The Administrative Data Research Network was an ESRC-funded project that ran from October 2013 to July 2018. It is currently at the end of its funding cycle and is no longer taking applications. Administrative data research will be taken forward in a new project, which was launched at the end of 2018.

Visit the Administrative Data Research Partnership for further information. 

This archival website reflects the state of play at the end of the project in July 2018. All content has been frozen and may not be up to date.

open menu

Since 2014, ADRC-England organised a series of short courses every year to support members of the research community working or willing to work in administrative data research. These courses were designed for researchers in academia, government and the voluntary sector. We had courses to suit all levels of experience from introductory to advanced levels. Some of our courses run in collaboration with other data research centres, such as the ESRC-funded Consumer Data Research Centre and the Farr Institute in London. 

Key facts

  • Content: All our courses were strongly applied and most courses include practical sessions and computer workshops
  • Expertise: Courses run by experts in their fields, all of whom are experienced in delivering high quality short courses in a professional environment
  • Duration: Typical short course length was between one and three days
  • Location: Courses were usually held at the University of Southampton and University College London but we also run courses in Oxford, Edinburgh, Swansea and Belfast. 
  • Fees: Our courses were delivered free to ADRN members. We offered discounts for UK-registered postgraduate students and for UK academic institutions, Research Council UK funded researchers, UK public sector staff and staff at UK-registered charity organisations

Please note, the ADRC-E short courses are now concluded.

About the courses

Our courses were highly sought after and often over-subscribed. Therefore, we always recommended booking a place in advance to avoid disappointment; or to join our waiting list for last minute cancellations or for the announcement of new courses.  

In the last 4 years, we ran a total of 62 short courses, totalling 87.5 days of training, and we hosted 1,162 professionals from across the UK and other European countries. Attendees came from several Russell Group universities, the ONS, the NHS, local governments, national government departments, statistics agencies, professional membership organisations and charities that focus on health policy, social welfare, education and humanitarian aid. - Updated to June 2018.

We have often received very positive feedback from our attendees and here a few examples from the latest 2017 courses:

"Very impressed at presenter's knowledge. Variety and pace was well managed"

"Practicals and real life examples were excellent!"

"Nice course. Very good speaker. I am very satisfied!"

"I really enjoyed the sequence of theory and practice throughout the day"

"Excellent course, practically focused"

Other useful links

  • ADRC-E Training Podcasts - Visit this webpage to hear directly from some of the tutors regarding the content of their classes

Useful publications

Courses portfolio

Introduction to Administrative Data 

The course considers what administrative data are, the differences between survey and administrative data; and it considers the potential benefits of using administrative data alongside some of the accompanying challenges. Throughout the course, real case are used, based on the experience of ONS to date. Whilst the course discusses the benefits to be achieved from linking administrative datasets, it does not go into technical details. The process by which administrative data can be accessed is discussed, as are legal responsibilities such as the Data Protection Act and Human Rights Act.

Tutors: Dr Carolyn Watson (ONS), Dr Emma White (University of Southampton). This course ran in 2014 (twice).

Introduction to Data Linkage

Our most popular course. This short course is designed to give participants a practical introduction to data linkage and it is aimed at researchers either intending to use data linkage themselves or to analyse linked data. Examples of the uses of data linkage, data preparation, methods for linkage (including deterministic and probabilistic approaches) and issues for the analysis of linked data are covered. The main focus of this course is health data, although the concepts apply to many other areas. This course includes a practical example involving data to be linked, enabling participants to put theory into practice. 

Tutors: Dr Katie Harron (UCL), Dr James Doidge (UCL). This course ran in 2014 (twice), 2015, 2016 (twice), 2017 (twice) and 2018 (twice). 

Evaluating Linkage Quality for the Analysis of Linked Data

This short course is designed to give participants a practical introduction to handling and evaluating quality of linked data, and is aimed at researchers who want to understand more about how the data linkage process might impact on results derived from linked data. We will cover processing of linked data, concepts of linkage error and bias, and evaluating how linkage error might impact on analysis. This course includes a mixture of lectures and group work that will enable participants to put theory into practice.  

Tutors: Dr Katie Harron (UCL), Dr James Doidge (UCL). This course ran in 2018 (twice). 

Data Linkage: From Theory to Practice 

The course introduces basic concepts and methods of record linkage and covers methodological and statistical aspects of this newly emerging area. It provides theory and practical applications of deterministic and probabilistic approaches to record linkage including pre-matching processes, matching weights, types of errors in classification, evaluation of the quality of linkage procedures, implementation of the E-M algorithm and an introduction to the analysis of linked datasets. This course is a more intensive course than the course ‘Introduction to Data Linkage'.

Tutors: Prof Natalie Shlomo (University of Manchester). This course ran in 2014, 2015 and 2016.

Analysis of Linked Datasets 

This 2-day course introduces basic concepts of deterministic and probabilistic approaches to data (record) linkage, including pre-processing requirements, blocking, match weights and types of errors in the classification and evaluation procedures. Methods to compensate for potential linkage errors when carrying out some standard statistical models on the linked dataset are subsequently presented. These methods assume that linkage errors can be quantified and used to correct for measurement error in our statistical models. Other statistical methods for analysing linked datasets are presented, such as a multiple imputation approach.

Tutors: Prof Natalie Shlomo (University of Manchester). This course ran in 2014 (twice), 2015 and 2016.

Introduction to Hospital Episode Statistics

This 2-day course provides participants with an understanding of how Hospital Episode Statistics (HES) data are collected and coded, their structure, and how to clean and analyse HES data. A key focus is on developing an understanding of the strengths and weaknesses of HES data, how inconsistencies arise and the approaches that exist to address these inconsistencies. Participants also learn how to ensure individuals’ anonymity and confidentiality when analysing and publishing using HES. The course consists of a mixture of lectures and practicals sessions for which participants will use Stata software to clean and analyse HES data.

Tutors: Dr Pia Hardelid (UCL), Dr Linda Wijlaars (UCL). This course ran in 2015, 2016 (twice), 2017 (twice) and 2018.

Image: Dr Pia Hardelid and Dr Linda Wijlaars, taken during the March 2017 class

Introduction to National Pupil Database

This course gives a hands-on introduction to using the National Pupil Database (NPD) to analyse education policy. It provides an overview of all the key pupil background and attainment indicators that are contained in the NPD and an introduction to matching in key survey and administrative data to supplement the database. 

Tutors: Prof Lorraine Dearden, Dr Rebecca Allen, Mr Dave Thomson, Dr Mike Treadaway (UCL). This course ran in 2014, 2015 and 2016.

Quantitative Analysis using the National Pupil Database

This course helps researchers to know more about the structure of NPD, types of survey and administrative data that can be merged into the dataset, other data that has been linked to the NPD and strategies for dealing with measurement error and missingness (absence) in the data.

Tutors: Prof Lorraine Dearden, Dr Rebecca Allen (UCL). This course ran in 2014 and 2015. 

Using Administrative Data to analyse the Impact of Policy Initiatives

The course provides an introduction to non-experimental evaluation methods that can be used with administrative data for identifying the impact of policy initiatives. The course focuses on three methodological approaches used in program evaluation: Difference in Differences, Regression Discontinuity Design and Matching Methods.

Tutors: Prof Lorraine Dearden, Dr Rebecca Allen (UCL). This course ran in 2014 and 2015. 

Introduction to Data Visualization

This course provides participants with an introduction to data visualisation. It focuses on making interactive charts and maps using freely available software but also introduces some advanced options to create visuals through coding. The course also uses R to make some charts and maps that go beyond the standard line or scatter plots which academics usually make use of. Different sources of (open) data are used during the course, focussing on health, geographic and weather data. The course is aimed at academics from any discipline wanting to use these techniques either for public engagement or academic publication.

Tutors: Dr Linda Wijlaars (UCL). This course ran in 2016. 

Introduction to QGIS: Understanding and Presenting Spatial Data

This course will introduce spatial data and show you how to import and display spatial data with the free open source GIS program QGIS. We will also show you how to create choropleth maps and explain appropriate methods of visualising spatial data. We will also cover some basic spatial data analysis (e.g. calculating rates). No previous experience of GIS or QGIS is required, but some experience of using spatial data will be beneficial.

Tutors: Dr Nick Bearman (Clear Mapping Co, University of Liverpool). This course ran for the first time in July 2017 and in 2018 (twice). 

Introduction to Spatial Data and Using R (as a GIS)

This course covers an introduction to R, how to load and manage spatial data and how to create maps using R and RStudio. It presents appropriate ways of using classifications for choropleth maps, using loops in R to create multiple maps and some basic spatial analysis. RStudio is used to work with the R environment. By the end of the course attendees should be able to load data into R, represent it effectively and be able to prepare an output quality map.

Tutors: Dr Chris Gale (University of Southampton) - 2016. Dr Nick Bearman (Clear Mapping Co, University of Liverpool) - 2017. This course ran in 2016, in 2017 (twice) and in 2018. 

Image: Dr Nick Bearman, taken during the February 2017 class (photo courtesy of the Consumer Data Research Centre)

Confident Spatial Analysis

This course covers how to prepare and analyse spatial data in RStudio & GeoDa. RStudio is used to perform spatial overlay techniques (such as union, intersection and buffers) to combine different spatial data layers to support a spatial analysis decision. RStudio and GeoDa are also used to explore a range of different spatial analyses including regression, Moran’s I and clustering. By the end of the course applicants should understand how RStudio manages spatial data and be able to use RStudio for a range of spatial analysis.

Tutors: Dr Nick Bearman (Clear Mapping Co, University of Liverpool). This course ran in 2017 (twice) and in 2018 (twice). 

SQL Database Management Software

Database systems are increasingly being used for working with medical data as they enable the rapid querying of complex data in health and social care. This course introduces the theory behind the relational data model and enables participants to gain an understanding on how data can be modelled and stored in a relational database system, together with the different data types used. Through a series of practical-driven sessions using real-life data, attendees learn how to load existing data in a contemporary relational database management system and how to craft simple and complex queries for analysing the data.

Tutors: Dr Spiros Denaxas (UCL). This course ran for the first time in June 2017 and in 2018. 

Using Administrative Data in the Third Sector

This course considers how administrative data can be of use by researchers with an interest in the third sector, charities and social enterprises. It introduces key issues of administrative data, the concept of safe use and the different ways to access data. It explores a number of examples of how third sector researchers and third sector organisations have used administrative data, including impact evaluation. The course ends by looking to the future, identifying ways to improve access to administrative data and how to design research studies using administrative data. The course is aimed at those conducting or commissioning research related to the third sector and those with some research experience of using administrative data considering applications in the third sector. 

Tutors: Prof Fergus Lyon (Middlesex University), Tracey Gyateng, Dr Prabhat Vaze. This course ran in 2015. 

Combining Data from Multiple Administrative and Survey Sources for Statistical Purposes

In this 2-day course, day one provides a general introduction to combining multiple administrative and survey datasets for statistical purposes. A total-error framework is presented for integrated statistical data, which provides a systematic overview of the origin and nature of the various potential errors. The most typical data configurations are illustrated and the relevant statistical methods reviewed. Day two covers a handful of selected statistical methods. Training is given on the techniques of data fusion, or statistical matching, by which joint statistical data is created from separate marginal observations. The participants are introduced to several imputation or adjustment techniques, in the presence of constraints arising from overlapping data sources.

Tutors: Prof Li-Chun Zhang (University of Southampton). This course ran in 2015 and in 2017 (twice). 

Handling Missing Data in Administrative Studies: Multiple Imputation & Inverse Probability Weighting

The course considers the issues raised by missing data (both item and unit non-response) in studies using routinely collected data, for example electronic health records. Following a review of the issues raised by missing data, the course focuses on two methods of analysis: multiple imputation and inverse probability weighting. Participants also discuss how they can be used together. The relevant concepts are illustrated with medical and social data examples.

Tutors: Prof James Carpenter (LSHTM). This course ran in 2015 and 2017.  

Developing Synthetic Data for Administrative Data Sources

This course focuses on synthetic data methods that can be used to simulate synthetic versions of confidential administrative data. The course begins by giving a review of synthetic data, explaining how the general procedure works with illustrations, particularly with a focus on administrative data sources. There is also guidance on how synthetic data could be used best, with some strategies about how best to quantify the risk and utility of such data, both from theoretical and practical viewpoints. Details of how this fits in with government data access are also discussed, as any important differences between dealing with administrative data rather than traditional survey sample data.

Tutors: Dr Robin Mitra (University of Southampton). This course ran in 2016. 

Generating Synthetic Data for Statistical Disclosure Control

This course provides a detailed overview of the topic, covering all important aspects relevant for the synthetic data approach. Starting with a short introduction to data confidentiality in general and synthetic data in particular, the workshop discusses the different approaches to generating synthetic datasets in detail. Possible modelling strategies and analytical validity evaluations are assessed and potential measures to quantify the remaining risk of disclosure are presented. Finally, recent extensions of the synthetic data approach are reviewed and chances and obstacles of the idea are discussed.

Tutors: Dr Jörg Drechsler (IAB, Nuremberg). This course ran in 2014 and 2017. 

Page last updated: 31/07/2018