The Administrative Data Research Centre for Scotland provides state-of-the-art facilities to allow secure access to a wide range of de-identified administrative data by accredited researchers. This includes subjects such as housing, transport, welfare, health, social work, older people’s services, education and criminal justice systems.
We also want to discover the research potential of Scotland’s rich historical data collections, including:
- civil registration data (1855-1974), which has recently been digitised
- 1932 and 1947 Scottish Mental Surveys – a mental ability test taken by almost all 11-year-old school children
- Aberdeen Children of the Nineteen Fifties, a cross-sectional study of over 12,000 primary school children born between 1950 and 1956 who took part in the Aberdeen Child Development Survey in 1962
Our expertise can be further illustrated by the following case studies:
Research case studies:
- Research and development of the governance of and public engagement with administrative data: underpinning the ADRC-Scotland and towards a model to help researchers across the UK
- Rolling out the synthpop software tool for producing synthetic versions of sensitive administrative microdata
- New ways of exploring links between educational and developmental outcomes
Linkage case studies:
- National Energy Efficiency Data Framework Unique Property Reference Number (UPRN) Matching (Scotland)
- Development of “read-through” indexes against national research spine
- Linkage of School Pupil Census Education data
See what researchers at ADRC-Scotland are up to in this series of videos:
ADRC-Scotland Case Study 1 - Research and development of the governance of and public engagement with administrative data: underpinning the ADRC-Scotland and towards a model to help researchers across the UK
We have been examining what appears to be a rapidly changing culture of data linkage research and attitudes of the range of stakeholders involved in data linkage research in the social sciences. Our work has included interviews with stakeholders such as data controllers and publics around the release of data, public acceptability of data use and the public benefits of using such data. Engagement with Scottish public authorities in particular has led to a planned workshop in April 2016 investigating the ‘culture of caution’ surrounding the use, sharing and linkage of administrative data in Scotland’s public sector. The workshop will identify key barriers but also facilitate sharing of best practices and the identification of practical tools for overcoming this risk-averse culture in favour of one which is more confident and facilitative of proportionate decision-making.
Publications from the legal work package cover development of a novel decision-making tool for data controllers and a ‘public interest mandate’, as well as an assessment of the General Data Protection Regulation and its potential impact on social sciences research in the UK. The potential impact of the forthcoming General Data Protection Regulation on the undertaking of social sciences research in the UK has been assessed, providing practical analysis from the perspective of research involving administrative data.
ADRC-S staff have been collaborating across disciplines with colleagues in the Farr Institute of Health Informatics across the UK to share learning, developments and best practice. e.g. ADRC-Scotland and the Farr Institute have a been jointly running fortnightly drop-in sessions for researchers on public engagement since Autumn 2015. ADRC-S and the Farr Institute jointly hosted a workshop in Summer 2015 on 'Sharing Data Across Sectors for the Public Good’ - which focussed on the concept of interoperability and therefore what is required to make data linkages work across sectors; this was funded by a Knowledge Exchange and Impact Grant. We are planning a workshop on social and ethical assessment of social science research data linkage projects, building on work already undertaken looking at socio-technical perspectives of social science research and the value of public engagement to social science research.
Our efforts are helping researchers using the ADRC-Scotland to be aware of how the public voice is integral to their work and to explore how to embed public engagement within their research. Engagement events with different publics have examined the term public benefit and how to create meaningful public benefit with public stakeholders. At the core of this is the ADRC-Scotland Public panel, established in November 2015. The Panel’s next meeting is scheduled to examine criminal records, how they have been and are planned to be used by ADRC-Scotland researchers.
The ADRC-S Public Panel’s sphere is adjacent to that of the Farr Scotland Public Panel and the Public Benefit and Privacy Panel for Health and Social Care in Scotland. A major challenge for the ADRC-S and our ADRN colleagues, especially pertinent to one so new as the ADRC-S PP, and still to be worked through, is to understand the scope for such local panels. The challenge includes defining the extent to which they should be consulted beyond the initial remit within which they have been established, and understanding how any new structures for engaging with publics can be put in place across the ADRN that would reinforce, rather than undermine, the strengths of the local structures that have been developed from the ground up.
ADRC-Scotland Case Study 2 - Rolling out the synthpop software tool for producing synthetic versions of sensitive administrative microdata
In working to unlock the potential of administrative datasets, ADRC-S is exploring and implementing some innovative data dissemination strategies that can facilitate data access for research and training purposes. Synthetic microdata, unlike traditional disclosure control measures, provide high level of privacy protection at relatively low cost in terms of statistical properties of the original data. Sensitive information is replaced by synthetic values sampled from probability models fitted to the original data. If all values of all variables are generated from the models, the released synthetic dataset includes completely artificial units only and are in principle safe to release for a preliminary analysis with approximate results or as a teaching resource.
Synthetic data is an attractive way to expand the use of confidential microdata but due to the complexity of synthetic data production process they have been limited in use. The ADRC-S has led the development of the R synthpop package for the generation, evaluation and analysis of synthetic data which is freely available from the Comprehensive R Archive Network (CRAN). The recent work has focused on evaluating utility of synthetic data produced using various synthesising models with an overall aim of developing general guidelines for best practice.
The importance of open non-disclosive datasets has been recognised by an increasing number of data holders. The organisations that have already expressed interest in synthetic data and the synthpop package include Information Services Division Scotland (ISD), Usher Institute of Population Health Sciences and Informatics, Innovate UK and National Records of Scotland (NRS). The potential cooperation with a variety of stakeholders offers helpful guidance on further development needs of the synthpop package.
Our work has been progressed in close collaboration with Penn State University who seconded a long studentship to work at 9 Edinburgh Bioquarter, and has been increasingly recognised internationally via participation in a stream of overseas conferences and workshops.
ADRC-Scotland Case Study 3 - New ways of exploring links between educational and developmental outcomes
ADRC-S has been pioneering the use of administrative data to explore the links between educational and developmental outcomes, by exploring the extent to which the relationship between family socioeconomic position and educational attainment is moderated by being born small for gestational age and by child development, and in parallel by exploring variation in child development by birthweight and by family socioeconomic position.
This is a critical issue for policy affecting children’s educational and wellbeing outcomes. It has been recognised that low birthweight and socio-economic status impacts cognitive development, and previously investigated using the Scottish Mental Health Survey 1932, the Growing Up in Scotland (GUS) longitudinal survey, and the Aberdeen Children of the 1950s study. We are taking forward previous research in Scotland and the UK more widely as a showcase of the type of question that can be fruitfully addressed in the ADRN. Scotland is particularly fertile in offering datasets that enable targeting of this topic, and we have linked Scottish datasets with the Scottish Longitudinal Study. This work has been part of a wider collaboration as part of the Scottish Health Informatics Programme (SHIP) Research Programme on Demographic, Socio-Economic and Environmental Data Linkage.
Our research to date has identified that a small part of the differences in educational attainment by family socioeconomic position are explained by the different birth outcomes of advantaged and disadvantaged children. Only a very small part of the differences in educational attainment by family socioeconomic position are ‘explained away’ by differences in child development. This is because the child development measures available indicate quite serious conditions which affect only a very small proportion of children. It was also identified that these child development measures were strongly associated with gestational-age specific birthweight and that fine motor, social and hearing abnormalities were much more likely for children with parents in lower grade occupations or those with parents who were long-term unemployed.
These associations were net of other indicators of family socioeconomic position and birth outcomes, suggesting that infants from disadvantaged backgrounds remained more likely to experience these conditions beyond the differences that could be explained by other relevant factors (such as their birth weight). These findings confirm and support previous analysis through the use of administrative data.
In parallel with this work, colleagues across ADRC-S are at the exciting stage of gauging the responses from the 300 members of the Aberdeen Children of the 1950s cohort who regathered on 20 February 2016 to hear about ideas for future research and to help interpret contextual detail, describing the important influences on social circumstance for families growing up and living in Aberdeen through a time of rapid economic, cultural and technological transformation. It is anticipated that the findings of this work with the cohort will help direct plans for future research in the area of educational and developmental outcomes with particular interest in the potential impact of these childhood factors on aspects of later life.
The National Records of Scotland (NRS) Indexing Team are the trusted third party for ADRC-S. They will be applying and further developing the techniques listed below, in order to ensure robust linkage of education data to the research population spine. “Read-through” indexes will also be created to facilitate quick turnaround for approved research projects requiring linked pupil census data.
1) National Energy Efficiency Data Framework Unique Property Reference Number (UPRN) Matching (Scotland)
The data for this project was provided by Scottish Government (but sourced from different organisations: Scottish Assessors, Energy Savings Trust, UK Department of Energy and Climate Change (DECC), and Experian). The project required matching each of these datasets to UPRN via the NRS Address Register which combines address information from local authorities, the one-Scotland Gazateer and the Royal mail Postcode Address File. This exercise was undertaken by the NRS Indexing Team (trusted third party for ADRC-S) in collaboration with the NRS Geography Team.
The project is critical to the development of the National Energy Efficiency Data-Framework (NEED) for Scotland. Analysis from NEED looks at how energy consumption varies for different types of properties and households and enables analysis of the impact of installing retrofit energy efficiency measures on household energy consumption.
DECC currently produce NEED analysis for England and Wales, and the SG would look to enhance the use of these data to include Scotland - while also holding a dataset for Scotland consistent with the analysis currently carried out by DECC.
Using combinations of string manipulation and exact matching techniques, approximately 95% of Home Energy Efficiency Database address records from the Energy Savings Trust, and 98% of address records on household characteristics from Experian could be matched to a UPRN on the NRS address register.
The learning and outcomes for this exercise could potentially be applied to ADRC-S projects which require an element of household-level matching.
2) Development of “read-through” indexes against national research spine
NRS Indexing Team, the trusted third party for ADRC-S, are in the process of drafting a series of memorandum of understanding (MoU) with various administrative data controllers in order to establish safer and more efficient linkages of datasets under these agreements. Under each MoU, the data controller asks NRS to process their dataset by linking it to the national research spine and creating anonymised “read-through” index keys which the data controller will hold at the person-level on their own dataset. NRS will maintain a look-up of the “read-through” against the spine. This means for approved research studies involving their data which has already been indexed, the data controller just needs to send the read-through keys (without any other personal identifying information) for the people who make up the study cohort, to the indexing team to generate study-specific index keys in the usual manner. It can also allow the data provider to receive the “read-through” keys and study-specific index numbers from the indexing team, when the research cohort originates from a dataset held by a different data controller. This considerably cuts down on the amount of personal identifying information which regularly have to be transferred, particularly for datasets which are frequently requested in research projects. At the moment, MoU’s for health, education and the census are very close to sign-off.
3) Linkage of School Pupil Census Education data
It has already been demonstrated (see http://www.isdscotland.org/Products-and-Services/eDRIS/Docs/20150421-Linking-ScotXed-Data.pdf ) that it is possible to link, in the absence of name information, Scottish national pupil census information to a population database with high levels of precision and sensitivity. SILC partners in Farr Institute Scotland found that unique exact matches to the Community Health Index, using only Date of Birth, Gender and Postcode yield a linkage rate of > 93% with 99.9% precision. Incorporating probabilistic techniques and allowing less precise links can boost the linkage rate for a single pupil census year up to nearly 98.5% with still greater than 98% precision overall.