Evaluating non-consent and non-linkage biases in linked datasets
Linked datasets are becoming more frequently used in research, but limited knowledge is a barrier about possible non-linkage biases in dataset estimates caused by not being able to link all subjects of interest across datasets, and linked and non-linked subjects differing. This project seeks to develop a methodological framework for evaluating such biases that is of use to both dataset producers and users. Central to the work will be that methods will utilise information on all subjects of interest from the sample dataset to evaluate biases, thus overcoming the issue that for covariates from secondary datasets linked to the sample only information on linked subjects is available.
A methodological framework for evaluating non-linkage biases will be of use to both producers of and researchers using linked datasets. It can be used by dataset producers to inform adaptations to production methods - to reduce biases and improve dataset quality - and by researchers to quantify (remaining) biases so that their impact on substantive findings from datasets can be considered.
Datasets and name of the Government departments
Data from the UK Small Business Survey dataset through the UK Data Service (UKDS) Secure Data Service (SDS).
More about the project
The wide-ranging benefits of record linkage are increasingly recognised. A constraint to the use of linked datasets though, is limited knowledge concerning non-linkage biases in dataset estimates. These can arise when not all sample subjects are linked (a common occurrence), and linked and non-linked subjects differ. Empirical work suggests that such biases and their impact on inference can be substantial. However, in practice they are generally ignored because evaluation is made difficult by a lack of information on non-linked subjects, especially when outcomes of interest are from secondary dataset(s) linked to a sample.
In this project, we will investigate the impacts of two components of non-linkage bias on datasets: subject consent to linkage and unique identifier appendability. We quantify impacts given sample member attribute information from the sample dataset alone, utilising developments of methods first derived to evaluate survey non-response biases. The research conducted will inform practice concerning both linked dataset creation and the use of linked datasets in research. It is strongly anticipated that outputs will be of interest to government and other agencies utilising linked datasets.
Dr Jamie C. Moore, ADRC-E and University of Southampton
Prof Gabriele Durrant, ADRC-E and University of Southampton
Prof Peter W.F. Smith, ADRC-E and University of Southampton
Further information and links
- Correlates of record linkage and estimating risks of non-linkage biases in business data sets – Journal of the Royal Statistical Society: Series A (Statistics in Society)