Evaluating non-consent and non-linkage biases in linked datasets
Linked datasets are becoming more frequently used in research, but a barrier is limited knowledge about possible non-linkage biases in dataset estimates caused by not being able to link all subjects of interest across datasets, and linked and non-linked subjects differing. This project seeks to develop a methodological framework for evaluating such biases that is of use to both dataset producers and users. Central to the work will be that methods will utilise information on all subjects of interest from the sample dataset to evaluate biases, thus overcoming the issue that for covariates from secondary datasets linked to the sample only information on linked subjects is available.
The methodological framework for evaluating non-linkage biases developed and publicised in this research will be of use to both producers of and researchers using linked datasets. It can be used by dataset producers to inform adaptations to production methods to reduce biases and improve dataset quality, and by researchers to quantify (remaining) biases so that their impact on substantive findings from datasets can be considered.
Datasets and name of the Government departments
Data from the UK Small Business Survey dataset through the UK Data Service (UKDS) Secure Data Service (SDS).
More about the project
The wide-ranging benefits of record linkage are increasingly recognised. A constraint to the use of linked datasets though, is limited knowledge concerning non-linkage biases in dataset estimates. These can arise when not all sample subjects are linked (a common occurrence), and linked and non-linked subjects differ. Empirical work suggests that such biases and their impact on inference can be substantial. However, in practice they are generally ignored because evaluation is made difficult by a lack of information on non-linked subjects, especially when outcomes of interest are from secondary dataset(s) linked to a sample.
In this project, our objective is to provide a framework for evaluating non-linkage biases given sample member attribute information from the sample dataset alone, utilising developments of methods first derived to evaluate survey non-response biases. Beyond the evaluation of overall non-linkage biases, within this framework we will also consider multiple causes of such biases arising during different components of the linkage process: subject consent, and then linkage of subject records itself (previously, if at all, studies have tended to focus on one or another of these components). As well, we will consider interviewer effects, given that impacts on subject consent propensities have been reported, but possible confounding with geographic (area) effects and impacts on biases have not been investigated. Moreover, we will conduct similar research to evaluate analogous biases in biosocial datasets (subjects must in addition consent to biological data collection), which so far have received minimal attention in the literature.
Specifically, to ensure our work reaches the widest audience, we will conduct the above research in a series of sub-projects considering datasets on different topics: 1) developing methods for evaluating non-consent biases in the context of a health survey; 2) utilising similar methods to study business and social surveys in which non-linkage biases are the product of differences between consenting / non-consenting subjects and also linkable / non-linkable consenters; 3) investigating the impact of interviewer effects on consent propensities on non-linkage (-consent) biases in a social survey; 4) evaluating biases in biosocial datasets.
The research conducted will inform practice concerning both linked dataset creation and the use of linked datasets in research. It is strongly anticipated that outputs will be of interest to government and other agencies utilising linked datasets.
Dr Jamie C. Moore, ADRC-E and University of Southampton
Prof Gabriele Durrant, ADRC-E and University of Southampton
Prof Peter W.F. Smith, ADRC-E and University of Southampton