Combining survey data, paradata and administrative data for non-response investigation
Quality data and saving money
People are less keen than ever to take part in social surveys these days. They may simply decline to be involved in the first place or they may drop out after taking part once or more. For large-scale household and cohort studies looking to collect data not just once but repeatedly over time from thousands of people, this is a major logistical, practical and financial headache. But research linking three major surveys run by the Office for National Statistics (ONS) with Census information has come up with new ways to overcome these problems and achieve good quality data whilst making substantial financial savings. It could also help secure the future of the UK’s rich and important data sets.
How the research helps
There is increasing pressure on those running large-scale surveys to improve data quality whilst reducing costs. Every individual who declines to take part or drops out of a survey represents both a financial and a data quality blow in respect of the time and effort spent trying to contact individuals and in trying to deal with any bias caused by them not participating.
Traditionally, the approach to this problem has been to try to find new and better ways of maximising response rates, but this research points to potential benefits of an alternative approach. This focuses on monitoring subgroups of surveys to ensure they are properly represented and points to major potential savings from a more flexible and adaptive approach during the data collection process.
The research shows that when repeatedly attempting to contact a survey participant:
- implementing a cut off point (between call 6 and 8) makes little or no difference to data quality
- implementing that cut off point (depending on the survey) can make savings of between 7 and 15 per cent
It also shows that:
- it could be profitable for the commissioners of social surveys to reconsider and reflect on current attitudes towards tackling non response
- instead of aiming for an overall response rate at the end of data collection, potentially time and money could be spent more wisely if after a certain point subgroups less likely to respond to the survey are selectively targeted
- this approach could result in better quality data even if response rates are not improved
- technology could be used to employ this approach in real time by prioritising people for extra efforts to obtain their responses.
What has been achieved?
The research is the result of a positive and productive working partnership with ONS which has made the data available.
The findings have attracted considerable interest, especially from within ONS, which is looking to make substantial savings on their survey budget. The ONS has already reduced the maximum calls made to a survey subject from 20 to 13, and is conducting tests to confirm further reductions to 6 to 8 calls do not result in impacts on survey dataset quality. There is also interest among the wider research community especially those working on and with large scale social surveys.
Findings from the project have also been presented to the Treasury, the Government Economic Service and Government Social Research teams, members of the UK’s Household Longitudinal Study (Understanding Society) based at the University of Essex, the Science and Engineering South research workshop 'The Data Dialogue, Time to Share: Navigating Boundaries and Benefits' at the University of Cambridge, the Royal Statistics Society, the World Statistics Congress in Brazil and an International workshop in Norway.
The research uses administrative data collected in three ONS household surveys:
- Labour Force Survey (LFS), providing official measures of employment and unemployment - largest household study in the UK
- The Life Opportunities Survey (LOS), providing data on how disabled and non-disabled people participate in society
- The Opinions and Lifestyle Survey (OPN), providing quick and reliable information about topics of immediate interest
Those surveys are then linked to the UK Census, which provides a detailed snapshot of the population and its characteristics, and underpins funding for public services.
The research also made use of paradata, notes written by survey interviewers about the numbers of attempts they had made to contact survey participants.
ONS made the linked data set available to the research team via the Virtual Microdata Laboratory (VML) secure research environment. Each time a call was made to a participant, in the first wave of data collection, the representativeness of the dataset was checked.
Gabrielle Durrant, ADRC-E and University of Southampton
Jamie Moore, ADRC-E and University of Southampton
Peter W.F. Smith, ADRC-E and University of Southampton
Further information and links
- Dataset representativeness during data collection in three UK social surveys: generalizability and the effects of auxiliary covariate choice – Journal of the Royal Statistical Society: Series A (Statistics in Society)
- Fieldwork effort, response rate, and the distribution of survey outcomes: A multilevel meta-analysis – Public Opinion Quarterly
- Using linked survey-census data to monitor survey non-response – You Tube video