Cheltenham Science Festival 2017
ADRC-England was at the Cheltenham Science Festival 2017 (#CheltSciFest) on Friday 9 and Saturday 10 June. We were part of the University of Southampton initiative 'Bringing Research to Life' Roadshow. Silvia and Mirela, with the help of Steve (Public Engagement with Research unit, University of Southampton) on Friday evening, talked to 868 people at our ADRC-E stand at the Discovery Zone.
Images: The Discovery zone and Mirela at Cheltenham Science Festival 2017
'Better Candies Benefit Society': We counted all buttons (our 'Census' data) in the 4 jars (Data Linkage Platform) corresponding to the 4 types of candies provided by the 'Ministry of Candies'. 868 people took part in our experiment: 303 girls < 15 years old (34.9%), 222 boys < 15 years old (25.6%), 212 women ≥ 15 years old (24.4%) and 131 men ≥ 15 years old (15.1%). After careful consideration of the 'Published Results', the 'Candy Council' for CheltSciFest 2017 declared that
Images: Better Candies Benefit Society results at CheltSciFest 2017
We gathered all tweets about the event on a Twitter Moment on our @ADRC_E account.
We also had few questions left on our ADRC-E 'Book of Unknown Data Research Questions'. Here are the answers provided by our scientists:
- What size are the datasets?
The size of the datasets held by ADRC-E is determined on a project-by-project basis so that we never hold more data than is actually required for analysis. The records in an administrative dataset will usually relate to individual people, households or events and may therefore vary from a few thousand individuals in a survey to many millions of events such households or births. The dataset size will also be affected by the number of pieces of information brought together from the original datasets that may be linked for a particular study. ADRC-E datasets are designed to contain only the information needed to do the proposed analysis, which in turn has a bearing on how many variables or separate pieces of information are required to complete the project. A linked dataset will typically contain the matched records, that is, only the people who appear in each linked dataset. The validity of the conclusions is not necessarily related to the size of the dataset. There are lots of things that need to be thought about when producing a dataset to explore a research question, for example, the level of geography, the population being studied, the available data, and the characteristics, rare or common, that the researcher is interested in learning more about. The study design, which considers all of these issues, will lead to a decision about a suitable range of records to meet the researcher’s needs.
- How big a dataset is needed for conclusions to be valid?
This is quite a technical question. Researchers use statistical formulae to work out the size of a sample necessary to understand characteristics of a population with a given level of confidence. Administrative data generated by government departments are often neither statistically controlled samples nor complete records of the population. The exact confidence that can be placed in the conclusions from administrative data is therefore complex, and one of our areas of research! You can find out more about samples and populations online by looking up "sample size estimation".
- Do you sell your data?
ADRC-E does not own data - we are given permission to hold it by the data owners. So it is not ours to sell, even if we wanted to! We are Government-funded and value the trust that researchers, Government, data subjects and data owners have in us, so sale of data from ADRC-E projects would never be considered.
- Is this part of the Government infrastructure project on Big Data?
ADRC-E is funded by the Economic and Social Research Council (ESRC), which is allocated funds by the Government to distribute for research. ESRC has a Big Data Network programme, which you can learn more about here: http://www.esrc.ac.uk/research/our-research/big-data-network/.
- Will you publish access to datasets for the benefit of data science research commercially and for charity?
ADRC-E gives access to datasets for approved projects led by approved researchers. The research that we support is not for commercial purposes. Charities are able to apply to use ADRN resources through our usual channels (more information is on our ADRN website: https://www.adrn.ac.uk/). We do not make datasets themselves public since it is an essential element of our data security that they are only available to approved researchers. However, it is also a precondition of our projects that they should be of public benefit and that research results are made public.
- What software do you use?
We have all sorts of statistical and analytical software packages, including SPSS, SAS, R, Stata, MLwiN, ArcGIS and others. If a researcher has a specific requirement for something else and gives us enough notice, we will try to ensure that it is available to them, subject to licensing costs and resources.
- Do the datasets provided include market data eg. electricity/utility information. Can companies be identified?
ADRN projects could use utility information, subject to permission from the data owners. The data that a researcher receives for analysis will have been de-identified in such a way that the risk of them identifying any people or companies is minimised and the researcher is only able to publish carefully vetted results, not the data themselves.
- How do you make sure that you are not just identifying "nonsense" links?
There are sophisticated techniques for linking and matching two or more datasets that are designed such that the maximum number of true links is made balanced against making a minimum number of false matches. Our partner, the Office for National Statistics, does this for ADRC-E.