open menu

Novel methods for transforming raw electronic health records into research data

Research Overview

National electronic health records (EHR), administrative data and disease registries are increasingly being linked and made available for research. A primary use-case of linked EHR data is to accurately extract phenotypic information (i.e. disease status), a process known as phenotyping, for use in observational and experimental research. However, important challenges remain in relation to converting the information collected into research-ready data for statistical analyses. This methodological project focuses on developing computational methods for creating and validating HER-phenotypes for research and on building tools to transform raw EHR into research-ready data for statistical analysis in a transparent and reproducible manner.


Benefit

To improve the conversion of raw information collected in EHR into research-ready data to facilitate research progress and for clinical purposes


Datasets and name of the Government departments

Data from the Clinical Practice Research Datalink (CPRD), NHS Digital and the Office for National Statistics


More about the project

National electronic health records (EHR) data are now also being linked to genotypic data to examine how genetic variants influence susceptibility towards disease, to validate drug targets and modify drug response. An example of such a research platform is CALIBER which contains linked EHR data from primary care, hospital care, disease registries and mortality registers in >15m participants.

A primary use-case of linked EHR data is to accurately extract phenotypic information (i.e. disease status), a process known as phenotyping. However, important challenges remain in relation to converting the information collected into research-ready data for statistical analyses. The manner and reason for the generation, capture, and recording of EHR data vary substantially between healthcare settings. Additionally, different medical classification systems are often used for different data sources and consequently clinical information may be recorded in multiple sources but at different levels of clinical detail.  Integrating data from these different sources is therefore complex, but doing so is advantageous for research, and consequently clinical purposes.


Lead researchers

Dr Spiros Denaxas, ADRC-E and Institute of Health Informatics, UCL


Page last updated: 24/10/2017