Implementation of New Real-World Data Quality Frameworks and Initiatives to Address Challenges in Oncology Electronic Health Record (EHR)-Derived Databases for Research
Author(s)
Zhaohui Su, PhD, Lisa Herms, PhD, Janet Espirito, PharmD, Gayathri Namasivayam, PhD, Robyn Harrell, MS, Amy O'Sullivan, PhD, Jessica Paulus, ScD;
Ontada, Boston, MA, USA
Ontada, Boston, MA, USA
Presentation Documents
OBJECTIVES: This study aims to summarize and implement new real-world data (RWD) quality frameworks and initiatives to improve the relevance, reliability, and external validity of oncology EHR-derived databases for research. This will facilitate availability of robust and reliable research outcome data in oncology.
METHODS: Recently published data quality dimensions were summarized, applied and assessed using Ontada’s On.Genuity RWD platform, which integrates EHR data from ~500 US community oncology clinics with external mortality and claims data. New challenges include handling missing data and potential bias introduced by linked datasets.
RESULTS: Recently published frameworks include the following dimensions: relevance (including availability and feasibility), reliability (including accuracy, completeness, conformance, plausibility, provenance, reproducibility, and traceability), and external validity (including generalizability, replicability, transparency). Our evaluations of availability showed that 250 standardized variables across more than 20 clinical domains were available for over 500K representative patients across 40 tumor types over the past 10 years. Data completeness assessments indicated an improvement of 20% or more by extracting information from unstructured data using natural language processing (NLP). Chart abstraction accuracy was evaluated through inter-rater reliability (>95%), NLP data quality metrics include sensitivity, specificity, positive predictive value, negative predictive value, accuracy and F1 score, which ranged from 85% to 99%. Mortality data demonstrated high consistency with an external data source (>96%).
CONCLUSIONS: The implementation of newly published data quality frameworks and initiatives represents a significant advancement in EHR-based oncology research, ensuring use of relevant, reliable and externally valid data. These frameworks substantially enhance the utility of real-world evidence generated from RWD and are expected to improve informed decision-making in clinical practice and policy. Our research serves as a model for other therapeutic areas and underscores the importance of rigorous data quality standards in real-world research.
METHODS: Recently published data quality dimensions were summarized, applied and assessed using Ontada’s On.Genuity RWD platform, which integrates EHR data from ~500 US community oncology clinics with external mortality and claims data. New challenges include handling missing data and potential bias introduced by linked datasets.
RESULTS: Recently published frameworks include the following dimensions: relevance (including availability and feasibility), reliability (including accuracy, completeness, conformance, plausibility, provenance, reproducibility, and traceability), and external validity (including generalizability, replicability, transparency). Our evaluations of availability showed that 250 standardized variables across more than 20 clinical domains were available for over 500K representative patients across 40 tumor types over the past 10 years. Data completeness assessments indicated an improvement of 20% or more by extracting information from unstructured data using natural language processing (NLP). Chart abstraction accuracy was evaluated through inter-rater reliability (>95%), NLP data quality metrics include sensitivity, specificity, positive predictive value, negative predictive value, accuracy and F1 score, which ranged from 85% to 99%. Mortality data demonstrated high consistency with an external data source (>96%).
CONCLUSIONS: The implementation of newly published data quality frameworks and initiatives represents a significant advancement in EHR-based oncology research, ensuring use of relevant, reliable and externally valid data. These frameworks substantially enhance the utility of real-world evidence generated from RWD and are expected to improve informed decision-making in clinical practice and policy. Our research serves as a model for other therapeutic areas and underscores the importance of rigorous data quality standards in real-world research.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
RWD172
Topic
Real World Data & Information Systems
Topic Subcategory
Data Protection, Integrity, & Quality Assurance, Health & Insurance Records Systems, Reproducibility & Replicability
Disease
SDC: Oncology