Optimizing EHR Data Completeness: A Conceptual Framework for Bringing Real-World Data into Clinical Research through Relevant Completeness
Author(s)
Priyanka Ramamurthy, BA, MBA, Ruby Maa, BS, Dan Drozd, MD, MSc.
PicnicHealth, San Francisco, CA, USA.
PicnicHealth, San Francisco, CA, USA.
OBJECTIVES: The successful incorporation of real-world data into clinical research requires a comprehensive understanding of patients' healthcare journeys. We introduce a conceptual framework to address persistent challenges in achieving data completeness, including defining what constitutes "completeness" in electronic health record (EHR) data, identifying and mitigating gaps caused by patient movement between health systems, addressing events either not recorded or not anticipated, and thus not searched for, and developing scalable strategies for facility-specific record requests
METHODS: This framework uses Retrieval Density, a measure of relevant completeness defined as the ratio of retrieved visit records from targeted facilities to the number of signals indicating those care encounters occurred. Key steps include:
RESULTS: The framework supports the concept of relevant, rather than exhaustive, completeness, allowing for (1) Improved retrieval of key records through disease-informed methods, (2) Differentiation between types of missing data, enhancing analytical accuracy, and (3) Effective mapping of patient care pathways, to identify and address potential biases. Preliminary implementations demonstrate adaptability to diverse diseases and study designs.
CONCLUSIONS: Achieving complete EHR data for clinical research is not a zero-sum game. By prioritizing relevance, the Retrieval Density framework optimizes effort and mitigates biases. Its adoption is expected to advance the rigor of clinical research and support the integration of RWD into regulatory and decision-making processes.
METHODS: This framework uses Retrieval Density, a measure of relevant completeness defined as the ratio of retrieved visit records from targeted facilities to the number of signals indicating those care encounters occurred. Key steps include:
- (1) Initial Chart Reviews: Establish disease-, treatment-, and study-specific parameters, conducted manually or with human-in-the-loop methods.
- (2) Retrospective Retrieval: Leverage signals from health data sources (e.g., physician notes, claims data) to access relevant completed visits.
- (3) Prospective Prediction: Create synthetic signals to anticipate relevant patient care
- (4) Patient Engagement: Request provider and visit details directly from patients.
- (5) Data Classification: Differentiate between non-occurrence (event did not happen) and incomplete effort (records not retrieved).
- (6) Bias Management: Implement quality control measures, such as assessments of retrieval density across health status groups.
RESULTS: The framework supports the concept of relevant, rather than exhaustive, completeness, allowing for (1) Improved retrieval of key records through disease-informed methods, (2) Differentiation between types of missing data, enhancing analytical accuracy, and (3) Effective mapping of patient care pathways, to identify and address potential biases. Preliminary implementations demonstrate adaptability to diverse diseases and study designs.
CONCLUSIONS: Achieving complete EHR data for clinical research is not a zero-sum game. By prioritizing relevance, the Retrieval Density framework optimizes effort and mitigates biases. Its adoption is expected to advance the rigor of clinical research and support the integration of RWD into regulatory and decision-making processes.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
RWD24
Topic
Real World Data & Information Systems
Topic Subcategory
Distributed Data & Research Networks
Disease
No Additional Disease & Conditions/Specialized Treatment Areas