Development and Application of a Framework for Addressing New Challenges of Missing Data in Real-World Research

Author(s)

Su Z1, O'Sullivan A2, Dwyer K2, Paulus J2
1Ontada, Chestnut Hill, MA, USA, 2Ontada, Boston, MA, USA

OBJECTIVES: The rise in real-world data (RWD) sources introduces challenges to handling missing data given linkage issues, augmentation by natural language processing (NLP) and machine learning (ML) technologies, and reconciliation across diverse sources including electronic medical records (EMR), administrative claims, social determinants of health (SDoH), and wearables data. We therefore developed a novel framework for handling missing data in multi-sourced real-world databases.

METHODS: Published frameworks for handling missing data were reviewed. Scientific experts representing population health and data science disciplines developed a new framework that builds upon existing rubrics to address unmet needs in reconciliation across multiple sources, including imputed values. The framework has 5 domains: (1) data relevance and representativeness; (2) data quality; (3) data correlation; (4) intended collection; and (5) quantitative bias and sensitivity analyses. The framework was applied to several large RWD oncology studies that utilized integrated EMR, SDoH and claims data sources.

RESULTS: The new framework highlighted important considerations of data representativeness, quality and potential bias when handling missing data. First, when multiple data sources are integrated, the representativeness of the overlapped patients or data should be examined, documented and reported. Second, quality of supplementary data sources or data from predictive technologies including NLP and ML should be evaluated following a fit-for-purpose framework. Third, correlation among data elements and its impact on results should be assessed. Fourth, clinical expertise must be applied to differentiate between missing data and data not intended to be collected. Finally, quantitative bias and sensitivity analyses are strongly encouraged. Detailed methods and examples from applying this framework to oncology RWD studies will be presented.

CONCLUSIONS: A novel framework to address new challenges to handling missing data provides an objective approach to maximizing completeness and describing validity concerns. This framework supplements existing frameworks for handling missing data and will increase the quality of real-world evidence studies.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)

Code

MSR21

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference, Missing Data

Disease

No Additional Disease & Conditions/Specialized Treatment Areas, Oncology

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×