Development and Application of a Framework for Addressing New Challenges of Missing Data in Real-World Research
Author(s)
Su Z1, O'Sullivan A2, Dwyer K2, Paulus J2
1Ontada, Chestnut Hill, MA, USA, 2Ontada, Boston, MA, USA
Presentation Documents
OBJECTIVES: The rise in real-world data (RWD) sources introduces challenges to handling missing data given linkage issues, augmentation by natural language processing (NLP) and machine learning (ML) technologies, and reconciliation across diverse sources including electronic medical records (EMR), administrative claims, social determinants of health (SDoH), and wearables data. We therefore developed a novel framework for handling missing data in multi-sourced real-world databases.
METHODS: Published frameworks for handling missing data were reviewed. Scientific experts representing population health and data science disciplines developed a new framework that builds upon existing rubrics to address unmet needs in reconciliation across multiple sources, including imputed values. The framework has 5 domains: (1) data relevance and representativeness; (2) data quality; (3) data correlation; (4) intended collection; and (5) quantitative bias and sensitivity analyses. The framework was applied to several large RWD oncology studies that utilized integrated EMR, SDoH and claims data sources.
RESULTS: The new framework highlighted important considerations of data representativeness, quality and potential bias when handling missing data. First, when multiple data sources are integrated, the representativeness of the overlapped patients or data should be examined, documented and reported. Second, quality of supplementary data sources or data from predictive technologies including NLP and ML should be evaluated following a fit-for-purpose framework. Third, correlation among data elements and its impact on results should be assessed. Fourth, clinical expertise must be applied to differentiate between missing data and data not intended to be collected. Finally, quantitative bias and sensitivity analyses are strongly encouraged. Detailed methods and examples from applying this framework to oncology RWD studies will be presented.
CONCLUSIONS: A novel framework to address new challenges to handling missing data provides an objective approach to maximizing completeness and describing validity concerns. This framework supplements existing frameworks for handling missing data and will increase the quality of real-world evidence studies.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
MSR21
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference, Missing Data
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, Oncology