Advancing Causal Inference With Machine Learning and Real-World Data: An Application of Targeted Machine Learning and Super Learners on Hospital-Acquired Pressure Injuries From MIMIC IV
Author(s)
Wilson A1, Gregg M2, Streja E3, Alderden J4, Vanderpuye-Orgle J3, Roessner M3
1Parexel International, Waltham, MA, USA, 2Parexel International, Austin, TX, USA, 3Parexel International, Boston, MA, USA, 4Boise State University, Boise, ID, USA
Presentation Documents
OBJECTIVES: Traditional causal methods typically rely on parametric statistical models that impose restrictive assumptions about underlying data structure. Recent advancements in targeted machine learning (ML) and super learning enable the identification of causal estimates in real-world data, irrespective of data complexity or structure, offering a flexible and comprehensive approach to causal inference. In our study, we leverage the MIMIC IV database and ML models to ascertain the causes of hospital-acquired pressure injuries (HAPrI) and develop a risk prediction algorithm.
METHODS: Utilizing the MIMIC IV dataset – a deidentified electronic health records dataset from Beth Israel Deaconess Medical Center, capturing admissions from 2008-2019 for nearly 300,000 patients – we used cost-sensitive ensemble super learning to predict HAPrI in the ICU. We then estimated the potential causal effect of albumin on HAPrI development via clinically-informed debiasing methods, including directed acyclic graphs and targeted maximum likelihood estimation (TMLE).
RESULTS: Of 28,395 eligible cases, 1,395 developed a pressure injury (4.9%). The ensemble super learner had a cross-validated AUC of 0.8, with 45.6% sensitivity and 88.8% specificity. The crude odds ratio of low (below 3.0) albumin on pressure injury was significant: OR = 2.86, p<0.0001. The TMLE-adjusted estimate was significant but attenuated: OR = 2.22, p<0.0001.
CONCLUSIONS: Previous models predicting pressure injuries often favour overall accuracy and have a clinically uninformative sensitivity, whereas traditional Braden scales classify almost all patients as high-risk. Our results suggest that ML methods can be used to develop accurate risk prediction algorithms for HAPrI. We also identified a significant (causal) effect of low albumin levels on the development of pressure injuries. These findings demonstrate how ML methods generate valuable causal and predictive models and improve our understanding of data interdependencies. The future of clinical prediction lies at the intersection of accuracy, clinical wisdom, and machine learning's amplified use of real-world data.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 11, S2 (December 2023)
Code
MSR102
Topic
Methodological & Statistical Research, Real World Data & Information Systems, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Electronic Medical & Health Records, Health & Insurance Records Systems
Disease
Injury & Trauma, No Additional Disease & Conditions/Specialized Treatment Areas