Identifying Potential Long-COVID Patients Using Machine Learning: A German Claims Data Analysis
Author(s)
Pacis S1, Bolzani A1, Maywald U2, Wilke T3
1Cytel Inc, Berlin, BE, Germany, 2AOK PLUS, Dresden, Germany, 3IPAM e.V., Wismar, Germany
Presentation Documents
OBJECTIVES: Many patients suffering from long-COVID may have experienced delayed diagnosis and received untimely treatment, especially during the beginning of the pandemic. This study uses machine learning (ML) to determine important features for identifying potential long-COVID patients as a proxy for long-COVID diagnosis.
METHODS: Data from AOK PLUS, a German sickness fund covering 3.5 million patients in Saxony and Thuringia, were used to identify all adult patients with ≥1 inpatient documentation of confirmed COVID-19 (ICD-10-GM: U07.1) between 01/04/2020-31/03/2022 (index date = first COVID-19 diagnosis). Patients alive at 31/03/2022 with ≥90 days continuous insurance after index were included. The outcome of interest was ≥1 long-COVID diagnosis (inpatient/outpatient; U09.9!) during follow-up (45-365 days after index, or to long-COVID diagnosis).
An XGBoost model (70/30 training/testing) was developed with initial features including characteristics at index (age, sex, intubation, comorbidities, Charlson-comorbidity score [CCI]), any new diagnoses and medications during follow-up that did not occur in 45-365 days before index, and healthcare utilization (number of outpatient visits/hospitalization days in follow-up). Shapley values were used for feature interpretability and the final model included the top 25 most important features. Model performance was assessed using AUROC and sensitivity/specificity.RESULTS: 28,419 patients were included (54% females, mean age 66.5 years, mean CCI 3.6), of which 6,512 (22.9%) patients had long-COVID (mean time from index: 3.4 months). AUROC and sensitivity/specificity were 0.80 and 0.93/0.67, respectively.
207 features were initially included before feature selection. Most important features to identify potential long-COVID patients included higher healthcare utilization, CCI, older age, intubation, comorbidities at index (diabetes, chronic heart failure, chronic kidney disease [CKD]), new diagnoses (breathing disorders, pneumonia, hypertension, CKD), and new medications (antipsychotics, anti-inflammatory/antirheumatic agents, peptic ulcer drugs, diuretics).CONCLUSIONS: ML can be used to determine important features for identifying potential long-COVID patients, including new diagnoses and medications following initial COVID-19 hospitalization.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 11, S2 (December 2023)
Code
MSR118
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
Cardiovascular Disorders (including MI, Stroke, Circulatory), Diabetes/Endocrine/Metabolic Disorders (including obesity), Respiratory-Related Disorders (Allergy, Asthma, Smoking, Other Respiratory), Systemic Disorders/Conditions (Anesthesia, Auto-Immune Disorders (n.e.c.), Hematological Disorders (non-oncologic), Pain), Urinary/Kidney Disorders