Identifying Potential Long-COVID Patients Using Machine Learning: A German Claims Data Analysis

Author(s)

Pacis S1, Bolzani A1, Maywald U2, Wilke T3
1Cytel Inc, Berlin, BE, Germany, 2AOK PLUS, Dresden, Germany, 3IPAM e.V., Wismar, Germany

OBJECTIVES: Many patients suffering from long-COVID may have experienced delayed diagnosis and received untimely treatment, especially during the beginning of the pandemic. This study uses machine learning (ML) to determine important features for identifying potential long-COVID patients as a proxy for long-COVID diagnosis.

METHODS: Data from AOK PLUS, a German sickness fund covering 3.5 million patients in Saxony and Thuringia, were used to identify all adult patients with ≥1 inpatient documentation of confirmed COVID-19 (ICD-10-GM: U07.1) between 01/04/2020-31/03/2022 (index date = first COVID-19 diagnosis). Patients alive at 31/03/2022 with ≥90 days continuous insurance after index were included. The outcome of interest was ≥1 long-COVID diagnosis (inpatient/outpatient; U09.9!) during follow-up (45-365 days after index, or to long-COVID diagnosis).

An XGBoost model (70/30 training/testing) was developed with initial features including characteristics at index (age, sex, intubation, comorbidities, Charlson-comorbidity score [CCI]), any new diagnoses and medications during follow-up that did not occur in 45-365 days before index, and healthcare utilization (number of outpatient visits/hospitalization days in follow-up). Shapley values were used for feature interpretability and the final model included the top 25 most important features. Model performance was assessed using AUROC and sensitivity/specificity.

RESULTS: 28,419 patients were included (54% females, mean age 66.5 years, mean CCI 3.6), of which 6,512 (22.9%) patients had long-COVID (mean time from index: 3.4 months). AUROC and sensitivity/specificity were 0.80 and 0.93/0.67, respectively.

207 features were initially included before feature selection. Most important features to identify potential long-COVID patients included higher healthcare utilization, CCI, older age, intubation, comorbidities at index (diabetes, chronic heart failure, chronic kidney disease [CKD]), new diagnoses (breathing disorders, pneumonia, hypertension, CKD), and new medications (antipsychotics, anti-inflammatory/antirheumatic agents, peptic ulcer drugs, diuretics).

CONCLUSIONS: ML can be used to determine important features for identifying potential long-COVID patients, including new diagnoses and medications following initial COVID-19 hospitalization.

Conference/Value in Health Info

2023-11, ISPOR Europe 2023, Copenhagen, Denmark

Value in Health, Volume 26, Issue 11, S2 (December 2023)

Code

MSR118

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Cardiovascular Disorders (including MI, Stroke, Circulatory), Diabetes/Endocrine/Metabolic Disorders (including obesity), Respiratory-Related Disorders (Allergy, Asthma, Smoking, Other Respiratory), Systemic Disorders/Conditions (Anesthesia, Auto-Immune Disorders (n.e.c.), Hematological Disorders (non-oncologic), Pain), Urinary/Kidney Disorders

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×