Predicting Antimicrobial Resistance in Uncomplicated Urinary Tract Infections Using Machine Learning

Speaker(s)

Kponee-Shovein K1, Cheng WY1, Marijam A2, Schwab P3, Gao C1, Indacochea D1, Ferrinho D3, Mitrani-Gold FS3, Pinheiro L1, Royer J4, Joshi AV3
1Analysis Group, Inc., Boston, MA, USA, 2GSK, Wavre, Belgium, 3GSK, Collegeville, PA, USA, 4Analysis Group, Inc., Montreal, QC, Canada

Presentation Documents

OBJECTIVES:

Urinary tract infections (UTIs) are among the most common bacterial infections worldwide, with 80% classified as uncomplicated (uUTIs). Over 50% of patients with uUTI are prescribed non-guideline based antimicrobial treatment, potentially contributing to antimicrobial resistance (AMR) and increased healthcare costs. Here, we present the methodology underlying our study, which used machine learning in the development and validation of robust models estimating the probability of resistance to commonly prescribed classes of antibiotics for uUTI.

METHODS:

This predictive modeling study uses retrospective Optum Electronic Health Record (EHR) data, including lab results, during the period from 1/10/2015–29/2/2020. Data from female patients aged ≥12 years, diagnosed with uUTI (via positive Escherichia coli urine culture), with antibiotic susceptibility test results (index date), and ≥12 months of EHR activity prior to index date, were categorized into training and testing cohorts based on index dates. Least absolute shrinkage and selection operator (LASSO) and random forest (RF) models were evaluated as candidate predictive models. Both LASSO and RF algorithms were developed using training cohort data to estimate the probability of AMR to respective antibiotic classes and then assessed on separate testing data. Multiple imputation by chained equations was used to impute missing data, and nested cross validation was used to select the optimal algorithm, eliminating effects of optimism observed in standard k-fold cross validation. The statistical strength of selected features was calculated by bootstrapping the k-fold cross validation procedure to account for adaptive feature selection.

RESULTS:

LASSO was selected over RF as it slightly outperformed on Area Under the Curve of the Receiver Operating Characteristic value and was determined to be more interpretable.

CONCLUSIONS:

This predictive algorithm for AMR among uUTI patients further improves upon existing models because Optum EHR data are larger than those used in other published models in the United States, enabling greater statistical power and generalizability.

Code

MSR41

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas