Identifying Treatment Patterns for Diffuse Large B-Cell Lymphoma in Real-World Data Using Unsupervised Machine Learning
Author(s)
Wang Y1, Vanness D2
1Pennsylvania State University, State College, PA, USA, 2Pennsylvania State University, University Park, PA, USA
Presentation Documents
OBJECTIVES: Because many hematological oncology treatments delivered in practice do not precisely match treatment guidelines, researchers cannot rely on guidelines alone to identify treatment patterns and detect switches in therapy observed in real-world data. We explore whether unsupervised machine learning may be useful for identifying treatment patterns for patients with diffuse large B-cell lymphoma (DLBCL).
METHODS: We used 2007-2022 electronic health record data (TriNetX) to identify 7,321 DLBCL patients beginning with non-second-line-only therapies using ICD-10-CM diagnosis codes (C83.3). We identified 30 drugs used for DLBCL treatment from National Comprehensive Cancer Network (NCCN) and American Cancer Society (ACS) guidance and the literature. For each patient, drugs delivered within a 7-day window were grouped into multi-drug encounters. Multiple Correspondence Analysis (MCA) identified dimensions comprising weighted combinations drugs co-occurring within encounters. Mini-Batch K-Means then clustered encounters on their MCA domains. We used the Bayesian Information Criterion (BIC) to determine the optimal number of clusters. Sensitivity analyses varied the time window for grouping drugs, number of MCA dimensions, and criteria for selecting the optimal number of clusters.
RESULTS: Our base case approach successfully identified meaningful treatment patterns distinguishing between recognizable first-line and second-line therapies. Reducing MCA dimensions or expanding the drug grouping window reduced the optimal number of clusters, undesirably assigning some recognized first-line and second-line therapies to the same cluster. In some scenarios, replacing BIC with Akaike information criterion (AIC) yielded results similar to the base case but dramatically increased the optimal number of clusters in others.
CONCLUSIONS: Unsupervised machine learning is a promising approach for identifying meaningful treatment patterns. However, results are sensitive to learning parameters and require careful consideration.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 6, S2 (June 2023)
Code
MSR55
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas