Prognosis of Myelodysplastic Syndrome Using De-Identified Market Clarity Database

Author(s)

Khan S1, Markan R2, Tanwar K2, Sanyal S2, Verma V1, Gaur A2, Daral S2, Kukreja I2, Nayyar A2, Roy A2
1Optum, Gurgaon, HR, India, 2Optum, Gurugram, HR, India

OBJECTIVES: Myelodysplastic Syndrome (MDS) is a rare blood cancer which occurs in the bone marrow with no solid tumors. Hence, conventional diagnostic assessments are not relevant for MDS. With a growing MDS patient pool, it is essential to provide efficient disease prognosis for proper treatment modalities.

In view of the present scenario, we evaluate ML algorithms to predict MDS progression. Furthermore, to provide a more precise estimation, clinical notes are analyzed using generative Artificial Intelligence (AI).

METHODS: A total of 100,164 patients, from 2017 till 2021, were considered from Optum® de-identified Market Clarity database. Of the total sample, 1,434 patients were identified who had a 12-month continuous eligibility, with age criteria of minimum 45 years, and complying to other exclusion criteria.

To reduce the confounding effects of demographic factors, cases to controls were matched using propensity score matching method. Analyses were carried out using only structured Electronic Health Records (EHR) data, and a combination of structured EHR data with unstructured clinical notes. Logistic regression, Random-Forest and XGBoost classifier were used, considering clinical presentations, laboratory tests and demographics as the predictors. Literature review was conducted to determine the key terms that are searched in the clinical notes. Generative AI was used to increase the accuracy and speed to customize clinical notes data.

RESULTS: Detailed review of the clinical notes revealed that exposure to radiation and certain chemicals increases the risk of developing MDS. Logistic regression provided the best accuracy, with 87% precision and 84% f1-score with the model based on standard EHR and customized clinical notes.

CONCLUSIONS: The research findings indicate that the terminologies used in clinical notes present a more precise indicator of prognostic symptoms than the structured EHR data. Additionally, the utilization of generative AI has proven to be more efficient than manual efforts in the annotation of clinical notes.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)

Code

RWD76

Topic

Clinical Outcomes, Real World Data & Information Systems

Topic Subcategory

Clinical Outcomes Assessment, Data Protection, Integrity, & Quality Assurance

Disease

Oncology

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×