Machine Learning Approaches Towards Identification of Phenotypes in Various Diseases Using Electronic Health Records
Author(s)
Kumar A, Pradhan H, Adhikary RR
Novartis Healthcare Pvt. Ltd, Hyderabad, India
Presentation Documents
OBJECTIVES:
Machine learning (ML), that involves algorithmic modeling to extract general deductions from large and complex real-world datasets (e.g., electronic health records or EHR), is being rapidly adopted into healthcare over the past few decades. Traditionally, phenotyping in diseases like asthma was based on clinical features only. We describe herein the application of various ML algorithms for large EHR datasets as a suitable improvement in identifying novel phenotypes and confirming existing ones in various disease areas like allergy, cardiology, and oncology.METHODS:
The present concept outlines the identification of phenotypes through non-linear patient characteristics within EHR including demographics, clinical details, comorbidities, medications, procedures, diagnostics, and healthcare encounters. Considering the fragmented, missing, and inaccurate data within EHR, lexical and logical methods with verification from medical experts can be used to create a “fit-for-purpose” dataset. Various ML algorithms used on such datasets can enable various steps of the phenotyping process including: Natural language processing for decoding physicians’ notes, MissForest algorithm (using random forest techniques) for missing value imputation, Uniform Manifold Approximation and Projection for non-linear dimensionality reduction, and Bayesian Gaussian Mixture as an advanced clustering algorithm to identify the phenotypes. This followed by suitable informative and interactive data visualization with medical interpretation of shared clinical characteristics can help in identifying and validating phenotypes.RESULTS:
We outlined herein the concepts involved in using multiple ML algorithms complemented with medical expertise in identifying novel phenotypes and confirming established ones from large, complex, real-world EHR data by grouping clinical characteristics. This can help in the development of innovative medicines in various disease areas by supporting clinical trial recruitment and by aiding design of next-generation clinical trials based on real-world data.CONCLUSIONS:
Machine learning algorithms can enable identification of novel phenotypes from real-world EHR data thus helping in better disease understanding and in developing innovative medicines.Conference/Value in Health Info
2022-11, ISPOR Europe 2022, Vienna, Austria
Value in Health, Volume 25, Issue 12S (December 2022)
Code
MSR118
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference, Missing Data
Disease
SDC: Cardiovascular Disorders (including MI, Stroke, Circulatory), SDC: Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal), SDC: Respiratory-Related Disorders (Allergy, Asthma, Smoking, Other Respiratory), STA: Persona