Machine Learning Approaches Towards Identification of Phenotypes in Various Diseases Using Electronic Health Records

Author(s)

Kumar A, Pradhan H, Adhikary RR
Novartis Healthcare Pvt. Ltd, Hyderabad, India

Presentation Documents

OBJECTIVES:

Machine learning (ML), that involves algorithmic modeling to extract general deductions from large and complex real-world datasets (e.g., electronic health records or EHR), is being rapidly adopted into healthcare over the past few decades. Traditionally, phenotyping in diseases like asthma was based on clinical features only. We describe herein the application of various ML algorithms for large EHR datasets as a suitable improvement in identifying novel phenotypes and confirming existing ones in various disease areas like allergy, cardiology, and oncology.

METHODS:

The present concept outlines the identification of phenotypes through non-linear patient characteristics within EHR including demographics, clinical details, comorbidities, medications, procedures, diagnostics, and healthcare encounters. Considering the fragmented, missing, and inaccurate data within EHR, lexical and logical methods with verification from medical experts can be used to create a “fit-for-purpose” dataset. Various ML algorithms used on such datasets can enable various steps of the phenotyping process including: Natural language processing for decoding physicians’ notes, MissForest algorithm (using random forest techniques) for missing value imputation, Uniform Manifold Approximation and Projection for non-linear dimensionality reduction, and Bayesian Gaussian Mixture as an advanced clustering algorithm to identify the phenotypes. This followed by suitable informative and interactive data visualization with medical interpretation of shared clinical characteristics can help in identifying and validating phenotypes.

RESULTS:

We outlined herein the concepts involved in using multiple ML algorithms complemented with medical expertise in identifying novel phenotypes and confirming established ones from large, complex, real-world EHR data by grouping clinical characteristics. This can help in the development of innovative medicines in various disease areas by supporting clinical trial recruitment and by aiding design of next-generation clinical trials based on real-world data.

CONCLUSIONS:

Machine learning algorithms can enable identification of novel phenotypes from real-world EHR data thus helping in better disease understanding and in developing innovative medicines.

Conference/Value in Health Info

2022-11, ISPOR Europe 2022, Vienna, Austria

Value in Health, Volume 25, Issue 12S (December 2022)

Code

MSR118

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference, Missing Data

Disease

SDC: Cardiovascular Disorders (including MI, Stroke, Circulatory), SDC: Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal), SDC: Respiratory-Related Disorders (Allergy, Asthma, Smoking, Other Respiratory), STA: Persona

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×