Electronic Health Records With Unstructured Text to Predict Outcomes With Machine Learning: A Therapeutic Area Fingerprint
Author(s)
Cossio M1, Gilardino R2
1Universitat de Barcelona, Dubendorf, ZH, Switzerland, 2HE-Xperts Consulting LLC, Miami, FL, USA
Presentation Documents
OBJECTIVES: Due to the exponential application of Machine Learning (ML) to predict outcomes by analyzing unstructured text from electronic health records (EHR), we assessed whether therapeutic areas or medical specialties keen to employ unstructured data to capture disease-related information.
METHODS: We searched PUBMED and Scholar Google using the criteria "Electronic Health Records" and 'Machine Learning, screening all publications in English until September 2021. Variables for analysis were: The number of patients and time of data collection by TA, type of data structure, automatic text analysis techniques, and clinical outcomes. Data are presented as means for continuous, and percentages for categorical data.
RESULTS: We selected 117 papers that included 18 different therapy areas; Cardiovascular (27/117, 23%), Psychiatry (19/117, 16,2%), and Oncology (14/117, 11,9 %) were among the top employed unstructured data in the EHRs. 5/117 (4,2%) represented data from EU+UK. The range of patient population was: 577 to 2.341.877, and the years of data capture ranged from 3 to 20.5. 78/1117 (67%) of these papers employed ICD as the principal coding language. 47/117 (40%) presented unstructured data, and of the 20 registered automatic text analysis techniques, cTAKES and MetaMAP were the most frequent.
CONCLUSIONS: A wide diversity of medical specialties covered was found. However, a lack of protocolization was observed in the standards of number of patients and duration of studies. Likewise, the lack of standardization for automatic text analysis was identified, since many poorly developed health centers store their clinical data in this only way.
Conference/Value in Health Info
Value in Health, Volume 25, Issue 12S (December 2022)
Code
RWD102
Topic
Epidemiology & Public Health, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Disease Classification & Coding
Disease
No Additional Disease & Conditions/Specialized Treatment Areas