NATURAL LANGUAGE PROCESSING METHODS ENHANCE MACE IDENTIFICATION FROM ELECTRONIC HEALTH RECORDS

Author(s)

St. Laurent S1, Guo M1, Alfonso R1, Okoro T1, Johansen K2, Dember L3, Lindsay A1
1GSK, Collegeville, PA, USA, 2University of California, San Francisco, San Francisco, CA, USA, 3University of Pennsylvania, Philadelphia, PA, USA

OBJECTIVES: Electronic health records (EHRs) - digital versions of patients’ clinical records - can include free-text fields which offer insights into outcomes and events, but are challenging to analyze due to misspellings, abbreviations and extraneous notes. Natural language processing (NLP), the application of linguistics in computer science, can be used to identify events in free-text fields. The objective of this study was to apply NLP methods to identify major adverse cardiac events (MACE) in a large dialysis organization database using hospital discharge free-text fields. METHODS: The key search terms for MACE focused on stroke (‘stroke’, ‘cva’, ‘cerebrovascular accident’) and MI (‘myocardial infarction’, ‘MI’, ‘heart attack’, ‘stemi’, ‘nstemi’) and were based on clinical trial protocols and from National Library of Medicine terms. Terms were also identified for negation, diminutive and temporal criteria. The analysis (funded by GSK) was conducted in R, and utilized several text-mining packages (“tm”, “stringr”, “RTextTools”, “SnowballC”, “plyr”). A corpus of the text fields was cleaned by removing numbers, stripping white space and transforming all text to lowercase. A string-searching algorithm (RegEx) was performed iteratively on data frames created from the corpus using word-matching functions and boolean expressions. RESULTS: Classification rules were applied to 16,613 unique hospital discharge free-text fields, resulting in 87 fields containing stroke terms and 113 fields containing MI terms. Of these fields, a review by an experienced team comprising of cardiologists, nephrologists, and an endocrinologist confirmed 78 fields (90%) with stroke terms and 109 fields (96%) with MI terms. CONCLUSIONS: NLP methods conservatively simulated logic used by clinicians to detect outcomes from free-text fields, returning a broadly relevant subset of fields (90% clinically confirmed for stroke terms and 96% for MI terms). Refinements in the algorithm will continue to improve the precision of identifying true MACE events from free-text fields in an EHR database.

Conference/Value in Health Info

2018-05, ISPOR 2018, Baltimore, MD, USA

Value in Health, Vol. 21, S1 (May 2018)

Code

PRM40

Topic

Real World Data & Information Systems

Topic Subcategory

Reproducibility & Replicability

Disease

Cardiovascular Disorders

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×