PERFORMANCE OF NLP TOOL DESIGNED TO IDENTIFY AND EXTRACT BIOLOGIC DRUG INFUSION DATA FROM CLINICAL NOTES

Author(s)

Leng J¹, Lu C¹, Cannon G², Teng C¹, Zhou X¹, He T¹, Harrison DJ³, Shah N³, Sauer BC¹
¹Departments of Internal Medicine, University of Utah, Salt Lake City, UT, USA, ²VA Salt Lake City Health Care System, Salt Lake City, UT, USA, ³Amgen, Inc, Thousand Oaks, CA, USA

Presentation Documents

PRM37--strong-u-leng-j-u-sup-1-sup-strong-lu-c-sup-1-sup-cannon-g-sup-2-sup-teng-c-sup-1-sup-zhou-x-sup-1-sup-he-t-sup-1-sup-harrison-dj-sup-3-sup-shah-n-sup-3-sup-sauer-bc-sup-1-sup-br-sup-1-sup-departments-of-internal-medicine-university- ...

OBJECTIVES: Infusions of outpatient medications including biologic Disease Modifying Anti-Rheumatic Drugs (DMARDs) administered at Veterans Health Administration (VHA) facilities are well documented in the electronic medical record but data are not consistently entered into the pharmacy dispensing or nurse administration structured data sources. Although CPT codes can be used to identify many infusion events but inconsistent coding does not allow estimation of the administered dose. To address this, we developed Natural Language Processing (NLP) software to identify potential infusion notes. We used the NLP software to extract drug and dosage information, and standardize results. METHODS: Trained reviewers compared the NLP extractions to source documents and judged whether the software correctly extracted and standardized data. The software contains a display window allowing reviewers to directly assess the NLP extraction. NLP was run on all notes, but note titles were selected for evaluation based on the likelihood of containing infusion data. Accuracy, described as the number of correct extractions divided by the number of reviewed notes, was used to evaluate NLP performance. NLP was considered acceptable when the lower bound of the 95% Confidence Interval (CI) reached 80% accurate. RESULTS: The NLP was trained on 4,000 notes and the evaluation phase reviewed 35,190 notes based on 255 note titles covering 93.1% of eligible notes. The overall accuracy rate was 94.0% [95%CI: 85.5-100%] 33,077 notes were correctly extracted, 1,950 failed to extract infusion data and 163 contained an incorrect extraction. The range of the lower bound of 95%CI for the 255 titles was 78.8-100%; 247 (96.9%) titles had a lower bound >80%. CONCLUSIONS: The NLP software demonstrated acceptable accuracy when extracting biologic DMARD infusion data for approximately 97% of note titles, suggesting that clinical notes are a reliable data source to identify biologic DMARD infusion data when coding is inconsistent.

Conference/Value in Health Info

2014-05, ISPOR 2014, Palais des Congres de Montreal

Value in Health, Vol. 17, No. 3 (May 2014)

Code

PRM37

Topic

Real World Data & Information Systems

Topic Subcategory

Reproducibility & Replicability

Disease

Musculoskeletal Disorders

Explore Related HEOR by Topic

Real-World Data

Presentation