PERFORMANCE OF NLP TOOL DESIGNED TO IDENTIFY AND EXTRACT BIOLOGIC DRUG INFUSION DATA FROM CLINICAL NOTES
Author(s)
Leng J1, Lu C1, Cannon G2, Teng C1, Zhou X1, He T1, Harrison DJ3, Shah N3, Sauer BC1
1Departments of Internal Medicine, University of Utah, Salt Lake City, UT, USA, 2VA Salt Lake City Health Care System, Salt Lake City, UT, USA, 3Amgen, Inc, Thousand Oaks, CA, USA
OBJECTIVES: Infusions of outpatient medications including biologic Disease Modifying Anti-Rheumatic Drugs (DMARDs) administered at Veterans Health Administration (VHA) facilities are well documented in the electronic medical record but data are not consistently entered into the pharmacy dispensing or nurse administration structured data sources. Although CPT codes can be used to identify many infusion events but inconsistent coding does not allow estimation of the administered dose. To address this, we developed Natural Language Processing (NLP) software to identify potential infusion notes. We used the NLP software to extract drug and dosage information, and standardize results. METHODS: Trained reviewers compared the NLP extractions to source documents and judged whether the software correctly extracted and standardized data. The software contains a display window allowing reviewers to directly assess the NLP extraction. NLP was run on all notes, but note titles were selected for evaluation based on the likelihood of containing infusion data. Accuracy, described as the number of correct extractions divided by the number of reviewed notes, was used to evaluate NLP performance. NLP was considered acceptable when the lower bound of the 95% Confidence Interval (CI) reached 80% accurate. RESULTS: The NLP was trained on 4,000 notes and the evaluation phase reviewed 35,190 notes based on 255 note titles covering 93.1% of eligible notes. The overall accuracy rate was 94.0% [95%CI: 85.5-100%] 33,077 notes were correctly extracted, 1,950 failed to extract infusion data and 163 contained an incorrect extraction. The range of the lower bound of 95%CI for the 255 titles was 78.8-100%; 247 (96.9%) titles had a lower bound >80%. CONCLUSIONS: The NLP software demonstrated acceptable accuracy when extracting biologic DMARD infusion data for approximately 97% of note titles, suggesting that clinical notes are a reliable data source to identify biologic DMARD infusion data when coding is inconsistent.
Conference/Value in Health Info
2014-05, ISPOR 2014, Palais des Congres de Montreal
Value in Health, Vol. 17, No. 3 (May 2014)
Code
PRM37
Topic
Real World Data & Information Systems
Topic Subcategory
Reproducibility & Replicability
Disease
Musculoskeletal Disorders