Adaptation and Validation of a Natural Language Processing Algorithm to Use in Electronic Health Records to Identify Patients with Progressive Fibrosing-Interstitial Lung Disease in Spain
Author(s)
Balcells E1, Castellví I2, Caballero P3, Salinas MB4, Tort M5
1Hospital del Mar, Respiratory Department, Barcelona, Spain, 2Hospital Universitari de la Santa Creu i Sant Pau, Department of Rheumatology, Barcelona, Spain, 3Hospital Universitario de La Princesa, Radiology Department, Madrid, Spain, 4Hospital Universitario de Basurto, Respiratory Department, Bilbao, Spain, 5Boehringer Ingelheim, Sant Cugat, Barcelona, Spain
Presentation Documents
OBJECTIVES:
Progressive fibrosing-interstitial lung disease (PF-ILD) is a recently recognized condition that is often registered implicitly or explicitly in electronic health records (EHR). We aimed to adapt and validate a natural language processing (NLP) algorithm that identifies PF-ILD patients in EHR’s free-text in Spain.METHODS:
This cross-sectional, retrospective, and observational study included adults registered in the Hospital del Mar (Barcelona, Spain) between the 1st of January 2015 and the 31st of December 2019. The main study outcomes were precision, sensitivity, and accuracy metrics of the algorithm performance in identifying PF-ILD patients. Potential PF-ILD cases were detected according to progression criteria occurring within 24 months: decline in forced vital capacity, worsening of respiratory symptoms, and increased extent of fibrotic changes. We assessed precision, sensitivity, and accuracy of the algorithm performance based on experts’ verification of source data.RESULTS:
The algorithm identified 43 PF-ILD patients out of 323,713 screened EHR, from which 4 were registered using disease codes and the rest using free-text. The algorithm was precise (100%), sensitive (100%), and accurate (F1-measure >90%) in identifying PF-ILD cases and classifying intermediate algorithm variables (e.g., idiopathic pulmonary fibrosis: 84%, 98%, and >90% respectively; and interstitial lung disease: 98%, 96%, and >90%, respectively).CONCLUSIONS:
The adapted algorithm showed to be precise, sensitive, and accurate in identifying PF-ILD diagnosis that were not explicitly coded in EHR. Considering the algorithm performance -which may improve as the algorithm learns- and high applicability (e.g., standardized, and economic), its implementation to refine epidemiological indicators may be considered. In addition, this methodology may be of interest for other diseases with no explicit disease code or progressive diseases. “The author(s) meet criteria for authorship as recommended by the International Committee of Medical Journal Editors (ICMJE). BI was given the opportunity to review the abstract for medical and scientific accuracy as well as intellectual property considerations."Conference/Value in Health Info
Value in Health, Volume 26, Issue 11, S2 (December 2023)
Code
RWD94
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Electronic Medical & Health Records
Disease
Rare & Orphan Diseases, Respiratory-Related Disorders (Allergy, Asthma, Smoking, Other Respiratory)