Adaptation and Validation of a Natural Language Processing Algorithm to Use in Electronic Health Records to Identify Patients with Progressive Fibrosing-Interstitial Lung Disease in Spain

Author(s)

Balcells E1, Castellví I2, Caballero P3, Salinas MB4, Tort M5
1Hospital del Mar, Respiratory Department, Barcelona, Spain, 2Hospital Universitari de la Santa Creu i Sant Pau, Department of Rheumatology, Barcelona, Spain, 3Hospital Universitario de La Princesa, Radiology Department, Madrid, Spain, 4Hospital Universitario de Basurto, Respiratory Department, Bilbao, Spain, 5Boehringer Ingelheim, Sant Cugat, Barcelona, Spain

OBJECTIVES: Progressive fibrosing-interstitial lung disease (PF-ILD) is a recently recognized condition that is often registered implicitly or explicitly in electronic health records (EHR). We aimed to adapt and validate a natural language processing (NLP) algorithm that identifies PF-ILD patients in EHR’s free-text in Spain.

METHODS: This cross-sectional, retrospective, and observational study included adults registered in the Hospital del Mar (Barcelona, Spain) between the 1st of January 2015 and the 31st of December 2019. The main study outcomes were precision, sensitivity, and accuracy metrics of the algorithm performance in identifying PF-ILD patients. Potential PF-ILD cases were detected according to progression criteria occurring within 24 months: decline in forced vital capacity, worsening of respiratory symptoms, and increased extent of fibrotic changes. We assessed precision, sensitivity, and accuracy of the algorithm performance based on experts’ verification of source data.

RESULTS: The algorithm identified 43 PF-ILD patients out of 323,713 screened EHR, from which 4 were registered using disease codes and the rest using free-text. The algorithm was precise (100%), sensitive (100%), and accurate (F1-measure >90%) in identifying PF-ILD cases and classifying intermediate algorithm variables (e.g., idiopathic pulmonary fibrosis: 84%, 98%, and >90% respectively; and interstitial lung disease: 98%, 96%, and >90%, respectively).

CONCLUSIONS: The adapted algorithm showed to be precise, sensitive, and accurate in identifying PF-ILD diagnosis that were not explicitly coded in EHR. Considering the algorithm performance -which may improve as the algorithm learns- and high applicability (e.g., standardized, and economic), its implementation to refine epidemiological indicators may be considered. In addition, this methodology may be of interest for other diseases with no explicit disease code or progressive diseases.

“The author(s) meet criteria for authorship as recommended by the International Committee of Medical Journal Editors (ICMJE). BI was given the opportunity to review the abstract for medical and scientific accuracy as well as intellectual property considerations."

Conference/Value in Health Info

2023-11, ISPOR Europe 2023, Copenhagen, Denmark

Value in Health, Volume 26, Issue 11, S2 (December 2023)

Code

RWD94

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Electronic Medical & Health Records

Disease

Rare & Orphan Diseases, Respiratory-Related Disorders (Allergy, Asthma, Smoking, Other Respiratory)

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×