Natural Language Processing for Automated Classification of Qualitative Data From Interviews of Patients With Cancer

Dec 1, 2022, 00:00

10.1016/j.jval.2022.06.004

https://www.valueinhealthjournal.com/article/S1098-3015(22)02040-X/fulltext

Title : Natural Language Processing for Automated Classification of Qualitative Data From Interviews of Patients With Cancer

Citation : https://www.valueinhealthjournal.com/action/showCitFormats?pii=S1098-3015(22)02040-X&doi=10.1016/j.jval.2022.06.004

First page : 1995

Section Title : PATIENT-REPORTED OUTCOMES

Open access? : Yes

Section Order : 1995

Objectives

This study sought to explore the use of novel natural language processing (NLP) methods for classifying unstructured, qualitative textual data from interviews of patients with cancer to identify patient-reported symptoms and impacts on quality of life.

Methods

We tested the ability of 4 NLP models to accurately classify text from interview transcripts as “symptom,” “quality of life impact,” and “other.” Interview data sets from patients with hepatocellular carcinoma (HCC) (n = 25), biliary tract cancer (BTC) (n = 23), and gastric cancer (n = 24) were used. Models were cross-validated with transcript subsets designated for training, validation, and testing. Multiclass classification performance of the 4 models was evaluated at paragraph and sentence level using the HCC testing data set and analyzed by the one-versus-rest technique quantified by the receiver operating characteristic area under the curve (ROC AUC) score.

Results

NLP models accurately classified multiclass text from patient interviews. The Bidirectional Encoder Representations from Transformers model generally outperformed all other models at paragraph and sentence level. The highest predictive performance of the Bidirectional Encoder Representations from Transformers model was observed using the HCC data set to train and BTC data set to test (mean ROC AUC, 0.940 [SD 0.028]), with similarly high predictive performance using balanced and imbalanced training data sets from BTC and gastric cancer populations.

Conclusions

NLP models were accurate in predicting multiclass classification of text from interviews of patients with cancer, with most surpassing 0.9 ROC AUC at paragraph level. NLP may be a useful tool for scaling up processing of patient interviews in clinical studies and, thus, could serve to facilitate patient input into drug development and improving patient care.

Categories :

Artificial Intelligence, Machine Learning, Predictive Analytics
Methodological & Statistical Research
Oncology
Specific Diseases & Conditions
Study Approaches
Surveys & Expert Panels

Tags :

Bidirectional Encoder Representations from Transformers
natural language processing
patient interviews
patient-reported outcomes

Regions :

North America

ViH Article Tags :