Natural Language Processing for Automated Classification of Qualitative Data From Interviews of Patients With Cancer

Dec 1, 2022, 00:00
10.1016/j.jval.2022.06.004
https://www.valueinhealthjournal.com/article/S1098-3015(22)02040-X/fulltext
Title : Natural Language Processing for Automated Classification of Qualitative Data From Interviews of Patients With Cancer
Citation : https://www.valueinhealthjournal.com/action/showCitFormats?pii=S1098-3015(22)02040-X&doi=10.1016/j.jval.2022.06.004
First page : 1995
Section Title : PATIENT-REPORTED OUTCOMES
Open access? : Yes
Section Order : 1995

Objectives

This study sought to explore the use of novel natural language processing (NLP) methods for classifying unstructured, qualitative textual data from interviews of patients with cancer to identify patient-reported symptoms and impacts on quality of life.

Methods

We tested the ability of 4 NLP models to accurately classify text from interview transcripts as “symptom,” “quality of life impact,” and “other.” Interview data sets from patients with hepatocellular carcinoma (HCC) (n = 25), biliary tract cancer (BTC) (n = 23), and gastric cancer (n = 24) were used. Models were cross-validated with transcript subsets designated for training, validation, and testing. Multiclass classification performance of the 4 models was evaluated at paragraph and sentence level using the HCC testing data set and analyzed by the one-versus-rest technique quantified by the receiver operating characteristic area under the curve (ROC AUC) score.

Results

NLP models accurately classified multiclass text from patient interviews. The Bidirectional Encoder Representations from Transformers model generally outperformed all other models at paragraph and sentence level. The highest predictive performance of the Bidirectional Encoder Representations from Transformers model was observed using the HCC data set to train and BTC data set to test (mean ROC AUC, 0.940 [SD 0.028]), with similarly high predictive performance using balanced and imbalanced training data sets from BTC and gastric cancer populations.

Conclusions

NLP models were accurate in predicting multiclass classification of text from interviews of patients with cancer, with most surpassing 0.9 ROC AUC at paragraph level. NLP may be a useful tool for scaling up processing of patient interviews in clinical studies and, thus, could serve to facilitate patient input into drug development and improving patient care.

Categories :
  • Artificial Intelligence, Machine Learning, Predictive Analytics
  • Methodological & Statistical Research
  • Oncology
  • Specific Diseases & Conditions
  • Study Approaches
  • Surveys & Expert Panels
Tags :
  • Bidirectional Encoder Representations from Transformers
  • natural language processing
  • patient interviews
  • patient-reported outcomes
Regions :
  • North America
ViH Article Tags :