A Natural Language Processing (NLP) Approach to Automate Patients’ Testimonials Analysis


Hayat P1, Clemente C1, Martenot V2, Rollot M3
1Quinten Health, PARIS, France, 2Quinten, PARIS, France, 3Quinten Health, 75017 - PARIS 17, France


Patients’ testimonials (e.g. posts on forums or responses to questionnaires) provide valuable insights to define and characterize patient-reported outcomes (PRO), quality of life and patients’ perspective of disease symptoms. However, traditional NLP methods used for automated analysis of patients’ testimonials are based on co-occurrence word frequency, and thus are not fit-for-purpose for such data, with short texts and rare co-occurrences. Building upon an efficient method based on semantic proximity we introduced recently, the objective is to improve results post-processing and method scalability.


First, testimonials are vectorized to embeddings with a pre-trained Sentence-BERT language model, capturing the meaning of the texts beyond simple word co-occurrence. To ease interpretation, embeddings dimensionality is reduced to two using the UMAP algorithm. Then, an agglomerative clustering is performed on new embeddings with an optimal number of clusters (based on silhouette scores). In addition to previous work, the clustering dendrogram facilitates post-processing interventions by automatically pre-selecting the clusters that can be merged together or split into two subclusters. The most prevalent terms in a cluster are used to label it. Sentiment analysis is also performed to refine tags and ensure clusters’ definition relevance.

RESULTS: Tested on patients’ testimonials of an average length of 15 words, our method provides more consistent and interpretable topics than state-of-the-art approaches (e.g. latent Dirichlet allocation, non-negative matrix factorization). Compared to previous work, the improved clustering post-processing makes the analysis pipeline much faster to execute and more scalable, without altering performance.


Our proposed method allows to extract more consistent topics from a large volume of short texts in a more automated and less time-consuming way. It provides stronger insights on patients’ perception about a wide range of healthcare topics (side effects, treatment, symptoms...), paving the way for better PRO definitions and patient-centric evaluation, and striving better adherence to treatments.

Conference/Value in Health Info

2022-11, ISPOR Europe 2022, Vienna, Austria

Value in Health, Volume 25, Issue 12S (December 2022)




Patient-Centered Research

Topic Subcategory

Adherence, Persistence, & Compliance, Patient Behavior and Incentives, Patient-reported Outcomes & Quality of Life Outcomes, Stated Preference & Patient Satisfaction


No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now