Utilizing LLMs to Enhance Patient-Reported Outcome Measures: Application to the EQ-5D and Bolt-ons

Author(s)

Jan Heijdra Suasnabar, MSc.
Biomedical Data Science, Leiden University Medical Center, Leiden, Netherlands.

OBJECTIVES: Large language models (LLMs) have shown promising applications in healthcare, yet little is known about their potential to improve the measurement of patient-reported outcomes, which are central to inform decision-making across health system levels. We explored the use of LLMs to develop or extend patient-reported outcome measures (PROMs) based on information from patient-reported free-text data.
METHODS: The GPT-4o model was used to analyze data from 1,977 members of the Dutch Celiac Association who completed the EQ-5D-5L and narratively described the impact of celiac disease on their lives. Prompts to the LLM were designed to identify possible additional dimensions (i.e., ‘bolt-on’ dimensions) to the EQ-5D-5L, and to produce preliminary bolt-on item wordings for selected dimensions. Evaluation of the approach comprised: comparisons of dimensions identified by two alternative approaches (i.e., qualitative analysis and topic modelling); text-entry level agreement (i.e., Cohen’s Kappa) on identified dimensions; suitability of LLM-generated bolt-on wordings assessed against existing criteria using Likert scales; and a critical appraisal consisting of face validity assessments and a SWOT analysis.
RESULTS: The LLM identified 12 potential bolt-on dimensions to the EQ-5D-5L, of which 9 were also identified using qualitative analysis, and 5 using topic modelling. Text-entry level agreement between the LLM and qualitative approaches was ‘substantial’ or ‘almost perfect’, with two exceptions of poor/fair agreement (median Kappa=0.70, IQR=0.44-0.89). The LLM-generated potential bolt-on wordings for the 4 most common dimensions (i.e., ‘Dietary restrictions’, ‘Fatigue’, ‘Social participation’, and ‘Gastrointestinal symptoms’) scored 4/5, 4.4/5, 4.3/5, and 4.2/5 respectively when assessed against existing criteria.
CONCLUSIONS: This study demonstrates the promising potential of LLMs to inform the development or modification of PROMs based on patient-reported text data. A limitation to generalizability and reliability is the approach’s dependency on prompt quality. Further research should assess the approach’s transferability across disease areas and different data sources (e.g. social media, EHRs).

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

P61

Topic

Methodological & Statistical Research, Patient-Centered Research

Topic Subcategory

Instrument Development, Validation, & Translation, Patient-reported Outcomes & Quality of Life Outcomes

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)