Using Artificial Intelligence to Predict Patient's Preferences
Author(s)
Tina Cheng, MPH, Juan M. Gonzalez, PhD, Shelby Reed, RPh, PhD, Semra Ozdemir, PhD;
Duke University, Durham, NC, USA
Duke University, Durham, NC, USA
OBJECTIVES: This study examined the potential of large-language models (LLMs), such as OpenAI's GPT-4, to predict patient preferences for medical treatments based on past choice data.
METHODS: The predictive capabilities of GPT-4 were evaluated using synthetic data derived from real patient choices in a discrete choice experiment (DCE) related to cancer care. The synthetic dataset included 50 patients, each answering 48 questions comparing two treatment options that varied by expected survival, long-term survival, health limitation, and out-of-pocket cost. For each patient, data were split into training and testing sets, where GPT-4 was tasked with predicting the treatment option a patient would choose. Various input conditions were tested, including framing GPT-4 as an “assistant” versus a “patient”, randomizing training and testing set questions, and standardizing outputs. The analysis included quantitative measures (prediction accuracy) and qualitative assessments (semantic and contextual appropriateness of responses).
RESULTS: When identical training and testing questions were used for each respondent, GPT-4 achieved an average prediction accuracy of 70.5% (Standard Deviation [SD] = 7.8%). After randomizing training and testing questions, accuracy remained stable at 69.9% (SD = 10.9%), although greater variability was observed across respondents. This variability likely reflected differences in the informativeness of DCE questions, where some had greater attribute differences between alternatives. Prompt engineering revealed that GPT-4 predictions aligned with expected utility theory, regardless its role (as an assistant or patient) or response depth (simple answer or with reasoning).
CONCLUSIONS: This study demonstrates that LLMs have the potential to effectively predict patient preferences, with accuracy surpassing typical caregiver predictions accuracy of 50%, which is comparable to random guessing. However, further research is needed to evaluate LLM reliability, ability to detect preference heterogeneity, and address ethical considerations. These findings highlight the potential of LLMs to support patient-centered decision-making, particularly for patients who lose decision-making capacity.
METHODS: The predictive capabilities of GPT-4 were evaluated using synthetic data derived from real patient choices in a discrete choice experiment (DCE) related to cancer care. The synthetic dataset included 50 patients, each answering 48 questions comparing two treatment options that varied by expected survival, long-term survival, health limitation, and out-of-pocket cost. For each patient, data were split into training and testing sets, where GPT-4 was tasked with predicting the treatment option a patient would choose. Various input conditions were tested, including framing GPT-4 as an “assistant” versus a “patient”, randomizing training and testing set questions, and standardizing outputs. The analysis included quantitative measures (prediction accuracy) and qualitative assessments (semantic and contextual appropriateness of responses).
RESULTS: When identical training and testing questions were used for each respondent, GPT-4 achieved an average prediction accuracy of 70.5% (Standard Deviation [SD] = 7.8%). After randomizing training and testing questions, accuracy remained stable at 69.9% (SD = 10.9%), although greater variability was observed across respondents. This variability likely reflected differences in the informativeness of DCE questions, where some had greater attribute differences between alternatives. Prompt engineering revealed that GPT-4 predictions aligned with expected utility theory, regardless its role (as an assistant or patient) or response depth (simple answer or with reasoning).
CONCLUSIONS: This study demonstrates that LLMs have the potential to effectively predict patient preferences, with accuracy surpassing typical caregiver predictions accuracy of 50%, which is comparable to random guessing. However, further research is needed to evaluate LLM reliability, ability to detect preference heterogeneity, and address ethical considerations. These findings highlight the potential of LLMs to support patient-centered decision-making, particularly for patients who lose decision-making capacity.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
MSR31
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Oncology