DECODING PARTICIPANT VOICES WITH ARTIFICIAL INTELLIGENCE (AI): A PILOT ANALYSIS OF FREE-TEXT PARTICIPANT-REPORTED OUTCOME (PRO) DATA FROM THE PURPOSE 1 STUDY OF LENACAPAVIR FOR HIV PRE-EXPOSURE PROPHYLAXIS (PREP)
Author(s)
SAEID SHAHRAZ, MD, PHD1, Ryan Thaliffdeen, PharmD, MS1, JeanPierre Coaquira Castro, MPH1, Aaditya Rawal, MS2, Veda Donthireddy, BA2, Shubhi Pathak, BS, MS2, Dylan Mezzio, MS, PharmD1.
1Gilead Sciences, Inc., Foster City, CA, USA, 2Costello Medical, Boston, MA, USA.
1Gilead Sciences, Inc., Foster City, CA, USA, 2Costello Medical, Boston, MA, USA.
OBJECTIVES: Free-text responses in PRO questionnaires can elicit deeper insights from complex questions than structured-response formats, but distilling consistent concepts from free text is labor-intensive. AI could facilitate automated analysis of large free-text datasets. A PRO questionnaire in the PURPOSE 1 study (NCT04994509) asked participants about their PrEP administration preferences (daily pills vs twice-yearly injections) and included a free-text question to explain their reasons. This pilot analysis compared manual versus AI-based categorization of free-text reasons for PrEP administration preference among PURPOSE 1 participants who indicated a preference for injections at Week 52.
METHODS: Using behavioral theories, we developed an ontology reflecting human reasoning behind administration preference. Two independent reviewers manually categorized 1724 free-text responses into 17 ontological concepts, with arbitration by a third. To generate AI-based concepts, the same dataset was provided to a secure version of Microsoft Copilot® (GPT-5), along with the ontological categories and few-shot enhancements. A random 10% sample was used to validate Copilot’s classifications, reviewed by two raters, with third-reviewer adjudication. Inter-rater agreement was quantified using Cohen’s Kappa. Accuracy was the number of correctly categorized concepts by Copilot divided by total responses. The most frequent human- and Copilot-categorized concepts were compared to assess categorization alignment.
RESULTS: Copilot classified 32% of 1724 responses as “other” (vs 3% by humans), 0% as “perceived efficacy” (vs 38%), 21% as “convenience/logistical effort” (vs 29%), and 14% as “adherence feasibility” (vs 21%). In the validation sample, Copilot classified 67% (117/174) of responses accurately, with Kappa=0.63 indicating substantial agreement.
CONCLUSIONS: Despite mismatches between human benchmark and Copilot categorization into the most common concepts, this pilot analysis generated valuable insights and suggests that AI offers a viable approach to analyzing free-text data. Large-language models with retrieval-augmented generation could substantially improve the efficiency and reliability of PRO analyses and unlock further opportunities to leverage free-text data.
METHODS: Using behavioral theories, we developed an ontology reflecting human reasoning behind administration preference. Two independent reviewers manually categorized 1724 free-text responses into 17 ontological concepts, with arbitration by a third. To generate AI-based concepts, the same dataset was provided to a secure version of Microsoft Copilot® (GPT-5), along with the ontological categories and few-shot enhancements. A random 10% sample was used to validate Copilot’s classifications, reviewed by two raters, with third-reviewer adjudication. Inter-rater agreement was quantified using Cohen’s Kappa. Accuracy was the number of correctly categorized concepts by Copilot divided by total responses. The most frequent human- and Copilot-categorized concepts were compared to assess categorization alignment.
RESULTS: Copilot classified 32% of 1724 responses as “other” (vs 3% by humans), 0% as “perceived efficacy” (vs 38%), 21% as “convenience/logistical effort” (vs 29%), and 14% as “adherence feasibility” (vs 21%). In the validation sample, Copilot classified 67% (117/174) of responses accurately, with Kappa=0.63 indicating substantial agreement.
CONCLUSIONS: Despite mismatches between human benchmark and Copilot categorization into the most common concepts, this pilot analysis generated valuable insights and suggests that AI offers a viable approach to analyzing free-text data. Large-language models with retrieval-augmented generation could substantially improve the efficiency and reliability of PRO analyses and unlock further opportunities to leverage free-text data.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR4
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, PRO & Related Methods, Survey Methods
Disease
SDC: Infectious Disease (non-vaccine)