DECODING PARTICIPANT VOICES WITH ARTIFICIAL INTELLIGENCE (AI): A PILOT ANALYSIS OF FREE-TEXT PARTICIPANT-REPORTED OUTCOME (PRO) DATA FROM THE PURPOSE 1 STUDY OF LENACAPAVIR FOR HIV PRE-EXPOSURE PROPHYLAXIS (PREP)

Author(s)

SAEID SHAHRAZ, MD, PHD¹, Ryan Thaliffdeen, PharmD, MS¹, JeanPierre Coaquira Castro, MPH¹, Aaditya Rawal, MS², Veda Donthireddy, BA², Shubhi Pathak, BS, MS², Dylan Mezzio, MS, PharmD¹.
¹Gilead Sciences, Inc., Foster City, CA, USA, ²Costello Medical, Boston, MA, USA.

Presentation Documents

ISPOR26_Shahraz_MSR4_POSTER.pdf

OBJECTIVES: Free-text responses in PRO questionnaires can elicit deeper insights from complex questions than structured-response formats, but distilling consistent concepts from free text is labor-intensive. AI could facilitate automated analysis of large free-text datasets. A PRO questionnaire in the PURPOSE 1 study (NCT04994509) asked participants about their PrEP administration preferences (daily pills vs twice-yearly injections) and included a free-text question to explain their reasons. This pilot analysis compared manual versus AI-based categorization of free-text reasons for PrEP administration preference among PURPOSE 1 participants who indicated a preference for injections at Week 52.
METHODS: Using behavioral theories, we developed an ontology reflecting human reasoning behind administration preference. Two independent reviewers manually categorized 1724 free-text responses into 17 ontological concepts, with arbitration by a third. To generate AI-based concepts, the same dataset was provided to a secure version of Microsoft Copilot® (GPT-5), along with the ontological categories and few-shot enhancements. A random 10% sample was used to validate Copilot’s classifications, reviewed by two raters, with third-reviewer adjudication. Inter-rater agreement was quantified using Cohen’s Kappa. Accuracy was the number of correctly categorized concepts by Copilot divided by total responses. The most frequent human- and Copilot-categorized concepts were compared to assess categorization alignment.
RESULTS: Copilot classified 32% of 1724 responses as “other” (vs 3% by humans), 0% as “perceived efficacy” (vs 38%), 21% as “convenience/logistical effort” (vs 29%), and 14% as “adherence feasibility” (vs 21%). In the validation sample, Copilot classified 67% (117/174) of responses accurately, with Kappa=0.63 indicating substantial agreement.
CONCLUSIONS: Despite mismatches between human benchmark and Copilot categorization into the most common concepts, this pilot analysis generated valuable insights and suggests that AI offers a viable approach to analyzing free-text data. Large-language models with retrieval-augmented generation could substantially improve the efficiency and reliability of PRO analyses and unlock further opportunities to leverage free-text data.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR4

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, PRO & Related Methods, Survey Methods

Disease

SDC: Infectious Disease (non-vaccine)

Presentation (CTI)