Automated Extraction of Cost-Effectiveness Models Data from Health Technology Assessment Submissions Using Large-Language Models (LLMS): Does the Prompting Approach Matter ?

Speaker(s)

Szabó G¹, Pinsent A², Slim M³, Sullivan S⁴, Benedict Á⁵, Rivolo S⁶
¹Evidera, a part of Thermo Fisher Scientific, Budapest, Hungary, ²Evidera, a part of Thermo Fisher Scientific, London, UK, ³Evidera, a part of Thermo Fisher Scientific, Montreal, QC, Canada, ⁴Evidera, a part of Thermo Fisher Scientific, Paris, France, ⁵Evidera, a part of Thermo Fisher Scientific, Vienna, Austria, ⁶Evidera, a part of Thermo Fisher Scientific, San Felice Segrate, MI, Italy

Presentation Documents

OBJECTIVES: Large language models (LLMs) are capable of automatically extracting cost-effectiveness models (CEMs) data from prior technology assessments (TAs). However, the LLMs’ outputs quality is highly dependent on adequate prompting strategies, with limited guidance currently available on HEOR-specific prompting. This study evaluated the performance of alternative prompting strategies for automated CEM data extraction from TAs, using GPT4 LLM.

METHODS: Nine TAs across three health technology agencies (NICE-UK, CADTH-Canada, ICER-US) were reviewed. For each TA, automated GPT4-based data extraction of 10 CEM domains was performed, encompassing simple (model structure, time horizon, cycle length, comparators list, health states) and advanced (key outcome modelling approach, cost categories, utility approach, committee critiques, modelling assumptions) domains. We examined four alternative prompting strategies: 1] simple prompt; 2] chain of three prompts with increasing complexity; 3] complex chain-of-thought prompt; 4] complex chain-of-thought prompt combined with a domain-specific example. The LLM extractions were then compared against human-validated extractions and scored on a four-point (simple domains) or a five-point (advanced domains) Likert scales.

RESULTS: The chain of prompts strategy outperformed the other prompting strategies, correctly extracting the simpler domains most of the time (model structure: 100%, time horizon: 78%, cycle length and health states: 67% each). The list of comparators was not adequately extracted by any prompting strategy (<45% correct extractions). The advanced domains were more challenging, with the second and fourth prompting strategies outperforming the other strategies, achieving 44%-78% correct or partially correct extractions, across domains. Committee critiques and modelling assumptions were the most challenging domains to extract correctly (33%-56%).

CONCLUSIONS:

The chain of prompts was the most promising prompting strategy for LLM-assisted CEM data extraction from TAs. However, all prompting strategies showed suboptimal performance in extracting advanced domains. Further research is needed to optimize the HEOR-specific prompting strategies and inform best practices for LLM-assisted data extraction, along with LLMs continuous evolution.

Code

MSR179

Topic

Economic Evaluation, Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Cost-comparison, Effectiveness, Utility, Benefit Analysis, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

ISPOR Europe 2024

17 - 20 November