ACCELERATING DYNAMIC HTA LANDSCAPING IN ONCOLOGY THROUGH AUTONOMOUS GENERATIVE AI-DRIVEN MULTILINGUAL DATA EXTRACTION
Author(s)
Manuel Cossio, MPhil, MS1, Lilia Leisle, PhD2;
1Cytel, Director, Artificial Intelligence Lead, Dubendorf, Switzerland, 2Cytel, Berlin, Germany
1Cytel, Director, Artificial Intelligence Lead, Dubendorf, Switzerland, 2Cytel, Berlin, Germany
OBJECTIVES: To develop and evaluate autonomous large language model (LLM)-based agents for structured information extraction from multilingual Health Technology Assessment (HTA) reports to support EU Joint Clinical Assessment (JCA) Population-Intervention-Comparator-Outcome (PICO) simulation, including standard PICO elements and context-specific (CS) HTA evidence.
METHODS: Two sequential LLM-based agents were developed to perform information extraction using 21 expert-generated questions. The extraction covered standard PICO components, including the assessed population, accepted comparators, and outcomes, as well as context-specific elements such as methodological requirements, reasons for non-acceptance of outcomes or comparators, and other critique points reported in HTA documents. Agent 1 used a general prompt, while Agent 2 incorporated additional clarification instructions within selected questions to improve contextual understanding. Performance was evaluated using a custom scoring framework assigning 1 point each for accuracy and completeness. Any response containing hallucinated content received a total score of 0 regardless of accuracy or completeness. The agents were evaluated on publicly available osimertinib HTA reports from Spain (4,678 words), the Netherlands (2,512 words), and France (9,876 words).
RESULTS: Both agents completed the full extraction set across all documents, with approximately 90% of questions answered without hallucinations. Agent 2 outperformed Agent 1, achieving a higher mean number of fully correct responses (16.6 vs. 13.3). The French HTA report showed the highest performance for both agents. Agent 1 generated more partially correct answers (mean 6.6 vs. 5) and was the only agent to produce hallucinated content, observed in the Spanish report.
CONCLUSIONS: Expert-guided prompt refinement substantially improved autonomous extraction of both standard and CS HTA information from multilingual reports. LLM-based agents show promise for scalable HTA data extraction to support EU JCA PICO simulations. However, further methodological refinement and integration of HTA experts as humans-in-the-loop remain essential to reduce hallucinations, verify completeness and accuracy, and ensure reliability in regulatory and HTA applications.
METHODS: Two sequential LLM-based agents were developed to perform information extraction using 21 expert-generated questions. The extraction covered standard PICO components, including the assessed population, accepted comparators, and outcomes, as well as context-specific elements such as methodological requirements, reasons for non-acceptance of outcomes or comparators, and other critique points reported in HTA documents. Agent 1 used a general prompt, while Agent 2 incorporated additional clarification instructions within selected questions to improve contextual understanding. Performance was evaluated using a custom scoring framework assigning 1 point each for accuracy and completeness. Any response containing hallucinated content received a total score of 0 regardless of accuracy or completeness. The agents were evaluated on publicly available osimertinib HTA reports from Spain (4,678 words), the Netherlands (2,512 words), and France (9,876 words).
RESULTS: Both agents completed the full extraction set across all documents, with approximately 90% of questions answered without hallucinations. Agent 2 outperformed Agent 1, achieving a higher mean number of fully correct responses (16.6 vs. 13.3). The French HTA report showed the highest performance for both agents. Agent 1 generated more partially correct answers (mean 6.6 vs. 5) and was the only agent to produce hallucinated content, observed in the Spanish report.
CONCLUSIONS: Expert-guided prompt refinement substantially improved autonomous extraction of both standard and CS HTA information from multilingual reports. LLM-based agents show promise for scalable HTA data extraction to support EU JCA PICO simulations. However, further methodological refinement and integration of HTA experts as humans-in-the-loop remain essential to reduce hallucinations, verify completeness and accuracy, and ensure reliability in regulatory and HTA applications.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
HTA27
Topic
Health Technology Assessment
Topic Subcategory
Decision & Deliberative Processes
Disease
SDC: Oncology