Open-Source LLMs Performance on Information Retrieval Tasks for Health Outcomes Research

Moderator

Junjing Lin, PhD, Takeda Pharmaceuticals, Cambridge, MA, United States

Speakers

Achilleas Livieratos, Xalandri, Greece; Di Zhang, PhD; All-shine Chen; Maria Kudela; Yuxi Zhao; Cynthia Basu; Sai Hurrish Dharmarajan; Margaret Gamalo

OBJECTIVES: We evaluated the effectiveness of open-source large language models (LLMs) in information retrieval tasks for Health Economics and Outcomes Research (HEOR). Specifically, our objectives were to analyze the performance of four open-source LLMs (Qwen2-72B, Llama-3.1-8B, Mistral-7B, and Phi-3-Mini-4K) on data extraction tasks from clinical abstracts and full manuscripts, using a proprietary model, OpenAI's o1, as an evaluator.
METHODS: Our methodology included zero-shot learning assessments across three types of scenarios—open-ended prompts for abstracts, open-ended prompts for full manuscripts, and narrow-specific prompts for full manuscripts—applied to immunology-focused publications sourced from PubMed. We adopted a simplified Fine-grained Language Model Evaluation based on Alignment Skill Sets (FLASK) framework, assessing models on six critical metrics: Accuracy, Robustness, Creativity, Insights, Quantitative Information, and Logical Reasoning. Pair-wise output comparisons and win-rate calculations provided insight into model performances across varied prompt scenarios.
RESULTS: Results indicated that, while there were no significant differences across models, certain trends emerged. Notably, Qwen2-72B demonstrated superior performance, especially in open-ended tasks, achieving over 50% win rates against other models. These findings suggest that open-source LLMs, particularly with refined prompt engineering, are viable alternatives to proprietary models for HEOR-specific applications. The study supports the adoption of open-source models in pharmaceutical research, highlighting their flexibility, cost-effectiveness, and adaptability to regulatory requirements.
CONCLUSIONS: In conclusion, open-source LLMs present an underutilized yet promising tool for HEOR, offering substantial benefits in scalability and customization for data-driven health outcomes research. Further investigation into varied document types and prompt configurations may deepen understanding of their utility in HEOR and medical affairs.

Conference/Value in Health Info

2025-05, ISPOR 2025, Montréal, Quebec, CA

Value in Health, Volume 28, Issue S1

Code

SA1

Topic

Study Approaches

Disease

SDC: Systemic Disorders/Conditions (Anesthesia, Auto-Immune Disorders (n.e.c.), Hematological Disorders (non-oncologic), Pain)

Presentation (CTI)