Which Generative AI Method Is Used for High Specificity? A Methodological Comparison From the Systemic Literature Review of the Burden of Influenza in France

Author(s)

Ludovic Lamarsalle, MSc, PharmD¹, Magali Lemaitre, PhD².
¹HEALSTRA, Lyon, France, ²Health Data Expertise, Génissieux, France.

OBJECTIVES: To evaluate the efficiency and accuracy of two artificial intelligence (AI) methodologies for conducting systematic literature reviews (SLR) on influenza burden among elderly populations in France, comparing performance metrics, resource utilization, and consistency of findings.
METHODS: A dataset of 2,060 research abstracts published between 2010-2025 was analyzed using two distinct AI methodologies. Method 1 employed a batch processing approach with dual AI models (GPT-4o and Mistral-large) analyzing abstracts in groups of 25, followed by human arbitration of discrepancies. Method 2 utilized a direct comparison workflow where Claude-3.7-Sonnet and GPT-4o independently analyzed each individual abstract according to predefined criteria. When both models agreed on selection or rejection, the decision was accepted; when disagreement occurred, Mistral-large provided arbitration. For both methods, inclusion criteria were consistent: elderly population (aged 60+), sample size over 10,000, conducted in France, and addressing at least one component of influenza burden (epidemiological, clinical, economic, or humanistic). Performance metrics included processing time, selection accuracy, and resource requirements. Human validation served as the reference standard.
RESULTS: Methods 1 and 2 yielded 30 and 13 relevant abstracts respectively. Method 2 demonstrated 56% fewer false positives than Method 1 (2 vs. 36), while maintaining comparable selection sensitivity. Processing time was approximately 1 hour with Method 1 and 2 hours with Method 2. Selected abstracts revealed that in France, elderly populations bear a disproportionate influenza burden, representing 80% of hospital deaths, 70% of excess hospitalizations, and 77% of associated costs. Influenza causes 25,000-55,000 annual hospitalizations in people over 65, with 20% re-hospitalization rates within 3 months and overall costs of 155-350 million euros per season.
CONCLUSIONS: AI-augmented systematic literature reviews demonstrate significant efficiency gains while maintaining acceptable accuracy. The model consensus approach (Method 2) showed superior precision over batch processing (Method 1), suggesting its preferential use for SLRs requiring high specificity.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

MSR224

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Geriatrics, Infectious Disease (non-vaccine)

Presentation (CTI)