Which Generative AI Method Is Used for High Specificity? A Methodological Comparison From the Systemic Literature Review of the Burden of Influenza in France
Author(s)
Ludovic Lamarsalle, MSc, PharmD1, Magali Lemaitre, PhD2.
1HEALSTRA, Lyon, France, 2Health Data Expertise, Génissieux, France.
1HEALSTRA, Lyon, France, 2Health Data Expertise, Génissieux, France.
OBJECTIVES: To evaluate the efficiency and accuracy of two artificial intelligence (AI) methodologies for conducting systematic literature reviews (SLR) on influenza burden among elderly populations in France, comparing performance metrics, resource utilization, and consistency of findings.
METHODS: A dataset of 2,060 research abstracts published between 2010-2025 was analyzed using two distinct AI methodologies. Method 1 employed a batch processing approach with dual AI models (GPT-4o and Mistral-large) analyzing abstracts in groups of 25, followed by human arbitration of discrepancies. Method 2 utilized a direct comparison workflow where Claude-3.7-Sonnet and GPT-4o independently analyzed each individual abstract according to predefined criteria. When both models agreed on selection or rejection, the decision was accepted; when disagreement occurred, Mistral-large provided arbitration. For both methods, inclusion criteria were consistent: elderly population (aged 60+), sample size over 10,000, conducted in France, and addressing at least one component of influenza burden (epidemiological, clinical, economic, or humanistic). Performance metrics included processing time, selection accuracy, and resource requirements. Human validation served as the reference standard.
RESULTS: Methods 1 and 2 yielded 30 and 13 relevant abstracts respectively. Method 2 demonstrated 56% fewer false positives than Method 1 (2 vs. 36), while maintaining comparable selection sensitivity. Processing time was approximately 1 hour with Method 1 and 2 hours with Method 2. Selected abstracts revealed that in France, elderly populations bear a disproportionate influenza burden, representing 80% of hospital deaths, 70% of excess hospitalizations, and 77% of associated costs. Influenza causes 25,000-55,000 annual hospitalizations in people over 65, with 20% re-hospitalization rates within 3 months and overall costs of 155-350 million euros per season.
CONCLUSIONS: AI-augmented systematic literature reviews demonstrate significant efficiency gains while maintaining acceptable accuracy. The model consensus approach (Method 2) showed superior precision over batch processing (Method 1), suggesting its preferential use for SLRs requiring high specificity.
METHODS: A dataset of 2,060 research abstracts published between 2010-2025 was analyzed using two distinct AI methodologies. Method 1 employed a batch processing approach with dual AI models (GPT-4o and Mistral-large) analyzing abstracts in groups of 25, followed by human arbitration of discrepancies. Method 2 utilized a direct comparison workflow where Claude-3.7-Sonnet and GPT-4o independently analyzed each individual abstract according to predefined criteria. When both models agreed on selection or rejection, the decision was accepted; when disagreement occurred, Mistral-large provided arbitration. For both methods, inclusion criteria were consistent: elderly population (aged 60+), sample size over 10,000, conducted in France, and addressing at least one component of influenza burden (epidemiological, clinical, economic, or humanistic). Performance metrics included processing time, selection accuracy, and resource requirements. Human validation served as the reference standard.
RESULTS: Methods 1 and 2 yielded 30 and 13 relevant abstracts respectively. Method 2 demonstrated 56% fewer false positives than Method 1 (2 vs. 36), while maintaining comparable selection sensitivity. Processing time was approximately 1 hour with Method 1 and 2 hours with Method 2. Selected abstracts revealed that in France, elderly populations bear a disproportionate influenza burden, representing 80% of hospital deaths, 70% of excess hospitalizations, and 77% of associated costs. Influenza causes 25,000-55,000 annual hospitalizations in people over 65, with 20% re-hospitalization rates within 3 months and overall costs of 155-350 million euros per season.
CONCLUSIONS: AI-augmented systematic literature reviews demonstrate significant efficiency gains while maintaining acceptable accuracy. The model consensus approach (Method 2) showed superior precision over batch processing (Method 1), suggesting its preferential use for SLRs requiring high specificity.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR224
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
Geriatrics, Infectious Disease (non-vaccine)