AI Time and Motion - Analysis of the Accuracy and Efficiency of AI for HTA-Standard SLRs
Author(s)
Fadel Shoughari, MSc1, Amanda Hansson Hedblom, MS1, Karl Freemyer, MBA2, Caroline Barwood, MS3, Kevin Kallmes, BS, MA, JD4, Devendra Patil, MS4.
1FIECON, a Herspiegel Company, London, EC1R 3AW, United Kingdom, 2FIECON, a Herspiegel Company, Bloomfield, NJ, USA, 3FIECON, a Herspiegel Company, St. Albans, United Kingdom, 4Nested Knowledge, St. Paul, MN, USA.
1FIECON, a Herspiegel Company, London, EC1R 3AW, United Kingdom, 2FIECON, a Herspiegel Company, Bloomfield, NJ, USA, 3FIECON, a Herspiegel Company, St. Albans, United Kingdom, 4Nested Knowledge, St. Paul, MN, USA.
OBJECTIVES: Health technology assessors (e.g. NICE, ICER) and regulatory authorities (e.g. FDA) have signalled their intent to synthesize and evaluate new product evidence via artificial intelligence (AI) and large language models (LLM). Industry has responded by adopting AI and LLM technology, substantially accelerating systematic literature reviews (SLRs). To ensure proper stewardship, industry, HTA and regulators must ensure efficiency and accuracy relative to manual processes. This research aims to: 1)Compare the efficiency of manual SLRs versus AI-assisted methods; 2)Analyze the accuracy of Hybrid-AI and fully-AI screening; 3)Analyze the accuracy of AI extractions.
METHODS: A matched research question and literature search string were used to conduct the following tasks: •Manual Dual Screening SLR using Excel, •Hybrid AI SLR: Supervised AI screener, •Fully-AI: Unsupervised AI screener• For extraction, fully-AI was compared to manual Excel-based extraction. The time for each task was compared across the different literature reviews. Recall, Precision, and Accuracy were compared between the expert-reviewer-level, hybrid, and fully-AI approaches for Screening, with the adjudicator considered the gold standard; for extraction, manual Excel extraction was the gold standard.
RESULTS: The literature search identified 234 publications. Hybrid-AI screening had 97.0% Accuracy, 96.2% Precision, and 80.6% Recall; fully-AI had 93.2.% Accuracy, 100% Precision, and 48.4% Recall; for comparison, human reviewer-level screening had 90.1% Accuracy, 61.5% Precision, and 88.7% Recall. Hybrid-AI screening provided 51.4% and fully-AI screening 81.1% time savings compared to fully-manual methods. Fully AI extraction produced 95.7% Accuracy and 93.1% time savings for full AI extraction compared to manual SLR. AI extractions were consistent with human extractions, but extractions notably captured qualitative text.
CONCLUSIONS: This research demonstrates that the use of AI and LLMs in conducting SLRs produces significant time savings while maintaining a high degree of accuracy. While quantitative extraction required further exploration, Hybrid-AI approaches produced the highest screening accuracy.
METHODS: A matched research question and literature search string were used to conduct the following tasks: •Manual Dual Screening SLR using Excel, •Hybrid AI SLR: Supervised AI screener, •Fully-AI: Unsupervised AI screener• For extraction, fully-AI was compared to manual Excel-based extraction. The time for each task was compared across the different literature reviews. Recall, Precision, and Accuracy were compared between the expert-reviewer-level, hybrid, and fully-AI approaches for Screening, with the adjudicator considered the gold standard; for extraction, manual Excel extraction was the gold standard.
RESULTS: The literature search identified 234 publications. Hybrid-AI screening had 97.0% Accuracy, 96.2% Precision, and 80.6% Recall; fully-AI had 93.2.% Accuracy, 100% Precision, and 48.4% Recall; for comparison, human reviewer-level screening had 90.1% Accuracy, 61.5% Precision, and 88.7% Recall. Hybrid-AI screening provided 51.4% and fully-AI screening 81.1% time savings compared to fully-manual methods. Fully AI extraction produced 95.7% Accuracy and 93.1% time savings for full AI extraction compared to manual SLR. AI extractions were consistent with human extractions, but extractions notably captured qualitative text.
CONCLUSIONS: This research demonstrates that the use of AI and LLMs in conducting SLRs produces significant time savings while maintaining a high degree of accuracy. While quantitative extraction required further exploration, Hybrid-AI approaches produced the highest screening accuracy.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR75
Topic
Health Policy & Regulatory, Health Technology Assessment, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas