Can AI-Assisted Data Extraction From HTA Reports Improve Comparative HTA Research: A Case Study on NICE Assessment Reports
Author(s)
Jan-Willem Versteeg, MSc, PharmD1, Marie De Bruin, PhD1, Maarten Schermer, Drs.1, Shiva Nadi Najafabadi, MSc1, Modhurita Mitra, PhD1, Christine Leopold, PhD1, Aukje Mantel-Teeuwisse, PhD1, Wim Goettsch, MSc, PhD2, Lourens Bloem, PhD1.
1Utrecht University, Utrecht, Netherlands, 2Zorginstituut Nederland, Diemen, Netherlands.
1Utrecht University, Utrecht, Netherlands, 2Zorginstituut Nederland, Diemen, Netherlands.
OBJECTIVES: Data used in comparative health technology assessment (HTA) research is often manually extracted from HTA reports. This hinders the scope, reproducibility, updateability, and credibility of this research. This study examines the application of automated data extraction methods to extract research-relevant attributes from publicly available HTA reports. This study analyzes and compares the performance of various text-mining techniques, aiming to demonstrate the relevance and opportunities of these extraction methods.
METHODS: To analyze the performance of different text-mining approaches, 14 research-relevant attributes were extracted from National Institute for Health and Care Excellence (NICE) HTA reports using two natural language processing techniques (rule-based (NLP-R), classification models (NLP-CM)) and a generative AI technique (large language model-based (LLM), Claude 3 Opus). To analyze the performance of the extraction methods, accuracy and other method-specific measures were calculated and compared. Additionally, data extracted using the LLM-based extraction was analyzed for policy insights.
RESULTS: Extraction accuracies depended on the extraction method and attribute. Overall, the LLM-based approach performed best (88-98% accuracy for 12/14 attributes). Extraction of the outcome of the relative effectiveness assessment (REA) and the comparator was most challenging and had the lowest accuracies (~70% for the LLM-based approach). NLP-based methods required more development work and were unable to extract attributes at the medicine-indication combination level; however, they were independent of commercial software and free from reproducibility issues, which were the most significant limitations of the LLM-based approach. Graphs created using the LLM-extracted data give important policy insights that are updateable and reproducible and would have been difficult to obtain with manual data extraction.
CONCLUSIONS: Automatic data extraction for research-relevant attributes from HTA reports is possible and can provide important insights for comparative HTA research. Room for improvement remains, and future research should focus on expanding the system to different HTA organizations and refining the LLM-based approach.
METHODS: To analyze the performance of different text-mining approaches, 14 research-relevant attributes were extracted from National Institute for Health and Care Excellence (NICE) HTA reports using two natural language processing techniques (rule-based (NLP-R), classification models (NLP-CM)) and a generative AI technique (large language model-based (LLM), Claude 3 Opus). To analyze the performance of the extraction methods, accuracy and other method-specific measures were calculated and compared. Additionally, data extracted using the LLM-based extraction was analyzed for policy insights.
RESULTS: Extraction accuracies depended on the extraction method and attribute. Overall, the LLM-based approach performed best (88-98% accuracy for 12/14 attributes). Extraction of the outcome of the relative effectiveness assessment (REA) and the comparator was most challenging and had the lowest accuracies (~70% for the LLM-based approach). NLP-based methods required more development work and were unable to extract attributes at the medicine-indication combination level; however, they were independent of commercial software and free from reproducibility issues, which were the most significant limitations of the LLM-based approach. Graphs created using the LLM-extracted data give important policy insights that are updateable and reproducible and would have been difficult to obtain with manual data extraction.
CONCLUSIONS: Automatic data extraction for research-relevant attributes from HTA reports is possible and can provide important insights for comparative HTA research. Room for improvement remains, and future research should focus on expanding the system to different HTA organizations and refining the LLM-based approach.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
P2
Topic
Health Technology Assessment, Methodological & Statistical Research, Study Approaches
Disease
No Additional Disease & Conditions/Specialized Treatment Areas