Optimizing Systematic Literature Reviews in Endometrial Cancer: Leveraging AI for Real-Time Article Screening and Data Extraction in Clinical Trials


Datta S1, Lee K2, Paek H1, Mojarad MR1, Prabhu V3, Zhang J3, Foley E1, Glasgow J1, Liston C1, Zheng Y3, Huang YL3, Du J1, Wang X1, Cossrow N3, Cai J4, Wang D3
1IMO health, Rosemont, IL, USA, 2IMO health, Ardsley, NY, USA, 3Merck & Co., Inc., Rahway, NJ, USA, 4Merck & Co., Inc., West Point, PA, USA

OBJECTIVES: Traditional systematic literature review (SLR) methodologies, while high in quality, suffer from inefficiency due to their time-consuming and resource-intensive nature, aggravated by the increasing volume of trials and literature. Our goal is to improve this process by developing an efficient Artificial intelligence (AI) system, using Generative Pre-Trained Transformer 4 (GPT-4), for real-time article screening and data extraction.

METHODS: We collected endometrial cancer clinical trial studies from PubMed using specific keywords to construct our GPT system. Our GPT-4-based automatic literature review system comprises three components - abstract screening, full-text screening, and data element extraction from abstracts. Both abstract and full-text screening components generate the eligibility decision based on specific inclusion and exclusion criteria outlined in the study protocols. Data element extraction identifies important study details (e.g., population size) and clinical trial outcomes (e.g., overall response rate). Quantitative and qualitative evaluation was conducted to assess the system performance, based on 50 randomly selected endometrial cancer clinical trial articles labeled by experts.

RESULTS: For abstract screening, the system’s accuracy is 86% with precision, recall (sensitivity), and F1 of 0.86, 0.94, and 0.90 respectively. The system’s accuracy in full-text screening is 78.95%. For data element extraction, the F1 scores in identifying the study details and clinical outcome values are 0.97 and 0.88, respectively (strict matching) and 0.99 and 0.97 respectively (relaxed matching). Qualitative analysis identified challenges including accurately interpreting screening criteria language and capturing partial information for element extraction. Running the whole algorithm took 42 minutes for screening and extraction.

CONCLUSIONS: Our real-time article screening and data extraction AI system positively identified the majority (94%) of the manually labeled eligible articles in a fraction of the time required for manual identification. The AI system has the potential to be used for real-time updates and timely insights.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)




Clinical Outcomes, Methodological & Statistical Research, Organizational Practices

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Outcomes Assessment, Industry



Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now