Feasibility of GPT-4-Based Content Extraction to Identify Eligible Titles and Abstracts in a Systematic Literature Review (SLR)
Speaker(s)
Cichewicz A1, Pande A2, CasaƱas i Comabella C3, Mittal L4, Slim M5
1Evidera, a part of Fisher Scientific, Waltham, MA, USA, 2Evidera (Thermo Fisher Scientific), Waltham, MA, USA, 3Evidera, a part of Thermo Fisher Scientific, London, LON, UK, 4Evidera, a part of Thermo Fisher Scientific, Bangalore, KA, India, 5Evidera (Thermo Fisher Scientific), Hamilton, ON, Canada
Presentation Documents
OBJECTIVES: The performance of machine learning algorithms for title/abstract screening in SLRs has been explored; however, these models must be pre-trained on a set of records and do not provide justification for inclusion/exclusion decisions. Large language models (LLMs), including GPT-4, are universally applicable without the burden of SLR-specific training. Therefore, we aimed to evaluate the feasibility of Smart Tag Recommendations, a GPT-4-based content extraction feature developed by Nested Knowledge, as a potential SLR screening tool.
METHODS: Question-based prompts were developed from pre-defined SLR eligibility criteria (based on a PICOS framework). Smart tag recommendations were generated from the abstracts of 19 records included in a previously conducted SLR of trials in pre-treated ovarian cancers and used to determine the eligibility of each record for inclusion. The presence of a tag recommendation indicated the record met that criterion for inclusion (e.g., population), while the absence of a recommendation indicated the record failed to meet that criterion. Records with tag recommendations fulfilling all PICOS criteria were deemed eligible for inclusion.
RESULTS: All records satisfied the outcome criterion, 89% satisfied the population criterion, and 84% the intervention/comparator criterion. Study design and line of therapy criteria had the lowest recall rates at 68% and 58%, respectively. Only 32% of records were deemed eligible for inclusion when considering tag recommendations for all PICOS criteria simultaneously; this low rate was driven by the treatment-related criteria. When these criteria were not considered, 63% were deemed eligible.
CONCLUSIONS: Smart tag recommendations, originally designed for content extraction, may not currently be suitable as a screening tool for SLRs. Its utility relies on available text descriptions with text-truncated abstracts and inconsistent reporting by authors posing limitations. Instead, this approach could be potentially utilized to augment human screening or categorize/prioritize records satisfying certain PICOS criteria for targeted reviews.
Code
MSR136
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas