Feasibility of GPT-4-Based Content Extraction to Identify Eligible Titles and Abstracts in a Systematic Literature Review (SLR)

Speaker(s)

Cichewicz A¹, Pande A², Casañas i Comabella C³, Mittal L⁴, Slim M⁵
¹Evidera, a part of Fisher Scientific, Waltham, MA, USA, ²Evidera (Thermo Fisher Scientific), Waltham, MA, USA, ³Evidera, a part of Thermo Fisher Scientific, London, LON, UK, ⁴Evidera, a part of Thermo Fisher Scientific, Bangalore, KA, India, ⁵Evidera (Thermo Fisher Scientific), Hamilton, ON, Canada

Presentation Documents

Cichewicz et al_ISPOR EU 2024_Final_Poster_v0.1_31Oct2024146679.pdf

OBJECTIVES: The performance of machine learning algorithms for title/abstract screening in SLRs has been explored; however, these models must be pre-trained on a set of records and do not provide justification for inclusion/exclusion decisions. Large language models (LLMs), including GPT-4, are universally applicable without the burden of SLR-specific training. Therefore, we aimed to evaluate the feasibility of Smart Tag Recommendations, a GPT-4-based content extraction feature developed by Nested Knowledge, as a potential SLR screening tool.

METHODS: Question-based prompts were developed from pre-defined SLR eligibility criteria (based on a PICOS framework). Smart tag recommendations were generated from the abstracts of 19 records included in a previously conducted SLR of trials in pre-treated ovarian cancers and used to determine the eligibility of each record for inclusion. The presence of a tag recommendation indicated the record met that criterion for inclusion (e.g., population), while the absence of a recommendation indicated the record failed to meet that criterion. Records with tag recommendations fulfilling all PICOS criteria were deemed eligible for inclusion.

RESULTS: All records satisfied the outcome criterion, 89% satisfied the population criterion, and 84% the intervention/comparator criterion. Study design and line of therapy criteria had the lowest recall rates at 68% and 58%, respectively. Only 32% of records were deemed eligible for inclusion when considering tag recommendations for all PICOS criteria simultaneously; this low rate was driven by the treatment-related criteria. When these criteria were not considered, 63% were deemed eligible.

CONCLUSIONS: Smart tag recommendations, originally designed for content extraction, may not currently be suitable as a screening tool for SLRs. Its utility relies on available text descriptions with text-truncated abstracts and inconsistent reporting by authors posing limitations. Instead, this approach could be potentially utilized to augment human screening or categorize/prioritize records satisfying certain PICOS criteria for targeted reviews.

Code

MSR136

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

ISPOR Europe 2024

17 - 20 November