Evaluating the Effectiveness of AI-Generated vs Real Abstracts in Training Machine Learning Models for Study Selection in Systematic Literature Reviews

Author(s)

Elissa C1, Bravo A2, Atanasov P3
1Amaris Consulting, Barcelona, Spain, 2Amaris Consulting, BARCELONA, B, Spain, 3Amaris Consulting, Barcelona, B, Spain

Presentation Documents

OBJECTIVES: This study evaluates the effectiveness of AI-generated decisions in training machine learning (ML) models for identifying of relevant publications in systematic literature reviews (SLRs).

METHODS: A SLR on CAR-T therapy for multiple myeloma in Australia retrieved 989 publications. Using PICOS framework, ChatGPT 3.5 (free browser version) generated 50 abstracts meeting inclusion criteria and 50 with exclusion criteria.

We trained ML models with a set of abstracts to provide a relevance score to the remaining publications, organizing them to prioritize the most relevant for inclusion. Four scenarios were proposed: (A) trained with 100 abstracts randomly selected and annotated by experts; (B) trained solely with AI-generated data; (C) enriched (A) with 50 AI-generated inclusion abstracts; (D) trained with top 100 real abstracts based on scores from (B).

A logistic regression model scored each publication's inclusion likelihood. Results were plotted on screening progression curves, showing the percentage of included publications found versus the percentage of publications screened, allowing us to calculate performance based on the Area Under the Curve (AUC). The curves from the four scenarios were compared with optimal screening (100%), where included publications appear first.

RESULTS: Scenario A achieved a performance of 79.51%. Scenario B demonstrated 74.48%. Scenario C showed the highest performance, reaching 81.49%. Scenario D achieved 79.86%.

Scenario C identified 80% of the included publications by screening only 50% of the total set, outperforming scenarios A and D, which required screening 54% and 59% of the publications, respectively. Scenario B needed to screen 69% to identify 80% of the included publications.

CONCLUSIONS: AI-generated abstracts can effectively train ML models for publication identification. Integrating AI-generated abstracts with real ones enhances the screening process, reducing workload and accelerating the identification of relevant publications. These findings suggest that AI is a promising secondary reviewer in SLRs.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR165

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×