Sentence-Level Abstract Section Classification in HEOR Literature using a Biomedical Language Model
Author(s)
Reza Jafar, PhD1, Mengmeng Zhang, PhD2, Vicki Young, PhD3;
1Cytel Inc, Vancouver, BC, Canada, 2Cytel Inc, Toronto, ON, Canada, 3Cytel Inc, London, United Kingdom
1Cytel Inc, Vancouver, BC, Canada, 2Cytel Inc, Toronto, ON, Canada, 3Cytel Inc, London, United Kingdom
OBJECTIVES: Health Economics and Outcomes Research (HEOR) relies on the systematic review and synthesis of vast biomedical literature to inform decision-making. Traditional abstract screening and data extraction are time-consuming and labor-intensive. Leveraging artificial intelligence (AI) to assist human reviewers by splitting abstracts into separate sections can enhance comprehension, improve workflow efficiency, and increase the quality of results. Therefore, this study aims to develop and validate a natural language processing (NLP) model to automatically classify sentences within medical abstracts into four predefined categories: Objectives, Methods, Results, and Conclusions. The model supports efficient evidence synthesis by enabling reviewers to focus on relevant text quickly. Additionally, this classification is expected to improve the accuracy of AI-driven screening and extraction, ensuring more precise identification of critical information.
METHODS: We fine-tuned SciFive, a biomedical language model, for a sentence classification task on a proprietary dataset comprising 2,000 annotated medical abstracts. Each sentence within the abstracts was labeled according to its respective section: Objectives, Methods, Results, or Conclusions. The model’s performance was evaluated using accuracy, precision, recall, and F1-score on a held-out test set.
RESULTS: The developed NLP model achieved an overall accuracy of approximately 98% in correctly classifying sentences into the four section categories. The accuracy in classifying the Objectives, Methods, Results, and Conclusions sections were 99%, 99%, 98%, and 99% respectively. The overall precision, recall, and f1-score were all 0.98. These results demonstrate the model’s high reliability and potential applicability in automating the segmentation of medical abstract content.
CONCLUSIONS: Our NLP model offers a robust solution for automating the classification of abstract sentences into structured sections, significantly enhancing the efficiency of abstract screening and data extraction in HEOR. By reducing manual effort and increasing consistency, this approach can accelerate systematic reviews, support evidence synthesis, and ultimately inform more timely and cost-effective healthcare decisions.
METHODS: We fine-tuned SciFive, a biomedical language model, for a sentence classification task on a proprietary dataset comprising 2,000 annotated medical abstracts. Each sentence within the abstracts was labeled according to its respective section: Objectives, Methods, Results, or Conclusions. The model’s performance was evaluated using accuracy, precision, recall, and F1-score on a held-out test set.
RESULTS: The developed NLP model achieved an overall accuracy of approximately 98% in correctly classifying sentences into the four section categories. The accuracy in classifying the Objectives, Methods, Results, and Conclusions sections were 99%, 99%, 98%, and 99% respectively. The overall precision, recall, and f1-score were all 0.98. These results demonstrate the model’s high reliability and potential applicability in automating the segmentation of medical abstract content.
CONCLUSIONS: Our NLP model offers a robust solution for automating the classification of abstract sentences into structured sections, significantly enhancing the efficiency of abstract screening and data extraction in HEOR. By reducing manual effort and increasing consistency, this approach can accelerate systematic reviews, support evidence synthesis, and ultimately inform more timely and cost-effective healthcare decisions.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
MSR26
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas