HYBRID AI SYSTEM FOR AUTOMATED SAFETY-RELATED DOCUMENT CLASSIFICATION

Author(s)

Oleg Roderick, PhD;
Otsuka pharmaceutical, Data and Analytics, Princeton, NJ, USA

OBJECTIVES: Document processing in pharmaceutical operations requires accurate identification of safety-relevant content. Manual review can produce reliable outcomes, but it is resource-intensive and subjective.
Our study evaluates a hybrid analytical framework combining business rules, traditional machine learning, and large language models (LLMs) to automate detection of pharmacovigilance-relevant sections in documentation.
METHODS: We developed a multi-layers classification system to identify safety-related contents in vendor contracts and to explain their pharmacovigilance significance. The system architecture integrates three analytical approaches: deterministic rules encoding regulatory requirements, supervised ML classifier trained on domain-specific corpora, and zero-shot/few-shot LLM inference. The system is intended as augmentation to human decision, and can use human feedback to create additional training data and improve prompts given to LLM. Performance was evaluated on a labeled dataset of documents. Quality evaluation was focused on avoiding false-negatives leading to safety information ignored in the review.
RESULTS: Business rules alone can be automated through nlp techniques such as keyword extraction and topic modelling, but lead to very limited quality with no possibility for improvement. Zero-shot LLM approaches face the same limitation, they are only as good as business rules behind their prompts. Standalone ML models grow in quality with the size of training data, but also hit the quality ceiling.However, the integrated hybrid system demonstrates superior performance, with statistical / ML component serving as a robust mechanism that decides how to involve LLM and what elements of training data to include into prompts.
CONCLUSIONS: Effective automation of safety-critical document classification in pharmaceutical operations requires coordination of complementary analytical methods. Rather than treating AI as a monolithic solution, organizations should architect systems where generative models coordinate traditional tools—combining the interpretability of rules, the pattern recognition of ML, and the contextual reasoning of LLMs. This approach supports regulatory compliance while maintaining the transparency required in HEOR applications.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

HPR29

Topic

Health Policy & Regulatory

Topic Subcategory

Approval & Labeling

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)