HYBRID AI SYSTEM FOR AUTOMATED SAFETY-RELATED DOCUMENT CLASSIFICATION
Author(s)
Oleg Roderick, PhD;
Otsuka pharmaceutical, Data and Analytics, Princeton, NJ, USA
Otsuka pharmaceutical, Data and Analytics, Princeton, NJ, USA
OBJECTIVES: Document processing in pharmaceutical operations requires accurate identification of safety-relevant content. Manual review can produce reliable outcomes, but it is resource-intensive and subjective.
Our study evaluates a hybrid analytical framework combining business rules, traditional machine learning, and large language models (LLMs) to automate detection of pharmacovigilance-relevant sections in documentation.
METHODS: We developed a multi-layers classification system to identify safety-related contents in vendor contracts and to explain their pharmacovigilance significance. The system architecture integrates three analytical approaches: deterministic rules encoding regulatory requirements, supervised ML classifier trained on domain-specific corpora, and zero-shot/few-shot LLM inference. The system is intended as augmentation to human decision, and can use human feedback to create additional training data and improve prompts given to LLM. Performance was evaluated on a labeled dataset of documents. Quality evaluation was focused on avoiding false-negatives leading to safety information ignored in the review.
RESULTS: Business rules alone can be automated through nlp techniques such as keyword extraction and topic modelling, but lead to very limited quality with no possibility for improvement. Zero-shot LLM approaches face the same limitation, they are only as good as business rules behind their prompts. Standalone ML models grow in quality with the size of training data, but also hit the quality ceiling.However, the integrated hybrid system demonstrates superior performance, with statistical / ML component serving as a robust mechanism that decides how to involve LLM and what elements of training data to include into prompts.
CONCLUSIONS: Effective automation of safety-critical document classification in pharmaceutical operations requires coordination of complementary analytical methods. Rather than treating AI as a monolithic solution, organizations should architect systems where generative models coordinate traditional tools—combining the interpretability of rules, the pattern recognition of ML, and the contextual reasoning of LLMs. This approach supports regulatory compliance while maintaining the transparency required in HEOR applications.
Our study evaluates a hybrid analytical framework combining business rules, traditional machine learning, and large language models (LLMs) to automate detection of pharmacovigilance-relevant sections in documentation.
METHODS: We developed a multi-layers classification system to identify safety-related contents in vendor contracts and to explain their pharmacovigilance significance. The system architecture integrates three analytical approaches: deterministic rules encoding regulatory requirements, supervised ML classifier trained on domain-specific corpora, and zero-shot/few-shot LLM inference. The system is intended as augmentation to human decision, and can use human feedback to create additional training data and improve prompts given to LLM. Performance was evaluated on a labeled dataset of documents. Quality evaluation was focused on avoiding false-negatives leading to safety information ignored in the review.
RESULTS: Business rules alone can be automated through nlp techniques such as keyword extraction and topic modelling, but lead to very limited quality with no possibility for improvement. Zero-shot LLM approaches face the same limitation, they are only as good as business rules behind their prompts. Standalone ML models grow in quality with the size of training data, but also hit the quality ceiling.However, the integrated hybrid system demonstrates superior performance, with statistical / ML component serving as a robust mechanism that decides how to involve LLM and what elements of training data to include into prompts.
CONCLUSIONS: Effective automation of safety-critical document classification in pharmaceutical operations requires coordination of complementary analytical methods. Rather than treating AI as a monolithic solution, organizations should architect systems where generative models coordinate traditional tools—combining the interpretability of rules, the pattern recognition of ML, and the contextual reasoning of LLMs. This approach supports regulatory compliance while maintaining the transparency required in HEOR applications.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
HPR29
Topic
Health Policy & Regulatory
Topic Subcategory
Approval & Labeling
Disease
No Additional Disease & Conditions/Specialized Treatment Areas