FROM TEXT TO SIGNAL: EVALUATING LLMS FOR VALID CASE IDENTIFICATION IN PHARMACOVIGILANCE
Author(s)
Artur Nowak, MSc, Ewa Borowiack, MSc, Ewelina Sadowska, MPharm, Monika Opalek, PhD, Iwona Kmicikiewicz, PhD, Joanna Konieczna, MSc, Damian Stachura, MSc;
Evidence Prime, Krakow, Poland
Evidence Prime, Krakow, Poland
OBJECTIVES: Pharmacovigilance (PV) requires rapid identification of valid individual case safety reports (ICSRs) within an expanding scientific literature. Because a valid case must include an identifiable patient and reporter, a suspected drug, and an adverse event, manual screening is labor-intensive and inconsistent. Specialized LLM agents may streamline triage by extracting case elements as structured, database-ready records with an auditable trail (supporting quotes, document location, and brief rationale) and enabling downstream standardization (e.g., MedDRA mapping). We evaluated an LLM-agent pipeline for suspected drug identification as the first step toward automated extraction of all valid case elements.
METHODS: The dataset (n=71) was manually developed and tagged in Laser AI by experienced reviewers for the presence of suspected drugs. The training set was used to develop prompts and configure domain-specific agents. Agents were executed as an extraction pipeline, producing one structured record per publication, along with an audit trail. Outputs were compared with human reference values to assess accuracy. Performance assessment included F1 scores calculated on structured outputs, as well as qualitative error analysis to categorize and interpret discrepancies.
RESULTS: Agents achieved an F1 score of 80% on the held-out test set (n=50). Most errors occurred in multi-patient case series, where not all patient-drug pairs were captured (82% of failures). Additional failure modes included combination therapy handling (returning merged entities such as “ipilimumab/nivolumab” rather than individual drugs) and conservative causality assessment, omitting suspected drugs when attribution was implicit or uncertain.
CONCLUSIONS: Specialized agents can reliably identify and tag suspected drugs in PV literature while providing structured, auditable outputs to support scalable case triage. Subsequent studies will extend extraction to the remaining valid case elements (patient, reporter, and adverse event) and expand coverage to non-English publications, multimodal inputs, and ontology-based mapping, ultimately supporting end-to-end valid case identification.
METHODS: The dataset (n=71) was manually developed and tagged in Laser AI by experienced reviewers for the presence of suspected drugs. The training set was used to develop prompts and configure domain-specific agents. Agents were executed as an extraction pipeline, producing one structured record per publication, along with an audit trail. Outputs were compared with human reference values to assess accuracy. Performance assessment included F1 scores calculated on structured outputs, as well as qualitative error analysis to categorize and interpret discrepancies.
RESULTS: Agents achieved an F1 score of 80% on the held-out test set (n=50). Most errors occurred in multi-patient case series, where not all patient-drug pairs were captured (82% of failures). Additional failure modes included combination therapy handling (returning merged entities such as “ipilimumab/nivolumab” rather than individual drugs) and conservative causality assessment, omitting suspected drugs when attribution was implicit or uncertain.
CONCLUSIONS: Specialized agents can reliably identify and tag suspected drugs in PV literature while providing structured, auditable outputs to support scalable case triage. Subsequent studies will extend extraction to the remaining valid case elements (patient, reporter, and adverse event) and expand coverage to non-English publications, multimodal inputs, and ontology-based mapping, ultimately supporting end-to-end valid case identification.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR129
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas