GIVE ME THE NUMBERS: FINE-GRAINED, STRUCTURED DATA EXTRACTION BEYOND NARRATIVE LLM OUTPUTS
Author(s)
Artur Nowak, MSc, Monika Opalek, PhD, Ewa Borowiack, MSc, Ewelina Sadowska, MPharm, Joanna Konieczna, MSc, Damian Stachura, MSc;
Evidence Prime, Krakow, Poland
Evidence Prime, Krakow, Poland
OBJECTIVES: Assess feasibility of specialized, fine-grained LLMs to extract highly contextual patient-flow data, and develop a framework for high-quality domain datasets, including annotation guidelines and field definitions, to support later stages. Three patient-flow variables provided a controlled testbed to refine dataset design. Generative LLMs capture high-level insights but often miss the precise structure needed for evidence synthesis. Laser AI uses domain-specific agents to produce structured, vocabulary-mappable, analysis-ready outputs stored directly in databases for downstream teams and scalable analytics workflows, with traceability for quality control.
METHODS: Clinically relevant patient categories were selected: number of patients assessed for eligibility (per study), and the numbers of patients randomized and lost to follow-up (per study arm). Datasets were created and annotated by trained reviewers within the Laser AI environment. The training set was used to develop LLM prompts. AI agents were executed as an extraction pipeline, producing one structured record per study (and arm where applicable). In addition to structured numeric outputs, agents returned an audit trail including supporting text quotes with document location and a brief rationale for each extracted value. Accuracy assessment included F1 scores and qualitative error analysis to categorize and interpret discrepancies.
RESULTS: A total of 160 studies were included, with separate training and test sets for each extraction field. LLM-based agents demonstrated strong extraction performance across the patient-flow variables (macro-averaged F1 score: 90%). Qualitative review further assessed correctness, consistency, and acceptability of both extracted values and their supporting evidence (quotes and brief rationale), informing refinement of guidelines and field definitions.
CONCLUSIONS: Specialized agents can reliably extract highly contextual patient-flow information. Subsequent project stages will extend these capabilities to multilingual extraction, multimodal input processing, and ontology-based vocabulary mapping, ultimately supporting granular evidence extraction for drug and medical device effectiveness, safety, burden of disease (economic, humanistic, epidemiological point of view), and health-state utility outcomes.
METHODS: Clinically relevant patient categories were selected: number of patients assessed for eligibility (per study), and the numbers of patients randomized and lost to follow-up (per study arm). Datasets were created and annotated by trained reviewers within the Laser AI environment. The training set was used to develop LLM prompts. AI agents were executed as an extraction pipeline, producing one structured record per study (and arm where applicable). In addition to structured numeric outputs, agents returned an audit trail including supporting text quotes with document location and a brief rationale for each extracted value. Accuracy assessment included F1 scores and qualitative error analysis to categorize and interpret discrepancies.
RESULTS: A total of 160 studies were included, with separate training and test sets for each extraction field. LLM-based agents demonstrated strong extraction performance across the patient-flow variables (macro-averaged F1 score: 90%). Qualitative review further assessed correctness, consistency, and acceptability of both extracted values and their supporting evidence (quotes and brief rationale), informing refinement of guidelines and field definitions.
CONCLUSIONS: Specialized agents can reliably extract highly contextual patient-flow information. Subsequent project stages will extend these capabilities to multilingual extraction, multimodal input processing, and ontology-based vocabulary mapping, ultimately supporting granular evidence extraction for drug and medical device effectiveness, safety, burden of disease (economic, humanistic, epidemiological point of view), and health-state utility outcomes.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR179
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas