CONTEXT-AWARE GENERATIVE AI FOR EVIDENCE GENERATION, ANALOG-DRIVEN ENROLLMENT FORECASTING, AND OPERATIONAL RISK MANAGEMENT IN ONCOLOGY TRIALS
Author(s)
Ashwin Kumar Rai, MS1, Victoria Ikoro, PhD2, Devika Bhandary, MSc3, Andre Ng, Msc4, Marielle Bassel, BA5.
1Director of Data Science & Advanced Analytics, Thermo Fisher Scientific, Overland Park, KS, USA, 2Thermo Fisher Scientific, Kitchener, ON, Canada, 3Thermo Fisher Scientific, London, United Kingdom, 4Thermo Fischer Scientific, London, United Kingdom, 5Thermo Fisher Scientific, Montreal, QC, Canada.
1Director of Data Science & Advanced Analytics, Thermo Fisher Scientific, Overland Park, KS, USA, 2Thermo Fisher Scientific, Kitchener, ON, Canada, 3Thermo Fisher Scientific, London, United Kingdom, 4Thermo Fischer Scientific, London, United Kingdom, 5Thermo Fisher Scientific, Montreal, QC, Canada.
Presentation Documents
OBJECTIVES: To develop and demonstrate a context-aware GenAI evidence-generation workflow that identifies study-design-matched “analog” oncology trials from ClinicalTrials.gov and linked publications, extracts enrollment performance and operational difficulty signals, and produces explainable enrollment-rate recommendations for a planned study, with a pathway to scale the same pipeline for machine learning (ML).
METHODS: We built a hybrid GenAI + rules pipeline combining (1) structured registry fields (study type/phase, allocation/masking, arms/interventions, endpoints, enrollment, dates, locations) and (2) publications linked via NCT identifiers and robust title/acronym matching. GenAI normalized free-text eligibility and design narratives into a standardized “trial fingerprint” capturing indication/stage/line, biomarker gates, comparator class, endpoint/procedure burden proxies, run-in/washout complexity, and geographic footprint. Analog retrieval used hard filters on critical design attributes, then embedding-based similarity ranking over fingerprints. Enrollment velocity labels were computed as participants/site/month when recruitment windows and site counts were available from publications; otherwise conservative registry-derived proxies were used. Difficulty signals (eligibility restrictiveness themes, biomarker-testing friction, visit/procedure burden, and competitive recruiting density) were extracted to contextualize forecasts. For demonstration, five completed oncology trials were treated as “planned targets,” each matched to ~10 concluded analogs to generate similarity-weighted enrollment-rate distributions and explanatory rationales.
RESULTS: Across five targets, the system generated consistent trial fingerprints and ranked analog sets aligned on key design drivers. The analog method produced enrollment-rate recommendations (median and uncertainty bands) and highlighted factors expected to accelerate or slow enrollment (e.g., biomarker confirmation requirements, comparator acceptability, procedure intensity, competition). Outputs were traceable to registry fields and publication evidence for stakeholder review.
CONCLUSIONS: Context-aware GenAI enables a evidence generation paradigm for feasibility planning: it systematically identifies design-matched analog trials, extracts enrollment performance and operational challenges, and produces explainable recommendations for expected enrollment rate and study difficulty. The same evidence-engineering pipeline can be scaled to produce large structured datasets suitable for ML-based prediction as additional concluded trials are ingested.
METHODS: We built a hybrid GenAI + rules pipeline combining (1) structured registry fields (study type/phase, allocation/masking, arms/interventions, endpoints, enrollment, dates, locations) and (2) publications linked via NCT identifiers and robust title/acronym matching. GenAI normalized free-text eligibility and design narratives into a standardized “trial fingerprint” capturing indication/stage/line, biomarker gates, comparator class, endpoint/procedure burden proxies, run-in/washout complexity, and geographic footprint. Analog retrieval used hard filters on critical design attributes, then embedding-based similarity ranking over fingerprints. Enrollment velocity labels were computed as participants/site/month when recruitment windows and site counts were available from publications; otherwise conservative registry-derived proxies were used. Difficulty signals (eligibility restrictiveness themes, biomarker-testing friction, visit/procedure burden, and competitive recruiting density) were extracted to contextualize forecasts. For demonstration, five completed oncology trials were treated as “planned targets,” each matched to ~10 concluded analogs to generate similarity-weighted enrollment-rate distributions and explanatory rationales.
RESULTS: Across five targets, the system generated consistent trial fingerprints and ranked analog sets aligned on key design drivers. The analog method produced enrollment-rate recommendations (median and uncertainty bands) and highlighted factors expected to accelerate or slow enrollment (e.g., biomarker confirmation requirements, comparator acceptability, procedure intensity, competition). Outputs were traceable to registry fields and publication evidence for stakeholder review.
CONCLUSIONS: Context-aware GenAI enables a evidence generation paradigm for feasibility planning: it systematically identifies design-matched analog trials, extracts enrollment performance and operational challenges, and produces explainable recommendations for expected enrollment rate and study difficulty. The same evidence-engineering pipeline can be scaled to produce large structured datasets suitable for ML-based prediction as additional concluded trials are ingested.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR60
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Oncology