Automated Extraction of Kaplan-Meier Survival Curves Using Generative Artificial Intelligence and Computer Vision

Author(s)

Ying Li, PhD1, Augustine Annan, Ph.D.2, Majid R. Mojarad, Ph.D.2, Jingcheng Du, PhD2, Yingxin Xu, PharmD, PhD1;
1Regeneron Pharmaceuticals, Inc, Tarrytown, NY, USA, 2Intelligent Medical Object, Inc, Rosemont, IL, USA
OBJECTIVES: Kaplan-Meier (KM) survival curves are essential in clinical research. While traditional manual digitization is time-consuming and error-prone, we propose an automated pipeline combining generative artificial intelligence (AI) and computer vision (CV) to extract KM data from published literature.
METHODS: The pipeline integrates GPT-4o for plot metadata extraction (axis limits, scales) and CV for curve point extraction using adaptive and distance thresholding. The pipeline was evaluated against GPT-4o and Claude 3.5 using 10 published single-arm KM curves to assess median survival time and confidence interval (CI) accuracy. Additionally, 150 synthetic KM curves were generated using various distributions (Exponential, Weibull, Mixture, Gompertz) with single treatment groups, with/without censoring, and sample sizes (50-500). Performance assessment used median survival accuracy, root mean square error (RMSE), mean absolute error (MAE) and Bland-Altman analysis.
RESULTS: Our pipeline outperformed existing methods on published KM curves (±0.6 months median survival deviation, 89% CI accuracy vs. GPT-4o: ±1.9 months, 75%; Claude 3.5: ±1.8 months, 77%). For synthetic data, with censoring, our method achieved ±0.7 months deviation, RMSE 0.014, MAE 0.011 (vs. GPT-4o: ±1.64 months, RMSE 0.090, MAE 0.076; Claude 3.5: ±1.68 months, RMSE 0.087, MAE 0.079). Performance improved for uncensored data: ±0.6 months deviation, RMSE 0.011, MAE 0.008 (vs. GPT-4o: ±1.5 months, RMSE 0.079, MAE 0.064; Claude 3.5: ±1.61 months, RMSE 0.078, MAE 0.065). Bland-Altman analysis of our pipeline on synthetic Kaplan-Meier curves demonstrated minimal bias (mean difference <0.02) in survival probability extraction across diverse survival distributions and censoring patterns.
CONCLUSIONS: Our pipeline demonstrates high accuracy for automated single-arm KM curve data extraction. Future work will address multi-arm and subgroup survival curves, enabling patient-level data reconstruction for meta-analyses and cost-effectiveness studies.

Conference/Value in Health Info

2025-05, ISPOR 2025, Montréal, Quebec, CA

Value in Health, Volume 28, Issue S1

Code

MSR33

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×