Automated Extraction of Kaplan-Meier Survival Curves Using Generative Artificial Intelligence and Computer Vision
Author(s)
Ying Li, PhD1, Augustine Annan, Ph.D.2, Majid R. Mojarad, Ph.D.2, Jingcheng Du, PhD2, Yingxin Xu, PharmD, PhD1;
1Regeneron Pharmaceuticals, Inc, Tarrytown, NY, USA, 2Intelligent Medical Object, Inc, Rosemont, IL, USA
1Regeneron Pharmaceuticals, Inc, Tarrytown, NY, USA, 2Intelligent Medical Object, Inc, Rosemont, IL, USA
OBJECTIVES: Kaplan-Meier (KM) survival curves are essential in clinical research. While traditional manual digitization is time-consuming and error-prone, we propose an automated pipeline combining generative artificial intelligence (AI) and computer vision (CV) to extract KM data from published literature.
METHODS: The pipeline integrates GPT-4o for plot metadata extraction (axis limits, scales) and CV for curve point extraction using adaptive and distance thresholding. The pipeline was evaluated against GPT-4o and Claude 3.5 using 10 published single-arm KM curves to assess median survival time and confidence interval (CI) accuracy. Additionally, 150 synthetic KM curves were generated using various distributions (Exponential, Weibull, Mixture, Gompertz) with single treatment groups, with/without censoring, and sample sizes (50-500). Performance assessment used median survival accuracy, root mean square error (RMSE), mean absolute error (MAE) and Bland-Altman analysis.
RESULTS: Our pipeline outperformed existing methods on published KM curves (±0.6 months median survival deviation, 89% CI accuracy vs. GPT-4o: ±1.9 months, 75%; Claude 3.5: ±1.8 months, 77%). For synthetic data, with censoring, our method achieved ±0.7 months deviation, RMSE 0.014, MAE 0.011 (vs. GPT-4o: ±1.64 months, RMSE 0.090, MAE 0.076; Claude 3.5: ±1.68 months, RMSE 0.087, MAE 0.079). Performance improved for uncensored data: ±0.6 months deviation, RMSE 0.011, MAE 0.008 (vs. GPT-4o: ±1.5 months, RMSE 0.079, MAE 0.064; Claude 3.5: ±1.61 months, RMSE 0.078, MAE 0.065). Bland-Altman analysis of our pipeline on synthetic Kaplan-Meier curves demonstrated minimal bias (mean difference <0.02) in survival probability extraction across diverse survival distributions and censoring patterns.
CONCLUSIONS: Our pipeline demonstrates high accuracy for automated single-arm KM curve data extraction. Future work will address multi-arm and subgroup survival curves, enabling patient-level data reconstruction for meta-analyses and cost-effectiveness studies.
METHODS: The pipeline integrates GPT-4o for plot metadata extraction (axis limits, scales) and CV for curve point extraction using adaptive and distance thresholding. The pipeline was evaluated against GPT-4o and Claude 3.5 using 10 published single-arm KM curves to assess median survival time and confidence interval (CI) accuracy. Additionally, 150 synthetic KM curves were generated using various distributions (Exponential, Weibull, Mixture, Gompertz) with single treatment groups, with/without censoring, and sample sizes (50-500). Performance assessment used median survival accuracy, root mean square error (RMSE), mean absolute error (MAE) and Bland-Altman analysis.
RESULTS: Our pipeline outperformed existing methods on published KM curves (±0.6 months median survival deviation, 89% CI accuracy vs. GPT-4o: ±1.9 months, 75%; Claude 3.5: ±1.8 months, 77%). For synthetic data, with censoring, our method achieved ±0.7 months deviation, RMSE 0.014, MAE 0.011 (vs. GPT-4o: ±1.64 months, RMSE 0.090, MAE 0.076; Claude 3.5: ±1.68 months, RMSE 0.087, MAE 0.079). Performance improved for uncensored data: ±0.6 months deviation, RMSE 0.011, MAE 0.008 (vs. GPT-4o: ±1.5 months, RMSE 0.079, MAE 0.064; Claude 3.5: ±1.61 months, RMSE 0.078, MAE 0.065). Bland-Altman analysis of our pipeline on synthetic Kaplan-Meier curves demonstrated minimal bias (mean difference <0.02) in survival probability extraction across diverse survival distributions and censoring patterns.
CONCLUSIONS: Our pipeline demonstrates high accuracy for automated single-arm KM curve data extraction. Future work will address multi-arm and subgroup survival curves, enabling patient-level data reconstruction for meta-analyses and cost-effectiveness studies.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
MSR33
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas