A Large Language Model Approach for Evaluating Real-World Evidence in FDA Approvals
Author(s)
Anzhuo Xie, MS1, Xue Wen, PhD2, Jinge Wang, PhD3, Lixia Yao, PhD3;
1Columbia University, New York, NY, USA, 2Louisiana State University, Baton Rouge, LA, USA, 3Polygon Health Analytics LLC, Philadelphia, PA, USA
1Columbia University, New York, NY, USA, 2Louisiana State University, Baton Rouge, LA, USA, 3Polygon Health Analytics LLC, Philadelphia, PA, USA
OBJECTIVES: Real-world evidence (RWE) plays an increasingly critical role in supporting U.S. Food and Drug Administration (FDA) approvals for drugs and biologics. This study leverages Large Language Models (LLMs) to systematically evaluate the trends, purposes, and acceptability of RWE in FDA approvals from 2019 to 2023.
METHODS: Using ChatGPT-4o, publicly available FDA review documents for New Drug Applications (NDAs) and Biologics License Applications (BLAs) were analyzed. The analysis identified whether RWE was used in the application, its purpose (e.g., safety, effectiveness, or therapeutic context), and its contribution to regulatory decisions (e.g., substantial, supportive, or inadequate). Study designs, such as prospective or retrospective cohort studies, were also examined to assess methodological quality and relevance.
RESULTS: Among 538 drugs and 47 BLAs reviewed, 183 drugs and 32 BLA included RWE in their submissions. The percentage of applications incorporating RWE increased from 29.8% in 2019 to 52.7% in 2023. RWE was most frequently utilized in therapeutic areas such as reproductive health, allergy and neurology. In 94.8 % of RWE-supported applications, RWE positively influenced the regulatory decision-making, demonstrating its utility. Primary applications leveraged RWE to demonstrate effectiveness (29.8%), safety (20.9%), or both (48.4%). Over half of these cases were classified as providing substantial or supportive evidence, underscoring the value of RWE in regulatory assessments. However, 28.8% of applications faced challenges due to experimental constraints or study design limitations, highlighting areas for improvement in integrating RWE.
CONCLUSIONS: This study underscores the growing importance of RWE in FDA regulatory decisions while identifying persistent methodological challenges limiting its broader adoption. By utilizing LLMs, manual review efforts were reduced by 95%, enabling efficient and scalable analysis of FDA review documents. Ground-truth assessments validated the effectiveness of well-tuned LLM prompts, showcasing their potential to enhance the accuracy and scalability of RWE evaluations.
METHODS: Using ChatGPT-4o, publicly available FDA review documents for New Drug Applications (NDAs) and Biologics License Applications (BLAs) were analyzed. The analysis identified whether RWE was used in the application, its purpose (e.g., safety, effectiveness, or therapeutic context), and its contribution to regulatory decisions (e.g., substantial, supportive, or inadequate). Study designs, such as prospective or retrospective cohort studies, were also examined to assess methodological quality and relevance.
RESULTS: Among 538 drugs and 47 BLAs reviewed, 183 drugs and 32 BLA included RWE in their submissions. The percentage of applications incorporating RWE increased from 29.8% in 2019 to 52.7% in 2023. RWE was most frequently utilized in therapeutic areas such as reproductive health, allergy and neurology. In 94.8 % of RWE-supported applications, RWE positively influenced the regulatory decision-making, demonstrating its utility. Primary applications leveraged RWE to demonstrate effectiveness (29.8%), safety (20.9%), or both (48.4%). Over half of these cases were classified as providing substantial or supportive evidence, underscoring the value of RWE in regulatory assessments. However, 28.8% of applications faced challenges due to experimental constraints or study design limitations, highlighting areas for improvement in integrating RWE.
CONCLUSIONS: This study underscores the growing importance of RWE in FDA regulatory decisions while identifying persistent methodological challenges limiting its broader adoption. By utilizing LLMs, manual review efforts were reduced by 95%, enabling efficient and scalable analysis of FDA review documents. Ground-truth assessments validated the effectiveness of well-tuned LLM prompts, showcasing their potential to enhance the accuracy and scalability of RWE evaluations.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
RWD55
Topic
Real World Data & Information Systems
Disease
No Additional Disease & Conditions/Specialized Treatment Areas