Enhancing Causal Discovery in Chronic Diseases: The MAGIC Framework Using Multiple LLMs

Author(s)

Jihee Kim, B.A1, Minseol Jang, PharmD2, Miryoung Kim, RPh, MCP, PhD3, Hyun Jin Han, MBA, MPH, PhD2, Kangjun Noh, B.S.1, Sumin Park, B.A1, Kyungwoo Song, PhD1, Hae Sun Suh, MA, MS, PhD4.
1Department of Statistics and Data Science, Yonsei University, Seoul, Korea, Republic of, 2Department of Regulatory Science, Graduate School, Kyung Hee University, Seoul, Korea, Republic of, 3Sunchon National University, Suncheon, Korea, Republic of, 4College of Pharmacy, Kyung Hee University, Seoul, Korea, Republic of.
OBJECTIVES: Understanding causal relationships among chronic diseases is essential for identifying associations and minimizing bias. Traditionally, directed acyclic graphs (DAGs) have relied on expert knowledge and literature review, limiting scalability and introducing potential bias. With the recent advance of large language models (LLMs), it is now possible to explore knowledge-informed DAG construction. This study aimed to evaluate the feasibility of LLM-based approaches and to introduce MAGIC (Multi-LLM Assisted Graph Inference and Correction), a novel framework that integrates statistical, clinical, and language-based feedback to improve DAG generation.
METHODS: The study consisted of two parts: development of a reference DAG and comparative performance evaluation of causal discovery methods. The reference DAG was constructed through literature review and experts’ consensus. MAGIC combines (1) statistical metrics (phi coefficients, BDeu scores, disease duration) using individual data from the Korea National Health and Nutrition Examination Survey; (2) external clinical knowledge from publicly available sources to enrich disease-specific context; and (3) a consensus-based voting mechanism across multiple LLMs to reduce model-specific bias. The clinical plausibility and methodological validity of each method were reviewed by a panel of three clinical experts and three statisticians. Performance was assessed using standard metrics— skeleton and orientation precision, recall, F1-score, and Structural Hamming Distance (SHD)—against the reference DAG.
RESULTS: After five rounds of expert review and discussion, MAGIC was deemed clinically plausible and methodologically robust. Quantitatively, MAGIC achieved the best overall performance with skeleton precision 0.941, recall 0.640, F1-score 0.762; orientation precision 0.735, recall 0.500, F1-score 0.595; SHD 27 after the third iteration.
CONCLUSIONS: MAGIC demonstrates the potential of LLM-guided, feedback-enhanced causal discovery for scalable and reliable causal graph construction. By integrating real-world data, clinical context, and multi-model consensus, this approach offers a reproducible and interpretable framework for complex chronic disease research and supports broader applications in healthcare and epidemiology.

Conference/Value in Health Info

2025-09, ISPOR Real-World Evidence Summit 2025, Tokyo, Japan

Value in Health Regional, Volume 49S (September 2025)

Code

RWD270

Topic Subcategory

Health & Insurance Records Systems

Disease

SDC: Diabetes/Endocrine/Metabolic Disorders (including obesity)

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×