Enhancing Causal Discovery in Chronic Diseases: The MAGIC Framework Using Multiple LLMs

Author(s)

Jihee Kim, B.A¹, Minseol Jang, PharmD², Miryoung Kim, RPh, MCP, PhD³, Hyun Jin Han, MBA, MPH, PhD², Kangjun Noh, B.S.¹, Sumin Park, B.A¹, Kyungwoo Song, PhD¹, Hae Sun Suh, MA, MS, PhD⁴.
¹Department of Statistics and Data Science, Yonsei University, Seoul, Korea, Republic of, ²Department of Regulatory Science, Graduate School, Kyung Hee University, Seoul, Korea, Republic of, ³Sunchon National University, Suncheon, Korea, Republic of, ⁴College of Pharmacy, Kyung Hee University, Seoul, Korea, Republic of.

OBJECTIVES: Understanding causal relationships among chronic diseases is essential for identifying associations and minimizing bias. Traditionally, directed acyclic graphs (DAGs) have relied on expert knowledge and literature review, limiting scalability and introducing potential bias. With the recent advance of large language models (LLMs), it is now possible to explore knowledge-informed DAG construction. This study aimed to evaluate the feasibility of LLM-based approaches and to introduce MAGIC (Multi-LLM Assisted Graph Inference and Correction), a novel framework that integrates statistical, clinical, and language-based feedback to improve DAG generation.
METHODS: The study consisted of two parts: development of a reference DAG and comparative performance evaluation of causal discovery methods. The reference DAG was constructed through literature review and experts’ consensus. MAGIC combines (1) statistical metrics (phi coefficients, BDeu scores, disease duration) using individual data from the Korea National Health and Nutrition Examination Survey; (2) external clinical knowledge from publicly available sources to enrich disease-specific context; and (3) a consensus-based voting mechanism across multiple LLMs to reduce model-specific bias. The clinical plausibility and methodological validity of each method were reviewed by a panel of three clinical experts and three statisticians. Performance was assessed using standard metrics— skeleton and orientation precision, recall, F1-score, and Structural Hamming Distance (SHD)—against the reference DAG.
RESULTS: After five rounds of expert review and discussion, MAGIC was deemed clinically plausible and methodologically robust. Quantitatively, MAGIC achieved the best overall performance with skeleton precision 0.941, recall 0.640, F1-score 0.762; orientation precision 0.735, recall 0.500, F1-score 0.595; SHD 27 after the third iteration.
CONCLUSIONS: MAGIC demonstrates the potential of LLM-guided, feedback-enhanced causal discovery for scalable and reliable causal graph construction. By integrating real-world data, clinical context, and multi-model consensus, this approach offers a reproducible and interpretable framework for complex chronic disease research and supports broader applications in healthcare and epidemiology.

Conference/Value in Health Info

2025-09, ISPOR Real-World Evidence Summit 2025, Tokyo, Japan

Value in Health Regional, Volume 49S (September 2025)

Code

RWD270

Topic Subcategory

Health & Insurance Records Systems

Disease

SDC: Diabetes/Endocrine/Metabolic Disorders (including obesity)

Presentation (CTI)

Author(s)

Conference/Value in Health Info

Code

Topic Subcategory

Disease

ISPOR–The Professional Society for
Health Economics and Outcomes Research

Your browser is out-of-date