Leveraging Real-World Data and NLP to Identify At Risk Metabolic Dysfunction Associated Steatohepatitis in the General Population
Author(s)
Or Shaked, MD MPH1, David Gruzman, MSC1, Talia Tron, PhD1, Talia Kustin, PhD1, Ben Giladi, MD1, Siegal Sadetzki, MD MPH1, Gadi Lalazar, MD2.
1Briya, Tel Aviv, Israel, 2Shaare Zedek Medical Center, Jerusalem, Israel.
1Briya, Tel Aviv, Israel, 2Shaare Zedek Medical Center, Jerusalem, Israel.
OBJECTIVES: Metabolic Dysfunction Associated Steatotic Liver Disease (MASLD) affects approximately 25% of the global population, with the highest prevalence reported in the Middle East (32%). A subset of these patients develop Metabolic Dysfunction-Associated Steatohepatitis (MASH), and those with a Metavir Fibrosis-score>= F2 are at increased risk of adverse liver-related health outcomes, including cirrhosis and hepatocellular carcinoma (HCC). Despite its clinical and public health significance, MASLD screening remains inconsistent, and diagnosed cases are frequently under-documented in electronic health records (EHRs). With emerging therapeutic options, identifying undiagnosed or undocumented MASH cases has become a critical priority in real-world clinical settings. This study aimed to leverage real-world data (RWD) and develop an advanced natural language processing (NLP) model to identify patients with at-risk MASH through radiology reports and laboratory test results.
METHODS: Abdominal radiology reports and Fibrosis-4 (FIB-4) index scores (Age, ALT, AST, Platelet count) from 2020-2024 were analyzed, sourced from a leading state-mandated health provider in Israel. Reports were labeled using multiple validation sources, including expert radiologist reviews, coded diagnoses, age, and laboratory values. A machine learning-based NLP classification model was developed using the Briya© computational platform (Briya NLP). Model performance was evaluated using area under the curve (AUC), sensitivity, specificity, and accuracy metrics.
RESULTS: The study analyzed 132,124 radiology reports from 78,741 unique patients. External validation demonstrated high model performance, with sensitivity and specificity exceeding traditional diagnostic methods (specific metrics pending). Implementation of the Briya NLP model significantly increased at-risk MASH case identification compared to traditional diagnostic codes alone.
CONCLUSIONS: This RWD-driven NLP based approach offers an efficient, scalable solution to identifying under-documented at-risk MASH cases in routine clinical practice. This automated system could enhance earlier intervention through lifestyle modifications and targeted therapies, ultimately improving patient outcomes in real-world healthcare settings.
METHODS: Abdominal radiology reports and Fibrosis-4 (FIB-4) index scores (Age, ALT, AST, Platelet count) from 2020-2024 were analyzed, sourced from a leading state-mandated health provider in Israel. Reports were labeled using multiple validation sources, including expert radiologist reviews, coded diagnoses, age, and laboratory values. A machine learning-based NLP classification model was developed using the Briya© computational platform (Briya NLP). Model performance was evaluated using area under the curve (AUC), sensitivity, specificity, and accuracy metrics.
RESULTS: The study analyzed 132,124 radiology reports from 78,741 unique patients. External validation demonstrated high model performance, with sensitivity and specificity exceeding traditional diagnostic methods (specific metrics pending). Implementation of the Briya NLP model significantly increased at-risk MASH case identification compared to traditional diagnostic codes alone.
CONCLUSIONS: This RWD-driven NLP based approach offers an efficient, scalable solution to identifying under-documented at-risk MASH cases in routine clinical practice. This automated system could enhance earlier intervention through lifestyle modifications and targeted therapies, ultimately improving patient outcomes in real-world healthcare settings.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
P13
Topic
Real World Data & Information Systems
Disease
SDC: Gastrointestinal Disorders