A COMPREHENSIVE EVALUATION FRAMEWORK FOR ARTIFICIAL INTELLIGENCE IN CLINICAL DATA EXTRACTION AND NORMALIZATION FOR HEALTH DATA SPACES

Author(s)

Gabriel de Maeztu, MD;
IOMED, Co-founder, Barcelona, Spain
OBJECTIVES: To develop and validate a robust framework for the verification and validation of Artificial Intelligence (AI) outputs, specifically Natural Language Processing (NLP) and Automated Terminology Mapping (ATM), in real-world clinical data integration across European Health Data Spaces.
METHODS: We applied a two-tiered evaluation framework to 10 distinct cohorts across 10 European hospitals, analyzing 550 variables. The methodology comprised:
1) A physician-led verification process using Remote Source Data Verification (rSDV) to assess inference accuracy (Precision, Recall, F1-Score) on samples determined by Bayesian statistics ; and 2) A clinical data scientist-led validation phase evaluating six data quality dimensions based on the Kahn Framework.
We assessed consistency across centers using Intraclass Correlation Coefficients (ICC) and mixed-effects models, and evaluated inter-rater reliability using Fleiss’ Kappa. The AI capabilities utilized NLP for unstructured notes and ATM for structured data to map local identifiers or ICD9/ICD10/SNOMED to the OMOP Common Data Model.
RESULTS: The AI capabilities achieved a pooled F1-score of 0.88 (95% CI: 0.84-0.92) across centers. Inter-rater reliability analysis yielded a Fleiss’ Kappa of 0.81, indicating substantial agreement among physician annotators , with 91% agreement against golden annotators. The ICC for F1-scores was 0.0014, demonstrating minimal between-center variability and high generalizability. Integration of AI-generated data increased unique data points by 40% (1.2M to 1.68M) and significantly improved data diversity (Margalef’s Index increased from 3.5 to 5.8). Clinical outcome analysis showed significant effect sizes (Cohen’s d: 0.75-0.95) across cohorts.
CONCLUSIONS: The implementation of a rigorous verification and validation framework confirms the reliability of AI-generated clinical data. While NLP and ATM significantly enhance data volume and diversity, a systematic physician-led verification process ensures high-fidelity representation in the OMOP CDM, enabling robust secondary use of real-world data.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

P19

Topic

Real World Data & Information Systems

Topic Subcategory

Data Protection, Integrity, & Quality Assurance, Distributed Data & Research Networks

Disease

No Additional Disease & Conditions/Specialized Treatment Areas, SDC: Cardiovascular Disorders (including MI, Stroke, Circulatory), SDC: Diabetes/Endocrine/Metabolic Disorders (including obesity), SDC: Oncology, SDC: Rare & Orphan Diseases

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×