FLAME: An Open-Source Federated Platform for Deeply Phenotyped and Multimodal Studies at Population Scale
Author(s)
Rafael Garcia-Dias, PhD1, Alexandre Triay Bagur, PhD1, Virginia Fernandez, PhD1, Paul Wright, PhD1, Lawrence Adams, PhD2, Mcanroe Fernandes, Bachelor of Engineering3, Stephanie Jones, Master2, M Jorge Cardoso, Reader1, Joe Zhang, MD2, Sebastien Ourselin, Professor1, James Teo, Professor1.
1King's College London, London, United Kingdom, 2Artificial Intelligence Centre for Value Based Healthcare, London, United Kingdom, 3Guy's and St. Thomas' Hospitals NHS Foundation Trust, London, United Kingdom.
1King's College London, London, United Kingdom, 2Artificial Intelligence Centre for Value Based Healthcare, London, United Kingdom, 3Guy's and St. Thomas' Hospitals NHS Foundation Trust, London, United Kingdom.
Presentation Documents
OBJECTIVES: Despite advances in interoperability, most NHS hospital data remains siloed. Privacy and policy barriers to physical data transfer limit analysis on bulk datasets. Federation is key to NHS strategy, but most platforms are commercial and rely on highly curated datasets. Uncurated data 'locked' in NHS systems better represents diverse patients with complex multi-morbidity. This multi-modal data (text and imaging) enables deep phenotyping and supports AI biomarker development. We aimed to develop an open-source codebase for multi-modal data engineering and FLAME (Federated Learning and Analytics across Multimodal Environments), a platform for cohort discovery, federated analysis, and AI model training as a Federated NHS Secure Data Environment.
METHODS: We built three types of orchestrated pipelines—SQL/dbt for structured data, NLP on Nvidia hardware for text, and radiology metadata extraction—connecting to over twenty NHS source systems to curate data into the OMOP Common Data Model. FLAME uses FastAPI to integrate an OMOP database replica, XNAT, AI-assisted annotation tools (MONAI Label), and a semi-standardised Federated Learning framework on NVFLARE, flexible for various applications. FLAME was validated through federated querying, analytics, and model training across two datasets.
RESULTS: FLAME produced validated results in federated querying, descriptive/aggregate/regression statistics, and classification tasks. In real-world deployments, multi-modal pipelines and FLAME provide access to full pathway data across 10 million patients in two UK quaternary hospital networks, including deep phenotypes and endpoints derived from NLP.
CONCLUSIONS: Our open-source components support the complete workflow from multi-modal data curation through OMOP to research and AI applications, enabling population-scale analyses with previously inaccessible methods and data. Current infrastructure and timelines support local and cloud deployment across hospital networks covering the London population. Cohort discovery, observational and model-driven analyses, and biomarker development, are supported on an NHS platform designed with full systems integration.
METHODS: We built three types of orchestrated pipelines—SQL/dbt for structured data, NLP on Nvidia hardware for text, and radiology metadata extraction—connecting to over twenty NHS source systems to curate data into the OMOP Common Data Model. FLAME uses FastAPI to integrate an OMOP database replica, XNAT, AI-assisted annotation tools (MONAI Label), and a semi-standardised Federated Learning framework on NVFLARE, flexible for various applications. FLAME was validated through federated querying, analytics, and model training across two datasets.
RESULTS: FLAME produced validated results in federated querying, descriptive/aggregate/regression statistics, and classification tasks. In real-world deployments, multi-modal pipelines and FLAME provide access to full pathway data across 10 million patients in two UK quaternary hospital networks, including deep phenotypes and endpoints derived from NLP.
CONCLUSIONS: Our open-source components support the complete workflow from multi-modal data curation through OMOP to research and AI applications, enabling population-scale analyses with previously inaccessible methods and data. Current infrastructure and timelines support local and cloud deployment across hospital networks covering the London population. Cohort discovery, observational and model-driven analyses, and biomarker development, are supported on an NHS platform designed with full systems integration.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
RWD85
Topic
Medical Technologies, Methodological & Statistical Research, Real World Data & Information Systems
Topic Subcategory
Distributed Data & Research Networks
Disease
No Additional Disease & Conditions/Specialized Treatment Areas