Epidemiology Information Synthesis for Focal Segmental Glomerular Sclerosis (FSGS): An Innovative Approach Using Human-in-the-Loop AI
Author(s)
Chhaya V1, Bajwa S2, Khambholja K3, Ambika A4
1Catalyst Clinical Research, Thiruvananthapuram, Kerala, India, 2Catalyst Clinical Research, Panchkula, HR, India, 3Catalyst Clinical Research, Vadodara, India, 4Genpro Research, Thiruvananthapuram, Kerala, India
Presentation Documents
OBJECTIVES: Collecting epidemiological information (EI) on rare diseases (RD) presents challenges due to complexity and limitations in regional or national surveys. Epidemiologists rely on case reports, studies, real-world data registries, and expert opinions. However, diverse sources, data rarity, AI model training needs, and non-standardised reporting complicate synthesising EI for RD. This paper discusses an innovative approach to EI synthesis for Focal Segmental Glomerular Sclerosis (FSGS).
METHODS: A semi-automated, AI-powered EI synthesis workflow for FSGS was designed with a multidisciplinary taskforce, including AI/ML experts, UI/UX designers, software engineers, senior HEOR analysts, and consultant epidemiologists. The proposed roadmap includes six key steps: 1. Design thinking workshop, 2. Input data retrieval, 3. FSGS-specific Named Entity Recognition (NER) Corpus development, 4. Training Dataset annotation using NLP techniques (entity linking and relationship extraction), 5. Integration with a knowledge graph, and 6. Product testing based on predefined user acceptability testing (UAT) plan by technical and HEOR teams.
RESULTS: After a design-thinking workshop, epidemiological study abstracts and registry data from the United States, Canada, Japan, and EU5 countries were retrieved using a generic human-in-the-loop (HITL) AI evidence synthesis tool. This retrieval informed the FSGS NER corpus generation. Annotated training datasets exclusively for FSGS improved accurate identification and classification of FSGS EI metrics. Knowledge graph integration enhanced semantic querying, achieving a 95% accuracy rate during UAT in literature curation on FSGS EI. This led to significant time savings (50-75%) by providing rapid EI reports on FSGS burden with integrated end-to-end citation management.
CONCLUSIONS: A hybrid workflow combining epidemiologists’ expertise and AI/ML capabilities can facilitate rapid RD burden estimates with high accuracy. For robust product development, extensive training datasets and tailored ontologies supported by live outcomes from RD patient communities are recommended.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
MSR182
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
Rare & Orphan Diseases