Artificial Intelligence-Powered Identification, Access, and Utility Mapping of Real-World Data Sources for Lung Cancer in Asia Pacific
Author(s)
Jia Hao Wong, MPH, Bsc (Hons) Pharmacy1, Sharanya Jois, MSc Medical Biotechnology and Business Management1, Louise Hogg, MA Healthcare Ethics and Law, BSc (Hons) Biology2, kezia tan, BSc, Nutritional Sciences in Public Health1, kaywei low, Master of Pharmacy2, Vanessa Escalante, Msc3, Vineet Jain, MSc3.
1Market Access & HEOR, APAC, Ipsos, Singapore, Singapore, 2Ipsos, Singapore, Singapore, 3Data Science and Advanced Analytics, Ipsos, London, United Kingdom.
1Market Access & HEOR, APAC, Ipsos, Singapore, Singapore, 2Ipsos, Singapore, Singapore, 3Data Science and Advanced Analytics, Ipsos, London, United Kingdom.
OBJECTIVES: To identify real-world data (RWD) sources for Lung cancer (LC) in Asia Pacific (AP) and determine access and utility for conducting evidence generation studies.
METHODS: AI-powered systematic literature review of academic publications (2014-2024, PubMed) identified AP RWD sources for LC. Leveraging Large Language Models (gemini-1.5-pro with enhanced prompt), our AI system harnessed a semantic search protocol to identify relevant data sources and extract key information including database type, coverage, demographics, treatments, clinical, humanistic, and economic data. Results were manually validated by two independent reviewers. Human oversight at all stages ensured integrity of AI outputs, while addressing potential biases. Data sources with highest number of publications (top 20%) were prioritized for assessment.
RESULTS: 380 citations were retrieved. 136 unique data sources across 5 AP countries were identified after manual validation; 33% cover Japan, followed by China(24%), South Korea(17%), Australia(15%) and Taiwan(10%). Registries account for the largest proportion(40%), followed by hospital/ Electronic Medical Records(28%), population surveys(9%) and insurance databases(8%). Data access, assessed by number of publications, was greatest in Taiwan, followed by Japan, South Korea, Australia and China. 26 data sources were prioritized for utility assessment. Data utility, determined by number of variables available, was greatest in Japan, followed by Australia, South Korea, Taiwan and China. Presence of genetic biomarkers were reported in 30% of data sources, with EGFR, ALK, PD-L1, KRAS, ROS-1, HER2 and MET being most commonly reported.
CONCLUSIONS: There is significant value in mapping LC RWD for assessing feasibility of real-world evidence and health economic outcome research to inform downstream evidence generation activities. Japan, China, South Korea, Australia and Taiwan are key contributors of RWD, offering diverse datasets for better understanding and management of LC in AP. Variability in data availability and utility remains a challenge with data gaps and variable access, highlighting the importance of collaborative efforts in comprehensive data collection.
METHODS: AI-powered systematic literature review of academic publications (2014-2024, PubMed) identified AP RWD sources for LC. Leveraging Large Language Models (gemini-1.5-pro with enhanced prompt), our AI system harnessed a semantic search protocol to identify relevant data sources and extract key information including database type, coverage, demographics, treatments, clinical, humanistic, and economic data. Results were manually validated by two independent reviewers. Human oversight at all stages ensured integrity of AI outputs, while addressing potential biases. Data sources with highest number of publications (top 20%) were prioritized for assessment.
RESULTS: 380 citations were retrieved. 136 unique data sources across 5 AP countries were identified after manual validation; 33% cover Japan, followed by China(24%), South Korea(17%), Australia(15%) and Taiwan(10%). Registries account for the largest proportion(40%), followed by hospital/ Electronic Medical Records(28%), population surveys(9%) and insurance databases(8%). Data access, assessed by number of publications, was greatest in Taiwan, followed by Japan, South Korea, Australia and China. 26 data sources were prioritized for utility assessment. Data utility, determined by number of variables available, was greatest in Japan, followed by Australia, South Korea, Taiwan and China. Presence of genetic biomarkers were reported in 30% of data sources, with EGFR, ALK, PD-L1, KRAS, ROS-1, HER2 and MET being most commonly reported.
CONCLUSIONS: There is significant value in mapping LC RWD for assessing feasibility of real-world evidence and health economic outcome research to inform downstream evidence generation activities. Japan, China, South Korea, Australia and Taiwan are key contributors of RWD, offering diverse datasets for better understanding and management of LC in AP. Variability in data availability and utility remains a challenge with data gaps and variable access, highlighting the importance of collaborative efforts in comprehensive data collection.
Conference/Value in Health Info
2025-09, ISPOR Real-World Evidence Summit 2025, Tokyo, Japan
Value in Health Regional, Volume 49S (September 2025)
Code
RWD245
Topic Subcategory
Health & Insurance Records Systems
Disease
SDC: Oncology