Millennial Medical Record (MMR) Data Profile: A Japanese Electronic Medical Records (EMRs) Database Utilizing Unstructured Data for Lung Cancer Research
Author(s)
Ayumi Hamaguchi, MSc1, Kenichiro Shiba, MEng2, Mari Kato, MSc3, Daisuke Wakasugi, MEng4, Tadashi Koga, PhD3, Suguru Nozue, MEng4, Takayuki Sawada, RPh, PhD3, Tomoyo Morita, BA4, Amanda Pulfer, BA1, Dimitra Lambrelli, MASc, MSc, PhD1.
1Thermo Fisher Scientific, London, United Kingdom, 2General Incorporated Association Life Data Initiative, Kyoto, Japan, 3Clinical Study Support, Inc., Nagoya, Japan, 4NTT DATA Japan Corporation, Tokyo, Japan.
1Thermo Fisher Scientific, London, United Kingdom, 2General Incorporated Association Life Data Initiative, Kyoto, Japan, 3Clinical Study Support, Inc., Nagoya, Japan, 4NTT DATA Japan Corporation, Tokyo, Japan.
OBJECTIVES: Health insurance claims data are widely used for real-world research in Japan but often lack detailed clinical information. EMRs provide richer insights from medical examinations; however, EMR databases were not commercially available. MMR manages data from EMRs, claims and Diagnosis Procedure Combination (DPC) under The Next Generation Medical Infrastructure Act in Japan, enabling comprehensive analyses. This study aims to describe the characteristics of MMR data, focusing on its utility for lung cancer research.
METHODS: As of February 2025, MMR covered 1.9 million patients from 24 hospitals, mostly large general hospitals with ≥300 beds in secondary and tertiary settings across Japan. Data collection started in 2015. Hospitals contribute claims, DPC and EMRs in structured and unstructured formats. Clinical data include patient characteristics (e.g., age, sex, comorbidities, complications), diagnostics (e.g., laboratory and imaging tests) and treatments. Data available as of February 2025 were analysed. Patients with lung cancer were identified using the ICD-10 code C34. Among patients with lung cancer, a keyword search was applied to the unstructured data to quantify data availability, regardless of confirmed presence. All analyses were descriptive.
RESULTS: 29,545 lung cancer patients were identified; 18,630 males (63.1%). The largest proportion was in the 70-79 age group (44.2%). Keywords indicating adenocarcinoma appeared in 35-40% of cases, small cell carcinoma in 10-15%, and squamous cell carcinoma in 10-15%. Laboratory-related terms included carcinoembryonic antigen (70-75%), squamous cell carcinoma antigen (35-40%) and cytokeratin 19 fragment (15-20%). Genetic test terms included Epidermal Growth Factor Receptor (EGFR, 20-25%), Anaplastic Lymphoma Kinase (ALK, 15-20%) and c-ros Oncogene 1 (ROS1, 5-10%). Treatment response (progressive) was extracted for 10-15%.
CONCLUSIONS: MMR demonstrates the potential of unstructured clinical data to enrich oncology research, as illustrated by the example of lung cancer, where data such as histological type, biomarkers and treatment responses is available through real-world clinical practice.
METHODS: As of February 2025, MMR covered 1.9 million patients from 24 hospitals, mostly large general hospitals with ≥300 beds in secondary and tertiary settings across Japan. Data collection started in 2015. Hospitals contribute claims, DPC and EMRs in structured and unstructured formats. Clinical data include patient characteristics (e.g., age, sex, comorbidities, complications), diagnostics (e.g., laboratory and imaging tests) and treatments. Data available as of February 2025 were analysed. Patients with lung cancer were identified using the ICD-10 code C34. Among patients with lung cancer, a keyword search was applied to the unstructured data to quantify data availability, regardless of confirmed presence. All analyses were descriptive.
RESULTS: 29,545 lung cancer patients were identified; 18,630 males (63.1%). The largest proportion was in the 70-79 age group (44.2%). Keywords indicating adenocarcinoma appeared in 35-40% of cases, small cell carcinoma in 10-15%, and squamous cell carcinoma in 10-15%. Laboratory-related terms included carcinoembryonic antigen (70-75%), squamous cell carcinoma antigen (35-40%) and cytokeratin 19 fragment (15-20%). Genetic test terms included Epidermal Growth Factor Receptor (EGFR, 20-25%), Anaplastic Lymphoma Kinase (ALK, 15-20%) and c-ros Oncogene 1 (ROS1, 5-10%). Treatment response (progressive) was extracted for 10-15%.
CONCLUSIONS: MMR demonstrates the potential of unstructured clinical data to enrich oncology research, as illustrated by the example of lung cancer, where data such as histological type, biomarkers and treatment responses is available through real-world clinical practice.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
RWD125
Topic
Real World Data & Information Systems
Topic Subcategory
Health & Insurance Records Systems
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, Oncology