A Proof of Concept Study to Build a Japanese-Based Electronic Medical Record Data Model Suited for Real-World Research
Author(s)
Michelle Saffranella, MPH, BA1, Lacey Wasson, BS2, Emi Fujinuma, MPH, BS1, Brent Arakaki, BS1, Masao Higuchi, LL.B.3, Tomoko Morita, PhD3, Hitoshi Deguchi, B.Ed.4, Emily Rubinstein, MPH, BSCE1.
1Aetion, Inc., New York, NY, USA, 2Aetion, Inc., Boston, MA, USA, 3NEC Corporation, Tokyo, Japan, 4EC Solution Innovators, Ltd., Tokyo, Japan.
1Aetion, Inc., New York, NY, USA, 2Aetion, Inc., Boston, MA, USA, 3NEC Corporation, Tokyo, Japan, 4EC Solution Innovators, Ltd., Tokyo, Japan.
OBJECTIVES: Electronic medical record (EMR) systems are built to document patient information collected during routine care. To use EMR data for real-world evidence (RWE) studies, the data need to be harmonized into a model that is appropriate and intelligible for analysis. We conducted a proof of concept (POC) study to determine if a new Japanese-based EMR system (MegaOakHR from NEC) could be made fit for conducting RWE studies.
METHODS: EMR data from one hospital were exported for a sample of patients. Tables were compared with data dictionaries to confirm the data contents, explorations were run using SQL to understand field distributions, and a data schema was built. Data elements were excluded from the final dataset based on our findings (e.g. fields that were unused) and knowledge of RWE (e.g. fields that were not relevant for studies). Data were transformed longitudinally and connected to the Aetion Evidence Platform for analysis.
RESULTS: The sample included approximately 3,400 patients with hospital encounters dating back to 2010. The hospital data model contained 18 data tables; 1,868 fields; 3 master files; and 16 lookup tables. After exploring the data and building the analytic data model, the NEC data source includes 15 data tables, 339 fields, and integration of the 3 master files; lookup tables were used to build a data dictionary. Fields that are consistently used in RWE generation were translated; highly-clinical fields were eliminated; tables were restructured to be tall, versus wide; and universal, in addition to hospital specific, coding systems were surfaced. The data model currently supports analyses involving diagnoses, hospital admissions, labs, prescriptions and administrations.
CONCLUSIONS: Patient data from EMRs can be a valuable source of insight for researchers, but only if the data are structured and formatted appropriately. This POC suggests a replicable method for building fit-for-research data sources from EMR systems.
METHODS: EMR data from one hospital were exported for a sample of patients. Tables were compared with data dictionaries to confirm the data contents, explorations were run using SQL to understand field distributions, and a data schema was built. Data elements were excluded from the final dataset based on our findings (e.g. fields that were unused) and knowledge of RWE (e.g. fields that were not relevant for studies). Data were transformed longitudinally and connected to the Aetion Evidence Platform for analysis.
RESULTS: The sample included approximately 3,400 patients with hospital encounters dating back to 2010. The hospital data model contained 18 data tables; 1,868 fields; 3 master files; and 16 lookup tables. After exploring the data and building the analytic data model, the NEC data source includes 15 data tables, 339 fields, and integration of the 3 master files; lookup tables were used to build a data dictionary. Fields that are consistently used in RWE generation were translated; highly-clinical fields were eliminated; tables were restructured to be tall, versus wide; and universal, in addition to hospital specific, coding systems were surfaced. The data model currently supports analyses involving diagnoses, hospital admissions, labs, prescriptions and administrations.
CONCLUSIONS: Patient data from EMRs can be a valuable source of insight for researchers, but only if the data are structured and formatted appropriately. This POC suggests a replicable method for building fit-for-research data sources from EMR systems.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
RWD3
Topic
Organizational Practices, Real World Data & Information Systems, Study Approaches
Topic Subcategory
Health & Insurance Records Systems
Disease
No Additional Disease & Conditions/Specialized Treatment Areas