Implementation of a Real-World Data Quality Framework in a Nationwide Oncology Electronic Health Record-Derived Database


Castellanos E1, Wittmershaus B2, Chandwani S3
1Flatiron Health Inc., Philadelphia, PA, USA, 2Flatiron Health Inc., New York, NY, USA, 3Flatiron Health Inc., Somerset, NJ, USA

Presentation Documents

OBJECTIVES: In recent years, multiple real-world data (RWD) quality frameworks have been released identifying key dimensions of quality. Practical considerations in applying these different frameworks to scaled datasets have not been well-described. We demonstrate the implementation of a RWD quality framework incorporating core published dimensions of data quality to a scaled, electronic health record (EHR)-based oncology RWD source.

METHODS: We assessed the nationwide Flatiron Health EHR-derived de-identified database, with data originating from ~280 US academic and community cancer clinics, using structured and unstructured sources as well as external linkages to genomic and claims data. We examined quality assessment approaches for generating oncology RWD and mapped them to quality dimensions across published frameworks.

RESULTS: Our RWD quality framework aligns with published frameworks and includes the following dimensions: relevance (including sufficiency and representativeness) and reliability (including accuracy, completeness, provenance, and timeliness). Dataset size, breadth and depth of data elements using structured and unstructured EHR-derived data or linked data sources are selected to optimize relevancy to broad or specific sets of use cases. A range of validation approaches are implemented, including direct comparison to an external or internal reference standard, or indirect benchmarking. Verification checks, implemented at patient and cohort level throughout the data lifecycle assess conformance, consistency and plausibility. Completeness is assessed according to clinical expectations for documentation at source. Provenance is addressed by recording data transformation, documenting data management procedures, and maintaining auditable metadata. Timeliness is addressed by setting refresh frequency to minimize lags in data capture (e.g., 30 day recency).

CONCLUSIONS: Our data quality assessments address the common dimensions of reliability and relevance using a range of approaches to balance robustness, scalability, and feasibility. This framework can be flexibly applied across other RWD sources, enables transparency in determining fitness for use, and standardizes language for data quality implementation.

Conference/Value in Health Info

2023-05, ISPOR 2023, Boston, MA, USA

Value in Health, Volume 26, Issue 6, S2 (June 2023)




Real World Data & Information Systems

Topic Subcategory

Data Protection, Integrity, & Quality Assurance, Distributed Data & Research Networks, Health & Insurance Records Systems, Reproducibility & Replicability


No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on Update my browser now