Challenges in Estimating Sample Size for Retrospective Real-World Research


Robert N1, Espirito J2, DiLullo S2, Haydon W2, Longenecker A2, Montelongo N2, Sumner E2, Spark S2
1Ontada, Irving, TX, USA, 2Ontada, The Woodlands, TX, USA

OBJECTIVES: In retrospective Real-World Research (RWR), application of inclusion and exclusion criteria to identify eligible patients for studies can be done with electronic health record structured data. However, if chart review is performed, during unstructured data review some charts may not meet criteria and are disqualified (DQ) from the study. We aimed to understand RWR DQ rates and factors affecting these rates to help inform sample size estimates. METHODS: A study-specific estimated DQ rate based on eligibility criteria and structured data completeness was initially identified for 26 oncology chart review studies completed in 2019 and 2020. A total of 9156 patient charts were reviewed. The difference between estimated and actual DQ rates was analyzed. Four studies were investigated to understand inclusion and exclusion criteria that relied on unstructured data to confirm eligibility and affect DQ rate. RESULTS: The average initial estimated DQ rate across all studies was 20%. More than half of studies had higher than anticipated DQ rates, resulting in smaller than expected sample sizes. The average difference between estimated and actual DQ rate was 13%, with a 78% difference in one outlier study. With this outlier removed, the average difference was 11%, suggesting the need to increase estimated DQ rates. In four case studies, higher than estimated DQ rates closely correlated with eligibility criteria that relied on unstructured data, including multiple disease statuses, treatment type and initiation verification, and diagnosis of other concomitant primary cancer types. CONCLUSIONS: For retrospective RWR, understanding factors that contribute to high DQ rates can improve chart review sample size estimates and study timelines. Increased collaboration with members of the RWR team can lead to upfront identification of eligibility criteria that rely on structured or unstructured data to predict DQ rates and study sample sizes.

Conference/Value in Health Info

2021-05, ISPOR 2021, Montreal, Canada

Value in Health, Volume 24, Issue 5, S1 (May 2021)




Real World Data & Information Systems

Topic Subcategory

Data Protection, Integrity, & Quality Assurance



Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on Update my browser now