IMPACT OF METHOD CHOSEN FOR DEFINING OBSERVABLE TIME IN LINKED OPEN DATA SOURCES
Author(s)
Anna Swenson, MPH, Gursimran Basra, MCA, Paul Buzinec, MS, Kathryn Starzyk, BA, MSc;
OM1, Boston, MA, USA
OM1, Boston, MA, USA
OBJECTIVES: Defining observable time in real-world data sources without enrollment information is an important study design consideration. We explore how varying methods for defining observation periods in linked EMR and open claims data impact sample size, comorbidity prevalence, medication usage, and HCRU across three conditions.
METHODS: Eligible patients were identified from OM1 RWDC curated clinical datasets with linked EMR and open medical claims in Atopic Dermatitis (AD), Rheumatoid Arthritis (RA), and Major Depressive Disorder (MDD). We compared three observability methods in linked data: Method 1: Persistence window (gap < 548 days), requiring complete EMR/claims overlap. Method 2: Encounter-based (≥1 EMR & claims encounters within 12-months post-index; any encounter >12-months post-index). Method 3: Modified Encounter-based (≥1 EMR & claims encounters within 12-months post-index; ≥1 EMR & claims encounters >12-months post-index). Demographics, comorbidities, medications, and HCRU were compared.
RESULTS: Initial patient counts were 94,368 (AD), 282,850 (RA), and 1,063,161 (MDD). After applying observability criteria, Method 1 resulted in the largest sample size reductions (-59.6%[RA] to -74.7%[MDD]), Method 2 the smallest (-36.2%[RA] to -46.8%[MDD]), and Method 3 intermediate (-56.5%[RA] to -66.7%[MDD]). Mean age and Charlson Comorbidity Index ≥2 were highest for Method 1 and lowest in Method 2. While comorbidity and medication prevalence varied only slightly across methods, outpatient visit counts showed larger differences with Method 1 having the highest counts and Method 2 the lowest. Although results were largely consistent across conditions, the magnitude of differences varied.
CONCLUSIONS: Specifying observation periods requires trade-offs between data completeness and sample representativeness. Stricter definitions of linked data availability yielded smaller, older, and sicker populations with more complete data. Method choice must align with study goals to ensure fit-for-purpose data, and selection bias should be carefully assessed and mitigated with attention to condition & data source specific characteristics.
METHODS: Eligible patients were identified from OM1 RWDC curated clinical datasets with linked EMR and open medical claims in Atopic Dermatitis (AD), Rheumatoid Arthritis (RA), and Major Depressive Disorder (MDD). We compared three observability methods in linked data: Method 1: Persistence window (gap < 548 days), requiring complete EMR/claims overlap. Method 2: Encounter-based (≥1 EMR & claims encounters within 12-months post-index; any encounter >12-months post-index). Method 3: Modified Encounter-based (≥1 EMR & claims encounters within 12-months post-index; ≥1 EMR & claims encounters >12-months post-index). Demographics, comorbidities, medications, and HCRU were compared.
RESULTS: Initial patient counts were 94,368 (AD), 282,850 (RA), and 1,063,161 (MDD). After applying observability criteria, Method 1 resulted in the largest sample size reductions (-59.6%[RA] to -74.7%[MDD]), Method 2 the smallest (-36.2%[RA] to -46.8%[MDD]), and Method 3 intermediate (-56.5%[RA] to -66.7%[MDD]). Mean age and Charlson Comorbidity Index ≥2 were highest for Method 1 and lowest in Method 2. While comorbidity and medication prevalence varied only slightly across methods, outpatient visit counts showed larger differences with Method 1 having the highest counts and Method 2 the lowest. Although results were largely consistent across conditions, the magnitude of differences varied.
CONCLUSIONS: Specifying observation periods requires trade-offs between data completeness and sample representativeness. Stricter definitions of linked data availability yielded smaller, older, and sicker populations with more complete data. Method choice must align with study goals to ensure fit-for-purpose data, and selection bias should be carefully assessed and mitigated with attention to condition & data source specific characteristics.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
RWD31
Topic
Real World Data & Information Systems
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, SDC: Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal), SDC: Sensory System Disorders (Ear, Eye, Dental, Skin)