Leveraging Data Aggregators to Annotate Deidentified Genomic Data Derived from Commercial Laboratory Specimens to Study COVID-19 Severity
Author(s)
Dandiker S1, Latham A1, Tanpaiboon P2, Ratajski AM2, Bare L3, Chanock S4, Fesko YA2, Joseph V1, Offit K1
1Memorial Sloan Kettering Cancer Center, New York, NY, USA, 2Quest Diagnostics, Secaucus, NJ, USA, 3Quest Diagnostics, Moraga, CA, USA, 4National Cancer Institute, Bethesda, MD, USA
OBJECTIVES: To identify genetic factors associated with severe COVID-19 infections, we used de-identified data sources from a commercial diagnostic laboratory merged with limited clinical annotation from a data aggregator.
METHODS: The study utilized limited clinical data from patients testing positive for SARS-CoV-2 by nucleic acid analysis within a year (median 1 month) of remnant whole-blood collection for clinical care. The remnant specimens, coded by a commercial laboratory (Quest Diagnostics) with study ID (QD-pID), had HIPAA identifiers removed. A Limited Use Data set, including zip code, State, gender, vital status, date of ascertainment, and SARS-CoV-2 results, and de-identified samples were sent to an academic center (MSKCC), which further de-identified specimens by replacing QD-pID with study participant ID (MSK-pID), and sent samples to the National Institutes of Health for germline whole genome SNP array genotyping. The research team used de-identified individual level data provided by HealthVerity, a data aggregator, to determine disease severity as represented by patient claims linked to the SARS-CoV-2 test accession ID and QD-pID. The study, reviewed by the Western Institutional Review Board, was deemed exempt from requirement for consent under 45 CFR § 46.104(d)(4).
RESULTS: Quest Diagnostics provided N=9,241 identifiers of SARS-CoV-2 positive samples, of which 4,644 samples from COVID-19 patients were sent for genotyping. Using HealthVerity’s data marketplace, of the 4,644 samples genotyped, 1,118 (24%) had comprehensive ICD codes for severity stratification, classified as: 914 mild, 10 moderate, and 194 severe COVID-19 positive cases; correlation with genotype is ongoing.
CONCLUSIONS: This design describes a human subject research compliant model for public health studies of the association of inherited genetic variations with COVID-19 severity, by linking remnant biospecimens from a commercial laboratory with clinical annotation provided by a data aggregator and de-identified genotyping by a third-party laboratory.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
EPH95
Topic
Epidemiology & Public Health, Real World Data & Information Systems, Study Approaches
Topic Subcategory
Disease Classification & Coding, Health & Insurance Records Systems, Public Health
Disease
Infectious Disease (non-vaccine), No Additional Disease & Conditions/Specialized Treatment Areas