Quantifying the Effectiveness of a RWD Quality Framework: A Case Study Using Paroxysmal Nocturnal Hemoglobinuria (PNH)
Speaker(s)
Snell Taylor S1, Chen J2, Tubinis D1, Ramamurthy P3
1PicnicHealth, San Francisco, CA, USA, 2PicnicHealth, Portland, OR, USA, 3PicnicHealth, New York, NY, USA
Presentation Documents
OBJECTIVES: Many frameworks exist within RWD research to organize how data quality is considered and measured. Here, we apply a quality framework to a rare blood disorder dataset, Paroxysmal Nocturnal Hemoglobinuria (PNH).
METHODS: We are building on an established quality framework to assess data accuracy, plausibility, conformance, and consistency.
We chose PNH for our case study due to the robustness of record availability and completeness. Patients with PNH consent to PicnicHealth Research Platform to collect their medical records across U.S. health systems. Data was abstracted from structured and unstructured medical records using human-validated machine learning. We applied programmatic rules designed to flag potential quality issues for researcher review. Descriptive statistics were reported for key data elements.RESULTS: The PNH cohort includes 90 patients; 70% female and 30% male patients. 70% of patients are white, 12% Black, and 12% are Hispanic or Latino. Median number of records per patient was 222, with 10 (6, 14) median (IQR) years of available records.
89% of the cohort has a documented diagnosis date in the narrative text of a specialist record, and 11% has a diagnosis confirmed by the first appearance of PNH anywhere in their medical records. 36% of the diagnosis dates are reported to the day precision, 39% to month, and 26% to year. We deployed 143 disease-agnostic plausibility and conformance quality rules on this cohort, and 19 rules specific to PNH. After abstraction and processing 3 months of new records, 105 errors were flagged; 9 of which were PNH-specific and 96 of which were disease-agnostic. Each flag was reviewed for data correction or justification of why the flag should not be corrected.CONCLUSIONS: These findings inform abstractor retraining efforts and improve model performance. Data corrections are tracked and traceable to source records, resulting in improved data provenance and better data quality.
Code
RWD147
Topic
Real World Data & Information Systems
Topic Subcategory
Data Protection, Integrity, & Quality Assurance
Disease
Rare & Orphan Diseases, Systemic Disorders/Conditions (Anesthesia, Auto-Immune Disorders (n.e.c.), Hematological Disorders (non-oncologic), Pain)