UTILIZING TOKENIZATION TO INTEGRATE THREE DATA SOURCES IN RARE DISEASE RESEARCH: MUSCULAR DYSTROPHY-LONGITUDINAL INTEGRATED CLAIMS (MD-LINC)
Author(s)
Bryan D. Innis, MS, Sourav Santra, PhD, Shane Hornibrook, BA, Jacko Logan, MS, Richard Baxter, BsC (Hons), Katherine L. Gooch, PhD;
Sarepta Therapeutics, Inc., Cambridge, MA, USA
Sarepta Therapeutics, Inc., Cambridge, MA, USA
OBJECTIVES: Muscular dystrophy (MD) is a group of rare genetic diseases that cause progressive weakness and degeneration of skeletal muscles. Existing methods for identifying patients with MD within administrative databases can be challenging. Providing a foundation for Health Economics and Outcomes Research (HEOR), the objective was to develop a tokenized dataset "Muscular Dystrophy-Longitudinal Integrated Claims (MD-LINC)" of patients with MD by linking administrative, mortality and laboratory data.
METHODS: This novel methodology used a retrospective cohort design. Datavant®'s proprietary tokenization software was used to integrate three distinct data sources: Inovalon Closed Claims, Datavant Mortality and consolidated genetic laboratory data. A remediation protocol provided guidance for censoring data elements not pertinent for research purposes enabling reliable identification and long-term tracking of patients with MD within a real-world context, while still maintaining privacy and confidentiality. Descriptive statistics were generated for the source and resulting datasets.
RESULTS: From over 200,000 individuals, approximately 7% had a positive test confirming MD during the study period, creating the MD-LINC dataset. Of those, the majority were male (58%) and almost one-third were white (32%). Over 40% had commercial insurance with a median duration of four years of medical and pharmacy insurance. To demonstrate application, the MD-LINC dataset was used to develop a patient identification algorithm in a Duchenne muscular dystrophy (DMD) study. A set of three algorithms (broad, narrow, and restrictive) reported positive predictive values (PPV) between 78.4-84.5% and were able to identify and distinguish between patients with DMD and Becker muscular dystrophy (BMD).
CONCLUSIONS: The ability to link three data sources that are inherently unrelated provides a unique opportunity to conduct HEOR & real-world evidence research into rare diseases that has not be available previously. This will help improve the accuracy of patient identification and fill existing knowledge gaps, offering insight into the care and long-term outcomes of MD patients.
METHODS: This novel methodology used a retrospective cohort design. Datavant®'s proprietary tokenization software was used to integrate three distinct data sources: Inovalon Closed Claims, Datavant Mortality and consolidated genetic laboratory data. A remediation protocol provided guidance for censoring data elements not pertinent for research purposes enabling reliable identification and long-term tracking of patients with MD within a real-world context, while still maintaining privacy and confidentiality. Descriptive statistics were generated for the source and resulting datasets.
RESULTS: From over 200,000 individuals, approximately 7% had a positive test confirming MD during the study period, creating the MD-LINC dataset. Of those, the majority were male (58%) and almost one-third were white (32%). Over 40% had commercial insurance with a median duration of four years of medical and pharmacy insurance. To demonstrate application, the MD-LINC dataset was used to develop a patient identification algorithm in a Duchenne muscular dystrophy (DMD) study. A set of three algorithms (broad, narrow, and restrictive) reported positive predictive values (PPV) between 78.4-84.5% and were able to identify and distinguish between patients with DMD and Becker muscular dystrophy (BMD).
CONCLUSIONS: The ability to link three data sources that are inherently unrelated provides a unique opportunity to conduct HEOR & real-world evidence research into rare diseases that has not be available previously. This will help improve the accuracy of patient identification and fill existing knowledge gaps, offering insight into the care and long-term outcomes of MD patients.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
SA31
Topic
Study Approaches
Disease
SDC: Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal), SDC: Neurological Disorders, SDC: Pediatrics, SDC: Rare & Orphan Diseases