USE OF MUTUAL INFORMATION THEORY IN BUILDING A PREDICTIVE MODEL FOR ANKYLOSING SPONDYLITIS DIAGNOSIS
Author(s)
Garges C1, Shukla O2, DeFreitas F1, Grabowsky T2, Park Y3, Deodhar A4
1HVH Precision Analytics, King of Prussia, PA, USA, 2HVH Patient Precision Analytics, LLC, King of Prussia, PA, USA, 3Novartis Pharmaceuticals, East Hanover, NJ, USA, 4Oregon Health & Science University, Portland, OR, USA
OBJECTIVES: Mutual Information (MI) metrics can be utilized to identify predictive patterns obscured by the volume of data in claims databases. MI measures shared content between two data samples and quantifies the relevance of each predictor to a diagnosis. Predictors are classified by predictive ability and reliability, and analysis repeated. Each iteration eliminates predictors and changes ranking. Iterative optimization eventually defines the most effective predictive model. Many US patients with ankylosing spondylitis (AS) experience a 7 to 13-year delay before correct diagnosis. Delayed diagnosis and treatment contribute to considerable economic, physical, and psychological burdens on patients, caregivers, physicians, and society. Thus, we aim to develop a predictive model for AS based on sequence and timing of diagnostic, procedure, prescription, and provider (DPPP) codes observed in histories of patients with AS diagnosis to aid in earlier identification of AS patients. METHODS: Data for this retrospective cohort study were extracted from de-identified US claims from over 182 million people from January 1, 2006 through September, 2015. Study population comprised patients with AS diagnosis (ICD-9-CM 720.0). For each AS patient, a minimum of 10 patients without AS matched by age, gender, enrollment period, and geographic region were randomly selected from the same database. MI was applied to identify DPPP codes that differentiate AS from the matched-control population. Combinations of DPPP codes were ranked by MI value (high MI indicates higher relevance to AS diagnosis) to determine predictors. RESULTS: Claims histories of 12,162 AS diagnosed patients and 121,620 matched-controls were analyzed to build a proof of concept predictive risk model that separates AS patients from matched-controls. A total of 12,678 features were analyzed and 150 classifiers were built (with 3-fold cross-validation). CONCLUSIONS: Additional modifications and vigorous validation of the proof of concept predictive model will be made to enhance clinical relevance and practicality.
Conference/Value in Health Info
2017-05, ISPOR 2017, Boston, MA, USA
Value in Health, Vol. 20, No. 5 (May 2017)
Code
PRM85
Topic
Methodological & Statistical Research
Topic Subcategory
Modeling and simulation
Disease
Musculoskeletal Disorders, Systemic Disorders/Conditions