EXPLORING ISSUES IN ANALYZING NATIONAL DATABASES USING LOGISTIC REGRESSION- APPLICATION OF MEDICAL EXPENDITURE PANEL SURVEY
Author(s)
Althemery AU, Lai L, Alfaifi A
Nova Southeastern University, Davie, FL, USA
OBJECTIVES: Most national data use a complex stratified multistage probability design including cluster, strata, and weight adjustment to extrapolate study results to a national level. Survey procedures are available in Statistical Analysis System (SAS) 9.4. However, several issues might occur if not used appropriately. Moreover, no clear agreement exists on detecting multicollinearity in logistic regression, and generating ROC curves in these recent survey procedures. The study investigated three main issues when applying logistic regression in nationally representative multistage survey data: subgroup analysis in multistage sampling design data, multicollinearity in logistic regression, and receiver operating characteristic (ROC) curves for survey procedure METHODS: The current study reviewed, discussed and compared the available principles and techniques. First, results from three procedure statements for subpopulation analyses in (SAS) were contrasted. Also two multicollinearity methods, linear regression and the adjusted weight matrix by maximum likelihood algorithm, were conducted. Lastly, ROC curves in survey logistics were generated using direct and indirect procedures. A cohort of patients diagnosed with high blood cholesterol was obtained from Medical Expenditures Panel Survey (MEPS) 2012, and was utilized to provide examples of the reviewed statistical techniques. RESULTS: The study showed that the results without domain statement yielded potentially overestimated estimates and standard errors. The tolerance test and variance inflation factor (VIF) for detecting multicollinearity slightly changed after adjusting weight matrix. However, the two methods agreed that none of the tested independent factors were collinear. ROC curves accounting for the national estimation were successfully generated and offered similar but more reliable estimates. CONCLUSIONS: Accounting for total population weights when analyzing a subgroup in national databases is important. New methods are required for exploring multicollinearity in survey logistic regression procedures.
Conference/Value in Health Info
2016-05, ISPOR 2016, Washington DC, USA
Value in Health, Vol. 19, No. 3 (May 2016)
Code
SY3
Topic
Methodological & Statistical Research, Real World Data & Information Systems
Topic Subcategory
Confounding, Selection Bias Correction, Causal Inference, Reproducibility & Replicability
Disease
Cardiovascular Disorders, Diabetes/Endocrine/Metabolic Disorders