IMPUTATION TECHNIQUES FOR MISSING COVARIATES WHEN MODELING DISEASE PROGRESSION
Author(s)
Kotze L
Isimo Health, Bellville, WC, South Africa
Presentation Documents
OBJECTIVES : The aim of the study was to identify an appropriate imputation technique for imputing missing data in covariates when modelling disease progression. METHODS : A combination of authorisation (treatment request) and claims data, as provided by medical schemes belonging to the ICON network, was obtained from Isimo Health for 393 breast cancer patients. Based on this dataset, a dataset was simulated using the TPmsm package available in R. The dgpTP function was used to generate data from the illness-death model using Gumbel’s bivariate exponential distribution with unit exponential margins. R was used to eliminate data at random from the simulated dataset. Three datasets containing missing data were created containing either 5%, 10% or 15% of missing data. Two imputation techniques were then tested on the simulated datasets. One of the imputations techniques was based on chained equations and the other on random forests. The mice and missForest packages were used to test the two different imputation techniques. The mice package creates multiple imputations instead of a single imputation whereas the missForest package treats the missing data problem as a prediction problem. The data is imputed by regressing each of the variables against all the other variables and then predicting the missing data for the dependent variable by using the fitted forest. Seven measure of performance were chosen to adequately identify the best imputation technique. The performance measures were calculated using the Metrics package in R. RESULTS : Five of the seven performance measure were better for the missForest imputation technique for all three sets of data. The other two are were marginally inferior to the mice imputation. Therefore, the missForest algorithm based on random forests performed the best based on several different performance measures. CONCLUSIONS : It can be concluded, based on statistical measures, that the missForest package efficiently imputes missing covariates before modelling disease progression.
Conference/Value in Health Info
2019-05, ISPOR 2019, New Orleans, LA, USA
Value in Health, Volume 22, Issue S1 (2019 May)
Code
PNS222
Topic
Methodological & Statistical Research
Disease
No Specific Disease