To Bootstrap, or Not to Bootstrap Before Multiple Imputation: That Is the Question.
Author(s)
Rose Gagnon, MPT, MSc, PhD(c)1, Simon LaRue, MSc2, Kadija Perreault, PT, PhD1, Luc J. Hébert, Fellow PT, PhD, CD1, Jason R. Guertin, PhD1;
1Université Laval, Québec, QC, Canada, 2Population Health and Optimal Health Practices Program, CHU de Québec - Université Laval Research Centre, Québec, QC, Canada
1Université Laval, Québec, QC, Canada, 2Population Health and Optimal Health Practices Program, CHU de Québec - Université Laval Research Centre, Québec, QC, Canada
Presentation Documents
OBJECTIVES: Multiple imputation (MI) is an increasingly popular method for dealing with missing data in randomized clinical trials (RCTs). However, there is no consensus on the order in which MI and Bootstrap should be performed to obtain the most appropriate cost-effectiveness analyses. We aimed to determine whether the order in which the statistical procedures are performed influences the results obtained.
METHODS: Secondary analyses using cost and effectiveness data obtained during a pragmatic RCT (#NCT04009369, n=78, data missing at random). Analyses were performed on R software according to two scenarios. Scenario 1: The base sample was bootstrapped (1,000 samples, n=78 per sample), then each iteration was imputed using the MICE package (number of imputations according to percentage of missing data, predictive mean matching method for continuous variables). Imputed data were then analyzed and combined to create valid inferences for each Bootstrap iteration. Scenario 2: We inversed the order of multiple imputation and Bootstrap procedures. Cost-effectiveness planes (CEP) obtained using each scenario were compared to a reference scenario (complete case analysis) to assess if any differences were present.
RESULTS: The distribution (%) by quadrant (Q) of cost-effectiveness points in the reference case’s CEP was as follows: Q1, 44.2; Q2, 5.9; Q3, 1.7; Q4, 48.2. The first analysis scenario (Bootstrap then MI) produced a very similar distribution (Q1: 45.6, Q2: 8.0, Q3: 2.3, Q4: 44.1). The second scenario (MI then Bootstrap) also achieved an almost identical distribution of cost-effectiveness points (Q1: 44.1, Q2: 9.21, Q3: 3.99, Q4: 42.5). In all scenarios, the intervention was judged to be either dominant (Q4) or cost-effective (Q1).
CONCLUSIONS: The sequence in which MI and Bootstrap were used did not have a significant effect on the cost-effectiveness results obtained using data collected during a RCT. Further studies with different missing data patterns are needed to ascertain these conclusions.
METHODS: Secondary analyses using cost and effectiveness data obtained during a pragmatic RCT (#NCT04009369, n=78, data missing at random). Analyses were performed on R software according to two scenarios. Scenario 1: The base sample was bootstrapped (1,000 samples, n=78 per sample), then each iteration was imputed using the MICE package (number of imputations according to percentage of missing data, predictive mean matching method for continuous variables). Imputed data were then analyzed and combined to create valid inferences for each Bootstrap iteration. Scenario 2: We inversed the order of multiple imputation and Bootstrap procedures. Cost-effectiveness planes (CEP) obtained using each scenario were compared to a reference scenario (complete case analysis) to assess if any differences were present.
RESULTS: The distribution (%) by quadrant (Q) of cost-effectiveness points in the reference case’s CEP was as follows: Q1, 44.2; Q2, 5.9; Q3, 1.7; Q4, 48.2. The first analysis scenario (Bootstrap then MI) produced a very similar distribution (Q1: 45.6, Q2: 8.0, Q3: 2.3, Q4: 44.1). The second scenario (MI then Bootstrap) also achieved an almost identical distribution of cost-effectiveness points (Q1: 44.1, Q2: 9.21, Q3: 3.99, Q4: 42.5). In all scenarios, the intervention was judged to be either dominant (Q4) or cost-effective (Q1).
CONCLUSIONS: The sequence in which MI and Bootstrap were used did not have a significant effect on the cost-effectiveness results obtained using data collected during a RCT. Further studies with different missing data patterns are needed to ascertain these conclusions.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
MSR94
Topic
Methodological & Statistical Research
Topic Subcategory
Missing Data, PRO & Related Methods
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, SDC: Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal)