A Stacked Ensemble Approach for Building Composite Measure Models in Healthcare

Speaker(s)

Shi P1, Shang Q2, Zhao J2, Tregear S2, Fuller E2, Zhang S2
1Booz Allen Hamilton, Mclean, VA, USA, 2Bool Allen Hamiltion, Mclean, VA, USA

OBJECTIVES: It is a common practice to employ multiple quality measures in assessing the performance of healthcare program evaluations. The traditional approach involves building a separate model for each measure and subsequently attempting to derive a composite score for an overall program impact. However, this method has its drawbacks, as the measures often exhibit some degree of correlation. The single measure, single model approach tends to overlook such correlations, potentially resulting in biased estimates of program impact. Additionally, determining appropriate weights for each measure in the composite score can be a complex task.This paper proposes a novel solution in the form of a stacked ensemble approach. A multi-output model will be constructed to encompass all measures within a single model, allowing for the consideration of correlations among measures. Furthermore, the SHapley Additive exPlanations (SHAP) will be employed to calculate individual measure contributions, even in the presence of multicollinearity among the measures.

METHODS: We begin by employing simulation data to assess and compare the performance of a multi-output model against the conventional one-measure, one-model approach. The evaluation is based on Mean Squared Errors to gauge model effectiveness. Subsequently, we illustrate the application of SHAP values for calculating the contribution of each measure to healthcare expenditure, addressing the challenges posed by multicollinearity—a significant obstacle in traditional statistical approaches

RESULTS: For simulation data, our results showed the multi-output model always achieved smaller Mean Squared Errors (MSEs) compared to the single output model and produced more consistent MSEs. Regardless of correlation coefficients, the multi-output MSEs hovered around 50. Among single models, however, the MSEs ranged from 54 to 82.

CONCLUSIONS: Our hybrid approach is advantageous when there are interdependencies or correlations among the target variables, allowing the model to capture complex relationships and dependencies more effectively.

Code

MSR99

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas