The Use of Machine Learning Techniques for Healthcare Cost Prediction: A Preliminary Analysis of Administrative Databases From a Health Protection Agency in Northern Italy
Speaker(s)
Sala I1, Rozza D2, Losa L2, Conti S3, Ferrara P4, Bagnardi V5, Fornari C2, Zucchi A6, Ciampichini R6, Sampietro G6, Mantovani LG2
1University of Milano-Bicocca, Milano, MI, Italy, 2University of Milano-Bicocca, Monza, MB, Italy, 3University of Milan-Bicocca, Milan, MI, Italy, 4Research Center on Public Health (CESP), Department of Medicine and Surgery, University of Milano-Bicocca, Monza, MB, Italy, 5University of Milano-Bicocca, Milan, MI, Italy, 6ATS Bergamo, Bergamo, BG, Italy
Presentation Documents
OBJECTIVES: Predicting future expenditures is crucial for healthcare planning, as longer life expectancies and increasing prevalence of chronic diseases drive up demand for healthcare services and related costs. This is the first step of a study aimed to predict direct healthcare costs for the Italian National Healthcare Service (NHS) through the identification of homogeneous population segments for healthcare expenditures.
METHODS: For each individual aged ≥18 covered by the Health Protection Agency of Bergamo (Northern Italy) between 2010 and 2022, access to inpatient and outpatient services and total annual costs based on healthcare service tariffs, were traced from administrative healthcare databases.
A regression tree was built to predict yearly costs for subjects in 2011 based on their access history to the NHS in 2010. The tree identified homogeneous segments for which it provided mean predicted costs. For each segment, the ratio of the difference between total predicted and observed costs to total observed cost was calculated and used as measure of prediction error.RESULTS: In 2010, 70.7% of the 902,023 included subjects used at least one inpatient (n=99,860) and/or outpatient service (n=631,451), for a total cost of €692,298,400. High-cost subjects (>€15,000 yearly), accounting for 0.8% of the population, absorbed 28.7% of total costs.
The tree identified 21 segments, with mean predicted costs for 2011 ranging from €282 (subjects aged <60 with at most one follow-up visit) to €35,000 (dialysis subjects). Prediction error was highest in the segment of subjects aged <60 with at least three follow-up visits and at most one hospitalization (ratio=19.74%), and it was lowest in the segment of subjects with two hospitalizations and at least two CT scans (ratio=0.32%).CONCLUSIONS: These preliminary results identify several short-term predictors of direct costs for the NHS. To achieve more accurate predictions, further ensemble methods, such as random forests, will be employed.
Code
EPH23
Topic
Epidemiology & Public Health, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Public Health
Disease
No Additional Disease & Conditions/Specialized Treatment Areas