Machine Learning Insights Into COVID-19 Variant Spread Across US Regions
Author(s)
Hu L1, Zhang X1, Weimer I2, Yapici HO1, Shenoy A1, Lodaya K1, D'Souza F1
1Boston Strategic Partners, Inc., Boston, MA, USA, 2Boston Strategic Partners, Inc., Saint Paul, MN, USA
Presentation Documents
OBJECTIVES: This study uses machine learning techniques, specifically random forest regression, to analyze the influence of various regional or temporal factors on the percentage of key COVID-19 variants. This study aims to uncover predictors of variant prevalence and contribute to a more data-driven approach to pandemic management.
METHODS: The study integrated data from the National COVID Cohort Collaborative (N3C), The Bureau of Transportation Statistics, World Weather Online, the United States Environmental Protection Agency, and US Census data to create a comprehensive predictive model. Random forest regression was used to analyze how different factors impact the spread of COVID-19 variants such as Delta, Omicron BA.5, Alpha, and XBB.1.5 across various U.S. regions, and how sensitive the variants are to different factors.
RESULTS: The model demonstrated high predictive accuracy, with R² values of 0.99 for Delta and BA.5, 0.98 for Alpha, and 0.97 for XBB.1.5, significantly surpassing the 0.78 R² value for a mixed-variant baseline. It revealed that the spread of Delta correlated strongly with ozone density, BA.5 with sun hours and UV index, Alpha with temperature and air quality, and XBB.1.5 with land area and income. These results suggest a complex interplay between environmental factors and variant spread. Studies have also shown that each variant has its favorable environment, for example, BA.5 is not as sensitive as others regarding UV index, and Delta is more sensitive to OZ density but less sensitive to temperature.
CONCLUSIONS: This study shows that machine learning is a useful tool for identifying the multifaceted contributors to COVID-19 variants. The results may shed light on targeted public health interventions and policies, highlighting the vital role of data-driven models during a pandemic.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
MSR6
Topic
Epidemiology & Public Health, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Public Health
Disease
Infectious Disease (non-vaccine), No Additional Disease & Conditions/Specialized Treatment Areas