Comparing Methodologies to Predict Incidence of COVID-19 in US Counties
Author(s)
Coplan P1, Shah S2, Bhardwaj A3, Gurubaran A3, Dwarakanathan H3, Cafri G4, Chitnis A1, Khanna R5, Kakade O3, Nandi B3, Holy C6
1Johnson & Johnson, New Brunswick, NJ, USA, 2Mu Sigma, Bangalore, KA, India, 3Mu Sigma, Bengaluru, KA, India, 4Johnson & Johnson, San Diego, CA, USA, 5Johnson & Johnson Co., New Brunswick, NJ, USA, 6Johnson & Johnson, Somerville, MA, USA
OBJECTIVES: With the spread of the SARS-CoV-2 virus worldwide, governments have adopted stringent measures to prevent disease spread. As lockdowns are being eased, models to evaluate potential resurgence of disease are increasingly important. The aim of this study is to compare methodologies to predict incidence of COVID-19 for US counties. METHODS: Reported number of COVID-19 positive cases were obtained from CDC, Social distancing scores (SDS) from Unacast, Population Density from the US Census data and testing rates obtained from the CDC website. The data assessed was during the period February 28, 2020 to May 28, 2020. Poisson and linear regression models were built to predict the number of reported cases using 1-week lagged SDS, tests per day and population density. Damped Holt linear trend (DHLT) coefficients and moving averages were calculated by using the daily number of cases in the latest 14 days. All the models were built at a county level. The following 4 methodologies were compared: Poisson Regression, Linear Regression, DHLT and simple moving average (SMA). Data from the month of June was used to validate the results. RESULTS: US Counties were ranked in terms of annualized incidence of disease from highest to lowest and the top 100 counties were identified for each methodology. Counties that were predicted to be within the top 100 were compared to those that ended up being in the top 100, as per reported counts. The Poisson and linear regressions both correctly identified 45 out of top 100 counties. Whereas SMA and DHLT only identified 36 and 29 counties, respectively. CONCLUSIONS: Linear Regression and Poisson regression were the most accurate in predicting high incidence. In our study, confounding factors like usage of masks or changes in behaviors were not included. Further research on these different factors are needed to improve prediction accuracy.
Conference/Value in Health Info
2020-11, ISPOR Europe 2020, Milan, Italy
Value in Health, Volume 23, Issue S2 (December 2020)
Code
PIN87
Topic
Epidemiology & Public Health
Topic Subcategory
Public Health
Disease
Infectious Disease (non-vaccine)