Comparing Methodologies to Predict Incidence of COVID-19 in US Counties

Author(s)

Coplan P¹, Shah S², Bhardwaj A³, Gurubaran A³, Dwarakanathan H³, Cafri G⁴, Chitnis A¹, Khanna R⁵, Kakade O³, Nandi B³, Holy C⁶
¹Johnson & Johnson, New Brunswick, NJ, USA, ²Mu Sigma, Bangalore, KA, India, ³Mu Sigma, Bengaluru, KA, India, ⁴Johnson & Johnson, San Diego, CA, USA, ⁵Johnson & Johnson Co., New Brunswick, NJ, USA, ⁶Johnson & Johnson, Somerville, MA, USA

OBJECTIVES:

With the spread of the SARS-CoV-2 virus worldwide, governments have adopted stringent measures to prevent disease spread. As lockdowns are being eased, models to evaluate potential resurgence of disease are increasingly important. The aim of this study is to compare methodologies to predict incidence of COVID-19 for US counties.

METHODS:

Reported number of COVID-19 positive cases were obtained from CDC, Social distancing scores (SDS) from Unacast, Population Density from the US Census data and testing rates obtained from the CDC website. The data assessed was during the period February 28, 2020 to May 28, 2020. Poisson and linear regression models were built to predict the number of reported cases using 1-week lagged SDS, tests per day and population density. Damped Holt linear trend (DHLT) coefficients and moving averages were calculated by using the daily number of cases in the latest 14 days. All the models were built at a county level. The following 4 methodologies were compared: Poisson Regression, Linear Regression, DHLT and simple moving average (SMA). Data from the month of June was used to validate the results.

RESULTS:

US Counties were ranked in terms of annualized incidence of disease from highest to lowest and the top 100 counties were identified for each methodology. Counties that were predicted to be within the top 100 were compared to those that ended up being in the top 100, as per reported counts. The Poisson and linear regressions both correctly identified 45 out of top 100 counties. Whereas SMA and DHLT only identified 36 and 29 counties, respectively.

CONCLUSIONS:

Linear Regression and Poisson regression were the most accurate in predicting high incidence. In our study, confounding factors like usage of masks or changes in behaviors were not included. Further research on these different factors are needed to improve prediction accuracy.

Conference/Value in Health Info

2020-11, ISPOR Europe 2020, Milan, Italy

Value in Health, Volume 23, Issue S2 (December 2020)

Code

PIN87

Topic

Epidemiology & Public Health

Topic Subcategory

Public Health

Disease

Infectious Disease (non-vaccine)

Explore Related HEOR by Topic

Epidemiology

Presentation