Li Zhng , Jinping Hung , ?, Hipeng Yu , Xioyue Liu , Yun Wei , Xino Lin , Chuwei Liu ,Zhikun Jing
a Collaborative Innovation Center for Western Ecological Safety, Lanzhou University, Lanzhou, China
b Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou, China
Keywords:COVID-19 Statistical method Levenberg—Marquardt algorithm SIR model
A B S T R A C T At the time of writing, coronavirus disease 2019 (COVID-19) is seriously threatening human lives and health throughout the world. Many epidemic models have been developed to provide references for decision-making by governments and the World Health Organization. To capture and understand the characteristics of the epidemic trend, parameter optimization algorithms are needed to obtain model parameters. In this study, the authors propose using the Levenberg—Marquardt algorithm (LMA) to identify epidemic models. This algorithm combines the advantage of the Gauss—Newton method and gradient descent method and has improved the stability of parameters. The authors selected four countries with relatively high numbers of confirmed cases to verify the advantages of the Levenberg—Marquardt algorithm over the traditional epidemiological model method. The results show that the Statistical-SIR (Statistical-Susceptible—Infected—Recovered) model using LMA can fit the actual curve of the epidemic well, while the epidemic simulation of the traditional model evolves too fast and the peak value is too high to reflect the real situation.
The novel and highly contagious coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) has broken out worldwide since early December 2019, claiming 659 077 lives and having infected 16 670 063 people as of 29 July 2020 (COVID-19 Data Repository of the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University 2020 ). Countries around the world have taken various control measures to mitigate the spread of the virus, including social distancing measures, wearing masks, and so on ( Lewnard and Lo, 2020 ; Maier and Brockmann, 2020 ).However, the daily number of newly confirmed cases throughout the world has almost exceeded 300 000, which means the containment measures have not been strictly enforced. Furthermore, the pandemic not only poses an imminent threat to people’s lives and health, but also raises very serious socioeconomic problems ( Fernandes, 2020 ). If we fail to come together and adopt more scientific and strict control measures to deal with it, the pandemic will get worse. What is more worrying is that the epidemic had combined with other disasters, and possibly even coexist with humans for the long term, which would be an unbearable catastrophe for everyone ( Kissler et al., 2020 ; Salas et al., 2020 ).
Besides social distancing, quarantine measures and research into therapeutic drugs and vaccines, the study of mathematical models is also crucial, which can provide early warning to people and help governments allocate medical resources to save more lives( Enserink and Kupferschmidt, 2020 ). Many traditional and generalized models are used or have been developed to simulate the spread of COVID-19 and study the dynamics of epidemic transmission,such as the Susceptible—Infected—Recovered (SIR) model, Susceptible—Exposed—Infectious—Removed (SEIR) model, and other modified models( Alvarez et al., 2020 ; Hou et al., 2020 ; Maier and Brockmann, 2020 ;Prem et al., 2020 ; Singh and Adhikari, 2020 ; Yang et al., 2020 ).Calafiore et al. (2020) proposed a modified SIR model considering the proportionality factor relating to reported confirmed cases and unreported confirmed cases. Peng et al. (2020) developed a generalized SEIR model to project the evolution of the epidemic in China.Davies et al. (2020) introduced age-dependent effects into the projection of the COVID-19 epidemic based on a modified SEIR model.Huang et al. (2020) developed a global prediction system for the pandemic (GPCP) by taking into account environmental and temperature factors based on a statistical—dynamic method.
The real parameters of traditional epidemiological models, including infection rate and withdrawal rate are very difficult to derive, even using modern-day big data technology ( Zhou et al., 2020 ). The predictions of traditional epidemic models tend to be on the high side. Similar to the statistical dynamic forecast method in weather forecasting, using statistical methods to obtain epidemiological model parameters is an effective method. Yu et al. (2014a , 2014b ) developed an analogue—dynamical method by combining statistical methods and dynamical methods, which improved the accuracy of medium- and extended-range forecasts effectively. Zheng et al. (2013) filtered the poorly predictable components of a numerical model and used the analogue—dynamical approach to improve the forecast skill of the predictable components.Godio et al. (2020) proposed using a Particle Swarm Optimization solver to identify a generalized SEIR model, which improves the reliability of the parameters. Simha et al. (2020) developed a simple stochastic SIR model and optimized their model parameters by minimizing the square integral error, terminal error, and terminal rate error simultaneously between officially reported data and model simulation data.
In this paper, we propose using a damped least-squares method called the Levenberg—Marquardt algorithm to identify the modified SIR model developed by Huang et al. (2020) . The Levenberg—Marquardt algorithm is a linear combination of the least-squares method and the gradient descent method ( Madsen et al., 2004 ).
The global epidemic data are from the COVID-19 Data Repository of the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University ( https://github.com/CSSEGISandData/COVID-19/tree/master/csse _ covid _ 19 _ data ) ( Dong et al., 2020 ). They obtained these data from authorities around the world, like the World Health Organization, European Center for Disease Prevention and Control, US CDC,COVID Tracking Project, National Health Commission of the People’s Republic of China, and so on. We used “csse_covid_19_time_series ”data to obtain cumulative cases and daily new cases of confirmed, recovered and death cases to invert the parameters.
The equations of the traditional SIR model are as follows:
S
is the number of susceptible cases,I
is the number of infected cases,R
is the number of recovered cases and deaths, andN
is the population size.β
is the number of cases infected by one infected case, andγ
is the probability of the infected cases withdrawing from the epidemiological system (mortality rate plus cure rate).We assumed the population of each country remains unchanged during the epidemic andN
=I
+S
+R
. We then substitute this equation into Eq. (2) to obtain the modified model used in the Global Prediction System for COVID-19, which is described below:rτ
∕N
=r
β
-γ
,ε
=r
β
∕N
.r
is the number of cases who came into contact with an infected person,τ
is the infection rate.To identify these parameters, we used a damped least-squares method, the Levenberg—Marquardt algorithm, which is improved from the Gauss—Newton method shown below. Given a vector functionf
(x
,α)
, we want to optimize the parametersα
of the functionf
(x
,α
) by minimizing the sum of the squares of the deviations:α
,likeα
= (1,1,1,1…,1). Then, in each step of the iteration loop, the parametersα
will be replaced byα
+δ
, andδ
is the increment. The functions off
(x
i
,
α
) are linearly approximated by a Jacobian matrix:Or, in vector notation,
δ
and making the result of the operation equal to 0 givesJ
is the Jacobian matrix andJ
represents thei
th row of the Jacobian matrix, and thei
th component off
andy
aref
andy
, respectively.This is a system of linear equations and can be solved forδ
.The above is the solution of the traditional least-squares method.Levenberg improved Eq. (11) by adding a damped coefficientλ
:M
is the identity matrix.Fletcher (1971) found if the damped coefficientλ
is too large, thenJ
J
+λM
will be not used at all, so he scaled each gradient’s component according to the curvature of the function and replaced the identity matrix with a diagonal matrix diag (J
J
), which constitutes the final Levenberg—Marquardt algorithm:In the model of the Global Prediction System for COVID-19, the optimization function is
T
is the number of daily new confirmed cases;r,
β
, andε
are parameters that need to be optimized. Firstly, determineR
in Eq. (5) using the front difference method and then substitute it into Eq. (14) for the optimal parameter inversion to obtain the parametersr,
β
, andε
.The Levenberg—Marquardt algorithm introduces a damped coeffi-cient into the traditional least-squares method in the calculation of the Hessian matrix. For all the damped coefficients greater than 0, the parameter matrix will be positive definite, which ensures that the Hessian matrix is in the direction of descent. It improves the robustness and parameter stability of the traditional algorithm. We define the modified SIR model of the GPCP using this algorithm as a Statistical-SIR model.This statistical model can combine well the actual observational data with the dynamic mechanism of the spread of the epidemic and fit the epidemic curve of each country to capture the characteristics of the epidemic’s development, thereby providing the possibility for accurate epidemic forecasts.
β
= 0.4 andγ
= 0.15. The parameters of the Statistical-SIR model are presented in Table 1 .Fig. 1. Time series of daily new cases in four countries (Brazil, India, Africa, Mexico) between 22 March and 18 July 2020. The simulation of the Statistical-SIR model is shown by the green line, the simulation of the traditional SIR model is shown by the blue line, and the reported cases are denoted by the red line.
Table 1 The parameters obtained from the Statistical-SIR model.
Fig. 2. (a) Differences in daily new cases in each country between the traditional SIR model and reported cases on 31 May 2020. (b) Differences in daily new cases in each country between the Statistical-SIR model and reported cases on 31 May 2020.
Fig. 3. RMSE (root-mean-square error) varying with the number of iterations of four countries.
On the one hand, the epidemic simulation of the traditional SIR model evolves much faster than the actual situation. The peak of the epidemic simulation of the four countries occurred at the end of May to early June, while the actual situation of these countries was that they were still in a rapid outbreak stage. Also, the projection shows that the epidemic will end within two months, which is seriously inconsistent with reality. On the other hand, the peak value of the traditional model is much higher than the reported data. Although the number of actual cases is considered higher than the number of reported cases ( Calafiore et al., 2020 ; Horta?su et al., 2020 ; Liu et al., 2020 ), the epidemic development trend simulated by the Statistical-SIR model is basically in line with the facts, which can provide a reference for policymakers and medical resource allocation. In fact, limited by medical conditions and testing levels, the reported number of newly confirmed daily cases in India and Brazil cannot reach 200 000 and 300 000, respectively. However, the simulation of the Statistical-SIR model can reflect well the actual outbreak trend of the epidemic in these four countries.The fitting curve is highly consistent with the actual evolution curve of the epidemic, which can provide relatively accurate future trends on this basis.
The four countries are all developing and populous countries. Poor sanitary conditions and densely populated living environments provide an ideal environment for the spread of the virus ( Chatterjee et al., 2020 ;Pequeno et al., 2020 ). The epidemic in these four countries is difficult to control effectively in the short term. Therefore, it is extremely urgent to use models to simulate the development trend of the epidemic to optimize the allocation of limited medical resources to save more lives.
Fig. 2 shows the differences of the traditional model and Statistical-SIR model compared with reported new cases in each country around the world on 31 May 2020, respectively. Most countries peaked on 31 May in the simulation of the traditional model, but in fact they are still in the outbreak phase. The differences in the daily new cases between the traditional SIR model and reported cases in the United States, Brazil, China,Russia, India, and Southeast Asian countries are more than 60 000. Errors in European countries, Central African countries, and some South American countries are more than 30 000. Meanwhile, the errors of the Statistical-SIR model are much smaller than the traditional model, except for the United States and Brazil, which is due to the fluctuation of the real epidemic curve.
Using the Levenberg—Marquardt algorithm, the identification of the Statistical-SIR model converges quickly. The inversion of parameters for Brazil and India reached convergence after only two iterations, shown in Fig. 3 . South Africa and Mexico also started to decline slowly and reached convergence after the seventh and second iteration, respectively. In general, this algorithm can quickly converge after a few iterations, which can save computing resources and reverse the model parameters in time.
The COVID-19 pandemic is posing a serious threat to the health systems of countries around the world ( Sohrabi et al., 2020 ). Governments of all countries need to take reasonable measures that offer a balance between maintaining economic development and protecting people’s lives and property ( Mckibbin and Fernando, 2020 ). Therefore, before vaccines and specific therapeutic drugs are developed, it is important to use numerical models to predict the development of the epidemic.This paper proposes to use a Statistical-SIR model using the Levenberg—Marquardt algorithm to simulate the epidemic of countries all over the world. The Levenberg—Marquardt algorithm is established on the basis of the Gauss—Newton algorithm by introducing a damped coefficient.We chose four countries where the epidemic is severe to compare the simulation results of the traditional SIR model and the Statistical-SIR model. These four countries are currently in the rapid outbreak stage.All the results of the traditional model for the four countries were seriously inconsistent with the actual situation. The main problems with the traditional model are that the simulation of the epidemic evolves too fast and the peak is overestimated. Compared to the traditional model,the Statistical-SIR model can obtain more accurate parameters to fit the actual development curve of the epidemic. The fitting results are highly consistent with the actual situation, especially in countries where the epidemic is rising. This simple and practical model can help people better understand the development of the epidemic and take better response measures. This model can provide a qualitative judgment of the development trend of the epidemic and a deeper understanding of the dynamic mechanism of the spread of COVID-19.
However, due to the inaccuracy of the reported numbers, the infected and withdrawal rate obtained by the inversion may not be exactly the same as the real value ( CDC COVID-19 Response Team 2020 ). Additionally, the inversion result will be affected by the initial guess if it is too far away from the optimal value. Besides the Levenberg—Marquardt algorithm, there are countless other parameter optimization algorithms that can be used to invert the parameters of epidemic models —for example, numerous machine learning algorithms and various improved nonlinear inversion algorithms ( Hu et al., 2020 ). We need to develop the best possible epidemic model parameter optimization algorithm to obtain more accurate parameters so that we can better respond to the COVID-19 epidemic crisis and ultimately overcome it.
Funding
This work was jointly supported by the National Natural Science Foundation of China [grant number 41521004 ] and the Gansu Provincial Special Fund Project for Guiding Scientific and Technological Innovation and Development [grant number 2019ZX-06 ].
Acknowledgments
The authors acknowledge the Center for Systems Science and Engineering at Johns Hopkins University for providing the COVID-19 data.
Disclosure statement
No potential conflict of interest was reported by the authors.
Atmospheric and Oceanic Science Letters2021年4期