^{1}

^{1}

Severe acute respiratory syndrome coronavirus 2 is the causative agent of the ongoing coronavirus disease pandemic. Initial estimates of the early dynamics of the outbreak in Wuhan, China, suggested a doubling time of the number of infected persons of 6–7 days and a basic reproductive number (R_{0}) of 2.2–2.7. We collected extensive individual case reports across China and estimated key epidemiologic parameters, including the incubation period (4.2 days). We then designed 2 mathematical modeling approaches to infer the outbreak dynamics in Wuhan by using high-resolution domestic travel and infection data. Results show that the doubling time early in the epidemic in Wuhan was 2.3–3.3 days. Assuming a serial interval of 6–9 days, we calculated a median R_{0} value of 5.7 (95% CI 3.8–8.9). We further show that active surveillance, contact tracing, quarantine, and early strong social distancing efforts are needed to stop transmission of the virus.

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the etiologic agent of the current rapidly growing outbreak of coronavirus disease (COVID-19), originating from the city of Wuhan, Hubei Province, China (

Studying dynamics of a newly emerged and rapidly growing infectious disease outbreak, such as COVID-19, is important but challenging because of the limited amount of data available. In addition, unavailability of diagnostic reagents early in the outbreak, changes in surveillance intensity and case definitions, and overwhelmed healthcare systems confound estimates of the growth of the outbreak based on data. Initial estimates of the exponential growth rate of the outbreak were 0.1–0.14/day (a doubling time of 6–7 days), and a basic reproductive number (R_{0}; defined as the average number of secondary cases attributable to infection by an index case after that case is introduced into a susceptible population) ranged from 2.2 to 2.7 (

We collected an expanded set of case reports across China on the basis of publicly available information, estimated key epidemiologic parameters, and provided a new estimate of the early epidemic growth rate and R_{0} in Wuhan. Our approaches are based on integration of high-resolution domestic travel data and early infection data reported in provinces other than Hubei to infer outbreak dynamics in Wuhan. They are designed to be less sensitive to biases and confounding factors in the data and model assumptions. Without directly using case confirmation data in Wuhan, we avoid the potential biases in reporting and case confirmation in Wuhan, whereas because of the high level of domestic travel before the Lunar New Year in China, inference based on these data minimizes uncertainties and risk for potential misspecifications and biases in data and model assumptions.

We developed 2 modeling approaches to infer the growth rate of the outbreak in Wuhan from data from provinces other than Hubei. In the first model, the first arrival model, we computed the likelihood of the arrival times of the first known cases in provinces outside of Hubei as a function of the exponential growing population of infected persons in Wuhan before late January. This calculation involved using domestic travel data to compute the probability that an infected person traveled from Wuhan to a given province as a function of the unknown actual number of infected persons in Wuhan and the probability that they traveled. The timings of the arrivals of the first infected persons in different provinces would reflect the rate of the epidemic growth in Wuhan.

In the second model, the case count model, we accounted for the detection of additional persons who were infected in Wuhan and received their diagnoses in other provinces and explicitly modeled those persons by using a hybrid deterministic–stochastic SEIR (susceptible-exposed-infectious-recovered) model. We then fitted this model to new daily case count data reported outside Hubei Province during the period before substantial transmission occurred outside of the province.

By using data collected outside Hubei Province, we minimized the effect of changes in surveillance intensity. By the time cases were confirmed in provinces outside Hubei, all of the provinces of China had access to diagnostic kits and were engaging in active surveillance of travelers out of Wuhan (e.g., using temperatures detectors and digital data to identify infected persons [

We collected publicly available reports of 140 confirmed COVID-19 cases (mostly outside Hubei Province). These reports were published by the Chinese Centers for Disease Control and Prevention (China CDC) and provincial health commissions; accession dates were January 15–30, 2020 (

We used the Baidu Migration server (

We considered realistic distributions for the latent and infectious periods to calculate R_{0}. We described the methods we used to calculate R_{0} and the effect of intervention strategies on the outbreak (

We first translated reports from documents or news reports published daily from the China CDC website and official websites of health commissions across provinces and special municipalities in China during January 15–30, 2020. Altogether, we collected 137 individual case reports from China and 3 additional case reports from outside of China (

By using this dataset, we estimated the basic parameter distributions of durations from initial exposure to symptom onset to hospitalization to discharge or death. Our estimate of the time from initial exposure to symptom onset (i.e., the incubation period) is 4.2 days (95% CI 3.5–5.1 days) (

Epidemiologic characteristics of early dynamics of coronavirus disease outbreak in China. Distributions of key epidemiologic parameters: durations from infection to symptom onset (A), from symptom onset to hospitalization (B), from hospitalization to discharge (C), and from hospitalization to death (D). Filled circles and bars on

The time from symptom onset to hospitalization showed evidence of time dependence (

Moving from empirical estimates of basic epidemiologic parameters to an understanding of the early growth rates of COVID-19 requires model-based inference and data. We first collected real-time travel data during the epidemic by using the Baidu Migration server, which provides real-time travel patterns in China based on mobile-phone positioning services (

Extremely high level of travel from Wuhan, Hubei Province, to other provinces during January 2020, as estimated by using high-resolution and real-time travel data, China. A) A modified snapshot of the Baidu Migration online server interface showing the human migration pattern out of Wuhan (red dot) on January 19, 2020. Thickness of curved white lines denotes the size of the traveler population to each province. The names of most of the provinces are shown in white. B) Estimated daily population sizes of travelers from Wuhan to other provinces.

We then integrated spatiotemporal domestic travel data to infer the outbreak dynamics in Wuhan by using two mathematical approaches (_{0} is the theoretical time of the exponential growth initiation, so that ^{*}(_{0}) = 1 in the deterministic model. We call _{0} a “theoretical” time in the sense that it should not be interpreted as the time of first infection in a population. We should expect that _{0} is later than the date of the first infection because multiple spillover events from the animal reservoir might be needed to establish sustained transmission and stochasticity might play a large role in initial dynamics before the onset of exponential growth (

Estimates of the exponential growth rate and the date of exponential growth initiation of the 2019 novel coronavirus disease outbreak in China based on 2 different approaches. A) Schematic illustrating the export of infected persons from Wuhan. Travelers (dots) are assumed to be random samples from the total population (whole pie). Because of the growth of the infected population (orange pie) and the shrinking size of the total population in Wuhan over time, probability of infected persons traveling to other provinces increases (orange dots). B) The dates of documented first arrivals of infected persons in 26 provinces. C) Best fit of the case count model to daily counts of new cases (including only imported cases) in provinces other than Hubei. Error bars indicate SDs.

We used travel data for each of the provinces (_{0} (_{0} is estimated to be December 20, 2019 (95% CI December 11–26). As we show later, there exist larger uncertainties in the estimation of _{0}.

We further estimated that the total infected population size in Wuhan was ≈4,100 (95% CI 2,423–6,178) on January 18 (

An alternative model, the case count approach, used daily new case counts of persons who had COVID-19 diagnosed in other provinces but who had been in Hubei Province within 14 days of becoming symptomatic. This model uses data beyond the first appearance of an infected person from Wuhan but also accounts for the stochastic nature of the process by using a hybrid model. In this model, the infected population in Wuhan was described with a deterministic model, whereas the infected persons who traveled from Wuhan to other provinces were tracked with a stochastic SEIR (susceptible-exposed-infectious-recovered) model (_{0}) is December 16, 2019 (95% CI December 12–21), and the exponential growth rate is 0.30/day (95% CI 0.26–0.34/day). These estimates are consistent with estimates in the first arrival approach (

Marginalized likelihoods of growth rate (

In both models, we assumed perfect detection (i.e., of infected cases outside of Hubei Province). However, a certain fraction of cases probably was not reported. To investigate the robustness of our estimates, we performed extensive sensitivity analyses to test 23 different scenarios of surveillance intensity (_{0}would be earlier than our estimate but the estimation of the growth rate remained the same (

In addition to using 2 modeling approaches, we looked for other evidence of a high outbreak growth rate to cross-validate our estimations. We found that the time series of reported deaths in Hubei, which is less subject to the biases of the confirmed case counts, is simply not consistent with a growth rate of 0.1/day (

Overall, these analyses suggest that although there exist uncertainties depending on the level of surveillance, the exponential growth rate of the outbreak is probably 0.21–0.3/day. This estimation is much higher than previous reports, in which the growth rate was estimated to be 0.1–0.14/day (

The basic reproductive number, R_{0}, is dependent on the exponential growth rate of an outbreak, as well as additional factors such as the latent period (the time from infection to infectiousness) and the infectious period (_{0} is in general high and the longer the latent and the infectious periods, the higher the estimated R_{0} (

To derive realistic values of R_{0}, we used previous estimates of serial intervals for COVID-19. The serial interval is estimated to be ≈7–8 days based on data collected early in the outbreak in Wuhan (_{0} to be 5.8 (95% CI 4.4–7.7) (_{0} is 5.7 (95% CI of 3.8–8.9) (_{0} can be lower if the serial interval is shorter. However, recent studies reported that persons can be infectious for a long period, such as 1–3 weeks after symptom onset (

Estimation of the basic reproductive number (R_{0}), derived by integrating uncertainties in parameter values, during the coronavirus disease outbreak in China. A) Changes in R_{0} based on different growth rates and serial intervals. Each dot represents a calculation with mean latent period (range 2.2–6 days) and mean infectious periods (range 4–14 days). Only those estimates falling within the range of serial intervals of interests were plotted. B) Histogram summarizing the estimated R_{0} of all dots in panel A (i.e., serial interval ranges of 6–9 days). The median R_{0} is 5.7 (95% CI 3.8–8.9).

The R_{0}_{0}. At R_{0} = 2.2, this threshold is only 55%. But at R_{0} = 5.7, this threshold rises to 82% (i.e., >82% of the population has to be immune, through either vaccination or prior infection, to achieve herd immunity to stop transmission).

We then evaluated the effectiveness for nonpharmaceutical interventions, such as contact tracing, quarantine, and social distancing, by using the framework by Lipsitch et al. (

Levels of minimum efforts of intervention strategies needed to control the spread of severe acute respiratory syndrome coronavirus 2, (i.e. reducing the reproductive number to <1), during the coronavirus disease outbreak in China. Strategies considered were quarantine of infected persons and persons who had contact with them (

In this study, we estimated several basic epidemiologic parameters, including the incubation period (4.2 days), a time dependent duration from symptom onset to hospitalization (changing from 5.5 days in early January to 1.5 days in late January outside Hubei Province), and the time from symptom onset to death (16.1 days). By using 2 distinct approaches, we estimated the growth rate of the early outbreak in Wuhan to be 0.21–0.30 per day (a doubling time of 2.3–3.3 days), suggesting a much faster rate of spread than initially measured. This finding would have important implications for forecasting epidemic trajectories and the effect on healthcare systems as well as for evaluating the effectiveness of intervention strategies.

We found R_{0} is likely to be 5.7 given our current state of knowledge, with a broad 95% CI (3.8–8.9). Among many factors, the lack of awareness of this new pathogen and the Lunar New Year travel and gathering in early and mid-January 2020 might or might not play a role in the high R_{0}. A recent study based on structural analysis of the virus particles suggests SARS-CoV-2 has a much higher affinity to the receptor needed for cell entry than the 2003 SARS virus (

How contagious SARS-CoV-2 is in other countries remains to be seen. Given the rapid rate of spread as seen in current outbreaks in Europe, we need to be aware of the difficulty of controlling SARS-CoV-2 once it establishes sustained human-to-human transmission in a new population (

Additional data for study of high contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2.

Additional methods used for study of high contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2.

These first authors contributed equally to this article.

We thank Alan Perelson, Christiaan van Dorp, and Ruy Ribeiro for suggestions and critical reading of the manuscript and Weili Yin for help with collecting and translating documents from provincial health commission websites.

S.S. and R.K. received funding from the Defense Advanced Research Projects Agency (grant no. HR0011938513) and the Laboratory Directed Research and Development Rapid Response Program through the Center for Nonlinear Studies at Los Alamos National Laboratory. C.X. received funding from Laboratory Directed Research and Development Program. E.R.S. received funding from the National Institutes of Health (grant no. R01AI135946). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author contributions: R.K. and N.H. conceived the project; R.K. collected data; S.S., Y.T.L., C.X., and R.K. performed analyses; S.S., Y.T.L., E.R.S., N.H., and R.K. wrote and edited the manuscript.

Authors declare no competing interests. All data are available in the main text and in Appendices 1 and 2.

Dr. Sanche is a postdoctoral research associate at Los Alamos National Laboratory, Los Alamos, New Mexico, USA. His primary research interest lies in complex disease dynamics inferred from data science and mathematical modeling. Dr. Lin is also a postdoctoral research associate at Los Alamos National Laboratory. His primary research interest lies in applied stochastic processes, biological physics, statistical inference, and computational system biology.