Discovering spatiotemporal patterns of COVID-19 pandemic in South Korea

Discovering spatiotemporal patterns of COVID-19 pandemic in South Korea A novel severe acute respiratory syndrome coronavirus 2 emerged in December 2019, and it took only a few months for WHO to declare COVID-19 as a pandemic in March 2020. It is very challenging to discover complex spatial–temporal transmission mechanisms. However, it is crucial to capture essential features of regional-temporal patterns of COVID-19 to implement prompt and effective prevention or mitigation interventions. In this work, we develop a novel framework of compatible window-wise dynamic mode decomposition (CwDMD) for nonlinear infectious disease dynamics. The compatible window is a selected representative subdomain of time series data, in which compatibility between spatial and temporal resolutions is established so that DMD can provide meaningful data analysis. A total of four compatible windows have been selected from COVID-19 time-series data from January 20, 2020, to May 10, 2021, in South Korea. The spatiotemporal patterns of these four windows are then analyzed. Several hot and cold spots were identified, their spatial–temporal relationships, and some hidden regional patterns were discovered. Our analysis reveals that the first wave was contained in the Daegu and Gyeongbuk areas, but it spread rapidly to the whole of South Korea after the second wave. Later on, the spatial distribution is seen to become more homogeneous after the third wave. Our analysis also identifies that some patterns are not related to regional relevance. These findings have then been analyzed and associated with the inter-regional and local characteristics of South Korea. Thus, the present study is expected to provide public health officials helpful insights for future regional-temporal specific mitigation plans. A novel virus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was identified as the pathogen for the outbreak of COVID-19 in December 20191. Since then, the COVID-19 pandemic has posed huge challenges to public health officials all around the world. Due to the frequent international flights and human mobility, it took only a few months that COVID-19 spread to more than 200 countries. Currently, many developed countries are in the process of vaccinating their citizens, and some countries hope to soon achieve herd immunity2. In fact, the majority of countries with higher proportions of vaccination have shown a significant reduction in the number of COVID-19 cases and deaths from March to June 2021.Unfortunately, as of July 3, 2021, the confirmed COVID-19 cases have increased worldwide due to the Delta variant3. This is one of the new variants of COVID-19 and is a potential threat to the goal of herd immunity. At this point, there are a total of more than 180 million confirmed cases and nearly 4 million deaths in 220 countries4. Among others, the US, India, and Brazil are the top three countries of COVID-19 cumulative cases and deaths officially; the US (33,709,176; 605,524), India (30,502,362; 401,050), Brazil (18,687,469; 521,952), respectively. These numbers indicate officially reported cases and may be considerable underestimates due to false negatives5,6, lack of tracking systems7, and overloading of healthcare facilities8. Therefore, it is urgent to understand the spatial–temporal transmission dynamics of COVID-19 to propose effective interventions to mitigate and reduce further morbidity and mortality. Apparently, COVID-19 has disproportionately affected different regional, social, and economic statuses even in developed countries9,10,11. South Korea shows a significant level of variability in the spatiotemporal patterns of COVID-19 as well. As of March 9, 2020, South Korea had a total of 7382 confirmed cases and the largest outbreak of COVID-19 besides China12. This was mainly due to few super-spreading events at the Shincheonji Church in Daegu Province and Daenam health care facility in Gyeongsang Province from February 20 to March 20. As of July 3, 2021, the total confirmed cases and deaths of COVID-19 increased to 159,342 and 2025 in South Korea, respectively. The spatial and temporal heterogeneity of COVID-19 has changed over time.An in-depth understanding of COVID-19 requires the use of mathematical modeling, which has played an essential role to explain complex spatial and temporal transmission dynamics of various infectious diseases. These include recent emerging infectious diseases; novel H1N1 influenza, SARS-CoV-1, Zika, MERS-CoV, and SARS-CoV-213. Recent emerging infectious diseases tend to spread all over the world within a shorter time scale due to dramatic increases in international flights and human mobility13,14. There has been much research on spatial–temporal patterns of COVID-19 using various modeling approaches9,10,15,16. The spread of COVID-19 during an early stage of the pandemic in South Korea was investigated; 12 significant spatiotemporal clusters were identified and analyzed17. They observed that early interventions including 3T (test, trace, treat) were effective so that the cluster size and duration were shortened in time. Castro et al. investigated the spatial and temporal patterns of COVID-19 in Brazil and identified several key factors for failure of region-specific effective interventions10. Sartorius et al. employed a Bayesian hierarchical space-time SEIR model to assess the spatiotemporal variability of COVID-19 in England and they examined that mobility and social distancing played a critical role in the spatiotemporal patterns of mobility and mortality18. Wang et al. demonstrated the spatiotemporal characteristics and trends of COVID-19 in the United States and the various complex interactions with preventive efforts on COVID-19 were analyzed19. Bag et al. explored the spatiotemporal patterns of COVID-19 in India, and further, they examined the interplay between the space-specific patterns and governmental responses20.However, it is very challenging to discover spatial–temporal transmission mechanisms by the standard equation-based framework introduced above. In this work, we propose to discover the high complexity of spatial–temporal dynamics for COVID-19 transmission by employing a data-driven approach based on dynamic mode decomposition. The dynamic mode decomposition method (DMD) originated in the fluid dynamics community as a method to decompose complex flows into spatiotemporal coherent structures. DMD is a matrix-free, data-driven method capable of providing an accurate decomposition of a complex system into spatial–temporal coherent structures that may even be able to predict the short-time future state. Since Schmid and Sesterhenn21 first introduced the DMD algorithm and demonstrated its ability, there have been tremendous works in DMD, and DMD became even more popular and is still in development today. This includes a sparsity-promoting DMD22, a randomized DMD23, which scales with the intrinsic rank of the dynamics, a consistent DMD, a new method for computing DMD operator based on a variational framework24. DMD has been successfully used for computational epidemiology25. Bistrian et al.26 proposed a framework for reduced-order modeling and forecasting of non-intrusive data with application to epidemiology, using a technique based on randomized DMD combined with ARIMA (AutoRegressive Integrated Moving Average)27 and this has been used also for modeling of SARS-CoV-2 dynamics obtained from the raw data reported by World Health Organization28. Proctor et al.29 have demonstrated how DMD can aid in the analysis of spatial–temporal disease data. It is shown that DMD is an effective and efficient computational analysis tool for the study of infectious disease taking into account several tests’ data such as Google Flu Trends data, pre-vaccination measles in the UK, and paralytic poliomyelitis wild type-1 cases in Nigeria. We note though that in particular, Google Flu Trends data is shown to be overall more influenced by the media clamor than by true epidemiological burden as studied in30,31.In this paper, we propose a compatible window-wise dynamic mode decomposition (CwDMD). The notable difference of our work from other available works is that we tackle COVID-19 time series data in a way that the data sets are made to be consistent in the sense of Tu et al.32. Basically, the compatible window is a selected the data set that can be modeled by a linear operator, thereby making DMD analysis meaningful. Further, we show that the consistency is equivalent to the linearity and demonstrate that DMD produces misleading data interpretation for inconsistent or nonlinear data in general. This indicates that the direct and reliable DMD analysis of large time-series data such as COVID-19 data is not feasible. We develop a strategy to choose an adequate set of representative subdomains called windows in which an appropriate balance or compatibility between spatial and temporal resolutions is built. The total size-times duration of all the windows serving a given system depends only on local situations that can arise in the full time-series data. We then apply DMD to each window that results in robust and reliable data analysis. It is easy to see that if the data is linear, DMD analysis will be adequate while it is not for nonlinear data. Oftentimes such an inadequacy has been justified through the Koopman mode analysis in the framework of Hankel DMD. However, it is well-known that Hankel DMD is proven to work only for ergodic data33,34. These frameworks, therefore, can not be applied in general, for highly nonlinear data. Such data includes internal solitary wave as discussed in35 as well as COVID-19 data analyzed in the present paper, which are not necessarily ergodic. It is notable that a recent work by Zhang et al.35 is closely relevant to our method. However, their work is not based on compatible windows, i.e., the choice of windows is constructed without respecting the consistency. Phase studies are not investigated either unlike the proposed study in this paper. Furthermore, we make significant and novel progress from the consistency assumption that the data fitting for any given window can be achieved accurately only by finding the coordinate of any single data within the window in terms of DMD modes. This allows us to achieve a significant computational reduction. The identified coordinate is then used as a certain scale for the selection of important DMD modes.Our new method is used to investigate the spatiotemporal patterns of COVID-19 in South Korea from January 20, 2020 to May 10, 2021. A total of four compatible windows have been selected from the given COVID-19 time series data. The spatiotemporal patterns of these four windows are then analyzed by a few important DMD modes selected based on our new criterion. Several hot and cold spots were identified, their spatial–temporal relationships, and some hidden regional patterns were discovered. Our analysis reveals that the first wave was contained in the Daegu and Gyeongbuk area, but it spread rapidly to the whole of South Korea after the second wave. Later on, the spatial distribution is seen to become more homogeneous after the third wave. These findings have then been associated with the inter-regional and local characteristics of South Korea. We expect that the present study can provide public health officials helpful insights for future regional-temporal specific mitigation plans.In this section, we present an overview of COVID-19 data collected in South Korea (see Fig. 1 for more description). Daily confirmed cases and deaths of COVID-19 from January 20, 2020 to May 10, 2021, were obtained from the Korea Centers for Disease Control and Prevention (KCDC) and each provincial website12. As of May 10, 2021, there were a total of 127,772 COVID-19 confirmed cases and 1875 deaths in South Korea. To analyze the spatiotemporal patterns of COVID-19, the spatial distribution of COVID-19 confirmed cases is refined in 17 first-tier administrative divisions of South Korea. Figure 1 shows a South Korea map (a) with spatial distributions of the cumulative number of COVID-19 confirmed cases (b) and the cumulative number of COVID-19 deaths (c). As displayed in b, c, d of Fig. 1, South Korea shows a high level of spatial and temporal heterogeneity in 17 regions. We can observe that the main characteristics of the temporal patterns of South Korea can be placed into the particular four stages, i.e., three big waves and the last stage. More precisely, the first window is from January 20, 2020 to April 26, 2020, the second window is from July 28, 2020 to October 12, 2020, the third window is from November 3, 2020 to February 1, 2021, and the period after the third wave is February 2, 2021, to May 10, 2021. These are chosen as four windows and represented by different colors in Fig. 2a.Figure 1Spatial distribution of the cumulative confirmed and deaths of COVID-19 as of May 10, 2021. (a) A map of South Korea. South Korea is divided into 17 first-tier administrative divisions: 7 metropolitan cities (Seoul, Busan, Daegu, Incheon, Gwangju, Daejeon, and Ulsan), 1 special self-governing city (Sejong), and 9 provinces. The metropolitan area refers to Seoul, Incheon, and Gyeonggi. (b) Cumulative confirmed cases. (c) Cumulative deaths. Geographical descriptions such as population, area, and population density of each region; and COVID-19 profiles are in (d). Population density between metropolitan cities and non-metropolitan areas is extremely polarized, except Gyeonggi. The total population of three metropolitan areas is about 26 million as of May 2021, which is more than 50% of the South Korean population.Figure 2Time series of COVID-19 outbreak in South Korea. (a) Daily incidence of COVID-19 in South Korea. South Korea went through three big waves, after the third wave, the incidence has been maintained with no significant increase or decrease. The four windows of main interest were colored and given as; (1) the first wave (January 20, 2020–April 26, 2020); (2) the second wave (July 28, 2020–October 12, 2020); (3) the third wave (November 3, 2020–February 1, 2021); and (4) after the third wave (February 2, 2021–May 10, 2021). (b) Weekly incidence and cumulative cases in 17 r
https://www.nature.com/articles/s41598-021-03487-2