Tensor decomposition for infectious disease incidence data

Abstract Many demographic and ecological processes generate seasonal and other periodicities. Seasonality in infectious disease transmission can result from climatic forces such as temperature and humidity; variation in contact rates as a result of migration or school calendar; or temporary surges in birth rates. Seasonal drivers of acute immunizing infections can also drive longer‐term fluctuations. Tensor decomposition has been used in many disciplines to uncover dominant trends in multi‐dimensional data. We introduce tensors as a novel method for decomposing oscillatory infectious disease time series. We illustrate the reliability of the method by applying it to simulated data. We then present decompositions of measles data from England and Wales. This paper leverages simulations as well as much‐studied data to illustrate the power of tensor decomposition to uncover dominant epidemic signals as well as variation in space and time. We then use tensor decomposition to uncover new findings and demonstrate the potential power of the method for disease incidence data. In particular, we are able to distinguish between annual and biennial signals across locations and shifts in these signals over time. Tensor decomposition is able to isolate variation in disease seasonality as a result of variation in demographic rates. The method allows us to discern variation in the strength of such signals by space and population size. Tensors provide an opportunity for a concise approach to uncovering heterogeneity in disease transmission across space and time in large datasets.

of other forcing mechanisms (Dorelien et al., 2013). Where contact rates vary seasonally (e.g. as a result of school terms), these seasonal forces thus dominate any seasonality in births (Dorelien et al., 2013).
In the case of measles in England and Wales (E&W), and many other contexts, the predominant transmission forcing is via contact in schools (Dorelien et al., 2013;Finkenstadt & Grenfell, 2000;Grenfell et al., 2002). Effectively deterministic measles dynamics occur in large populations, where susceptible replenishment is substantial enough that infections will wane but never disappear altogether; the threshold for this epidemic equilibrium is approximately 300,000 in Europe and North America (Bartlett, 1957;Dorelien et al., 2013;Grenfell et al., 2002). In small populations, the pathogen will go extinct until it is reintroduced via imports; imported infections can trigger a large epidemic when susceptible numbers have built up sufficiently. While large locations have outbreaks at the same time every year or every other year, small towns experience more stochastic, violent outbreaks (Bartlett, 1957;Bharti, Xia, Bjornstad, & Grenfell, 2008;Bolker & Grenfell, 1995;Grenfell, Bjørnstad, & Kappey, 2001).
Wavelet analysis is used frequently to explore nonstationary cyclicality and heterogeneity in ecological time series (Bjornstad, Peltonen, Liebhold, & Baltensweiler, 2002;Cazelles et al., 2008;Grenfell et al., 2001;Grinsted, Moore, & Jevrejeva, 2004). For measles in particular, wavelets have been used to determine seasonal and longer period outbreaks as well as spatial variation in the lag between epidemics (Grenfell et al., 2001). Local dynamics of measles transmission in E&W are generally composed of seasonal (driven by the school term) and long-term cycles usually annual or biennial driven by susceptible replenishment. The strength of these longer-term cycles can vary between places and across time.
Wavelets provide very detailed data on seasonal signals of individual localities; in this paper, we present tensor decomposition as a method for characterizing the wavelet spectra of many places at once.
Tensor decomposition is a multi-dimensional generalization of matrix decomposition methods such as principal components analysis (PCA) or singular value decomposition (SVD). Analogous to these matrix methods, tensor decomposition reduces the dimensionality of the data by providing lower-dimensional components which describe much of the variance in the data (Cichocki et al., 2015;Kolda & Bader, 2009;Rabanser, Shchur, & Günnemann, 2017;Sidiropoulos et al., 2017).
Tensor decomposition has been used successfully in a number of fields to uncover trends in large, multi-dimensional datasets.
The method has been used in neuroscience (Cong et al., 2012(Cong et al., , 2015Lee, Kim, Cichocki, & Choi, 2007;Vanderperren et al., 2013), text analysis (Acar, Camtepe, Krishnamoorthy, & Yener, 2005;Ifada, 2014;Zheng, Ding, Lin, & Chen, 2016) and photogrammetry (Guo, Huang, Zhang, & Zhang, 2013). In neuroscience, tensors have been a useful method for feature extraction and pattern detection in electroencephalography (EEG) signals. Tensor decomposition has successfully isolated task-related brain activity from the mixture of unrelated brain activity, interference and noise (Cong et al., 2012(Cong et al., , 2015Lee et al., 2007;Vanderperren et al., 2013). The method can also extract differences in EEG signals among individual subjects, experimental conditions, or tasks (Cong et al., 2012;Vanderperren et al., 2013). These studies use data in some variation of a channeltime-subject format; in other words, data which can be represented coherently across three dimensions. In general, these dimensions are comprised of a subject such as an individual person or place or task, and two dimensions which represent categories (such as channels or frequencies or subject areas) and time. A data structure amenable to tensor decomposition is a one in which one dimension represents a discrete unit of observation (a trial, a person, a subject), and the other two can be easily interpreted as vectors (such as changes in power overtime across channels or frequencies). Though tensor decomposition for data structures beyond three dimensions is certainly possible, we focus on applications to three-dimensional data both for its relative ease of understanding as well as its relevance to our project. Fields such as neurology and computer science have utilized tensors in processing signal data in a myriad of settings meanwhile applications to ecological data lag behind.
We validate the value of the method for disease incidence data by decomposing simulated epidemic data; we simulate epidemics under three different birth regimes and demonstrate the method's ability to identify these differences. We then apply the method to the well-studied E&W measles data to uncover previously undocumented variation in measles seasonality.

| Simulations
We simulate epidemics using a discrete-time stochastic susceptibleinfected-recovered (SIR) model (Becker & Grenfell, 2017;Bharti et al., 2008;Caudron et al., 2014;Siettos & Russo, 2013). At each time step, each individual is assumed to be either susceptible to infection, infected or recovered (dead or immune). Once an individual recovers, we assume they can never be infected again. We simplify the simulation process by assuming that any individual infected at time t will be recovered at time t + 1. New suscpetibles are supplied by births determined by a pre-defined annual crude birth rate (CBR) which we distribute uniformly across each time step. Imported cases are determined by drawing from a binomial distribution with a 10% probability of importation.
We use a starting population of 300,000 for all simulations, and initial infected population of 10 and initial susceptible population of 1,000. The susceptible dynamics are determined by: We add births (B t−1 ) and subtract new infections (I t ). The expected number of infected individuals at time t is defined as a function of transmission rate (β), local susceptible (S) and infected individuals (I) as well as imported infections (ι): In Equation 2, α is a tuning parameter, fixed to 0.97, consistent with previous analyses and simulations of measles (Becker et al., 2016;Becker & Grenfell, 2017;Grenfell et al., 2002). We give transmission a seasonal shape consistent with what has been estimated for London (Becker & Grenfell, 2017;Grenfell et al., 2002), with an average R 0 (or number of secondary infections per individual infection) set to 15 for all simulations, within the range of typical estimates for measles Grenfell et al., 2002;Guerra et al., 2017). We draw the number of infections using a Poisson distribution to introduce stochasticity (Becker & Grenfell, 2017;: We allow approximately 80 years for the epidemics to settle into equilibrium and evaluate the following 20 years so the scale is comparable to the 22 years of E&W data. To alter the dynamics of each epidemic, we vary the CBR used in each simulation. We use three different birth regimes: a constant CBR of 0.015, a constant CBR of 0.03, and a variable CBR which begins at 0.012 and increases to 0.036. The simulated birth rates cover the range of birth rates in the data we use; the tenth percentile of CBRs in E&W during this period is 0.012, 0.015 is approximately the median, and the max is 0.035. We collect 50 time series for each birth regime.

| England and Wales urban districts 1944-1966
For our primary analysis, we consider measles cases in all 954 urban districts in England and Wales for the pre-vaccination period   Bjørnstad et al., 2002;Caudron et al., 2014;Grenfell et al., 2001). We use annual births and population sizes to calculate the CBR for each year and evaluate the relationship between epidemic cycles and demographic conditions. In particular, we considered the CBR for locations above the critical community size for measles (Finkenstadt & Grenfell, 2000). The spatial influence of large cities on regional dynamics is substantial (Grenfell et al., 2001), and therefore we limit our analysis to larger locations as we have more confidence these locations are determining their own dynamics rather than echoing the dynamics of endemic neighbours (Bartlett, 1957;Grenfell et al., 2001). The post-war baby boom resulted both in a surge of birth rates as well as a large range in birth rates across districts; for these reasons, we examine baby boom CBRs in particular. The following sections describe the methods (wavelet transform and tensor decomposition) we use to evaluate the relationship between epidemic seasonality and demographic conditions across all 954 locations.

| Continuous wavelet transform
To ground our analysis in previous studies, we performed a local wavelet analysis to the log-transformed data to assess the time-frequency variation in the signal (Grenfell et al., 2001). Generalizing Fourier analysis, wavelets allow insight into a potentially non-stationary epidemic signal by decomposing it into multiple frequencies over time (Gouhier, Grinsted, & Simko, 2019;Torrence & Compo, 1998). Like a Fourier Transform, wavelets decompose a complex signal into its component frequencies.
However, in addition to learning which frequencies dominate the signal, wavelets allow us to determine whether and how those dominant frequencies change over time. Rather than using a sinusoid as in Fourier analysis, wavelet transformation uses wavelet basis functions which can explore local (in time) variations in frequency (Grenfell et al., 2001;Torrence & Compo, 1998). In this analysis, we used the Morlet wavelet function, essentially a damped complex exponential:

| Tensor decomposition
Tensor Decomposition. Once we have compiled our three-dimensional data, we can decompose it into vector components. Tensor decomposition can be understood as a multi-dimensional generalization of PCA (Cichocki et al., 2015;Fanaee-T & Gama, 2016;Kolda & Bader, 2009).
As with PCA, we seek to reduce the dimensionality of the data by expressing it in terms of components which capture the most variance in the data. In the CWT case, each component consists of a location vector, a frequency vector and a time vector. The outer product of the frequency and time vector produce a general wavelet power spectrum as in Figure 1. For the ith location, the ith scalar in the location vector describes the amount the power spectrum contributes to that location's original signal. In other words, each component describes a particular frequency and its power as a function of time. The locationspecific scalars represent how much that signal is magnified or dampened within that location's data. (3)

| 1693
Methods in Ecology and Evoluঞon KOREVAAR Et Al.
To reconstruct the original data for a specific location, we compute and add such a matrix for each component (Cichocki et al., 2015;Guo et al., 2013;Kolda & Bader, 2009). If our tensor decomposition had three components and we wanted to reconstruct an estimate of our original data, we would calculate and sum three wavelet power spectra using the three time and frequency vectors along with the F I G U R E 1 Wavelet power spectra for London . Darker blues indicate more power, red areas indicate time/frequencies that were determined to be statistically significant by Monte Carlo methods. The parabolic line indicates the cone of influence, the boundary of points that may be affected by edge effect artifacts. Similar to spectral analysis, errors will occur at the beginning or the end of time series. Padding the data with zeros introduces discontinuities into the data, as we increase in scale, the amplitude is decreased as more zeros enter the analysis. For the regions outside the cone of influence, it is not clear if decreases in variance are due to the additional zeros F I G U R E 2 Tensor: array of wavelet spectra across locations. After calculating the wavelet power spectra, we store each of the time × frequency matrices in a three-dimensional tensor with one matrix for each location. This creates a cube of dimensions n × f × t for n places, t time steps and f frequencies location-specific score (Figure 3). To calculate the tensor decomposition, we use the canonical polyadic decomposition (CPD). We can formalize CPD for a three-way tensor as follows: In Equation 6, R denotes the rank of the tensor (Kolda & Bader, 2009). This definition is illustrated in Figure 3 for a rank-three tensor. We use an alternating least squares algorithm to calculate X ⋀ .
In the case, when min is referred to as an exact lowrank approximation of X. In that case, we can write out the matrix form of X ⋀ as: Each X ⋀ (i) is a component, and each a r , b r , c r are factor vectors, and A, B, C are factor matrices. These are analogous to components and loadings in PCA (Cichocki et al., 2015;Fanaee-T & Gama, 2016;Kolda & Bader, 2009). To estimate these components, we fix all except one of the factor matrices and optimize the remaining matrix. For example, we may fix matrices B and C and optimize A given these matrices. We repeat this for each matrix until we reach our stopping criteria (Kolda & Bader, 2009;Li, Bien, & Wells, 2018). In our case, we optimize until the Frobenius norm of the error matrix is below 0.0001 (Li et al., 2018).
For the three-way tensor instance, this can be formalized as follows: Characterizing the rank of a tensor is a complex mathematical problem without a simple solution (Alexeev, Forbes, & Tsimerman, 2011;Ballico, Bernardi, Chiantini, & Guardo, 2018;Kolda & Bader, 2009;Stegeman & Friedland, 2017). Therefore, we attempt tensor decomposition beginning by selecting a single component, and increasing the number of components until the algorithm consistently converges.

| Simulations
Our simulations demonstrate comparable epidemic behaviour to that of E&W (Figure 4). The CPD algorithm consistently converged at three components for the simulated data. In Figure 5 Adding components with group specific scores results in a consistent annual and triennial periodicity for the first group (low birth rate); consistent annual and biennial cycles in the second (high birth

F I G U R E 3
Theoretical canonical polyadic decomposition for tensor X from Figure 2. The outer product of each of the f i × t i vectors produces a wavelet power spectra and each of the n places has a score in the n j vector specific to that wavelet power spectra. In this way we can use the three rank one tensors to approximate the original power spectra for each of the n original places. We can think of the matrix formed by the outer product of t i × f i as a component in the principal component analysis sense. Where each entry in the matrix determines the loading of the that time, frequency value. Each entry in the n j vector represents a place-specific score which determines the contribution of that matrix in the final wavelet power spectrum rate); and a transition from sporadic epidemics to annual and biennial signals in the third group (increasing birth rate). These results concur with samples of the time series (Figure 4) as well as reconstructions of the CWT for each group ( Figure S1).

| England and Wales urban districts 1944-1966
We found four components provided the most consistent convergence with the CPD algorithm for the E&W tensor ( Figure 6). The The dominance of the annual and, in particular, the biennial signal in these data has already been documented (Grenfell et al., 2001). If we were naive regarding the importance of these signals, we could investigate variation in biennial patterns regionally-including synchronicity and temporal lag. Bjorn tad and Grenfell have investigated such patterns in this dataset (Grenfell et al., 2001).  Figure S3). An illustration of the reconstruction for London The strength of the score on all components depends largely on population size. This is consistent with previous work which has shown that dynamics in small, isolated places tend to be erratic rather than seasonal and thus do not have significant annual or biennial signals (Bartlett, 1957;. Though  Figure S2). We know from mathematical models of measles transmission that higher birth rates should lead to larger annual epidemics, as a result of quicker susceptible replenishment. In concordance with these models, our decomposition shows that locations with slightly lower CBRs would begin with more biennial seasonality and locations with crude higher birth rates would begin with more annual seasonality and settle into biennial cycles later. The baby boom CBR has been identified as a crucial bifurcation point for measles cycles using simulations (Finkenstadt & Grenfell, 2000). This local dovetailing in dynamics as a result of variation in the CBR has not been previously illustrated.

| D ISCUSS I ON
In this paper, we explore, to our knowledge, the first application of tensor decomposition to disease surveillance data to confirm pre-  of the data; inherent to this method is the loss of some information.
However, there are solutions which can increase the granularity of the components.
As with PCA, one can opt to decompose the data into additional components. We selected four components for this paper because the CPD algorithm reliably converged at four components, and because each additional component adds substantial computation time to the decomposition algorithm. For these data, if one were interested in more detailed information, one could select a subset of the total dataset which would allow for faster computation of additional components. This subsetting could be done by location (e.g. a random sample or a sample of the largest, most dominant cities), or by frequency (e.g. selecting all data between annual and triennial frequencies).
A first pass naive tensor decomposition, such as the one presented in this paper, may guide additional decompositions. In the case of Norwich (Figure 8) that the 1.5-year cycle in the data is represented as a 2-year cycle in their construction. After a first pass tensor decomposition, the data could be pre-reduced by dropping frequencies. In this case, the data for 6-month periods and below accounts for very little variance in the entire data. Dropping these frequencies could allow the representation of additional cycles in the tensor components, as well as reducing overall computation time.
An additional extension of this analysis would be to use tensor decomposition on the phase or phase-differenced matrices for these locations in order to concisely summarize spatiotemporal dynamics of epidemics (Grenfell et al., 2001).

AUTHOR S' CONTRIBUTIONS
H.K. conceived of the initial project design, carried out the analysis and drafted the manuscript; C.J.M. and B.T.G. provided project guidance, suggested additional project components and provided editorial feedback on the manuscript.

Peer Review
The peer review history for this article is available at https://publo ns.

DATA AVA I L A B I L I T Y S TAT E M E N T
The measles data used in this paper are publicly available via https:// www.nature.com/artic les/s4155 9-020-1186-6 (Lau et al., 2020) (see references). The code used to simulate disease data, calculate wavelet transform and decompose the tensor is available https://doi. org/10.5281/zenodo.3999553 (Korevaar, 2020).