Volume 9, Issue 7
RESEARCH ARTICLE
Free Access

Statistical inference for home range overlap

Kevin Winner

College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, Massachusetts

Search for more papers by this author
Michael J. Noonan

Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, Virginia

Department of Biology, University of Maryland, College Park, Maryland

Search for more papers by this author
Christen H. Fleming

Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, Virginia

Department of Biology, University of Maryland, College Park, Maryland

Search for more papers by this author
Kirk A. Olson

Wildlife Conservation Society, Mongolia Program, Ulaanbaatar, Mongolia

Search for more papers by this author
Thomas Mueller

Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, Virginia

Senckenberg Biodiversity and Climate Research Centre, Senckenberg Gesellschaft für Naturforschung, Frankfurt (Main), Germany

Department of Biological Sciences, Goethe University, Frankfurt (Main), Germany

Search for more papers by this author
Daniel Sheldon

College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, Massachusetts

Department of Computer Science, Mount Holyoke College, South Hadley, Massachusetts

Search for more papers by this author
Justin M. Calabrese

Corresponding Author

E-mail address: calabresej@si.edu

Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, Virginia

Department of Biology, University of Maryland, College Park, Maryland

Correspondence
Justin M. Calabrese, Smithsonian Conservation Biology Institute, National Zoological Park, 1500 Remount Rd., Front Royal, VA 22630.
Email: calabresej@si.eduSearch for more papers by this author
First published: 14 May 2018
Citations: 11

Abstract

  1. Despite the routine nature of estimating overlapping space use in ecological research, to date no formal inferential framework for home range overlap has been available to ecologists. Part of this issue is due to the inherent difficulty of comparing the estimated home ranges that underpin overlap across individuals, studies, sites, species, and times. As overlap is calculated conditionally on a pair of home range estimates, biases in these estimates will propagate into biases in overlap estimates. Further compounding the issue of comparability in home range estimators is the historical lack of confidence intervals on overlap estimates. This means that it is not currently possible to determine if a set of overlap values is statistically different from one another.
  2. As a solution, we develop the first rigorous inferential framework for home range overlap. Our framework is based on the autocorrelated‐Kernel density estimation (AKDE) family of home range estimators, which correct for biases due to autocorrelation, small effective sample size, and irregular sampling in time. Collectively, these advances allow AKDE estimates to validly be compared even when sampling strategies differ. We then couple the AKDE estimates with a novel bias‐corrected Bhattacharyya coefficient (BC) to quantify overlap. Finally, we propagate uncertainty in the AKDE estimates through to overlap and thus are able to put confidence intervals on the BC point estimate.
  3. Using simulated data, we demonstrate how our inferential framework provides accurate overlap estimates, and reasonable coverage of the true overlap, even at small sample sizes. When applied to empirical data, we found that building an interaction network for Mongolian gazelles Procapra gutturosa based on all possible ties, vs. only those ties with statistical support, substantially influenced the network’s properties and any potential biological inferences derived from it.
  4. Our inferential framework permits researchers to calculate overlap estimates that can validly be compared across studies, sites, species, and times, and test whether observed differences are statistically meaningful. This method is available via the R package ctmm.

1 INTRODUCTION

Ecologists have long been interested in patterns and drivers of animal space use (Brown & Orians, 1970; Burt, 1943; Jetz, 2004). Decisions on what areas to occupy can influence fitness through a wide range of pathways such as foraging efficiency (Mitchell & Powell, 2012) or predator–prey dynamics (Mitchell & Lima, 2002), and even drive evolutionary trajectories (Lukas & Clutton‐Brock, 2013). Related to this is the question of overlapping space use between individuals and/or populations. Quantifying overlap can provide an informative metric for testing hypotheses on interspecific competition (Berger & Gese, 2007), territoriality (Grant, Chapman, & Richardson, 1992), and mating systems (Powell, 1979). Furthermore, overlap can be used to underpin analyses of social network structure (Frère et al., 2010) and contact rates, with implications for disease transmission (Dougherty, Seidel, Carlson, Spiegel, & Getz, 2018; Sanchez & Hudgens, 2015). Trends in overlapping space use are also routinely used in determining allometric scaling laws (Grant et al., 1992; Jetz, 2004). The rapid increase in both the availability and quality of tracking data in recent years (Kays, Crofoot, Jetz, & Wikelski, 2015) has made the concept of home range (HR) overlap increasingly relevant. Ecologists are now in a position to address overlap‐related questions for a larger number of species and individuals, in more ecosystems, and with more accurate data than ever before.

Despite these advances, a formal inferential framework for HR overlap is still lacking. Overlap is typically quantified by the first‐estimating HRs from tracking data and then applying an overlap metric to the range estimates (Fieberg & Kochanny, 2005; Millspaugh, Gitzen, Kernohan, Larson, & Clay, 2004). A wide range of overlap metrics have been proposed in the literature, spanning the gamut from ad hoc indices to more formal measures. These different metrics have contrasting properties and can produce highly different overlap estimates on the same data (see Fieberg & Kochanny, 2005; Millspaugh et al., 2004). Further compounding this problem is the inherent difficulty of comparing the estimated HRs that underpin overlap across studies, sites, species, and times (Fleming & Calabrese, 2017). There is broad agreement in the literature that HR estimates based on different sampling strategies are difficult to compare, as they may be exposed to different degrees of bias (Fieberg & Börger, 2012; Fleming et al., 2018; Frair et al., 2010). More subtly, even identical sampling strategies can still produce differentially biased HR estimates if the underlying parameters of movement differ among individuals in the comparison (Fleming & Calabrese, 2017). As overlap is calculated conditionally on a pair of HR estimates, biases in the HR estimates will propagate into biases in overlap estimates (Fieberg & Kochanny, 2005). It follows then that differential biases in HR estimates among different groups of interest will tend to propagate into differential biases in overlap estimates, rendering comparisons difficult to interpret and potentially unreliable.

Additionally, none of the overlap metrics of which we are aware come equipped with confidence intervals to quantify the uncertainty in the estimates. This means that it is currently not possible to determine if a set of overlap values is statistically different from one another or from a reference value of interest. To see this, consider a case where one wishes to compare two overlap estimates from two pairs of individuals: 0.35 and 0.55. If the 95% confidence intervals for each estimate are disjoint, then we may conclude that the two pairs have significantly different measures of overlap. If the 95% confidence intervals are not disjoint, then the point estimates may not be significantly different. In other words, without confidence intervals, one cannot properly interpret differences between estimates (Pawitan, 2001).

Here, we develop the first inferential framework for HR overlap by building on previous work in quantifying overlap (Fieberg & Kochanny, 2005) and by leveraging recent advances in HR estimation (Fleming & Calabrese, 2017; Fleming, Fagan, et al., 2015; Fleming et al., 2018). We base our approach on the Bhattacharyya coefficient (BC; Bhattacharyya, 1943, also called the Bhattacharyya affinity), which has a formal basis as a measure of similarity between two probability distributions and is straightforward to calculate and interpret (Fieberg & Kochanny, 2005). We couple the BC with autocorrelated‐Kernel density estimation (AKDE) as a general HR estimator (Fleming & Calabrese, 2017). Basing overlap estimation on AKDE has two primary advantages. First, AKDE corrects for bias due to autocorrelation (Fleming, Fagan, et al., 2015), ordinary small‐sample‐size bias (Fleming & Calabrese, 2017), and temporal sampling bias (Fleming et al., 2018). The net result is that AKDE HR estimates can validly be compared across studies, sites, species, and times, even when sampling strategies and underlying movement parameters differ (Fleming & Calabrese, 2017; Fleming et al., 2018; Noonan et al., in review). Second, the error propagation techniques used to develop confidence intervals on AKDE area estimates (Fleming & Calabrese, 2017) can be extended to overlap estimation, allowing us to develop confidence intervals for overlap estimates. In addition, overlap estimates can exhibit negative bias (Fieberg & Kochanny, 2005), where part of this problem is the result of small‐sample‐size bias in the BC (Djouadi & Snorrason, 1990). As a solution, we derive an approximate, first‐order bias correction to the BC.

We use a combination of simulated and empirical data to demonstrate the power of our inferential framework. First, based on simulations, we study the bias in BC estimates as a function of the amount of autocorrelation in the data and of the effective sample size, both in cases where the underlying HR estimators account for these biases (AKDE), and where they do not (conventional KDE; Worton, 1989). We use a similar approach to quantify the realized coverage of our confidence intervals. We then show how our framework can be used to accurately estimate overlap, even when individuals exhibited different movement strategies and/or were subject to completely different sampling designs, whereas conventional methods fail. Finally, we show how our approach can be used in “downstream” applications that depend on overlap. Specifically, we build an interaction network (Wey, Blumstein, Shen, & Jordán, 2008) for Mongolian gazelles Procapra gutturosa where edges are established only between individuals whose overlap estimates received statistical support.

2 MATERIALS AND METHODS

Our inferential framework consists of bias‐corrected HR estimates, a bias‐corrected BC estimator, and confidence intervals on the BC point estimate. We describe each of these elements in turn. We then describe how our framework can be used in practice via the ctmm R package by extending the workflow for HR analysis described in Calabrese, Fleming, and Gurarie (2016) or through the web‐based graphical user interface at ctmm.shinyapps.io/ctmmweb/ (Dong, Fleming, & Calabrese, 2017).

2.1 Home range estimation

At a minimum, calculating overlap requires a pair of HR estimates (Fieberg & Kochanny, 2005; Millspaugh et al., 2004). More generally, comparisons of overlap among different groups, species, places, or times may also be of interest. Nonetheless, as overlap estimates are conditional on estimated HRs, those underlying HR estimates must be directly comparable across the different groups the researcher wishes to evaluate. Unfortunately, HR estimates are subject to a number of biases, and differences in either sampling schedule, underlying movement parameters, or both can expose different data setsto different degrees of bias (Fieberg & Börger, 2012; Fleming & Calabrese, 2017). Datasets characterized by one of more of these forms of bias, which are the norm in practice, can thus render comparison of HR estimates across groups of interest highly misleading. The propagation of differentially biased HR estimates into differentially biased overlap estimates has been a key impediment to the development of a reliable inferential framework for HR overlap.

In decreasing order of importance, the three main sources of bias in HR estimation are unmodeled autocorrelation (Fleming, Fagan, et al., 2015), small effective sample sizes (Fleming & Calabrese, 2017), and temporally biased sampling (Fleming et al., 2018). The magnitude of the negative bias in HR estimates that results from assuming the data is independent and identically distributed (IID) when, in fact, they are autocorrelated can be arbitrarily large (Fleming & Calabrese, 2017). All else being equal, the bias will increase with the strength of autocorrelation in the data. In contrast, small sample size bias will be estimator‐specific and will tend to be of smaller magnitude than autocorrelation‐related bias for modern GPS data. For example, KDEs based on the conventional Gaussian reference function (GRF) approximation tend to overestimate HR areas at small sample size (Fleming & Calabrese, 2017). Temporally biased sampling occurs when some times are oversampled while others are undersampled (Frair et al., 2010), which can produce data that are not representative of the individual's space use (Fleming et al., 2018). Bias due to nonrepresentative sampling in time will tend to increase with the degree of unevenness in the sampling schedule.

These three sources of bias must be mitigated to validly compare HR estimates, and, by extension, to validly compare overlap estimates. We now describe HR estimation methods that, when used in combination, largely corrects these biases. Autocorrelated‐KDE is a generalization of the GRF‐KDE (Fleming, Fagan, et al., 2015). The core advance in AKDE is that the optimization of the smoothing bandwidth, σB, explicitly accounts for autocorrelation in the data. Specifically, an autocorrelated movement model is used to represent the autocorrelation structure of the data in the bandwidth optimization (Fleming et al., 2014c; Fleming, Subaşi, & Calabrese, 2015). Model selection (detailed below) can be used to arrive at an appropriate model for the data’s autocorrelation structure (Calabrese et al., 2016). When the data exhibit no autocorrelation, the IID model would be selected, and AKDE conditional on the IID model is exactly equivalent to the well‐known GRF‐KDE. Recently, Fleming and Calabrese (2017) derived a small‐sample‐size, area‐based correction that mitigates the tendency of KDEs based on the GRF approximation, including AKDE, to over‐smooth the data. Finally, (Fleming et al., 2018) developed an optimal weighting scheme, termed “wAKDE,” that leverages the autocorrelation structure of the data to appropriately upweight undersampled times and downweight oversampled times. When used in concert, these innovations result in more accurate HR estimates that are directly comparable across groups of interest. A technical introduction to these estimators is provided in Supporting Information Appendix A.1.

2.2 The Bhattacharyya coefficient

There are many different measures which quantify the relative similarity (overlap) or dissimilarity (distance) of two probability distributions. While both types of metrics can be used to describe the degree of shared space use between individuals, measures of overlap are used more commonly in biological contexts than measures of distance (but see Kranstauber, Smolla, & Safi, 2016). In their comparative analysis of overlap metrics, Fieberg and Kochanny (2005) concluded that the BC and Volume of Intersection statistic (VI; also known as the overlap coefficient; Inman & Bradley, 1989) were the most robust overlap estimators. While these two valid choices exist, we suggest that, for inferential purposes, an overlap estimator should satisfy the following criteria:
  1. Statistical validity. An appropriate overlap estimator should be based on an established measure of statistical distance or divergence that satisfies related mathematical properties.
  2. Geometric interpretability. For uniform distributions, overlap should be proportional to the area of intersection.
  3. Objectivity. Overlap should not depend on ad hoc parameters such as particular isopleths (e.g., 95% or 50%) or discretized distributions.
  4. Computational efficiency. Computing the overlap of two distributions should scale efficiently with the sample size and extent of both distributions.
  5. Asymptotic consistency. An overlap estimator should converge to the true overlap in the large sample size limit.
  6. Minimal bias. An overlap estimator should have good small‐sample‐size behaviour.
  7. Quantifiable uncertainty. Overlap is an estimate derived from data and should be accompanied by a measure of the confidence in that estimate (Pawitan, 2001).
The BC (Bhattacharyya, 1943) is a solid basis for inference on HR overlap because it satisfies criteria (i)–(v), and has the additional benefit of being well known to the ecological community (Fieberg & Kochanny, 2005). Although the VI also meets these criteria (Fieberg & Kochanny, 2005), approximating confidence intervals on the VI for the case of unequal variances presents severe difficulties (Reiser & Faraggi, 1999). Consequently, we base our approach on the BC. The BC between two continuous distributions p1 and p2 is given by
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0001(1)
The BC is thus a function of the product of the two distributions, ranging from 0 ≤ BC ≤ 1, with BC = 0 only when p1 and p2 have no shared support and BC = 1 only when p1 = p2. We now turn our attention to criteria (vi) and (vii) and derive a confidence interval approximation, and bias correction that allow the BC to satisfy these additional criteria.

2.2.1 Confidence intervals for the BC

When measuring the overlap of two HRs, the BC, as given above, is a point estimate of the overlap between the two distributions, but does not capture any of our uncertainty in the HR estimation procedure. To address this limitation, we derive confidence intervals for the BC, in the Gaussian reference function (GRF) approximation. AKDE's first step involves fitting stochastic movement models (Fleming, Fagan, et al., 2015) to estimate the mean and covariance parameters
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0002(2)
where r(t) = (x(t), y(t)) denotes the individual's location. In the GRF approximation, the individual spatial density estimates are given by
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0003(3)
and so the BC between Gaussian density estimates resolves to
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0004(4)
in terms of the arithmetic and geometric means of the covariance matrices
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0005(5)
and the Mahalanobis distance (Mahalanobis, 1936) between the two distributions
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0006(6)
The closely related Bhattacharyya distance BD = − log BC; Bhattacharyya (1946) is defined
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0007(7)
which here resolves to
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0008(8)
Term‐by‐term all components of the BD are nonnegative, with the first set of terms involving the Mahalanobis distance being zero only for identical mean locations, and the second set of terms invoking the AM‐GM inequality being zero only for identical covariance matrices.
First, we propagate uncertainty in the mean and covariance parameters into uncertainty in urn:x-wiley:2041210X:media:mee313027:mee313027-math-0009 via the delta method (Cox, 2005) to obtain urn:x-wiley:2041210X:media:mee313027:mee313027-math-0010. Second, as an improvement over asymptotically normal CIs, and as the BD roughly takes the form of a square distance, we approximate the BD statistic as being chi‐squared with degrees of freedom equal to
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0011(9)
in accordance with the chi‐square variance formula. We then transform the BD CIs back into BC CIs via BC = exp(−BD). Finally, for the kernel density BC CIs, we apply the same χ2 approximation (9), but with the AKDE point estimate for the BD and the GRF estimate for urn:x-wiley:2041210X:media:mee313027:mee313027-math-0012.

2.2.2 Bias correction for the BC

As noted by Fieberg and Kochanny (2005), overlap is likely to be negatively biased at small sample sizes. In addition to negative biases in HR estimation driven by unmodeled autocorrelation, part of this problem is the result of small‐sample‐size bias in the BC (Djouadi & Snorrason, 1990), which is a common property of asymptotically consistent estimators (Basu, 1956). As a solution, here we derive an approximate bias correction for the BD
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0013(10)
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0014(11)
which we will also apply to the AKDE BD point estimate. Even if the two distributions are Gaussian, the BD plug‐in estimator—which calculates the BD directly by assuming that the density estimates are true—is severely biased. This bias correction will be exact in the case of IID processes of equal variance, which is known to be solvable (Djouadi & Snorrason, 1990), but approximately generalized for the movement processes we consider and verified with simulation (Supporting Information Appendix A.2). Most of the bias is due to the fact that uncertainty in the centroids translates strictly into positive BD, even if the two distributions are identical. First, we address this largest source of bias, by decomposing the mean estimates into their expectation values and (mean‐zero) error
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0015(12)
whereupon we can express the first expected BD term
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0016(13)
plus terms like urn:x-wiley:2041210X:media:mee313027:mee313027-math-0017 that we ignore because ξ is mean zero and asymptotically uncorrelated with urn:x-wiley:2041210X:media:mee313027:mee313027-math-0018. Next, we note the approximation
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0019(14)
which is exact for many stationary processes (e.g., Fleming et al., 2014c), with a proportionality constant equal to the effective sample size of the mean. Therefore, we have
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0020(15)
when the two covariances are similar, allowing us to here ignore the biases in urn:x-wiley:2041210X:media:mee313027:mee313027-math-0021 . We note that, in general, this term related to home‐range centroid uncertainty is by far the largest source of bias in BD estimation. Furthermore, if the two movement process are independent of each other, then we have
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0022(16)
For the remaining terms of the plug‐in BD estimator, we require some distributional assumptions on the covariance estimates urn:x-wiley:2041210X:media:mee313027:mee313027-math-0023, urn:x-wiley:2041210X:media:mee313027:mee313027-math-0024,and urn:x-wiley:2041210X:media:mee313027:mee313027-math-0025. We take urn:x-wiley:2041210X:media:mee313027:mee313027-math-0026 and urn:x-wiley:2041210X:media:mee313027:mee313027-math-0027 to be Wishart distributed (Wishart, 1928) where effective sample sizes N1 and N2 are estimated with the parameters (Fleming & Calabrese, 2017). For the average covariance urn:x-wiley:2041210X:media:mee313027:mee313027-math-0028, we construct a Welch–Satterthwaite (Satterthwaite, 1946) like approximation that is exact for equal covariances. If urn:x-wiley:2041210X:media:mee313027:mee313027-math-0029 were χ2 distributed, the ordinary Welch–Satterthwaite approximation would fix its degrees of freedom via the relationship between its variance and that of its constituents. However, urn:x-wiley:2041210X:media:mee313027:mee313027-math-0030 is matrix valued and has many variances. We choose to conserve the trace variance, which is both additive and rotationally invariant:
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0031(17)
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0032(18)
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0033(19)
Next, the expected inverse estimate matrix resolves to
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0034(20)
and so we clamp our effective sample size estimates to N ≥  dim(σ) + 2, which is the smallest discrete number of IID locations with which one can estimate properly. Below this value, the estimate is likely not approximately Wishart distributed and N is likely not well estimated. So by clamping N, we effectively clamp our bias correction. Next, the expected log‐determinant terms resolve to
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0035(21)
in terms of the multivariate digamma function ψd.
Finally, as BD ≥ 0, we debias the plug‐in estimator by dividing by a large number rather than by subtracting a large number:
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0036(22)
which is the same to first order. This serves as a first‐order bias correction to both the BD and the BC.

2.3 Workflow

The resulting centerpiece of our inferential framework is a bias‐corrected BC estimate, with confidence intervals, that is comparable across studies. To get to that point, the user must first proceed through a workflow designed to produce the best possible estimates from their data, but warn when such an analysis is inappropriate. This workflow builds on that described in Calabrese et al. (2016) for HR analysis.

The first step is ensuring that the data at hand are appropriate for HR analysis, which means that there must be clear evidence of range‐residency. Data from non‐range‐resident individuals or from range‐resident intervals that were only briefly tracked may not satisfy this criterion. When the data do not show evidence of range‐residency, HR estimation is not appropriate (Calabrese et al., 2016; Fleming & Calabrese, 2017), which implies that HR overlap analysis is also not appropriate. We, therefore, strongly recommend starting with visual verification of range‐residency via variogram analysis (Fleming et al., 2014b). Specifically, the variogram of a range‐resident individual should show a clear asymptote.

Once range‐residency has been verified, the next step is to fit a series of range‐resident movement models to the data, such as the IID, Ornstein‐Uhlenbeck (OU; Uhlenbeck & Ornstein, 1930), and OU‐Foraging (OUF; Fleming et al., 2014b, 2014c) processes. Model selection should then be employed to identify the best model for the data (Fleming et al., 2014c, Fleming, Subaşi, et al., 2015). The selected model should then be visually compared to the variogram to ensure that the model is capturing the key features in the data. Models that fail to converge, or that do not provide a reasonable fit to the data are another indication that HR analysis may be inappropriate (Calabrese et al., 2016).

With a fitted, selected movement model in hand, AKDE HR estimates can then be calculated, and these can be used to obtain BC estimates and CIs. These overlap estimates may either be the final product of the analysis or be used in subsequent analyses. Importantly, the confidence intervals attached to each BC estimate can be straightforwardly propagated into derived quantities, such as the mean overlap within a group, which can facilitate testing hypotheses on similarity or differences among groups of interest. While the workflow we describe involves several steps, the ctmm package and graphical user interface (Dong et al., 2017) streamline this procedure. A full example of the workflow is shown in Supporting Information Appendix B.

2.4 Simulation study

To examine the statistical properties of the BC, and the coverage of our CIs, we simulated tracking data with variable sampling durations and frequencies. Data were simulated based on pairs of both IID processes, and OUF processes (Fleming et al., 2014b, 2014c), parameterized such that the true overlap between these pairs was fixed at 0.5. Simulating from an OUF process generates relocations that feature autocorrelated positions and velocities as well as restricted space use and are representative of modern GPS tracking data commonly used in HR analyses (Fleming & Calabrese, 2017).

Importantly, the timescale over which autocorrelation in position decays, τp (also termed the HR crossing time; Calabrese et al., 2016), is a key parameter for HR estimation (Noonan et al., in review). Formally, τp can be quantified from the data as the timescale over which an individual's positional autocorrelation decays by a factor of urn:x-wiley:2041210X:media:mee313027:mee313027-math-0037, and its movement process reverts to the mean location (Fleming & Calabrese, 2017; Fleming, Fagan, et al., 2015). The duration of the observation period (T), in relation to τp, will thus dictate the effective sample size (ne) of a dataset via
urn:x-wiley:2041210X:media:mee313027:mee313027-math-0038(23)
which may be interpreted as the approximate number of range crossings that occurred during the sampling period. We tailored our simulations according to their relative effects on ne. These were:
  1. Sampling duration. Observations were recorded eight times/day, and we manipulated sampling duration (ranging from 1 to 4,096 days in a doubling series). For OUF simulations, the HR crossing time was set to 1 day, and the velocity autocorrelation timescale to 1/5 of a day. Notably, this parameterization was such that, in these simulations, the sampling duration in days exhibited a 1:1 relationship with ne.
  2. Sampling frequency. Here, the sampling duration was fixed at 32 days, and we manipulated the sampling frequency (ranging from 1 to 1,024 fixes/day in a doubling series). Again, for the OUF process, HR crossing time was set to 1 day, and the velocity autocorrelation timescale to 1/5 of a day. The fixed sampling duration in these simulations resulted in ne being fixed at 32, irrespective of variation in the sampling frequency.

We then compared the accuracy of the underlying HR estimates, the accuracy of the estimated overlap, and the realized coverage of the confidence intervals. Results were averaged over 1,000 simulations per manipulation. The computations were conducted on the Smithsonian Institution High Performance Cluster (SI/HPC).

2.5 Empirical study

We demonstrate the functionality of this method using GPS data from Mongolian gazelles. Mongolian gazelles are medium‐sized herbivores that cross their ranges on seasonal timescales (Fleming et al., 2014b, 2014c). Positional data for 36 Mongolian gazelle were collected in Mongolia's Eastern Steppe between 2007 and 2011 (Fleming et al., 2014a). Both variogram analysis (Fleming et al., 2014c) and model selection (Calabrese et al., 2016) were used to confirm that there was evidence of range‐residency in the data. From these diagnostic checks, 13 individuals showed no signs of range‐resident behaviour, and we restricted our analyses to the 23 range‐resident individuals. HR estimation was then carried out using KDE and AKDE as described above. We then computed all pairwise BCs ± 95% CIs on the KDE and AKDE estimates. Notably, the long HR crossing timescales (urn:x-wiley:2041210X:media:mee313027:mee313027-math-0039 days; range = 8.0–443.2), and comparatively short tracking durations (urn:x-wiley:2041210X:media:mee313027:mee313027-math-0040 days; range = 67.2–755.0), here produced a mean ne of 6.1 (range = 0.7–24.6). This is a regime where the negative bias of conventional KDE is known to have serious implications for HR estimates on autocorrelated data (Fleming & Calabrese, 2017).

2.5.1 Downstream analyses

To further highlight the utility of these confidence intervals, we used the estimated overlap to quantify the edges of a spatial interaction network (Wey et al., 2008). As point estimates were accompanied by CIs, we were able to subset edges into two categories:
  1. Supported. Well‐supported edges were identified as cases where two individuals exhibited overlapping space use, with a minimum CI that was greater than 0.01—that is, there was a 95% certainty that the overlap was ≥0.01
  2. Unsupported. Unsupported edges were identified as cases where the point estimate suggested overlapping space use, but with a minimum CI that was less than 0.01—that is, there was insufficient evidence to be certain that the overlap differed significantly from 0.

We then quantified a number of commonly used diagnostics (i.e., network density, mean path length, and closeness centrality; Wey et al., 2008), to investigate how these might differ when the network was based only on statistically supported edges vs. the inclusion of unsupported edges.

All analyses were conducted in the R environment (R Core Team, 2016), using the methods implemented in the package ctmm (Calabrese et al., 2016).

3 RESULTS

3.1 Simulation results

3.1.1 Asymptotic properties of the BCss

Simulations revealed that for IID data, both AKDE and KDE HR estimates provided identical results and were relatively unbiased except at very small sample sizes (Figure 1a). The resulting overlap was also identical between estimators, and increasing the number of fixes, by either increasing the sampling duration (Figure 1b) or frequency (Figure 1e), had the expected effect of increasing the accuracy of the overlap estimate and decreasing the uncertainty. Notably, the CIs on the BC offered reasonable coverage of the true overlap across all sampling regimes, albeit with some persistent negative bias at large sample sizes (Figure 1c,f). This was the result of bias in the BC decaying too slowly relative to the variance (see Supporting Information Appendix A.3).

image
The asymptotic properties of KDE and AKDE HR estimators (a and d) and the BC (b and e) for simulated, IID data, as well as the coverage of the CIs (panels c and f), as a function of sampling duration (top row) and frequency (bottom row). In all panels, the dashed horizontal lines depict the truth, the solid line the mean point estimate, and the shaded regions the 95% CIs

For autocorrelated data in contrast, AKDE 95% HR estimates were generally accurate across the range of sample durations (Figure 2a) and frequencies (Figure 2d) we simulated, whereas KDE HR estimates were severely biased for all but the largest datasets. As a result, while the estimated overlap between AKDE and KDE estimates both converged to the truth as sampling duration increased (Figure 2b), asymptotic consistency for KDE estimates was severely delayed. Furthermore, increasing the sampling frequency increased the negative bias in overlap estimates derived from KDE, but, appropriately, did not influence overlap estimates based on AKDE (Figure 2e).

image
The asymptotic properties of KDE and AKDE HR estimators (a and d) and the BC (b and e) for simulated, autocorrelated tracking data, and the coverage of the CIs (c and f), as a function of sampling duration (top row), and frequency (bottom row). In all panels, the dashed horizontal lines depict the truth, the solid line the mean point estimate, and the shaded regions the 95% CIs. Notably, convergence to the truth was much slower for KDE, and the coverage of KDE's CIs was far from appropriate in all cases

The coverage of 95% CIs for the KDE‐derived overlap estimates was severely biased under all of the scenarios we tested (Figure 2c,f). In contrast, the coverage of CIs on the AKDE estimates consistently provided close to nominal coverage of the true overlap.

3.1.2 Comparability of estimates

Our baseline simulation study controlled the effect of the movement parameters by assuming the individuals exhibited identical movement strategies and were sampled at the exact same times. Under these conditions, the improved accuracy of AKDE HRs estimates resulted in more accurate overlap estimates, with 95% CIs that provided close to nominal coverage (Figure 3a). There are realistic complications to our basic simulation strategy, however, including cases where individuals are subject to the same sampling design, but exhibit different movement strategies, and cases where both movement strategies and sampling designs differ. Importantly, we found that AKDE‐based overlap still provided reasonable coverage for both of these cases (Figure 3c,e). In contrast, because of the differential bias in KDE HR estimates, the estimated overlap differed substantially between each of these scenarios, and in every case failed to provide coverage of the true value (Figure 3b,d,f).

image
HR and overlap estimates for two simulated individuals with a true overlap of 0.50. In all panels, the dashed circles depict the true 95% areas, the solid black lines the estimated 95% areas, and the grey lines the 95% CIs on the area estimates. In the first row, relocations were simulated from OUF models with identical movement parameters and sampling times. In the second row, sampling was held consistent, but the individual plotted in yellow had a HR crossing time of 1 week vs. 1 day for the individual in red. In the third row, movement again differed between individuals, but here, the individual in yellow was sampled once every 30 min vs. once every 3 hrs for the individual in red. Note how in all cases AKDE‐based overlap estimates were relatively consistent and provided coverage of the true overlap, whereas KDE‐based overlap estimates varied substantially and consistently failed to provide coverage of the truth

3.2 Empirical case study

Consistent with our simulated findings of negative bias in KDE HR and BC estimates at mid to low ne on autocorrelated data, empirical AKDE HR estimates were larger than KDE estimates for all pairs (Figure 4a). Median pairwise overlap between the 276 pairs of individuals was 0.66 (95% CI 0.58–0.76) when the overlap was estimated from AKDE HR estimates, but fivefold lower when estimated from KDE estimates (median = 0.13; 95% CI 0.06–0.22).

image
(a) The relationship between pairwise estimates of the BC for Mongolian gazelle, computed from KDE and AKDE HR estimates. The dashed 1:1 line depicts parity between these. Note how all cases fall above this line, highlighting how AKDE‐derived BC suggests more overlap than KDE‐derived BC. An example of this discrepancy is depicted in (b), with AKDE BC suggesting extensive overlap 0.80 (0.22–0.99), whereas in (c), the negative bias in KDE propagates to produce a biased estimate of the overlap 0.02 (0.01–0.03). Crucially, with effective sample sizes of c. 4 for each HR estimate, the CIs approximated from the AKDE estimates were appropriately wide, vs. KDE's deceivingly narrow CIs

The severe negative bias of KDE‐derived overlap was persistent across all individuals. This can be illustrated in a specific example, where the KDE HR estimates resulted in an estimated overlap of 0.02 (95% CI 0.01–0.03), whereas the AKDE HRs resulted in an overlap of 0.80 (95% CI 0.22–0.99). Visual inspection of the range estimates for these individuals revealed substantial negative bias in the KDE HR, whereas the AKDE HR was larger, with appropriately wide CIs considering the small ne of c. 4 for each HR estimate (Figure 4b,c).

3.2.1 Downstream analyses

As these overlap estimates were accompanied by confidence intervals, the uncertainty can be used to inform downstream analyses. For instance, a spatial network analysis based on the estimated overlap revealed 461 edges of variable strength (Figure 5). Of these, 275 were well supported, whereas 186 had no statistical support. We found that basing the network off of all possible edges, vs. only those edges with statistical support, influenced its properties and any potential biological inferences that would be derived from it. For instance, network density was reduced from 0.86 to 0.63 when the analysis was restricted to only the well‐supported edges. Furthermore, only utilizing statistically supported edges increased the mean path length from 1.13 to 1.39. Interestingly, despite decreasing density and increasing the mean path length, constructing the network based on only well‐supported edges resulted in a twofold increase in the closeness centrality compared to the network constructed with both supported and unsupported edges (0.45 vs. 0.23, respectively).

image
Figure depicting (a) the GPS locations for 23 Mongolian gazelle tracked in Mongolia's Eastern Steppe; (b) a network diagram with edge weights based on overlap values; and (c) an example case of two HR estimates where the point estimate of the overlap suggests a connection, but the CIs on the estimates suggest that connection might not be statistically significant. The dashed lines in (b) depict pairs where the point estimate suggests a connection, but with CIs that include 0.01 and thus may not be statistically significant. The transparency of the lines is proportional to the point estimate of the BC. The connection depicted in red on the right‐hand side of (b) corresponds to the pair in (c)

4 DISCUSSION

Despite the routine nature of estimating overlapping space use (e.g., Berger & Gese, 2007; Dougherty et al., 2018; Frère et al., 2010; Sanchez & Hudgens, 2015), there exists no formal inferential framework for this analysis. This is largely due to the inherent difficulties associated with HR estimation (Fieberg & Börger, 2012) and exacerbated by the historical lack of CIs on both HR and overlap estimates. As a solution, we have demonstrated how AKDE HR estimates (Fleming & Calabrese, 2017; Fleming, Fagan, et al., 2015) can serve as a reliable foundation on which to base statistical inference. In addition, we have implemented a small‐sample‐size bias correction for the BC and derived well‐behaved, approximate CIs on the point estimate. Collectively, these advances permit researchers to accurately quantify HR overlap, even when sampling strategies and underlying movement parameters differ among groups being compared, and test whether any observed differences are statistically meaningful.

4.1 Home range and overlap estimation: an intrinsic relationship

A crucial component of any statistical inference is having comparable measures on which to base analyses. Overlap is typically conditional on HR estimates (Fieberg & Kochanny, 2005; Millspaugh et al., 2004), which are themselves estimated from animal tracking data. As overlap estimation relies on at least three separate estimates (two HR estimates, and their overlap), it follows that this analysis is particularly vulnerable to issues of estimator bias. Accurate HR estimation is a deceptively challenging problem, however, as autocorrelation (Fleming, Fagan, et al., 2015), small‐sample‐size bias (Fleming & Calabrese, 2017), and sampling irregularities (Fleming et al., 2018; Frair et al., 2010) will significantly influence any statistical analyses applied to animal tracking data. More subtly, even identical sampling strategies can still produce differentially biased HR estimates if the underlying parameters of movement differ markedly between individuals (Fleming & Calabrese, 2017: Noonan et al., in review). As these are nearly ubiquitous aspects of animal tracking data, accurate overlap estimation requires statistical methods that can handle these complications, without introducing artifactual differences due purely to estimator bias.

In this respect, our simulation study revealed that, for autocorrelated data, KDE regularly underestimated HR sizes (Fleming & Calabrese, 2017; Noonan et al., in review), and this negative bias was directly propagated to overlap estimates. For KDE, the amount of data required to achieve an accurate measure of overlap was very large, and most empirical cases are likely to underestimate the true overlap (Fieberg & Kochanny, 2005). In contrast, AKDE HRs were larger, but significantly more accurate, which translated to more accurate overlap estimates. Crucially, when we varied the sampling design and movement strategies between the individuals we were comparing, AKDE‐based estimates provided reliable coverage of the true overlap, whereas this was not the case for KDE. Consistent with the results of our simulation study, empirical AKDE HRs from autocorrelated Mongolian gazelle GPS data were ca. twice as large as KDE estimates. This resulted in the median pairwise overlap being fivefold larger when based on AKDE vs. KDE. Had an analysis been based on the biased KDE estimates, one would have erroneously concluded that there was little spatial overlap in this system, whereas the results based on AKDE's more rigorous estimates revealed these individuals actually exhibited extensive overlap. Although these empirical estimates could not be compared to a truth, as per our simulations, this finding is also consistent with a recent analysis by Noonan et al. (in review). In a large‐scale comparative study encompassing 369 individuals across 30 species, they found that AKDE 95% HR estimates consistently included c. 95% of holdout observations, whereas KDE estimates included c. 92% at high ne (>256), but only c. 75% at low ne. This means AKDE's larger estimates are accurate, while those produced by conventional KDE on the same data are consistently, and often grossly, too small. The net result is that AKDE provides a solid foundation for estimating overlap under realistic sampling regimes, resulting in accurate overlap estimates that can validly be compared across studies.

As described above, a fundamental component of estimating HR overlap is having comparable measures on which to base analyses. Notably, in this study, we consider range estimators in the sense of Burt (1943), which estimate a long‐run space use, assuming the focal individual does not change its movement process (Fleming, Fagan, et al., 2015). This includes KDEs, minimum convex polygons (MCP; Mohr, 1947), and time‐naive local convex hulls (LoCoH) (Getz et al., 2007). Also of interest are occurrence distribution estimators such as the Brownian bridge (Horne, Garton, Krone, & Lewis, 2007) or t‐LoCoH (Lyons, Turner, & Getz, 2013) which quantify uncertainty in the animal's location during the sampling period, including times not sampled. Crucially, this uncertainty vanishes in the limit where both the sampling interval and telemetry error approach zero. Although these two mathematically distinct classes of distributions have been historically conflated under the umbrella term of “utilization distributions,” they have very different interpretations and use cases (Fleming, Fagan, et al., 2015). Consequently, overlap based on occurrence estimates has a very different meaning from overlap based on range estimates and is beyond the scope of the present work.

We also note that extending our bias correction and CIs to other HR estimators, such as MCP, LoCoH, or non‐GRF KDE bandwidth optimizers, is not a tractable problem. First, our methods are explicitly based on the GRF approximation, so they are not consistent with non‐GRF estimators. Second, the GRF‐based methods implemented in ctmm are, to our knowledge, the only HR estimators that quantify uncertainty. As an uncertainty estimate is a prerequisite for our error propagation techniques, it would not currently be possible to adapt our approach to other estimators. Finally, the target distributions and expectation values of geometric methods such as MCP and LoCoH are usually unknown, which makes these estimators incompatible with the methods developed here.

4.2 Properties of the overlap estimator

In addition to utilizing reliable HR estimates, the overlap estimator itself should have desirable properties (Fieberg & Kochanny, 2005). While several valid estimators exist, the BC (Bhattacharyya, 1943) stands out because of its statistical validity, geometric interpretability, computational efficiency, and asymptotic consistency. As noted by Fieberg and Kochanny (2005), however, the BC is prone to exhibiting negative, small‐sample‐size bias (Djouadi & Snorrason, 1990). To correct for this, we derived a small‐sample‐size bias correction, which improved the accuracy of BC estimates (Djouadi & Snorrason, 1990).

Furthermore, problematic is the historical lack of CIs on overlap estimates. Overlap is an estimate derived from data and should be accompanied by a measure of the uncertainty (Pawitan, 2001). Without this, one cannot properly infer the importance of a given estimate. As a solution, we have derived CIs on the BC based on a GRF approximation. Using simulated data, we demonstrated how this implementation will provide reasonable coverage of the true overlap. We note, however, that, while generally well behaved, there was some persistent negative bias in the coverage of these CIs. The biased coverage is likely the result of the bias in the BC point estimate decaying too slowly relative to the variance as ne increased (Figure A.2). With asymptotically efficient estimators, this ratio would decay at a rate of urn:x-wiley:2041210X:media:mee313027:mee313027-math-0041 or better, whereas here it increases at a rate of c. urn:x-wiley:2041210X:media:mee313027:mee313027-math-0042. As such, their coverage should be treated with caution, particularly at large ne. Furthermore, because we approximate the HRs as Gaussian when estimating uncertainty, the CIs may exhibit unintended behaviour when the overlap is dependent on non‐Gaussian features.

Despite these limitations, well‐behaved CIs for HR overlap is a novel feature and permits true statistical inference on overlap estimates. For instance, these CIs can be applied to a reference value of interest (e.g., the mean overlap between individuals of the same species studied elsewhere) to test for significant differences between these, as opposed to relying on ad hoc comparisons. Additionally, if overlap is being used to inform subsequent analyses, CIs can be used to improve these. For example, we found that differentiating between the 275 overlap estimates that were well supported by the data and the 186 that may have been artifactual significantly influenced the properties of an interaction network of Mongolian gazelle. When based on all possible edges, the network suggested a larger number of edges, but with a low closeness centrality. Conversely, when based only on edges with statistical support, the network density decreased but closeness increased. The supported and unsupported networks would each lead to a unique set of biological interpretations, with only the former being supported by the data.

5 CONCLUSION

In conclusion, we have developed the first inferential framework for HR overlap tailored for the specific needs of ecologists that is both statistically valid and computationally efficient. Collectively, the more accurate and comparable HR estimates provided by AKDE (Fleming & Calabrese, 2017; Fleming, Fagan, et al., 2015; Noonan et al., in review) and our novel bias correction and CIs on the BC permit rigorous overlap estimation. This method is now available via command line interface through the ctmm package (Calabrese et al., 2016) or through the web‐based graphical user interface at ctmm.shinyapps.io/ctmmweb/ (Dong et al., 2017).

ACKNOWLEDGEMENTS

This work was supported by the US NSF Advances in Biological Informatics programme (ABI‐1458748 to J.M.C.). M.J.N. was supported by a Smithsonian Institution CGPS grant. T.M. was funded by the Robert Bosch Foundation.

AUTHORS’ CONTRIBUTIONS

K.W. and M.J.N. contributed equally to this work. C.H.F. and J.M.C. conceived the study. K.W. and C.H.F. developed the methods. K.A.O. and T.M. collected the data. M.J.N. conducted the analyses. M.J.N. and J.M.C. drafted the manuscript. All authors contributed to the study concepts and writing.

DATA ACCESSIBILITY

The Mongolian gazelle data used in this manuscript are available from the Dryad Digital Repository https://doi.org/10.5061/dryad.45157 (Fleming et al., 2014a).

    Number of times cited according to CrossRef: 11

    • Movements and habitat use of loons for assessment of conservation buffer zones in the Arctic Coastal Plain of northern Alaska, Global Ecology and Conservation, 10.1016/j.gecco.2020.e00980, (e00980), (2020).
    • Foraging ecology of masked boobies (Sula dactylatra) in the world’s largest “oceanic desert”, Marine Biology, 10.1007/s00227-020-03700-2, 167, 6, (2020).
    • How range residency and long-range perception change encounter rates, Journal of Theoretical Biology, 10.1016/j.jtbi.2020.110267, (110267), (2020).
    • Unsustainable anthropogenic mortality disrupts natal dispersal and promotes inbreeding in leopards, Ecology and Evolution, 10.1002/ece3.6089, 10, 8, (3605-3619), (2020).
    • Effects of body size on estimation of mammalian area requirements, Conservation Biology, 10.1111/cobi.13495, 34, 4, (1017-1028), (2020).
    • Scale-insensitive estimation of speed and distance traveled from animal tracking data, Movement Ecology, 10.1186/s40462-019-0177-1, 7, 1, (2019).
    • Dynamics of animal joint space use: a novel application of a time series approach, Movement Ecology, 10.1186/s40462-019-0183-3, 7, 1, (2019).
    • A comprehensive analysis of autocorrelation and bias in home range estimation, Ecological Monographs, 10.1002/ecm.1344, 89, 2, (2019).
    • Overcoming the challenge of small effective sample sizes in home‐range estimation, Methods in Ecology and Evolution, 10.1111/2041-210X.13270, 10, 10, (1679-1689), (2019).
    • Spatial ecology of the giant armadillo Priodontes maximus in Midwestern Brazil, Journal of Mammalogy, 10.1093/jmammal/gyz172, (2019).
    • Management Background and Release Conditions Structure Post-release Movements in Reintroduced Ungulates, Frontiers in Ecology and Evolution, 10.3389/fevo.2019.00470, 7, (2019).