Volume 8, Issue 5
Research Article
Free Access

A new kernel density estimator for accurate home‐range and species‐range area estimation

Christen H. Fleming

Corresponding Author

E-mail address: flemingc@si.edu

Smithsonian Conservation Biology Institute, National Zoological Park, 1500 Remount Road, Front Royal, VA, 22630 USA

Department of Biology, University of Maryland College Park, College Park, MD, 20742 USA

Correspondence author. E‐mail: flemingc@si.eduSearch for more papers by this author
Justin M. Calabrese

Smithsonian Conservation Biology Institute, National Zoological Park, 1500 Remount Road, Front Royal, VA, 22630 USA

Department of Biology, University of Maryland College Park, College Park, MD, 20742 USA

Search for more papers by this author
First published: 12 October 2016
Citations: 41

Summary

  1. Kernel density estimators are widely applied to area‐related problems in ecology, from estimating the home range of an individual to estimating the geographic range of a species. Currently, area estimates are obtained indirectly, by first estimating the location distribution from tracking (home range) or survey (geographic range) data and then estimating areas from that distribution. This indirect approach leads to biased area estimates and difficulty in deriving reasonable confidence intervals.
  2. We introduce a new kernel density estimator (and associated confidence intervals) focused specifically on area estimation that applies to both independently sampled survey data and autocorrelated tracking data. We test our methods against simulated movement data and demonstrate its use with African buffalo data.
  3. The area‐corrected kernel density estimator produces much more accurate area estimates, particularly at small sample sizes, and the newly derived confidence intervals are more reliable than existing alternatives.
  4. This new method is the most efficient nonparametric home‐range estimator for animal tracking data and should also be considered when calculating nonparametric range estimates from survey data. This estimator is now the default method in the ctmm r package.

Introduction

Range estimation is a critical statistical problem for ecology, wildlife management and conservation biology. Estimating range areas traditionally involves first estimating the distribution of animal locations from tracking or survey data and then calculating coverage regions from the distribution estimate. Nonparametric density estimators are particularly useful for their broad applicability in estimating unknown distributions, as they can account for irregular structures without requiring an understanding of the underlying biological mechanisms. Kernel density estimation (KDE) is the most statistically efficient nonparametric method for probability density estimation known and is supported by a rich statistical literature that includes many extensions and refinements (Silverman 1986; Izenman 1991; Turlach 1993). Ecology has benefited greatly from these developments, but because KDE is used across the sciences, the statistical research community has not turned much attention on the unique needs of ecological applications. For instance, despite KDE's ubiquity in home‐range estimation since its introduction into the field by Worton (1989), only recently has a KDE method been fully generalized to account for autocorrelation (autocorrelated KDE or AKDE, Fleming et al. 2015), even though autocorrelation has long been known to be an important factor in tracking data (Swihart & Slade 1985; Hansteen, Adreassen & Ims 1997).

Even when properly accounting for autocorrelation, kernel density estimators are biased, although asymptotically consistent estimators of the true density function of locations. Accounting for autocorrelation removes negative biases that stem from overestimating the effective sample size of the data (relating to the sampling rate) and failure to extrapolate diffusion (Fleming et al. 2015). The conventional, small sample‐size bias (relating to the sampling period) that affects KDE remains, although this bias will vanish asymptotically when autocorrelation is accounted for and the subsequent area estimates tend to be fairly biased (Worton 1995; Seaman & Powell 1996). For example, even for independent and identically distributed (IID) data, KDEs based on the Gaussian reference function (GRF) method will generally tend to overestimate areas (Worton 1995; Seaman & Powell 1996). Furthermore, appropriate confidence intervals for KDE‐derived statistics, such as area or overlap, are difficult to determine (Fieberg & Kochanny 2005). Area estimation is often a primary motivation for collecting animal location data, and accurate area estimates are of paramount importance in both movement ecology and biogeography. It is therefore imperative to formulate kernel density methods that yield accurate area estimates with reasonable confidence intervals.

Here, we derive a corrected kernel density estimator with substantially reduced bias in its area estimates. Specifically, we start with the autocorrelated GRF AKDE (Fleming et al. 2015), calculate the bias in its area estimates under the same GRF approximation and then correct for that bias in an area‐based coordinate system (Fig. 1). For context, our manuscript proceeds by first setting up an area‐based coordinate system, then calculating the bias in GRF‐derived area estimates and finally by applying a transformation using the area‐based coordinate system to remove much of the bias. We also derive improved confidence intervals for both KDE and AKDE area estimates under the GRF approximation. We check the accuracy of our techniques with a simulation study, using both independent and autocorrelated data, and demonstrate their use on an African buffalo example.

image
A utilization distribution estimate with (solid line) equiprobable contour containing an area urn:x-wiley:2041210X:media:mee312673:mee312673-math-0001 at percentile P. Our method proceeds by first estimating the bias of urn:x-wiley:2041210X:media:mee312673:mee312673-math-0002 and then contracting the contour inward, to a new (dashed line) contour with corrected area, as denoted by the black arrows. This is done for every equiprobable contour of the original UD estimate, resulting in an area‐corrected distribution estimate.

Area as a random variable

To remove bias and construct more reliable confidence intervals for area estimates, we must first develop the idea of area as a random variable. It is not sufficient to merely remove bias from the location distribution estimate, because the relationship between the location distribution and its areas is highly nonlinear. Developing the concept of area as a random variable will allow us to formally correct the density function estimate, so that derived area estimates are less biased, and ensure that this correction is both well defined and well behaved.

The region of area A associated with percentile P of the location probability density function p(x, y) is given by the smallest region containing the given cumulative probability
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0003(eqn 1)

which will also be the region containing the largest probability densities. In practice, we rasterize p(x,y) on a fine grid with grid cell areas dA = dx dy and cell probabilities Pcell(x, y) = p(x, y) dA. We then sort the cell probabilities Pcell (x, y) in descending order, from the mode(s) of the density function, which corresponds to the highest probability cells, out to infinity, where the cell probabilities limit to zero. The cumulative sum of this sorted list then gives us P(A), which is easily inverted to obtain A(P). Along any individual equiprobable contour of the location distribution containing an area A, it suffices to consider the coordinate (A, θ) in lieu of (x, y), where θ/2π measures distance around the contour divided by its circumference, very much like an angle (Fig. 2). If there are multiple contours for a given P(A), then we will also require a discrete variable z to classify them. Our area‐based coordinate system can therefore be represented as (A, θ, z).11 This is confirmed by the fact that the integers urn:x-wiley:2041210X:media:mee312673:mee312673-math-0004 are the quotient group of the real line urn:x-wiley:2041210X:media:mee312673:mee312673-math-0005 and circle S1, or urn:x-wiley:2041210X:media:mee312673:mee312673-math-0006, which explains the general necessity of z, A and θ, respectively.
As a coordinate, A is a measure of proximity to the mode location(s). The coordinate transformation (x, y) → (A, θ, z) is mathematically analogous to transforming from Cartesian to polar coordinates, except that the area‐based coordinate system (A, θ, z) conforms to the shape of its corresponding density function, which can have a more interesting topology (necessitating z). With this area‐based coordinate system in place, we can now formally develop a bias correction for the kernel density estimate.

image
Left panel: A bimodal distribution in the (x, y) coordinate system, with equiprobable contours drawn at different values of area A. Right panel: An overhead view of the same equiprobable contours, where an equivalent coordinate system based on area, (A, θ, z), is constructed. The area A determines how far out locations are from the modes, θ determines the relative distance around the equiprobable contour circumscribing area A, and z determines which contour, if necessary. Note that the discrete variable z only plays a role for the innermost contours, where multiple contours share the same value of A.

Kernel density‐derived area estimates

We consider the q‐dimensional location vector r(t) as a continuous‐time stochastic process with mean function μ(t) and autocorrelation function σ(t, t′):
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0007(eqn 2)
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0008(eqn 3)
and where M and Σ denote the mean qn dimensional block vector and qn × qn dimensional autocorrelation matrix, respectively, when the data are sampled at times ti, with i ∈ {1,…, n}. For simplicity, we will further assume the block structure
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0009(eqn 4)
where σ0 = σ(t, t) is the ordinary × q dimensional covariance matrix, ⊗ denotes the tensor product, and Cij = c(ti, tj) is an n × n dimensional correlation matrix. In other words, we assume that the autocorrelation structure and movement behaviour is proportionally the same in all directions. The more general case of an unstructured autocorrelation matrix is straightforward up to the point of the area correction, which is less obvious. Any weighted kernel density estimate with Gaussian kernels will take the form
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0010(eqn 5)
where wi denotes the weights and σB denotes the bandwidth matrix. With the previous structural assumptions on the autocorrelation matrix, we can express the GRF bandwidth as proportional to the covariance, or σB = h2 σ0, where h is the standardized bandwidth. Specifically, we optimize our bandwidth using the AKDE method (Fleming et al. 2015), which is the generalization of the GRF approximation to autocorrelated and possibly non‐stationary data. Unlike conventional KDE methods, AKDE accounts for autocorrelation in the data by first fitting an autocorrelation model to the data, which provides estimates for μ(t), σ(t, t′), and then optimizing the bandwidth h under the assumption of the fitted autocorrelation model. As AKDE is the generalization of the GRF KDE method to autocorrelated data, it reduces to regular KDE for IID data. We improve the area estimates of both AKDE and the conventional GRF KDE below.

Area bias correction

Here, we derive the area‐corrected KDE and AKDE methods (hereafter KDEC and AKDEC) under the GRF approximation. We start by considering the first two cumulants of the density estimate. The mean location as predicted by the kernel density estimate calculated from one realization of the movement process is
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0011(eqn 6)
Similarly, the covariance predicted by (A)KDE and then averaged over many realizations of the process is
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0012(eqn 7)
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0013(eqn 8)
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0014(eqn 9)
which is comprised of three parts: the random scatter in the data, the bandwidth smoothing and the intrinsic covariance of the mean trajectory. The latter contribution is zero if the mean is stationary. Only the stochastic and smoothing contributions scale the area, while the third term shapes the area. Therefore, in a GRF approximation, we consider the kernel density estimate of the mean detrended data, whereupon the third term vanishes and we are left with
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0015(eqn 10)
where the bandwidth is now parameterized as σB = h2 σ0 in terms of h. In the GRF approximation, all areas are the product of a standardized quantile function and the standard area urn:x-wiley:2041210X:media:mee312673:mee312673-math-0016. Therefore, when considering the ratio of the expected area to the true area at any confidence level, the quantile functions will cancel and the area will be overestimated by a factor of
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0017(eqn 11)

Next, we invoke our area‐based coordinate system (A, θ, z) to reduce this bias. While urn:x-wiley:2041210X:media:mee312673:mee312673-math-0018 is a biased estimator of areas, urn:x-wiley:2041210X:media:mee312673:mee312673-math-0019 would be exactly debiased in the GRF approximation if we were to know α exactly. Furthermore, the biased (A)KDE conforms to the distribution of the data and, by not altering this distribution along its remaining coordinates (θ, z), we avoid disturbing this property. If we had instead attempted to modify the bandwidth h, as pragmatically considered in Worton (1995), then this would not generally succeed because the second, negative term in α is 1/n for independent and identically distributed (IID) data, while h2 must then be urn:x-wiley:2041210X:media:mee312673:mee312673-math-0020 for the kernel density estimate to be asymptotically optimal (Silverman 1986). That is, shrinking the bandwidth to the point at which α = 1 would imply urn:x-wiley:2041210X:media:mee312673:mee312673-math-0021, which would violate asymptotic optimality.

As we can only estimate the bias factor α in eqn eqn 11 from the autocorrelation estimate urn:x-wiley:2041210X:media:mee312673:mee312673-math-0022, the correction estimated by urn:x-wiley:2041210X:media:mee312673:mee312673-math-0023 therefore removes the leading order of bias. To see this, note that the estimated correction itself has a bias factor of approximately urn:x-wiley:2041210X:media:mee312673:mee312673-math-0024, which is much closer to unity than α. Although it is highly nonlinear, this transformation is quite simple in practice. We simply take a contour of area A and probability urn:x-wiley:2041210X:media:mee312673:mee312673-math-0025 of the estimated location distribution and then reassign this probability value to the contour of the approximately debiased area urn:x-wiley:2041210X:media:mee312673:mee312673-math-0026 (Fig. 1). The location distribution is not distorted along its remaining dimensions (θ, z), which means that conformation to the data is unaffected by this correction. For α > 1, which is typically the case here, the effect of this correction is to contract the contours inward (by varying amounts of distance) towards regions of higher probability, where there are more data. This transformation counteracts the natural tendency of the GRF approximation to oversmooth the data.

Effective sample sizes

If the data are autocorrelated, effective sample sizes can inform us as to how much information the data really contain and assist us in estimating confidence intervals. We start by estimating an effective sample size for our optimal bandwidth. Specifically, we determine the number urn:x-wiley:2041210X:media:mee312673:mee312673-math-0027 of locations that would have to be sampled from an IID process with the same covariance σ0 to obtain the same optimal bandwidth urn:x-wiley:2041210X:media:mee312673:mee312673-math-0028 and same resolution of kernel density estimate. To solve for urn:x-wiley:2041210X:media:mee312673:mee312673-math-0029, first we take the IID mean integrated square error (MISE) from Fleming et al. (2015) eqn eqn 1, differentiate it to obtain the optimal bandwidth relation and then solve for n, which in this case is actually urn:x-wiley:2041210X:media:mee312673:mee312673-math-0030. By straightforward calculus, we have in two dimensions (q = 2)
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0031(eqn 12)
Next, we estimate the effective sample size of area estimates urn:x-wiley:2041210X:media:mee312673:mee312673-math-0032 in the GRF approximation. In this case, we have the simple two‐dimensional chi‐squared relation
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0033(eqn 13)

where urn:x-wiley:2041210X:media:mee312673:mee312673-math-0034 in two dimensions, and the variance is calculated from the curvature of the autocorrelated Gaussian likelihood function. For IID data, the effective sample size of the bandwidth urn:x-wiley:2041210X:media:mee312673:mee312673-math-0035 is also the number of sampled locations n, as is the effective sample size of the area urn:x-wiley:2041210X:media:mee312673:mee312673-math-0036, as we have defined nA with the prefactor of 2 to make these three numbers – n, nH, and nA – directly comparable. Anecdotally, we find that urn:x-wiley:2041210X:media:mee312673:mee312673-math-0037 for autocorrelated data, which will be an important distinction when we place confidence intervals on the area estimates.

Area confidence intervals

In KDE, it is customary to place confidence intervals on the density estimate itself, urn:x-wiley:2041210X:media:mee312673:mee312673-math-0038 and urn:x-wiley:2041210X:media:mee312673:mee312673-math-0039, as in Horowitz (2001). This practice is not particularly useful for placing confidence intervals on quantities derived from KDE, such as area, because urn:x-wiley:2041210X:media:mee312673:mee312673-math-0040 and urn:x-wiley:2041210X:media:mee312673:mee312673-math-0041 are not bona fide probability density functions that integrate to unity. Applying the nonparametric bootstrap to place confidence intervals on derived estimates like area or overlap also proves difficult because kernel density estimators themselves are both nonparametric and biased (Fieberg & Kochanny 2005), and so subtracting off the bootstrap bias from the density estimate also does not result in a bona fide density function. Instead, Fleming et al. (2015) propagated uncertainty in the autocorrelation estimate into uncertainty in the optimal bandwidth urn:x-wiley:2041210X:media:mee312673:mee312673-math-0042 of the kernel density estimate, as a means of obtaining confidence intervals on the area estimate from the chi‐squared distribution
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0043(eqn 14)
with the effective number of degrees of freedom given by
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0044(eqn 15)
and so the old CIs were obtained from the chi‐squared quantile function with point estimate urn:x-wiley:2041210X:media:mee312673:mee312673-math-0045 and degrees of freedom urn:x-wiley:2041210X:media:mee312673:mee312673-math-0046 satisfying the delta method. We have found this useful for small effective sample sizes where the autocorrelation uncertainties and optimal bandwidths are very large, such as in the Mongolian gazelle example in Fleming et al. (2015). However, in the limit of high‐quality data sets, where the effective sample size is large, these propagated uncertainties become severely underestimated. Here, we overcome previous limitations by deriving useful and well‐behaved confidence intervals for the area estimates in the GRF approximation.
First, we note that the GRF itself has the chi‐square area estimate distribution
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0047(eqn 16)
which is exact for normally distributed data and where urn:x-wiley:2041210X:media:mee312673:mee312673-math-0048 can be readily estimated from the curvature of the autocorrelated Gaussian likelihood function, following relation eqn 13. Next for the kernel density estimate, while in the GRF approximation, if we had urn:x-wiley:2041210X:media:mee312673:mee312673-math-0049 IID locations, then the two‐dimensional areas of the KDE would also limit to a chi‐squared distribution with urn:x-wiley:2041210X:media:mee312673:mee312673-math-0050 degrees of freedom, given that the KDE will approximately be comprised of a normal distribution of small kernels. The asymptotic behaviour encapsulated by this relation is important, because in the large urn:x-wiley:2041210X:media:mee312673:mee312673-math-0051 limit our previously derived confidence intervals eqn 14 would collapse to values much smaller than the KDE's resolution. Sticking to the GRF approximation, the appropriate effective sample size here is urn:x-wiley:2041210X:media:mee312673:mee312673-math-0052 and not urn:x-wiley:2041210X:media:mee312673:mee312673-math-0053 – which we confirm by simulation – as urn:x-wiley:2041210X:media:mee312673:mee312673-math-0054 provides the exact GRF confidence intervals for the areas. Therefore, our new AKDE CIs are approximated by the same distribution as the area estimates of the GRF
urn:x-wiley:2041210X:media:mee312673:mee312673-math-0055(eqn 17)
with the same GRF estimate of urn:x-wiley:2041210X:media:mee312673:mee312673-math-0056, but different point estimates for A.

Study

Simulation

We performed numerical simulations to check our theoretical results. We compared bias in the area estimates of our new KDEC and AKDEC estimators with the biases of both the independent (KDE) and autocorrelated (AKDE) versions of the GRF method. In our simulation tests, we considered a range of Gaussian processes including an uncorrelated IID process representing survey data, an Ornstein–Uhlenbeck (OU) process (Dunn & Gipson 1977), representing VHF quality tracking data, and an Ornstein–Uhlenbeck–F (OUF) process (Fleming et al. 2014a) representing high‐quality GPS tracking data. For the OU and OUF models, we used a home‐range crossing time of 1 day, and the velocity autocorrelation time‐scale of the OUF model was also set to 1 day. In all simulations, observations were recorded once every three hours, resulting in eight observations per day. For each model, we simulated various periods of data (from 5 to 5120 days in a doubling series), with results averaged over 250 simulations per period. Using the same approach, we also checked the coverage properties of our new confidence intervals by recording the proportion of the 250 replicates for each model for each sampling period that contained the true value of the 95% area. We compared the coverage properties of our new confidence intervals on the kernel density areas, to those of the corresponding GRFs on the normal areas (Fleming et al. 2014b), which are asymptotically exact. In all interval coverage tests, we used a nominal coverage of 50% so that we could detect deviations from the nominal coverage in both directions (i.e. if the realized coverage was too high or too low). For normally distributed processes, coverage behaviour is uniform across confidence levels and the choice of 50% minimizes the necessary number of simulations. All simulation and KDE was performed with the continuous‐time movement modelling (ctmm) r package v0.3.2 (Fleming & Calabrese 2015; Calabrese, Fleming & Gurarie 2016) in r version 3.2.4 (R Core Team 2016).

Application

In our empirical demonstration, we performed home‐range estimation on tracking data from an African buffalo (Syncerus caffer, Cross et al. 2009, 2016; Getz et al. 2007) that was sampled every 2 h for 8·5 months. Both the original AKDE of Fleming et al. (2015) and the area‐corrected AKDEC variant introduced here require first selecting an autocorrelated movement model prior to bandwidth optimization. As candidate models, we considered the IID, OU and OUF movement models, with either circular or elliptical covariances. Because this buffalo's distribution was elongated in one direction, we considered a non‐stationary drift term to determine whether home‐range estimation was meaningful for the full period of data or whether the buffalo's range shifted over time. Both variogram analysis (Fleming et al. 2014a) and model selection were used to confirm that there was indeed evidence of range residency in the data and that it was therefore reasonable to perform home‐range estimation (results not shown, but see Calabrese, Fleming & Gurarie 2016).

Results

In Fig. 3, we show the results of our simulation exercise comparing the area estimates of the GRF (KDE and AKDE) with those of the area‐corrected estimator derived here (KDEC and AKDEC). In all of the autocorrelated simulations, the range crossing time was set to 1 day and so N days of data yields an effective sample size of urn:x-wiley:2041210X:media:mee312673:mee312673-math-0057. These simulations confirm our theoretical results – that our method removes the leading order of bias, but not all of the bias in area estimation. Moreover, although the GRF approximation we relied on is an asymptotic approximation, our area‐corrected estimator also yields considerable improvements with small sample sizes.

image
Simulations demonstrating the relation between accuracy in area estimation and sampling duration, with (A)KDE in blue and area‐corrected (A)KDEC in red, for (a) an IID process, (b) an OU process with a 1 day home‐range crossing time‐scale and (c) and OUF process with 1 day home‐range crossing and persistence‐of‐motion time‐scales. Bias in the area estimates is substantially mitigated by our correction in all regimes.

We summarize the results of our simulation exercise comparing the new kernel density confidence intervals with the intervals of the GRF in Fig. 4. The performance of the new intervals is reasonable, although not as efficient as the GRF intervals. Both intervals have bias stemming from the underlying MLEs of the autocorrelation model parameters on which they are based, and it was not possible to tease the two biases apart. In the absence of an analytic formula for nA, the new AKDE CIs rely on the normal MLE of urn:x-wiley:2041210X:media:mee312673:mee312673-math-0058. Therefore, we could only judge what degradation there was in the new AKDE CIs relative to the GRF CIs.

image
Simulations demonstrating confidence interval coverage for our new kernel density CIs (a) and the corresponding Gaussian reference function (GRF) via normal MLE (b). The simulated processes are the same as in Fig. 3, including IID (blue), OU (red) and OUF (green). GRF CI performance is poor until urn:x-wiley:2041210X:media:mee312673:mee312673-math-0059, while the corresponding new AKDE CI performance is slightly degraded even beyond this point, although asymptotically correct for these processes.

The best‐fit OUF movement model in our buffalo example featured an elliptical covariance, a range crossing time‐scale of 12 (95% CI: 7–22) days and a persistence‐of‐motion time‐scale of 36 (95% CI: 32–40) minutes. The estimated drift term was along the longer axis of the buffalo's location distribution, but its presence increased the AIC of the model and so it was discarded. The empirical variogram exhibited a clear asymptote between time‐lags of 1 month to 7 months, which is consistent with range resident behaviour. A comparison of the autocorrelated GRF AKDE estimate vs. our new area‐corrected AKDEC estimate suggested that the former overestimated the buffalo's area by more than 30% (Fig. 5).

image
Autocorrelated kernel density estimates for an African buffalo, with and without bias correction. The thick black contour represents the 95% home‐range area point estimate, while the lighter contours represent its confidence intervals, and the grid lines correspond to the resolution of the density estimate. Although this data set consists of n = 1725 autocorrelated locations, the effective sample sizes for home‐range estimation were only urn:x-wiley:2041210X:media:mee312673:mee312673-math-0060 and urn:x-wiley:2041210X:media:mee312673:mee312673-math-0061. The bias correction pulls the 95% home‐range area contours in towards the bulk of the data, counteracting the positive bias of the Gaussian reference function estimator, which is here expected to overestimate areas by a third (urn:x-wiley:2041210X:media:mee312673:mee312673-math-0062).

Discussion

We have derived an improved kernel density estimator with reduced bias in its area estimates, applicable to both IID and autocorrelated data. Our estimator is uniquely tailored to the specific interests of movement ecology and biogeography, where area estimation is a key priority. Essentially, the AKDE method of Fleming et al. (2015) has served as our ‘pilot’ estimator, in much the same way that a fixed‐bandwidth kernel density estimate can serve as the pilot estimate for variable bandwidth KDE. Fleming et al. (2015) generalized the GRF bandwidth optimizer to the case of autocorrelated and possibly non‐stationary data. Fleming et al. (2015) chose to base their generalization of KDE to autocorrelated data on the GRF approximation, not because it is the most accurate, but because its generalization was mathematically tractable. While it has an asymptotically optimal order of error and is asymptotically consistent, the GRF bandwidth also has significant positive area estimation bias from oversmoothing, which the area correction introduced here mitigates. In short, Fleming et al. (2015) removed bias stemming from autocorrelation, which is strictly negative and increases with the sampling rate, while here we have reduced bias stemming from small effective sample sizes, which depends on the bandwidth optimization method and increases with decreasing sampling period. Otherwise, even for IID data, KDE‐derived area estimates exhibit significant bias with small sample sizes.

The substantial sampling‐dependent biases of conventional KDE and minimal convex polygon home‐range estimators have encouraged the promotion of standardized sampling schedules across individuals for making comparisons (Börger et al. 2006; Nilsen, Pedersen & Linnell 2008; Signer et al. 2015). When data are autocorrelated, this problem is actually worse in conventional estimators than has been realized, and such standardizations will often not work. For individuals with similar movement characteristics, standardization of the sampling schedule will produce approximately standardized output bias. However, when individuals exhibit very different movement behaviours (i.e. cross‐species comparisons), there is no guarantee that standardization will be valid. With conventional estimators, individuals exhibiting different movement characteristics will produce different amounts of bias from the same sampling schedule. To see this, consider that different movement processes typically have different characteristic time‐scales, such as the home‐range crossing time‐scale τH. If the sampling interval is substantially longer than τH, then the resulting data will be IID. Otherwise, the data will be autocorrelated and the strength of the autocorrelation will increase as the sampling interval decreases. Imposing the same sampling interval on individuals featuring very different home‐range crossing time‐scales (as will often be the case in cross‐species comparisons) will thus result in data sets that differ substantially in the strength of their autocorrelations. Estimating home ranges from such data with conventional methods that do not account for autocorrelation will then result in estimates that are differentially biased even though they were collected with a standardized sampling interval. The effect that the sampling period (or duration) has on bias in conventional home‐range estimators also operates relative to the home‐range crossing time‐scale. The effective sample size for area estimation, nA, is roughly the number of times the individual crosses the linear extent of its home range during the observation period T, or nA ~ TH. So again, two individuals differing in home‐range crossing time‐scales that are sampled for the same duration will have different effective sample sizes and thus be subject to different biases in existing home‐range analyses. By avoiding and correcting for these sampling‐dependent biases, the area‐corrected AKDEC allows researchers to make statistically valid comparisons across different sampling schedules and different species with improved statistical accuracy and without having to discard data.

Researchers familiar with KDE might question why we did not consider ‘variable’ or ‘adaptive’ bandwidth estimators, which are also known to reduce bias (Silverman 1986). Unfortunately, variable bandwidth KDEs only remove the leading‐order bias from heavy‐tailed distributions (Terrell & Scott 1992). Heavy tails are not a general concern for home‐range and species‐range distributions, which more frequently exhibit the opposite property – abrupt decay in density, due to topographical features, territoriality, spatially constrained niche preferences and other factors. Furthermore, variable bandwidths remove the leading‐order bias in the density estimate urn:x-wiley:2041210X:media:mee312673:mee312673-math-0063, which does not translate into removing the leading‐order bias in the area estimates urn:x-wiley:2041210X:media:mee312673:mee312673-math-0064, as the relationship between the two is highly nonlinear (eqn eqn 1). More relevant would be to consider methods that reduce error in the distribution estimate urn:x-wiley:2041210X:media:mee312673:mee312673-math-0065 (Liu & Yang 2008), which is at least one step closer to the area estimate.

It is important to highlight the workflow we applied in our empirical demonstration, which featured multiple tests for the validity of home‐range estimation (Calabrese, Fleming & Gurarie 2016). Due to migration, dispersal and other range‐shifting behaviours, not all animal tracking data sets will exhibit range residence over long sampling periods, while, in addition to the aforementioned behaviours, shorter sampling periods may not provide enough observed home‐range crossings to make range residency detectable. First, we examined the xy scatter plot of the data to rule out large migration events. Secondly, we examined the empirical variogram (Fleming et al. 2014a) of the data to check for an asymptote (Calabrese, Fleming & Gurarie 2016). Thirdly, we included in the set of candidate models reasonable non‐stationary behaviours (a drift in the mean location) that would violate the assumption of range residence. Fourthly, and finally, we reviewed the effective sample sizes of our best‐fit model, to make certain that multiple range crossings were observed in the data (i.e. urn:x-wiley:2041210X:media:mee312673:mee312673-math-0066).

From the perspective of area as a random variable, we may consider different measures of central tendency or average range area. The 50% ‘core’ home‐range area is also the median area, which gives some justification to focusing on the core home range over other percentiles, such as the frequently considered 95% area. Because the area‐based coordinate system expands outwards from the highest probability locations (initially at smaller values of A), we expect realistic area distributions to be positively skewed. For such distributions, there is a very strong tendency to have the ordering of mode(A) ≤ median (A) ≤ mean (A) (von Hippel 2005). Furthermore, we note that in the case of many unimodal location distributions, such Gaussian, the mode area is zero, which renders its utility limited. As the mean area tends to be larger than the median (50% core) area, this might also be a useful, statistically motivated measure of central tendency alternative to the core home range.

To identify the sampling properties of our density estimator specifically, we assumed that the true autocorrelation model was known in our simulations, while urn:x-wiley:2041210X:media:mee312673:mee312673-math-0067 was estimated. This is not a serious issue for IID data, but for autocorrelated data, the standard maximum likelihood approach to parameter estimation (Fleming et al. 2014b) can be significantly biased when the effective sample size is small. We do not consider this issue here, even though it affects our confidence intervals for small nA, but note that this is a reasonably well‐studied problem with possible solutions valid for autocorrelated data, such as residual maximum likelihood (REML, Harville 1977) and the parametric bootstrap (Efron & Efron 1982). This separate issue will, however, likely require substantial development and testing, and is thus beyond the scope of this manuscript.

Based on these theoretical, simulated and empirical results, the area‐corrected AKDEC we introduce here is now the default range distribution estimator in the ctmm r package (Fleming & Calabrese 2015; Calabrese, Fleming & Gurarie 2016), which is available on CRAN. The estimated effective sample sizes urn:x-wiley:2041210X:media:mee312673:mee312673-math-0068, urn:x-wiley:2041210X:media:mee312673:mee312673-math-0069 and the estimated bias factor urn:x-wiley:2041210X:media:mee312673:mee312673-math-0070 are stored in the slots DOF.H, DOF.A and bias of ‘utilization distribution’ UD objects. The methods introduced here continue to expand and refine the library of statistical estimators that are derived from first principles with ecological applications in mind. Future methods development for range estimation should include improved autocorrelation model fitting, pooled density estimates and covariate dependence (as in Moorcroft, Lewis & Crabtree 2006; Moorcroft & Lewis 2006).

Acknowledgements

The project was funded by US National Science Foundation grant ABI 1458748 to J.M.C. We thank Paul Cross for providing the African buffalo data. The buffalo data collection was supported by the US NSF and NIH Ecology of Infectious Disease Program (DEB‐0090323 to W.M. Getz)

    Data accessibility

    The African buffalo data used in this manuscript are available at Movebank (Cross et al. 2016) and are included in the ctmm r package (Calabrese, Fleming & Gurarie 2016). The source code for the ctmm package is available on CRAN (Fleming & Calabrese 2015).

      Number of times cited according to CrossRef: 41

      • Identifying Birds' Collision Risk with Wind Turbines Using a Multidimensional Utilization Distribution Method, Wildlife Society Bulletin, 10.1002/wsb.1056, 44, 1, (191-199), (2020).
      • Assessing structural changes at the forest edge using kernel density estimation, Forest Ecology and Management, 10.1016/j.foreco.2019.117639, 456, (117639), (2020).
      • High interindividual variability in habitat selection and functional habitat relationships in European nightjars over a period of habitat change, Ecology and Evolution, 10.1002/ece3.6331, 10, 12, (5932-5945), (2020).
      • Post‐stocking movement and survival of hatchery‐reared bloater (Coregonus hoyi) reintroduced to Lake Ontario, Freshwater Biology, 10.1111/fwb.13491, 65, 6, (1073-1085), (2020).
      • Effects of body size on estimation of mammalian area requirements, Conservation Biology, 10.1111/cobi.13495, 34, 4, (1017-1028), (2020).
      • How range residency and long-range perception change encounter rates, Journal of Theoretical Biology, 10.1016/j.jtbi.2020.110267, (110267), (2020).
      • Movements and habitat use of loons for assessment of conservation buffer zones in the Arctic Coastal Plain of northern Alaska, Global Ecology and Conservation, 10.1016/j.gecco.2020.e00980, (e00980), (2020).
      • Understanding decision making in a food-caching predator using hidden Markov models, Movement Ecology, 10.1186/s40462-020-0195-z, 8, 1, (2020).
      • Escaping drought: Seasonality effects on home range, movement patterns and habitat selection of the Guatemalan Beaded Lizard, Global Ecology and Conservation, 10.1016/j.gecco.2020.e01178, 23, (e01178), (2020).
      • Individual specialization in the use of space by frugivorous bats, Journal of Animal Ecology, 10.1111/1365-2656.13339, 0, 0, (2020).
      • Do Monkeys Avoid Areas of Home Range Overlap Because They Are Dangerous? A Test of the Risk Hypothesis in White-Faced Capuchin Monkeys (Cebus capucinus), International Journal of Primatology, 10.1007/s10764-019-00110-0, (2020).
      • Inconspicuous, recovering, or northward shift: status and management of the white shark ( Carcharodon carcharias ) in Atlantic Canada , Canadian Journal of Fisheries and Aquatic Sciences, 10.1139/cjfas-2020-0055, (1-12), (2020).
      • Ranging behaviour of Long-crested Eagles Lophaetus occipitalis in human-modified landscapes of KwaZulu-Natal, South Africa , Ostrich, 10.2989/00306525.2020.1770888, (1-7), (2020).
      • A fresh look at the biodiversity lexicon for fiddler crabs (Decapoda: Brachyura: Ocypodidae). Part 2: Biogeography, Journal of Crustacean Biology, 10.1093/jcbiol/ruaa029, (2020).
      • Vertical relief facilitates spatial segregation of a high density large carnivore population, Oikos, 10.1111/oik.06724, 129, 3, (346-355), (2019).
      • Testing cellular phone-enhanced GPS tracking technology for urban carnivores, Animal Biotelemetry, 10.1186/s40317-019-0180-8, 7, 1, (2019).
      • The time frame of home‐range studies: from function to utilization, Biological Reviews, 10.1111/brv.12545, 94, 6, (1974-1982), (2019).
      • The rise of a carnivore, the evolution of the presence of the golden jackal in Slovakia, Folia Zoologica, 10.25225/fozo.046.2019, 68, 2, (66), (2019).
      • Modelling animal movement as Brownian bridges with covariates, Movement Ecology, 10.1186/s40462-019-0167-3, 7, 1, (2019).
      • Foraging movements of common noddies in the East Indian Ocean are dependent on breeding stage: implications for marine reserve design, Pacific Conservation Biology, 10.1071/PC18033, 25, 2, (164), (2019).
      • Modified home range kernel density estimators that take environmental interactions into account, Movement Ecology, 10.1186/s40462-019-0161-9, 7, 1, (2019).
      • Movement ecology of Afrotropical birds: Functional traits provide complementary insights to species identity, Biotropica, 10.1111/btp.12702, 51, 6, (894-902), (2019).
      • Scale-insensitive estimation of speed and distance traveled from animal tracking data, Movement Ecology, 10.1186/s40462-019-0177-1, 7, 1, (2019).
      • Overcoming the challenge of small effective sample sizes in home‐range estimation, Methods in Ecology and Evolution, 10.1111/2041-210X.13270, 10, 10, (1679-1689), (2019).
      • Space fit for a king: spatial ecology of king cobras (Ophiophagus hannah) in Sakaerat Biosphere Reserve, Northeastern Thailand, Amphibia-Reptilia, 10.1163/15685381-18000008, 40, 2, (163-178), (2019).
      • A comprehensive analysis of autocorrelation and bias in home range estimation, Ecological Monographs, 10.1002/ecm.1344, 89, 2, (2019).
      • undefined, 2019 IEEE Global Communications Conference (GLOBECOM), 10.1109/GLOBECOM38437.2019.9014291, (1-6), (2019).
      • Spatial ecology of the giant armadillo Priodontes maximus in Midwestern Brazil, Journal of Mammalogy, 10.1093/jmammal/gyz172, (2019).
      • The trade-off between fix rate and tracking duration on estimates of home range size and habitat selection for small vertebrates, PLOS ONE, 10.1371/journal.pone.0219357, 14, 7, (e0219357), (2019).
      • Animal Movement Prediction Based on Predictive Recurrent Neural Network, Sensors, 10.3390/s19204411, 19, 20, (4411), (2019).
      • The Spatial-Temporal Characteristics of Cultivated Land and Its Influential Factors in The Low Hilly Region: A Case Study of Lishan Town, Hubei Province, China, Sustainability, 10.3390/su11143810, 11, 14, (3810), (2019).
      • Subsampling-extrapolation bandwidth selection in bivariate kernel density estimation, Journal of Statistical Computation and Simulation, 10.1080/00949655.2019.1597099, (1-20), (2019).
      • Space use and movement of jaguar (Panthera onca) in western Paraguay, Mammalia, 10.1515/mammalia-2017-0040, 82, 6, (540-549), (2018).
      • Wolf space use during denning season on Prince of Wales Island, Alaska, Wildlife Biology, 10.2981/wlb.00468, 2018, 1, (wlb.00468), (2018).
      • Ecological metrics and methods for GPS movement data, International Journal of Geographical Information Science, 10.1080/13658816.2018.1498097, 32, 11, (2272-2293), (2018).
      • Statistical inference for home range overlap, Methods in Ecology and Evolution, 10.1111/2041-210X.13027, 9, 7, (1679-1691), (2018).
      • Disentangling social interactions and environmental drivers in multi-individual wildlife tracking data, Philosophical Transactions of the Royal Society B: Biological Sciences, 10.1098/rstb.2017.0007, 373, 1746, (20170007), (2018).
      • Correcting for missing and irregular data in home‐range estimation, Ecological Applications, 10.1002/eap.1704, 28, 4, (1003-1010), (2018).
      • Testing time-geographic density estimation for home range analysis using an agent-based model of animal movement, International Journal of Geographical Information Science, 10.1080/13658816.2017.1421764, (1-18), (2018).
      • Anchoring and adjusting amidst humans: Ranging behavior of Persian leopards along the Iran-Turkmenistan borderland, PLOS ONE, 10.1371/journal.pone.0196602, 13, 5, (e0196602), (2018).
      • Comparative Changes of Influence Factors of Rural Residential Area Based on Spatial Econometric Regression Model: A Case Study of Lishan Township, Hubei Province, China, Sustainability, 10.3390/su10103403, 10, 10, (3403), (2018).