Population-level inference for home-range areas
Handling Editor: Marie Auger-Méthé
Abstract
- Home-range estimates are a common product of animal tracking data, as each range represents the area needed by a given individual. Population-level inference of home-range areas—where multiple individual home ranges are considered to be sampled from a population—is also important to evaluate changes over time, space or covariates such as habitat quality or fragmentation, and for comparative analyses of species averages. Population-level home-range parameters have traditionally been estimated by first assuming that the input tracking data were sampled independently when calculating home ranges via conventional kernel density estimation (KDE) or minimal convex polygon (MCP) methods, and then assuming that those individual home ranges were measured exactly when calculating the population-level estimates. This conventional approach does not account for the temporal autocorrelation that is inherent in modern tracking data, nor for the uncertainties of each individual home-range estimate, which are often large and heterogeneous.
- Here, we introduce a statistically and computationally efficient framework for the population-level analysis of home-range areas, based on autocorrelated kernel density estimation (AKDE), that can account for variable temporal autocorrelation and estimation uncertainty.
- We apply our method to empirical examples on lowland tapir Tapirus terrestris, kinkajou Potos flavus, white-nosed coati Nasua narica, white-faced capuchin monkey Cebus capucinus and spider monkey Ateles geoffroyi, and quantify differences between species, environments and sexes.
- Our approach allows researchers to more accurately compare different populations with different movement behaviours or sampling schedules while retaining statistical precision and power when individual home-range uncertainties vary. Finally, we emphasize the estimation of effect sizes when comparing populations, rather than mere significance tests.
1 INTRODUCTION
Accurately estimating species area requirements is of utmost importance for conservation (Pe'er et al., 2014; Shaffer, 1981) from the individual to the population level, especially in light of the increasing human impact on landscapes (Brashares et al., 2001; Dardanelli et al., 2006; Larsen et al., 2008; Nagy-Reis et al., 2021). At the individual level, space-use requirements are typically described by an animal’s home range (Burt, 1943), which is formalized by the probability distribution of the animal’s locations (Worton, 1995). Population-level inference on space-use parameters is also important—both for quantifying the area requirements of a typical organism and for quantifying the effect of covariates, such as species or taxa (Habel et al., 2019; Matley et al., 2019; Poessel et al., 2020; Rehm et al., 2018), sex (D’haen et al., 2019; Desbiez et al., 2019; Morato et al., 2016; Naveda-Rodríguez et al., 2018), body size (Bašić et al., 2019; Desbiez et al., 2019; Naveda-Rodríguez et al., 2018), age (Averill-Murray et al., 2020; Goldenberg et al., 2018; Kays et al., 2020; Mirski et al., 2020), movement characteristics (Bowman et al., 2002; Desbiez et al., 2019; Swihart et al., 1988), conspecific density (Erlinge et al., 1990; Massei et al., 1997; Trewhella et al., 1988), resource density (Herfindal et al., 2005; Loveridge et al., 2009; Massei et al., 1997), habitat or biome (McBride Jr & Thompson, 2019; Morato et al., 2016; Paolini et al., 2019; Tonra et al., 2019), human influences (Hansen et al., 2020; McBride Jr & Thompson, 2019; Rutt et al., 2020; Ullmann et al., 2020), weather (Kay et al., 2017; Matley et al., 2019; Mirski et al., 2020) and season or time (Bašić et al., 2019; Goldenberg et al., 2018; Matley et al., 2019; Roffler & Gregovich, 2018). Both the mean response and population variation have been studied as important regressors for biological inference (Seigle-Ferrand et al., 2021). In any case, it has traditionally been the approach that individual home-range area point estimates are input into conventional analyses (e.g. the sample mean, t-test, generalized linear models, etc.) without accounting for their associated uncertainties, likely due to the lack of a suitable alternative (though see Averill-Murray et al., 2020).
Home-range estimation is subject to a number of potential, differential biases that can challenge comparisons across species, behaviours, sampling schedules, tracking devices or habitats; and only recently have methods been developed to address these issues (Fleming et al., 2015, 2018, 2019, 2020; Fleming & Calabrese, 2017; Noonan et al., 2020). Negative biases in home-range estimation can result from less tortuous and less spatially constrained movement behaviours (Fleming et al., 2015; Swihart et al., 1988), finer sampling rates (Fleming et al., 2015; Noonan, Tucker, et al., 2019; Swihart & Slade, 1985), shorter sampling periods (Fleming et al., 2019; Noonan, Tucker, et al., 2019), larger body size (Noonan et al., 2020) and estimating an inappropriate target distribution (Fleming et al., 2015, 2016; Horne et al., 2020). On the other hand, positive biases in home-range estimation can result from over-smoothing of the density function (Fleming & Calabrese, 2017; Worton, 1995) and location error (Fleming et al., 2020; Moser & Garton, 2007; Thomson et al., 2017). These individual-level biases can differ between groups being compared and are expected to propagate into population-level biases if not dealt with. Furthermore, pragmatic adjustments, such as standardizing the individual sampling schedules (Börger et al., 2006) and thresholding the ‘dilution of precision’ (DOP) values or location classes (Bjørneraas et al., 2010), will not necessarily avoid differential biases, because home-range estimation biases are a product of both the sampling schedule (or location error) and the movement process. Standardization strategies, therefore, rely on the implicit assumption of the sampled individuals and their tracking devices behaving similarly enough that their biases can be matched by discarding potentially informative data (Winner et al., 2018). This can be acceptable in some cases, but cannot be relied on as a general solution. Instead, statistically efficient estimators that can handle these factors and best use all of the data, are necessary to ensure reliable comparisons (Fleming et al., 2020; Fleming & Calabrese, 2017).
To make population-level inferences, researchers and managers traditionally take differentially biased KDE and MCP home-ranged estimates and feed them into general purpose statistical analyses, which assumes that the individual home-range areas are measured exactly (see Signer & Fieberg, 2021, for a thorough description). For instance, a single population would be described by its sample mean, and two populations would be compared with a t-test (e.g. Kays & Gittleman, 2001). While the biases of conventional home-range estimators have been studied and more statistically efficient estimators are now available (Fleming et al., 2020; Noonan, Tucker, et al., 2019), and while variation in individuals and their behaviours has also been considered (e.g. Gutowsky et al., 2015), the impact of ignoring home-range estimation uncertainty on population-level estimates and inferences has not been investigated. More simply, the sample mean of unbiased individual estimates produces an unbiased population-mean estimate, but the sample mean will not achieve minimal variance among all possible population-mean estimators. A more statistically efficient population-mean estimator will down-weight uncertain estimates relative to more certain estimates in such a way that the estimated mean has a smaller variance. Researchers are thus not faced with the dilemma of deciding whether or not to include less precisely sampled individuals when calculating a population average. Furthermore, ignoring the individual sampling variances will necessarily lead to overestimated population variances, by an amount comparable to the neglected uncertainties. Indeed, it has historically been the case that home-range uncertainties have not been quantified at all, let alone leveraged in downstream analyses, whereas here we rely on the uncertainty estimates of Fleming and Calabrese (2017).
- We demonstrate with empirical data that differentially biased individual home-range estimates propagate into differentially biased population-mean home-range estimates, which is the case for conventional methods that do not account for autocorrelation. In most situations, this bias is negative and sampling dependent, meaning that conventional methods tend to underestimate mean home-range areas (Noonan, Tucker, et al., 2019).
- We introduce a novel hierarchical modelling framework for population-level home-range estimation and show that it outperforms existing methods, even in best-case scenarios for the application of conventional methods.
- We show the problems associated with traditional significance testing on population differences and argue for the estimation of meaningful effect sizes (Sullivan & Feinn, 2012), which we facilitate with a statistically efficient estimator for comparing population home-range areas. Effect sizes provide more information than significant tests and are important for reproducibility (Halsey et al., 2015).
2 THEORY AND METHODS
2.1 Effective sample sizes
The effective sample size, N, of an individual home-range estimate is the equivalent number of independent and identically distributed (IID) sampled locations required to produce the same quality estimate (Fleming et al., 2019). Biologically speaking, N can also be thought of as the number of observed home-range crossings, as $N=\mathcal{O}\left(T/\tau \right)$, which means that N is ‘on the order of’ $T/\tau $, where T is the total sampling period and τ is the mean-reversion or home-range crossing time-scale. When τ is large relative to the sampling interval and tracking data are strongly autocorrelated, the effective sample size of a home-range estimate, N, can be much smaller than the nominal sample size, n, which is the raw number of locations sampled.
As an example, if a tapir crosses its home range twice per day, for the purposes of home-range estimation, 12 days of tracking data will be approximately worth as much as 24 independently sampled locations, even if fixes were obtained every second, and over a million locations were recorded. The effective sample sizes of individual home-range estimates can often be small. Even with modern tracking data, Noonan, Tucker, et al. (2019), in a study of 369 individuals from 27 species, noted that 30% of animal tracking datasets had an effective sample size of <30—meaning that many large GPS location datasets were worth <30 independently sampled datapoints for the purpose of home-range estimation. Conventional home-range estimators that assume independently sampled data require hundreds to thousands of observed home-range crossings to produce accurate home-range estimates (Noonan, Tucker, et al., 2019). Moreover, when effective sample sizes are small, home-range estimate uncertainties are large, which are also not accounted for in conventional population-level analyses.
The bias and variance of a home-range estimator is largely a function of the effective sample size, N (Noonan, Tucker, et al., 2019). At small-to-moderate effective sample sizes, the most accurate home-range estimators, at present, are based on autocorrelated kernel density estimation, which conditions bandwidth optimization on a fitted autocorrelation model (Fleming et al., 2015). In terms of the autocorrelation estimates, which are the dominant source of bias at small N, conventional maximum likelihood and conventional Bayesian methods produce a downward $\mathcal{O}\left(1/N\right)$ bias, while residual maximum likelihood (REML)-based estimators and the parametric bootstrap can reduce the order of bias to $\mathcal{O}\left(1/{N}^{2}\right)$ and $\mathcal{O}\left(1/{N}^{3}\right)$, respectively (Fleming et al., 2019). Therefore, to obtain a bias as small as 5%, maximum likelihood and Bayesian methods require on the order of 20 observed range crossings, REML-based methods require on the order of 4–5 observed range crossings, and bootstrapped REML-based methods require on the order of 2–3 observed range crossings. The other important sample size for population estimates is the number of tracked individuals, m, and we are also interested in the impact of small m.
2.2 Hierarchical models
Hierarchical models have long been recognized as providing a natural framework for population-level inference on animal tracking data (Hooten et al., 2016; Jonsen et al., 2003), and here we use them to appropriately weight individual home-range estimates, according to their associated uncertainties, in estimating population-level parameters. Most simply, when modelling the average home-range area of a certain population, it is natural to both consider the individual animal locations to be distributed according to the animal’s home range and to further consider the animals’ home ranges to be randomly distributed according to their population (Figure 1). Hierarchical model estimation largely falls into two categories—either fitting the population model to the entire dataset or first calculating the individual statistics, $\widehat{\mathit{\theta}}$, and then fitting the population model to the set of individual statistics. The former is common to Bayesian analyses, while the latter is common to conventional analyses and meta-analyses (Viechtbauer, 2009). Importantly, if $\widehat{\mathit{\theta}}$ is a sufficient statistic—meaning that there is no additional information in the data beyond $\widehat{\mathit{\theta}}$ regarding $\mathit{\theta}$—and the exact sampling distribution of $\widehat{\mathit{\theta}}$ is leveraged, then there is no approximation invoked by the meta-analysis of said statistics (Fisher, 1922). Such is the case with IID Gaussian area estimates when leveraging their χ^{2} sampling distribution. Even when $\widehat{\mathit{\theta}}$ is not a sufficient statistic, maximum-likelihood and Bayesian analyses that fit the population model to the entire collection of tracking data are not exact per se, and will generally exhibit both $\mathcal{O}\left(1/N\right)$ and $\mathcal{O}\left(1/m\right)$ biases.
2.2.1 Hierarchical model estimators
In this work, we examine the performance of four methods for estimating population-level parameters on animal home ranges—a conventional sample-mean analysis and three proposed alternatives that account for individual uncertainties, including a conventional (normal) meta-analysis, a conventional Bayesian analysis and a novel χ^{2} inverse-Gaussian (χ^{2}-IG) meta-analysis. Importantly, the latter three methods all account for the uncertainties in individual home-range estimates by treating the home-range areas as unknown latent variables within a hierarchical model (Figure 1). All published analyses that we are aware of either neglect home-range uncertainties and reduce to the sample mean in the absence of covariates, or do not estimate a mean home-range area. So, while we refer to the normal meta-analysis and Bayesian analysis as being ‘conventional’, we are not aware of any pre-existing application or examination of these methods for the task of mean home-range area estimation. Finally, in all four cases, we model the population distribution as either Gaussian or inverse-Gaussian (IG), as these distributions can produce unbiased and asymptotically consistent population-mean estimates even when the population distribution is mis-specified.
Sample-mean analysis
In the conventional sample-mean analysis, we summarize populations by the sample mean of home-range area point estimates, which assumes both large N and large m. As the sample mean is unbiased, unbiased individual home-range area estimates will be propagated into unbiased population-mean estimates, and vice versa. On the other hand, because the sample mean is unweighted, home-range estimates with higher uncertainty are not down-weighted relative to those with lower uncertainty. As a result, the variance of the population estimate is not minimized when individual uncertainties are heterogeneous. This leaves researchers with the potential dilemma that imperfectly tracked individuals should be omitted from population-level analyses, without clear guidelines on what the threshold of omission should be. In contrast, an optimally weighted mean will produce lower variance population parameter estimates without guesswork, by down-weighting the (small-N) uncertain individual estimates. Moreover, because individual home-range area uncertainties are ignored in the sample mean, estimates of population variation will be substantially biased when the number of observed home-range crossings (N) is small.
Normal–normal meta-analysis
In the conventional meta-analysis, the individual home-range area estimates are modelled as having a normal sampling distribution and the population of home-range areas is also modelled as having a normal distribution. We consider a conventional meta-analysis, because it proposes an easy solution to the challenge of incorporating uncertainties, and is as simple as passing the home-range area point estimates and sampling variance estimates to a single function in R (e.g. metafor, Viechtbauer, 2009). The normal–normal meta-analytic model is at least somewhat problematic here, as both individual home-range areas and mean home-range areas are positive quantities, which the normal distribution does not respect. A link function could be employed to fix the lower bound, but that approach has two key disadvantages here. First, a link function would not directly produce a mean-area estimate, which is our main focus in this work. Second, a link function would give up the unbiased and ‘best linear unbiased estimator’ (BLUE) properties of the normal–normal meta-analysis, whereby unbiased input home-range area estimates will yield unbiased output mean-area estimates, if the input uncertainties are correctly specified.
Bayesian analysis
In the traditional Bayesian analysis, we consider the marginal likelihood of the entire dataset, given the population model parameters (Figure 1), with a very weakly informative prior. The conventional Bayesian modelling framework requires us to specify a generative model, but it does not require us to explicitly solve said model, in terms of optimization, integration or density function normalization. Because our likelihood function includes the same conditional density used in the biased maximum likelihood estimation of individuals, we know that these methods will produce large $\mathcal{O}\left(1/N\right)$ biases, at small N, that we want to avoid, for both their posterior predictions of individual $\mathit{\theta}$ and their population-level parameters, $\mathbf{\Theta}$. To see this, consider the limiting case of a flat (non-informative) population distribution, $p\left(\mathit{\theta}|\mathbf{\Theta}\right)$, and the opposing case of a singular population distribution. It is then straightforward to show that both cases produce biased predictions and estimates when employing maximum a posteriori (MAP) estimation, which is the statistically efficient Bayesian analog to maximum likelihood estimation. Moreover, unless all individuals share the same mean location, increasing the number of individuals, m, does not mitigate the small-N bias, because increasing m only pools a larger number of biased likelihoods.
χ^{2}-IG meta-analysis
Finally, we consider a novel meta-analysis framework whereby the individual home-range area estimates are modelled as having a χ^{2} sampling distribution, and the population of home-range areas is modelled as having an inverse-Gaussian distribution (Figure 2). Given the derivations in Appendix A and included software implementation, this analysis is as simple as feeding our home-range estimates into a single R function. However, the distributional assumptions here are far more reasonable than the conventional meta-analysis. In the case of an IID isotropic Gaussian stochastic process, home-range area estimates are sufficient statistics with a χ^{2} sampling distribution, and there is no approximation in performing any χ^{2}-based meta-analysis (versus fitting the population model to the entire dataset). We also find the χ^{2} distribution to be a good approximation more generally, for autocorrelated data, which gives the χ^{2}-based meta-analysis good statistical efficiency when the number of observed home-range crossings (N) is small. The choice of an inverse-Gaussian (IG) population distribution will be shown to facilitate good statistical efficiency when the number of sampled individuals (m) is small.
2.2.2 Statistical inefficiency of the sample mean
For example, if $\nu =\left\{1,2,3,\dots ,10\right\}$, then the χ^{2}-based $\mathrm{VAR}\phantom{\rule{0em}{0ex}}\left[{\widehat{\mathrm{\Sigma}}}_{{\chi}^{2}}\right]$ is only 62% of the sample-mean $\mathrm{VAR}\phantom{\rule{0em}{0ex}}\left[{\widehat{\mathrm{\Sigma}}}_{\delta}\right]$, indicating that the sample mean is only 62% efficient for IID Gaussian data, where the χ^{2}-based estimate is 100% efficient. Moreover, in this case, if the worst two observations are omitted, then the sample mean’s efficiency actually improves from 62% to 81%, whereas the χ^{2}-based efficiency degrades from 100% to 95%. When using the conventional sample mean, it can be advantageous to discard the worst estimates, whereas in an appropriately weighted analysis, it is advantageous to use all of the data. Finally, we note that there is no importance, in this example, on $\mathrm{min}\left(\nu \right)=1$. The exact same result follows from $\nu =\left\{10,20,30,\dots ,100\right\}$. It is the relative differences in individual effective sample sizes that lowers the relative efficiency of the sample mean (Figure 3).
2.2.3 The inverse-Gaussian population model
While a χ^{2}-based meta-analysis can provide good statistical efficiency for small $\nu $ or N, the choice of population model and estimators can have a large impact on the statistical efficiency for small m, which we now turn our attention to. The mathematically convenient population distribution for a χ^{2} sampling distribution is the inverse-gamma distribution, as it is a conjugate prior—meaning that the posterior distribution can be calculated in closed form with relative ease. However, the inverse-gamma population distribution is not ensured to produce population mean estimates that fall within the range of the data, and, moreover, produces infinite bias when the shape parameter’s sampling distribution has any support below 1. In contrast, the inverse-Gaussian (IG) distribution has a number of important properties that allow us to derive statistically efficient population-level estimates. Notably, in the absence of a hierarchical model, the inverse-Gaussian distribution’s maximum likelihood mean parameter estimate is the sample mean, which is minimum variance unbiased (MVU) (Folks & Chhikara, 1978), as well as unbiased and asymptotically consistent even if the IG population model is mis-specified.
In Appendix A we derive a suite of tools for population-level home-range area analysis with a χ^{2} sampling distribution and inverse-Gaussian population distribution (χ^{2}-IG), including debiased estimators for the population mean area, $\mathrm{\Sigma}=\mathrm{E}\left[\sigma \right]$, inverse population mean area, $1/\mathrm{\Sigma}=1/\mathrm{E}\left[\sigma \right]$ and square coefficient of variation, $\mathrm{CoV}{\left[\sigma \right]}^{2}=\mathrm{VAR}\left[\sigma \right]/\mathrm{E}{\left[\sigma \right]}^{2}$, where $\sigma $ denotes a random individual home-range area. We note that having both debiased mean and inverse-mean estimates is important because, as we will discuss, a natural effect size for comparing the home-range areas of populations I and J is the ratio ${R}_{I\phantom{\rule{0em}{0ex}}J}={\mathrm{\Sigma}}_{I}/{\mathrm{\Sigma}}_{J}$, which is the product of ${\mathrm{\Sigma}}_{I}$ and $1/{\mathrm{\Sigma}}_{J}$.
2.3 Relevant effect sizes
In the conventional comparative analysis on two populations, I and J, all individual home-range estimates are calculated; each population is summarized by the sample mean of their individual home-range point estimates, ${\widehat{\mathrm{\Sigma}}}_{I}$ and ${\widehat{\mathrm{\Sigma}}}_{J}$; and a t-test then determines the statistical significance of any difference in the sample means, ${\widehat{\mathrm{\Sigma}}}_{I}-{\widehat{\mathrm{\Sigma}}}_{J}$ (e.g. Kays & Gittleman, 2001). Here we discourage the overreliance on such p-values for several reasons (for further discussion, see Sullivan & Feinn, 2012). Any real pair of different populations will undoubtedly have different mean home-range areas, and the p-value is a combined measure of how different two populations are and how much data has been analysed. As we will show with real data, mean home-range area uncertainties are often relatively large and statistically insignificant p-values often do not rule out substantial differences. In their place, we encourage the estimation of relevant effect sizes with confidence intervals. Effect sizes provide more information than p-values, are more intuitive and are important for reproducibility (Halsey et al., 2015).
Without loss of generality, let us assume that ${\mathrm{\Sigma}}_{I}$ is greater than ${\mathrm{\Sigma}}_{J}$. If the (two-sided) $\alpha $ confidence interval for ${\widehat{R}}_{I\phantom{\rule{0em}{0ex}}J}$ contains 1, then the difference between ${\mathrm{\Sigma}}_{I}$ and ${\mathrm{\Sigma}}_{J}$ is not statistically significant at $p=\alpha /2$. However, the difference cannot necessarily be said to be insubstantial unless the confidence interval also does not contain substantial ratios, such as 1.5 or 2. On the other hand, the difference can only be said to be substantial if the confidence intervals do not contain any insubstantial ratios, such as 1.05. What constitutes a substantial effect size is still somewhat subjective, but in reporting effect sizes we avoid conflating data quality with importance.
3 EXAMPLES
Our examples include three estimator performance comparisons and an empirical analysis demonstration. In our first comparison, we demonstrate with empirical data that the conventional method of taking the sample average of differentially biased individual home-range area estimates results in differentially biased population-mean home-range area estimates. In our second comparison, we contrast the conventional sample mean, conventional normal–normal meta-analysis, and our χ^{2}-IG meta-analysis on simulated data that are ideal for the sample mean, to demonstrate other advantages of the χ^{2}-IG framework and serious issues with the conventional meta-analysis (sans link function). In our third comparison, we pit a conventional Bayesian analysis against our χ^{2}-IG meta-analysis on simulated data, to examine the small-N and small-m biases of a conventional Bayesian hierarchical estimator. Finally, in our empirical demonstration, we summarize and compare populations (by effect size) in a similar environment, across species and sex.
3.1 Estimator performance comparisons
3.1.1 Meta-analytic tapir cross-validation
For our empirical performance comparisons, we chose to cross-validate lowland tapir because of the abundance of their data and because they have relatively stable home-range areas (Fleming et al., 2019), which are necessary properties to empirically validate across a wide range of effective sample sizes. The Instituto Chico Mendes de Conservação da Biodiversidade (ICMBIO) provided the required annual permits for the capture and immobilization of tapirs and collection of biological samples (SISBIO 14603). The Comissão Técnico-Científica (COTEC) do Instituto Florestal do Estado de São Paulo (IF-SP) provided the required permit to carry out research in Morro do Diabo State Park (SMA 40624/1996). All protocols for the capture, anaesthesia, handling and sampling of tapirs have been reviewed and approved by the Veterinary Advisors of the Association of Zoos and Aquariums (AZA)—Tapir Taxon Advisory Group (TAG) and the Veterinary Committee of the IUCN SSC Tapir Specialist Group (TSG).
Using our largest dataset of 29 Pantanal lowland tapir (Medici, 2022) with median$\left(\widehat{N}\right)=386$, we performed an empirical cross-validation analysis to demonstrate the differential bias of conventional population-parameter estimation—whereby biased KDE estimates were fed into the sample mean and compared to more efficient pHREML-AKDE_{C} home-range estimates (Fleming et al., 2019; Fleming & Calabrese, 2017) fed into our χ^{2}-IG estimator. To examine the small-m bias, we used the entire tracks, where median$\left(\widehat{N}\right)=386$, and took random samples of tapir, from 2 to 20 individuals. To examine the small-N bias, we used the entire sample of Pantanal tapir ($m=29$) and incremented the sampling period from 2 to 10 days, with random segments of time sampled from the datasets. Following Fleming et al. (2019), one day of lowland tapir sampling approximately corresponds to an effective sample size of $N\approx 2$, and, therefore, 2 to 10 days corresponds approximately to effective sample sizes ranging from 4 to 20.
Given the results of Noonan, Tucker, et al. (2019), we expected the conventional population-mean ($\mathrm{\Sigma}$) estimates (conditioned on REML-KDE_{C}) to increase with increasing N and slowly approach the more accurate pHREML-AKDE_{C} estimates for very large N. Following Fleming et al. (2019) and Noonan, Tucker, et al. (2019), we expected the pHREML-AKDE_{C}-based results to be only slightly underestimated at $N\sim 4$ (2 days sampling), where they could be improved by bootstrapping (Fleming et al., 2019), which would be too slow for this number of simulations. We expected the reciprocal-mean ($1/\mathrm{\Sigma}$) estimates to exhibit the opposite biases, relative to those of the respective $\mathrm{\Sigma}$ estimates. We expected the conventional population variance estimates to exhibit positive biases that decrease with increasing N, as a result of conflation with unaccounted uncertainty.
3.1.2 Meta-analytic simulations
We compared the population parameter estimates of three conditional estimation methods—(a) averaging the point estimates as if they were exact (sample-mean), (b) averaging the estimates with a conventional meta-analytic hierarchical model, which assumes a normal sampling distribution and normal population distribution (normal–normal), and (c) averaging the estimates with our χ^{2}-IG meta-analytic framework. In our first set of simulations, we incremented the number of observed home-range crossings (N) from 1 to 20, with the number of individuals (m) set to 100. In our second set of simulations, we incremented the number of individuals from 2 to 20, with the number of observed home-range crossings set to 100. To isolate the biases of the conditional estimators, in all cases we used IID simulated tracking data and Gaussian home-range area estimates, which have ideal statistical efficiency. Furthermore, in each individual meta-analysis, N was held constant so that the conventional sample mean would have its highest efficiency, and other biases could be explored. The population coefficient of variation for the distribution of home ranges was set to one, which is considered to be an intermediate value and was consistent with most of our empirical examples. We performed this analysis both with an inverse-Gaussian population distribution, where our χ^{2}-IG model is exact, and again with a log-normal population distribution, where our χ^{2}-IG model is mis-specified.
3.1.3 Bayesian simulations
We compared the population parameter estimates of a general-purpose Bayesian estimator to those of our χ^{2}-IG meta-analytic estimator, when using unbiased REML Gaussian area estimates. We simulated IG distributed home-range areas, and then conditional on those home-range areas, we simulated IID movement processes. The IID tracking data were sampled daily, and would approximate a small canid or deer. The population coefficient of variation for the distribution of home ranges was set to one, which is considered to be an intermediate value and was consistent with most of our empirical examples. We provided our Bayesian estimator with very weak priors that were centred on the truth (Appendix B), and for output point estimates we considered the mode, median, and mean of the marginal posterior, $p\left(\mathrm{\Sigma}\right)$. For our Bayesian estimates, we only considered the integrated hierarchical model, with individual movements conditional on movement characteristics sampled from respective population distributions, and performed a single analysis on all individuals simultaneously, per simulation. In one set of simulations, we incremented the number of observed home-range crossings (N) from 2 to 20, with the number of individuals (m) set to 100. In a second set of simulations, we incremented the number of individuals from 2 to 20, with the number of range crossings set to 100. In this way, we could examine the small sample size biases for both m and N. For our meta-analytic simulations, we computed 10,000 replicates, while for the much slower Bayesian simulations we computed 1,500 replicates with each having 1,500 draws from the posterior after 1,500 discarded ‘burn-in’ points, after checking for convergence in a number of samples.
3.2 Analysis demonstration
3.2.1 Barro Colorado Island frugivore case study
For our analysis demonstration, we considered frugivore home-range estimates from Barro Colorado Island (Alavi et al., 2022). Fieldwork was carried out under IACUC protocol numbers 2014–1001-2017, 2017–0912-2020 and 2017–0605-2020 from STRI, and UC Davis IACUC protocol number 18239.
We compared the home-range areas of four species of frugivores—all located on Barro Colorado Island, Panama (Alavi et al., 2022)—including 12 kinkajou Potos flavus, with median$\left(\widehat{N}\right)=324$, 16 white-nosed coati Nasua narica, with median$\left(\widehat{N}\right)=371$, 8 white-faced capuchin monkey Cebus capucinus, with median$\left(\widehat{N}\right)=193$, and 8 spider monkey Ateles geoffroyi, with median$\left(\widehat{N}\right)=134$. We explored two issues in terms of effect sizes: how home-range areas differed among species and how home-range areas differed between sexes within each species. In all cases, we conditioned our χ^{2}-IG meta-analysis on 95% error-informed pHREML AKDE_{C} home-range area estimates (Calabrese et al., 2016; Fleming et al., 2019; Fleming et al., 2020; Fleming & Calabrese, 2017). Sex differences have been observed for spider monkeys (Campbell, 2008), with male home-range areas being larger than those of females, which might also be the case for kinkajous (Kays & Gittleman, 2001). Sex differences have not been observed for coatis (Gompper, 1997), and are not expected for capuchins, because male and female capuchins move together in a social group.
4 RESULTS
4.1 Estimator performance comparisons
4.1.1 Meta-analytic tapir cross-validation
We summarize our lowland tapir cross validation in Figure 4. We emphasize that these are real data with unknown true parameters, and so we can only assess consistency under resampling. When estimating mean home-range areas ($\mathrm{\Sigma}$), the conventional KDE estimates increased substantially with increasing sampling period, but did not asymptote enough to match the more accurate AKDE estimates, even when using all of the data. This is more than likely due to many tapir not having the requisite effective sample size necessary for REML-KDE_{C} to exhibit asymptotically efficiency (Noonan, Tucker, et al., 2019). We note that, among home-range estimators that assume independently sampled location data, REML-KDE_{C} is relatively efficient, and other conventional home-range estimators can produce relative biases that are an order of magnitude worse than that of KDE_{C}, let alone AKDE_{C} (Noonan, Tucker, et al., 2019). In contrast, when only 2–3 days (N ≈ 4–6) were sampled, the pHREML-AKDE_{C} mean home-range estimates had only a slight, negative bias, which could be remedied by bootstrapping (Fleming et al., 2019). However, the bootstrap itself requires repeated simulations, and is too computationally costly to be included in this analysis. As expected, the reciprocal mean area ($1/\mathrm{\Sigma}$) estimates, which are necessary for easily comparing populations, exhibited biases opposite to those of $\mathrm{\Sigma}$. Finally, both estimators reported extra variation at shorter sampling periods, though this sensitivity was much larger with the conventional estimate.
4.1.2 Meta-analytic simulations
We summarize our meta-analytic simulation comparisons in Figure 5. Again, these simulations were constructed to be ideal for the conventional sample mean, as in each meta-analysis the number of observed home-range crossings (N) was held fixed, which makes the sample mean’s uniform weighting more efficient. Generally speaking, our χ^{2}-IG conditional estimation framework provided unbiased estimates for all parameters of interest; as expected, the conventional sample mean provided unbiased population mean estimates, but moderately biased estimates of other population parameters; and the conventional normal–normal meta-analysis performed much worse than anticipated, with severe bias at small values of N. In our second simulation analysis, with a mis-specified log-normal population distribution, the results were almost indistinguishable from the inverse-Gaussian case.
We were initially surprised by the poor performance of the conventional normal–normal meta-analytic model, as this method provides approximately ‘best linear unbiased estimates’ (BLUEs) that are exactly BLUE if the sampling variances are correctly specified. However, in retrospect, the variances of a χ^{2} process are never exactly known, but are estimated to be proportional to the square of the point estimate. This association causes smaller estimates to be over-weighted and larger estimates to be under-weighted in a normal–normal analysis, which causes the extreme biases depicted in Figure 5.
Results for $m=2,3$ in column 2 of Figure 5 are not displayed, as they are very much contingent on the chosen model selection criterion and its outcome. If the population variance parameter is supported by model selection, then all χ^{2}-IG parameter estimates remain relatively unbiased. However, if the population variance parameter is not supported, then the coefficient of variation is taken to be zero and the inverse mean is moderately overestimated. Some degree of model selection is necessary, as the point estimate of the population variance parameter can be in the neighbourhood of zero, which would not be selected by any standard model selection criterion, and would cause divergences in both the estimated sampling variances and in the debiased point estimates if retained.
4.1.3 Bayesian simulations
We summarize our Bayesian simulation comparison in Figure 6. Generally speaking, our χ^{2}-IG conditional estimation framework provided unbiased mean area estimates when conditioned on unbiased individual area estimates, as in the previous simulations. In contrast, our Bayesian estimates were far more biased than we anticipated. In terms of observed home-range crossings, we found the small-N bias of our Bayesian averages of the anticipated magnitude but in the opposite direction of individual-level maximum likelihood biases. On the other hand, in terms of the number of individuals sampled, we found the small-m bias of our Bayesian averages to have an extremely large, positive bias, such that to achieve a reasonable amount of bias, our Bayesian estimator would require more individuals tracked than present in most studies. We tested whether or not this bias was due to a lack of identifiability with the spread of the home-range centres, but this was not the case. Instead, we only found that the small-m bias was very similar in scale to the spread of our prior on $\mathrm{\Sigma}$, even though it was centred on the truth and specified independently of other parameters.
4.2 Analysis demonstration
4.2.1 Barro Colorado Island frugivore case study
We summarize the results of our Barro Colorado Island (BCI) frugivores in Table 1. We found spider monkeys to have the largest home ranges and kinkajou to have the smallest home ranges of the four species, on average, with their mean 95% home-range areas estimated to be 5.3 (95% CI: 2.6–9.7) km^{2} and 0.3 (0.2–0.4) km^{2}, respectively. We could not statistically discriminate the coati and capuchin monkey, and estimated the coati/capuchin ratio of mean home-range areas to be 1.2 (0.8–1.7), which does not rule out a substantial difference. Only in the kinkajou did we find a statistically significant difference between the sexes, where we estimated the male/female ratio of mean home-range areas to be 2.3 (1.5–3.5), which excludes 1, and is both significant and substantial. This test remained statistically significant even if applying the Šidák correction for multiple comparisons. Substantial differences between the sexes could not be ruled out in the other BCI species, due to large uncertainties.
Mean (km^{2}) | CoV | |
---|---|---|
Spider monkey | 5.3 (2.6–9.7) | 1.0 (0.4–1.6) |
Male | 6.6 (4.7–9.0) | 0.3 (0.1–0.6)^{a} |
Female | 4.5 (1.6–10.3) | 1.2 (0.3–2.1) |
Coati | 1.4 (1.0–1.8) | 0.6 (0.4–0.8) |
Male | 1.3 (0.8–1.9) | 0.6 (0.3–1.0) |
Female | 1.4 (1.0–2.0) | 0.6 (0.3–0.8) |
Capuchin monkey | 1.1 (0.9–1.5) | 0.3 (0.2–0.5) |
Male | 1.3 (0.8–2.1) | 0.5 (0.1–0.8) |
Female | 1.0 (0.8–1.3) | 0.3 (0.1–0.5) |
Kinkajou | 0.3 (0.2–0.4) | 0.6 (0.3–0.9) |
Male | 0.4 (0.3–0.6) | 0.4 (0.1–0.6) |
Female | 0.2 (0.1–0.3) | 0.4 (0.2–0.6) |
- ^{a} The coefficient of variation for the male spider monkeys was not supported by AIC_{C}, due to a small sample size (m = 3) and relatively large home-range uncertainties, and is, therefore, expected to be underestimated.
5 DISCUSSION
We have introduced a computationally and statistically efficient hierarchical modelling framework for summarizing and comparing population home-range areas. While we strongly recommend designing studies with larger sample sizes when possible, this framework facilitates population-level inference with as few as 2–3 observed home-range crossings per individual and with a similarly small number of representative individuals. Representative samples of individuals can be obtained, for instance, by independently sampling a fixed proportion of individuals from areas that are sampled uniformly in space. Importantly, the methods that we have introduced avoid the differential biases inherent in conventional analyses and allow researchers to gain statistical efficiency in using all of their data. In contrast, conventional home-range estimators exhibit downward biases with high sampling rates (Noonan et al., 2020), and even carefully performed data thinning can fail to match these biases across populations (Fleming & Calabrese, 2017). Indeed, data with such high sampling rates require autocorrelation-informed home-range estimators like AKDE. For example, if comparing tapir and jaguar species in the same biome, daily tapir data are more comparable to weekly or monthly jaguar data for the purpose of home-range estimation (Fleming et al., 2019; Morato et al., 2016), and matching the sampling schedules of these two species can produce wildly different biases from conventional home-range estimators. Here we have pointed out and demonstrated that these individual-level biases propagate forward into population-level analyses (Winner et al., 2018).
We have demonstrated that conventional population-level estimators present a second data thinning dilemma to researchers, even when using accurate individual home-range estimates. Conventional population-level estimators perform better when omitting less well-tracked individuals, because certain and uncertain estimates are weighted equally in the sample mean, and because unmodeled individual uncertainties produce positive bias in the population variance estimate. Indeed, choosing to omit less well-tracked individuals is an extreme form of subjective down-weighting that is not optimized in practice, and would still be outperformed by an appropriately weighted method even if it was (Section 8). An appropriately weighted analysis—where uncertain individual home-range estimates are down-weighted relative to more certain estimates—is necessary to produce the best quality population estimates. Our χ^{2}-IG framework provides said weighting via a hierarchical model.
We recommend that researchers comparing populations do so by way of relevant effect sizes, provided with confidence intervals, rather than p-values, which are more variable and less reproducible. As we have demonstrated, insignificant differences do not imply insubstantial differences (Section 24). A ratio of mean home-range areas CI of (0.9–2.1) contains 1, which implies an insignificant difference, but it also contains 2, which implies that we are not confident that the difference is insubstantial. On the other hand, a ratio CI of (1.01–1.02) does not contain 1, which implies a significant difference, but it does not contain any substantial difference and we are therefore confident that the difference is insubstantial. Effect-size CIs provide a more thorough and meaningful comparison than p-values, as with insufficient data, substantial differences can be insignificant, and with abundant data, insubstantial differences can be significant.
5.1 Comparison to other hierarchical methods
While we also considered a conventional (normal) meta-analysis and Bayesian analysis, only our novel χ^{2}-IG meta-analysis proved to be generally suitable for population-level inference on home-range areas. Conventional meta-analyses also down-weight uncertain estimates, and we have shown here that their direct application leads to extreme biases because of the strong association between home-range area uncertainty estimates and home-range area point estimates (Shuster, 2010). We considered the conventional normal–normal meta-analysis without a link function in the hope of obtaining approximate BLUE quality estimates and more general asymptotic consistency. A link function could improve this method’s performance, but at the cost of the unbiased property and more general asymptotic consistency, and with the additional requirement of having to back-transform the output population-parameter estimates. We recommend that researchers using conventional meta-analytic methods for regression analysis, such as in Averill-Murray et al. (2020), also use an appropriate link function and pay careful attention to their residuals. Otherwise, for the purposes demonstrated here, there is no reason to use any of the conventional analyses over the χ^{2}-IG estimator. Finally, our Bayesian analysis produced much larger small-m biases that we anticipated, even though we supplied a non-informative prior similar to that suggested by Gelman (2006) for variance parameters, and further assisted our Bayesian analysis by fixing each prior’s mode to the truth.
5.2 Two-stage analyses and the assumption of range residency
Our proposed method involves a two-stage analysis, with the first stage consisting of individual analyses that are then fed into a second-stage meta-analysis of the population. When making good distributional assumptions and propagating uncertainties appropriately, two-stage analyses such as this offer a promising approach for implementing hierarchical models on ‘big data’ (Muff et al., 2020). In addition to delivering large improvements in computation time, individual-based workflows are only minimally increased, as the first stage of analysis is based on existing methods and software. However, because we focus on the second stage meta-analysis, we have not considered existing challenges in the first stage of home-range estimation that relate to individual variation. Populations can express considerable variance in their individual movement behaviours (van de Kerk et al., 2021), and AKDE is a general enough method to formally accommodate this variation.
5.3 Future analyses
While our χ^{2}-IG hierarchical model was designed for home-range analysis, it would also provide a natural model for population-level inference on diffusion rates, as they also have an approximately χ^{2} conditional sampling distribution. Mean speeds and travelled distances, however, would be better modelled as χ-IG (Noonan, Fleming, et al., 2019). Moreover, it would be useful to model both fixed and random effects, especially if the same individuals are being grouped in different populations (e.g. pre- and post-treatment). Fixed effects might be incorporated via standard IG-regression models (Folks & Davis, 1981), but random effects would require more effort to retain efficiency. For the time being, regression analyses should be performed with conventional meta-analysis regression methods, and with a carefully selected link function.
6 CONCLUSIONS
We have shown that accurate population-level home-range estimation requires (a) accurate individual home-range estimates to be fed into (b) an appropriate statistical framework. At present, the most accurate nonparametric home-range estimator is AKDE (Noonan, Tucker, et al., 2019), which has an associated R package (ctmm, Calabrese et al., 2016; Fleming & Calabrese, 2015) and graphical user interface (ctmmweb, Calabrese et al., 2021; Dong et al., 2017). Upon calculating individual AKDEs, the χ^{2}-IG meta-analysis that we have introduced here can be evaluated with a single function call, via the meta command (Fleming & Calabrese, 2015; Calabrese et al., 2016), which is complete with documentation, help(meta), and example code, example(meta). These combined methods—pHREML autocorrelation estimation (Fleming et al., 2019), AKDE_{C} density function estimation (Fleming & Calabrese, 2017), and χ^{2}-IG meta-analysis—allow researchers to reap the benefits of using all of their data, avoid differential biases and achieve greater statistical efficiency than has been possible. Future work will extend these methods to diffusion rates, speeds and complete movement models, which are a necessary ingredient in the estimation of statistically efficient population density estimates, as well as extended regression analyses.
ACKNOWLEDGEMENTS
C.H.F., W.F.F. and J.M.C. were supported by NSF IIBR 1915347, M.J.N. was supported by NSERC Discovery grant RGPIN-2021-02758, R.K. was supported by NSF IIBR 1914928, and I.D. and D.R.S. were supported by NSF IIBR 1914887. This work was partially funded by the Center of Advanced Systems Understanding (CASUS) which is financed by Germany’s Federal Ministry of Education and Research (BMBF) and by the Saxon Ministry for Science, Culture and Tourism (SMWK) with tax funds on the basis of the budget approved by the Saxon State Parliament. This project received funding from the NSF BCS 1440755, the Smithsonian Tropical Research Institute, a Packard Foundation Fellowship (2016-65130) and the Alexander von Humboldt Foundation in the framework of the Alexander von Humboldt Professorship endowed by the Federal Ministry of Education and Research awarded to M.C.C.
CONFLICT OF INTEREST
None of the authors have a conflict of interest.
AUTHORS’ CONTRIBUTIONS
C.H.F., J.M.C. and D.S. conceived the study; C.H.F., I.D. and D.S. developed the methods and software; C.H.F., I.D. and S.A. performed the analyses; M.C.C., R.K., B.T.H. and E.P.M. carried out the field work and collected the tracking data. All authors contributed to study concepts and writing.
Open Research
PEER REVIEW
The peer review history for this article is available at https://publons.com/publon/10.1111/2041-210X.13815.
DATA AVAILABILITY STATEMENT
Lowland tapir tracking data are archived in Movebank study 1907973121 (Medici, 2022, https://doi.org/10.5441/001/1.03ck4s52). Barro Colorado Island frugivore home-range estimates are archived on Dryad Digital Repository (Alavi et al., 2022, https://doi.org/10.5061/dryad.k3j9kd58t). χ^{2}-IG and normal–normal meta-analysis methods are implemented in the ctmm R package v0.6.0 and archived on CRAN (Fleming & Calabrese, 2015, https://CRAN.R-project.org/package=ctmm).