Volume 6, Issue 4
Special Feature – Review: New opportunities at the Interface Between Ecology and Statistics
Free Access

Point process models for presence‐only analysis

Ian W. Renner

Corresponding Author

School of Mathematical and Physical Sciences, The University of Newcastle, University Drive, Callaghan, NSW, 2308 Australia

Correspondence author. E‐mail: ian.renner@newcastle.edu.auSearch for more papers by this author
Jane Elith

School of BioSciences, The University of Melbourne, Parkville, Vic., 3010 Australia

Search for more papers by this author
Adrian Baddeley

Department of Mathematics & Statistics, Curtin University, GPO Box U1987, Perth, WA, 6845 Australia

Search for more papers by this author
William Fithian

Department of Statistics, Stanford University, 390 Serra Mall, Stanford, CA, 94303 USA

Search for more papers by this author
Trevor Hastie

Department of Statistics, Stanford University, 390 Serra Mall, Stanford, CA, 94303 USA

Search for more papers by this author
Steven J. Phillips

2201 4th Street, Boulder, CO, 80304 USA

Search for more papers by this author
Gordana Popovic

School of Mathematics and Statistics and Evolution & Ecology Research Centre, The University of New South Wales, Sydney, NSW, 2052 Australia

Search for more papers by this author
David I. Warton

School of Mathematics and Statistics and Evolution & Ecology Research Centre, The University of New South Wales, Sydney, NSW, 2052 Australia

Search for more papers by this author
First published: 23 February 2015
Citations: 129

Summary

  1. Presence‐only data are widely used for species distribution modelling, and point process regression models are a flexible tool that has considerable potential for this problem, when data arise as point events.
  2. In this paper, we review point process models, some of their advantages and some common methods of fitting them to presence‐only data.
  3. Advantages include (and are not limited to) clarification of what the response variable is that is modelled; a framework for choosing the number and location of quadrature points (commonly referred to as pseudo‐absences or ‘background points’) objectively; clarity of model assumptions and tools for checking them; models to handle spatial dependence between points when it is present; and ways forward regarding difficult issues such as accounting for sampling bias.
  4. Point process models are related to some common approaches to presence‐only species distribution modelling, which means that a variety of different software tools can be used to fit these models, including maxent or generalised linear modelling software.

Introduction

Species distribution modelling (SDM) provides a framework for determining the distribution of a species’ habitat as a function of environmental variables and is a highly researched topic of interest to ecologists, biologists and climate change scientists. Often, the best available species data come in the form of a list of reported presence locations of a species without any corresponding information about where a species is absent. This type of data is known as ‘presence‐only’ data (Pearce & Boyce 2006) and can be found in museums, atlases and herbaria. A researcher interested in exploring the relationship between a species and the environment is faced with the question of which methods to choose, and there have been calls for unification of SDM concepts (Elith & Leathwick 2009; Aarts, Fieberg & Matthiopoulos 2012). Here, using language common to the SDM literature, we aim to progress understanding of methods by focussing on emerging knowledge of the links between point process models (PPMs), regression and MAXENT.

Presence‐only data typically arise as point events – a set of point locations where a species has been observed. In the statistical literature, a set of point events (in which the location and number of points is random) is known as a point process. Spatial statistics literature provides a suite of tools for modelling point processes (Cressie 1993; Diggle 2003), but only recently have point process models been proposed as a natural way for analysing species presence‐only data in a regression framework (Warton & Shepherd 2010; Chakraborty et al. 2011). PPMs are closely connected to methods already in widespread use in ecology such as MAXENT (Aarts, Fieberg & Matthiopoulos 2012; Fithian & Hastie 2013; Renner & Warton 2013), some implementations of logistic regression (Baddeley et al. 2010; Warton & Shepherd 2010) and estimation of resource selection functions (Aarts, Fieberg & Matthiopoulos 2012; McDonald et al. 2013). PPMs enjoy particular benefits in interpretation and implementation. Consequently, we believe PPMs are a natural choice of analysis method for presence‐only SDM, when the data arise as point events.

In this paper, we review PPMs for species distribution modellers, and different methods for fitting them. We show that the point process viewpoint resolves a number of important questions regarding presence‐only data, including exactly what target quantity is being modelled, how to select background or pseudo‐absence points (named quadrature points in the PPM literature), what assumptions are made and how they can be checked. It also suggests a natural way to deal with biases. We provide a worked example to demonstrate how to fit PPMs with different methods.

Example – distribution of Eucalyptus sparsifolia

We will proceed by briefly describing an example data set to be used throughout the paper to illustrate key ideas. The data set comprises 230 presence‐only locations of Eucalyptus sparsifolia within the Greater Blue Mountains World Heritage Area (GBMWHA) and a surrounding 100‐km buffer zone (Fig. 1), a 86 227‐kmurn:x-wiley:2041210X:media:mee312352:mee312352-math-0001 area near Sydney, Australia (NSW Office of Environment and Heritage 2012). This species is known to be abundant and broadly distributed across this region (Hager & Benson 2010). Maps of environmental variables were available over the study region, and the goals of analysis were to map the distribution of Eucalyptus sparsifolia and identify its key environmental correlates.

image
Locations of 230 presence‐only Eucalyptus sparsifolia observations within 100 km of the Greater Blue Mountains World Heritage Area.

The presence records were entirely from incidental sightings of the species collated by the responsible state department since 1972. These records were cleaned prior to analysis to remove data from systematically sampled transects, and records with high location errors, leaving only records of opportunistic sightings whose point location was known to a reasonable (1 km) degree of accuracy. As these observations are reported as locations, not counts in transects or grid cells, they are best described as point locations in continuous space, which motivates the use of point process models for analysis.

Environmental predictors selected as likely relevant to the distribution of the species and its records are minimum and maximum annual temperature, annual rainfall, number of fires since 1943 and a categorical soil variable. A description of the soil categories is presented in Section 1 of the Appendix S1. All variables were available at 100‐m grid cell resolution (Renner et al. 2015).

Point process models

Presence‐only data consist of a set of locations urn:x-wiley:2041210X:media:mee312352:mee312352-math-0002 at which a species has been observed in some region urn:x-wiley:2041210X:media:mee312352:mee312352-math-0003. While methods for fitting PPMs are closely related to common regression models, particularly generalised linear models (GLMs, McCullagh & Nelder 1989), PPMs are posed differently. A regression model is typically used when the object of interest is a random variable urn:x-wiley:2041210X:media:mee312352:mee312352-math-0004, for which we model the mean urn:x-wiley:2041210X:media:mee312352:mee312352-math-0005 as a function of covariates urn:x-wiley:2041210X:media:mee312352:mee312352-math-0006. By contrast, the objects of primary interest in a PPM are the spatial locations of presence points urn:x-wiley:2041210X:media:mee312352:mee312352-math-0007 – that is the focus is on where the points were observed. We model the locations in urn:x-wiley:2041210X:media:mee312352:mee312352-math-0008 jointly with the number of presences m and characterise them via the intensity or limiting expected number of presence points per unit area λ(s). The link to regression comes because we typically model λ(s) as a function of covariates x(s) measured throughout the study region urn:x-wiley:2041210X:media:mee312352:mee312352-math-0009.

The first advantage of PPMs, before looking any further, is greater clarity about what exactly is being modelled (Aarts, Fieberg & Matthiopoulos 2012; Dorazio 2012). The target of interest, intensity, is not a probability and is instead a measure of abundance – the number of presence records per unit area. Thus, it need not have an upper bound of one (Aarts, Fieberg & Matthiopoulos 2012). Further, intensity as defined is a function of only two quantities – spatial patterning in the presence‐only data, and the spatial measurement units. Changing the spatial units from kilometres to metres should change intensity proportionally (decreasing by a factor of urn:x-wiley:2041210X:media:mee312352:mee312352-math-0010).

It should be emphasised that in most instances, the intensity λ(s) does not reflect the expected abundance per unit area of a species; rather, it reflects the expected abundance of species reportings. It can typically only be used to make inferences about relative patterns in species abundance (Fithian & Hastie 2013).

The Poisson case – no spatial dependence

The simplest type of PPM of use in presence‐only analysis is an inhomogeneous Poisson point process (hereafter referred to as a Poisson PPM), in which we assume (a) point events are independent of each other, which can be shown to imply that the total number of points in the study region is a Poisson random variable, and (b) that the intensity λ(s) varies spatially (and so is indexed by location s). We will further assume it varies according to environmental conditions x(s).

Assumption (a), in which point locations are independent, is a restrictive assumption which often is not satisfied by presence‐only data. Methods to check the independence assumption are described in Section 4.6' and methods to fit models that account for dependence are described in Section 3.3.

Assumption (b) is often refined to a loglinearity assumption, where we model intensity as a loglinear function of environmental covariates:
urn:x-wiley:2041210X:media:mee312352:mee312352-math-0011(eqn 1)

where urn:x-wiley:2041210X:media:mee312352:mee312352-math-0012 is a vector that contains the parameters corresponding to the p environmental covariates x(s). Loglinearity is a natural assumption because it ensures that intensity is a non‐negative quantity, and it is the canonical link for Poisson data. While the form of this equation is loglinear, it can readily capture nonlinear relationships between intensity and the environment, for example, using quadratic and interaction terms, smoothed functions in generalised additive models (GAMs, Hastie & Tibshirani 1990) or via kernel regression (Guan 2008; Baddeley et al. 2012).

Loglinear models of the form of (1) are commonly fitted to count data, and one way to fit a PPM is in fact to break the data into grid cells and fit a Poisson loglinear model to the counts of presence points in each grid cell. However, such a model has the potential to lose information from the data during aggregation from point location to the grid cell level (Renner & Warton 2013). It may, however, be helpful to readers to think of a Poisson PPM as like a model for count data – the key distinction being that the data for analysis are the set of point locations of presences, rather than counts in grid cells.

Regression models of presence–background data

Here we pause to briefly discuss connections with a common practice in the SDM literature, presence–background (PB) regression, also referred to as pseudo‐absence regression (e.g. Chefaoui & Lobo 2008; Phillips et al. 2009; Barbet‐Massin et al. 2012) and more recently as ‘naïve regression’ (Fithian & Hastie 2013). This models presence (y = 1) and background (treated as y = 0) with logistic regression methods usually used for presence–absence data. This approach can be understood as being ‘naïve’ essentially because of a mismatch between the model being fitted and the data that were collected – the presences (y = 1) are the raw data for which we wish to specify a model, and the background points (y = 0) are a fabrication. PB regression was motivated by the need to model distributions of species for which survey (PA) data were unavailable and, early on, by a lack of suitable alternatives. It has remained popular, perhaps because of the examples in which this approach seems to work reasonably well compared with other methods, and the fact that many ecologists are already familiar with regression methods. The approach is somewhat ad hoc. Some users apply arbitrary weights to the background samples, for pragmatic rather than statistically based reasons (Elith et al. 2006). The fitted quantity is interpreted as a relative likelihood of presence, with an unknown scaling linking it to the true probability of presence.

One of the most challenging steps in fitting a PB regression model is selection of the background points. A common choice has been to select a large number (thousands) of points across the landscape of interest (Elith et al. 2006). Other more complex schemes include identifying points more likely to represent a true absence, or at least avoiding presence points (Engler, Guisan & Rechsteiner 2004), or trying to specify an optimal number of background points (or presence–background ratio) for different methods (Barbet‐Massin et al. 2012). These are usually based on an idea about data structure or on evidence that it performs better than an alternative in particular case studies or simulations, but without stronger statistical justification. These have created some confusion among users regarding which approach is the best to use. We think that efforts to clarify background sampling schemes under the naïve model are misdirected and that much is to be gained by changing to the point process viewpoint. As will be seen later, this viewpoint provides a solid statistical framework for understanding the role of background points and for deciding how many, placed where, are sufficient.

Warton & Shepherd (2010) and Fithian & Hastie (2013) discuss problems with using naïve logistic regression and its various extensions for presence‐only data. One problem is scale dependence – the scale of PB regression predictions is meaningless since the predictions change as more background points are added to the sample. But Warton & Shepherd (2010) and Baddeley et al. (2010) showed that PB regression can be understood as an approximation to fitting a Poisson point process model and that the latter resolves the scale dependence issue, and many issues with background choice.

Spatial dependence in point processes

An underlying assumption of the Poisson PPMs (and most PB regression methods) is that data are conditionally independent given the covariates; that is, the similarities in intensity in nearby regions are fully explained by the environmental and sampling covariates in the model. This is, however, often not the case, and failing to account for spatial dependence can significantly alter conclusions (Dormann 2007). Common examples of spatial dependence are clustering through dispersal or social aggregation. Spatial dependence may also be induced by failing to measure environmental variables which act to make presence patterns in regions close together seem more similar than those further apart.

The two most common classes of point process models for species dependence are Gibbs and Cox processes.

Gibbs or ‘interaction’ processes relax the independence assumption by assuming interactions between sets of points. One useful example is area‐interaction processes (Widom & Rowlinson 1970; Baddeley & van Lieshout 1995), which assume interactions among all points within a distance of 2r. These interactions can be understood as introducing an additional term to the intensity function (conditional on the observed presence points):
urn:x-wiley:2041210X:media:mee312352:mee312352-math-0013(eqn 2)
where θ is an interaction parameter (positive values implying clustering of points) and urn:x-wiley:2041210X:media:mee312352:mee312352-math-0014 is the area of a disc of radius r centred at the location s that does not intersect with similar discs centred around each of the presence points urn:x-wiley:2041210X:media:mee312352:mee312352-math-0015.

Area‐interaction models are potentially useful for SDMs because they are capable of modelling either clustering or inhibition. They also have some biological justification; for example, interaction radius r could be considered as a maximum dispersal distance, with intensity increasing at locations within a distance r of a known presence because of the chance of establishment from that presence point. Beyond area‐interaction processes, there are a number of other types of Gibbs process, in particular processes that involve pairwise interaction between points (for a list, see Baddeley & Turner 2005, Section 9 of the Appendix S1).

An alternative way to deal with clustering and the effects of unmeasured covariates is by fitting a Cox process, the most common example of which is the spatial log‐Gaussian Cox process (LGCP) (Møller, Syversveen & Waagepetersen 1998). This can be understood as a point process analogue of a generalised linear mixed model with a random intercept that is normally distributed.

The intensity λ(s) in a LGCP is a function not just of environmental variables, but also of a stochastic Gaussian process ξ(s):
urn:x-wiley:2041210X:media:mee312352:mee312352-math-0016(eqn 3)

Here, ξ(s) is a spatial Gaussian process with zero mean, and a covariance function that depends on the distance between observations, such that observations closer together in space are assumed to be more positively correlated than those further apart. The ξ(s) can be understood as an unmeasured covariate which is associated with the distribution of the species. Conditional on this latent process, the point events are assumed to be inhomogeneous Poisson. In other words, it is assumed that any spatial dependence in the data is entirely captured by ξ(s).

Fitting a point process model

This section provides an overview of the process of fitting a PPM, with more detailed information about software and example code provided in the Appendix S1. We use the example Eucalyptus sparsifolia data introduced previously in illustratory analyses.

Fitting Poisson point processes

The most common approach to fitting a Poisson PPM is to maximise the log‐likelihood function (Cressie 1993), which can be written as:
urn:x-wiley:2041210X:media:mee312352:mee312352-math-0017(eqn 4)

For a derivation of this log‐likelihood, and how it differs from the homogeneous Poisson point process case, see Appendix S1 (Section 2). The integral in this expression can be interpreted as the expected number of presence points in the whole study region urn:x-wiley:2041210X:media:mee312352:mee312352-math-0018, and it is the approximation of this quantity that is the main challenge in model fitting.

Estimation of parameters in the inhomogeneous Poisson PPM is not straightforward, because the integral in (4) does not have a closed form and must be approximated in some way. A standard way to approximate this integral is through the use of numerical integration, otherwise known as ‘quadrature’ (Davis & Rabinowitz 1984). The general idea is to choose a set of ‘quadrature points’ at which the intensity function is evaluated, and these evaluations are then combined as a weighted sum to estimate the integral. Common examples of quadrature methods are Riemann sums and the trapezoidal rule. Irrespective of the quadrature method used, the likelihood can then be written as:
urn:x-wiley:2041210X:media:mee312352:mee312352-math-0019(eqn 5)
urn:x-wiley:2041210X:media:mee312352:mee312352-math-0043(eqn 6)
where urn:x-wiley:2041210X:media:mee312352:mee312352-math-0020 are quadrature weights and urn:x-wiley:2041210X:media:mee312352:mee312352-math-0021 are quadrature points and urn:x-wiley:2041210X:media:mee312352:mee312352-math-0022 for presence points (j = 1, …, n) and urn:x-wiley:2041210X:media:mee312352:mee312352-math-0023 otherwise. The quadrature weights urn:x-wiley:2041210X:media:mee312352:mee312352-math-0024 can be understood as applying a spatial scaling, so that the response being modelled (intensity, λ) has spatial units not observational units. For example, in analysing the Eucalyptus sparsifolia data, we modelled the expected number of presences per square kilometre. Broadly speaking, urn:x-wiley:2041210X:media:mee312352:mee312352-math-0025 represents the area of the neighbourhood around the point urn:x-wiley:2041210X:media:mee312352:mee312352-math-0026, found after partitioning the study region urn:x-wiley:2041210X:media:mee312352:mee312352-math-0027 into neighbourhoods around each point (including presences and quadrature points).

Equation 6 is due to Berman & Turner (1992), and it reexpresses the likelihood as a Poisson likelihood with observation weights urn:x-wiley:2041210X:media:mee312352:mee312352-math-0028. The significance of this result is that it makes Poisson PPMs relatively straightforward to fit – they can be fitted using any standard glm software, such as the glm function in r (R Development Core Team 2010), using the Poisson family and appropriate observation weightings. This can be done in just a few lines of code, as illustrated in the Appendix S1, although specialised packages are available that offer enhanced options – the spatstat package on r (Baddeley & Turner 2005) has a suite of model‐checking tools, and the ppmlasso package adds a LASSO penalty for improved predictive performance.

The connection to Poisson GLMs in equation 6 has enabled equivalence results between Poisson PPMs and PB regression in large samples (Baddeley et al. 2010; Warton & Shepherd 2010) and more recently MAXENT (Aarts, Fieberg & Matthiopoulos 2012; Fithian & Hastie 2013; Renner & Warton 2013), offering some insight into these methods. For example, the scale dependence of PB regression can be understood as arising due to the omission of appropriate observation weights urn:x-wiley:2041210X:media:mee312352:mee312352-math-0029. The equivalence with MAXENT implies that maxent software can be used to fit a Poisson PPM. We will discuss the capabilities of these alternate methods for fitting Poisson PPMs later.

How to choose quadrature points

A first step in fitting a PPM is selection of quadrature points, to allow estimation of the PPM likelihood. This is an equivalent issue to that of background selection (Section ‘Regression 3.2). However, our choice of the term ‘quadrature points’ in this Section reflects a desire to pose the question of their choice as a quadrature problem, which clarifies their role in analysis and provides a framework for their selection.

From the point process viewpoint, quadrature points are merely a device to estimate an integral. This is true across the various methods that can be used to fit a Poisson PPM (Section 4.6, see also Fithian & Hastie 2013). Being such a device, the important question becomes: How many points, placed where, will be sufficient to accurately estimate the likelihood? This is primarily a question for which the number and location of presence records are irrelevant; hence, ideas of matching number of presence points and sampling far away from presence points are not relevant, except when computational efficiency is an issue, which then requires specifically designed schemes (see later).

Assuming that the extent of the study area is pre‐determined by the analyst, two simple strategies to select the location of quadrature points within that extent are (i) to choose them on a regular mesh (e.g. at regularly spaced intervals in each cardinal direction) or (ii) to choose them randomly. Alternative strategies whereby the density of quadrature points increases with environmental variability may be helpful in scenarios where computation time is slow as these would lead to a smaller data set with negligible loss in accuracy. This is related to the ideas of importance sampling and adaptive quadrature (Davis & Rabinowitz 1984), which seem so far under‐utilised in the SDM literature.

The number of quadrature points should be sufficient for an accurate estimate of the likelihood, which will lead to a stable model that is approximately invariant across repeat samples of the points. To determine an appropriate number of quadrature points, we advise that analysts check that sufficient accuracy is achieved by increasing the number of quadrature points until there is little appreciable change in model fit or in predictive performance (Phillips & Dudík 2008). For instance, for Fig. 2a, we fitted Poisson PPMs to Eucalyptus sparsifolia data repeatedly halving the spacing between quadrature points, selected across a regular mesh. This was done using the ppmlasso package, which has a function specifically designed to perform this operation. The log‐likelihood converged at a 1‐km spacing, which required more than 86 000 quadrature points for our example (Fig. 2a), noticeably more than common software defaults (e.g. 10 000 in maxent).

image
Checking for likelihood convergence as the number of quadrature points changes: (a) Using a rectangular mesh of quadrature points at different spatial resolutions, as available in the ppmlasso package; (b) using random sets of quadrature points and progressively increasing the sample size, as estimated using downweighted Poisson regression models. It appears that there is little benefit in analysing the data at a spatial resolution finer than 1 km (a), or with more than 100 000 quadrature points (b).

Undersampling of quadrature points will lead to error in our coefficient estimates, although this error may still be small compared to the coefficients’ overall standard errors. If quadrature points have been randomly sampled (rather than using a regular mesh), we can easily quantify this error by considering what might happen under repeated sampling of quadrature points. For example, in Fig. 2b, models were fitted with an increasing number of randomly chosen quadrature points, and results replicated 30 times to study how much results varied when using different sets of random quadrature points (code available in Section 5 of the Appendix S1). Figure 2b suggests that 100 000 or more randomly chosen quadrature points would be needed to reliably estimate the maximised log‐likelihood – with significant variation from one set of quadrature points to another before this point. In particular, if using 10 000 random quadrature points, the maximised log‐likelihood varied over a range of more than 50 for different samples of quadrature points.

Alternatively, the error introduced by quadrature can be estimated analytically quite easily. The integral was estimated as an average of estimated intensities at n random points (multiplied by urn:x-wiley:2041210X:media:mee312352:mee312352-math-0030), so its uncertainty can be estimated using the formula for the standard error of a sample mean, urn:x-wiley:2041210X:media:mee312352:mee312352-math-0031. In our example data set, given an initial fit using 10 000 random quadrature points, the standard deviation of estimated intensities at the quadrature points was s = 0·0103, yielding an estimated standard error of urn:x-wiley:2041210X:media:mee312352:mee312352-math-0032. If we desire an estimate of the log‐likelihood to be within a standard error of two of its true value, we can estimate the required number of quadrature points as urn:x-wiley:2041210X:media:mee312352:mee312352-math-0033. This corresponds well with the results of Fig. 2b.

The precise number of points needed for a sufficiently accurate estimate should vary with the roughness of the intensity surface (hence the difficulty of the integration problem). Thus, we may expect to need more quadrature points when environmental data are measured at a finer resolution (equivalently, smaller grid cell sizes) or broader spatial extent.

‘Quadrature thinking’ also leads to other emphases. First, if computational efficiency is of key importance, it could be prudent to sample fewer quadrature points (with higher corresponding quadrature weights) in areas where the species is unlikely to occur, which should have negligible impact on the intensity and the likelihood of the fitted model. However, this needs to be done with care because it relies on correct identification and inclusion in the model of the covariates causing species absence from certain parts of the landscape. An example of where this may be appropriate is in telemetry studies, where an individual's location is typically strongly associated with distance from the last observed location; hence, quadrature points far from that location make negligible likelihood contribution (Warton & Aarts 2013).

Secondly, the quadrature viewpoint tends to place more emphasis on specifying the model in a way that deals with biases, rather than fiddling with quadrature points. Hence with quadrature thinking, one is more likely to accommodate bias by explicitly specifying covariates in the model (e.g. Warton, Renner & Ramp 2013; Fithian et al. 2015) than through the selection of quadrature points (e.g. target‐group background as in Phillips et al. 2009). For example, in our Eucalyptus sparsifolia model, we added two predictors related to site accessibility in order to model observer bias (distance from main roads and distance from urban areas). These two observer bias variables were included to try to account for spatial patterning in presence locations due to behaviour of observers rather than behaviour of the study species. By then making predictions at a common level of observer bias (e.g. distance equals zero), we can map E. sparsifolia distribution controlling for observer bias (Warton, Renner & Ramp 2013; Fithian et al. 2015).

Finally, most methods for fitting Poisson PPMs attach weights to the quadrature points (the urn:x-wiley:2041210X:media:mee312352:mee312352-math-0034 in equation 5) for a scale‐invariant estimate of the log‐likelihood that is comparable across sets of quadrature samples of different size. This can have advantages for model fitting and interpretation (Warton & Shepherd 2010).

Checking assumptions

A suite of diagnostic tools are available in the point process literature that can be used to ground‐truth model assumptions. Just as with ordinary regression models, residual analysis (Baddeley et al. 2005) can be used to assess adequacy of the model for intensity, in particular by checking for a spatial trend in residuals. The assumption of independence among point locations can be checked using Ripley's K‐function (Ripley 1977) and its generalisations (Baddeley, Møller & Waagepetersen 2000).

Consider, for example, a Poisson PPM fitted to the E. sparsifolia data. A check of the independence assumption suggests significant clustering of points at radii <10 km (Fig. 3a), so we fit an area‐interaction model (see Section 3 of the Appendix S1 for details). The resulting model fit exhibits some pattern (Fig. 3b), but the magnitude of the residuals is not exceedingly large. Cumulative residuals for increasing longitude and latitude (x and y) do not significantly deviate from Monte Carlo simulation envelopes (Fig. 3c–d), so the model fit may be deemed sufficiently appropriate.

image
Example diagnostic plots for a Poisson point process model using the spatstat package – (a) an inhomogeneous K‐function with 95% simulation envelope, (b) a map of smoothed Pearson residuals, (c) cumulative Pearson residuals for increasing longitude with 95% simulation envelope and (d) cumulative Pearson residuals for increasing latitude with 95% simulation envelope. The inhomogeneous K‐function (a) suggests point clustering not accounted for by the model as the observed values (red) exceed the upper limit of the envelope (dashed) for radii below 10 km. The residual plot (b) exhibits some pattern, but the magnitude of the residuals is not exceedingly large. Cumulative residuals for increasing longitude and latitude do not significantly deviate from Monte Carlo simulation envelopes (c and d), so the model fit may be deemed sufficiently appropriate.

Other useful diagnostic features demonstrated in Section 3 of the Appendix S1 include influence, leverage and partial residual plots, all derived in direct analogy to how they are used in generalised linear models (as in Baddeley et al. 2013). All of these diagnostic plots are relatively easy to produce using the spatstat package.

When checking the assumption of independence of presence points, an alternative approach is to treat models that incorporate spatial dependence as the default, to fit such models, and then study the level of dependence in the subsequent model fit, as in Fig. 4a. This approach makes particular sense as an approach if one is expecting spatial dependence a priori, and data of sufficient quality to see the spatial dependence signal. The fitting of such models is discussed below.

image
Estimation of spatial dependence using a log‐Gaussian Cox process model: (a) Mean and (b) standard deviation of the random Gaussian field, and (c) a posterior mean and 95% credible interval for the interaction coefficient, computed using the dist function on r‐inla. The mean is significantly larger than the standard deviation in some regions, and the interaction coefficient in (c) has a prediction interval greater than zero at small distances, both of which suggest clustering in the data beyond that explained by covariates.

Fitting point processes with spatial dependence

Both Gibbs and Cox processes are more difficult to fit than Poisson PPMs, although for different reasons.

Gibbs processes are difficult to fit by maximum likelihood because their specification involves a proportionality constant that is difficult to estimate. One workaround is to use the Poisson process likelihood in place of the true likelihood (Besag 1977), that is, use (5) as a pseudo‐likelihood. This is often done because the Gibbs likelihood has a complex form, but by using the Poisson pseudo‐likelihood instead, estimates are readily available via GLM. This approach is implemented in spatstat and ppmlasso (see Sections 3 and 4 of the Appendix S1 for detailed code), with a number of types of interaction processes to choose from in spatstat, as specified using the interaction argument. One issue to be aware of when using this approach is that traditional methods of likelihood‐based inference (such as likelihood‐based standard errors, likelihood ratio tests, AIC) may no longer apply, because parameters have not been estimated by maximum likelihood.

Cox process models are difficult to fit because they involve an unobserved Gaussian random process (the ξ(s) in equation 3), and so maximum likelihood estimation would involve complex integrals. These models are hierarchical and are therefore naturally suited to a Bayesian hierarchical approach to estimation as in Illian, Møller & Waagepetersen (2009) and Chakraborty et al. (2011). Other estimation techniques include composite likelihood (Guan 2006) and weighted estimating equations (Guan & Shen 2010), as implemented in spatstat.

Two methods of Cox process estimation that are currently popular, and which both have been implemented under a Bayesian framework, are the integrated nested Laplace approximation (INLA, implemented in the r‐inla package, Rue, Martino & Chopin 2009) and a Markov chain Monte Carlo (MCMC) method (implemented in the lgcp package, Taylor et al. 2013). The INLA method seeks to calculate the integrals by a set of carefully chosen approximations. It is generally fast compared to MCMC methods, which can be quite time‐consuming but have potentially greater accuracy. For a comparison of these, see Taylor & Diggle (2014).

Sections 7 and 8 of the Appendix S1 have example code for fitting a Cox process using both the r‐inla and lgcp packages. We found these packages much more difficult for the practitioner to use than Poisson PPMs, with specialist guidance often required. The main gain from this additional effort is the possibility of making valid inferences from the model, taking into account uncertainty, under the assumption of spatial dependence. This is otherwise harder to achieve without resorting to resampling, as below.

Block resampling for inference in the presence of spatial dependence

A complication that can arise when modelling spatially dependent point processes is that likelihood‐based standard errors may be estimated to be too small. This problem arises when using the pseudo‐likelihood approach to fit Gibbs processes or when failing to account for interpoint dependence correctly. This issue does not arise in fitting Cox processes, if correctly specified, a significant advantage of that approach.

Similarly, but irrespective of what model is fitted to data, standard cross‐validation measures of out‐of‐sample prediction error can be over‐optimistic in the presence of dependence, potentially leading us to erroneously prefer models that overfit to local structure in the data (Wenger & Olden 2012). Block resampling techniques can deal with short‐range interpoint dependence nonparametrically.

Suppose that interpoint dependence is strong for nearby locations, but weak at radius r. Then if we tile our geographic domain into c rectangular blocks of size r × r, the data falling in one block are approximately independent of the data in other blocks. If we believe this, then we can obtain accurate standard errors using a bootstrap algorithm that resamples whole blocks with replacement (Efron & Tibshirani 1993), as in Slavich et al. (2014). Likewise, we can obtain estimates of out‐of‐sample prediction error by cross‐validation where blocks are assigned whole to each fold (Burman, Chow & Nolan 1994), as in Wenger & Olden (2012), Pearson et al. (2013) and Warton, Renner & Ramp (2013).

For example, the model for Fig. 5c involved a LASSO penalty which was estimated by 5‐fold cross‐validation using blocks of 32 km × 32 km. Section 4 of the Appendix S1 has example code for block cross‐validation with the ppmlasso package.

image
Predicted intensity of Eucalyptus sparsifolia observations for a Poisson point process model fitted using (a) spatstat or (b) maxent; (c) an area‐interaction model fitted using ppmlasso; and (d) a log‐Gaussian Cox process fitted using r‐inla. Note that spatstat and maxent results look quite similar, as they fit similar models, and that the ppmlasso and r‐inla results, which account for spatial dependence, highlight additional areas of relatively high intensity further west.

Software for fitting point process models

The main software packages currently available for fitting point process models, and their key differences in properties, are summarised in Table  1. All are available in r. In the Appendix S1, we have developed short tutorials stepping the user through analysis of the Eucalyptus sparsifolia data using each of these packages, and we encourage new users to work through these resources when deciding on an approach to analysis of point process data.

Table 1. Summary table of software properties
Property spatstat ppmlasso iwlr dwpr maxent r‐inla lgcp
Regularisation × urn:x-wiley:2041210X:media:mee312352:mee312352-math-0035 × ×
Standard errors urn:x-wiley:2041210X:media:mee312352:mee312352-math-0036 × urn:x-wiley:2041210X:media:mee312352:mee312352-math-0037 urn:x-wiley:2041210X:media:mee312352:mee312352-math-0038 ×
Variable importance plots × × × × × ×
Diagnostic plots × × × × ×
Spatial dependence × × ×
Nonlinearity (e.g. smoothers)
Scale invariant × urn:x-wiley:2041210X:media:mee312352:mee312352-math-0039
  • 1LASSO only.
  • 2For Poisson models only.
  • 3Raw output only.

The most established package for point process modelling is spatstat, whose main advantages are the extensive suite of diagnostic tools, and its ability to simulate data from a given point process model. The ppmlasso package was written to be spatstat‐compatible, so it inherits many useful diagnostic tools, while adding a couple of functions of particular interest for SDMs – the ability to regularise parameter estimates (i.e. shrink them towards zero to reduce variance, Hastie, Tibshirani & Friedman 2009) using the LASSO or elastic net, and functions to guide the user regarding quadrature point choice, as in Fig. 2a. Standard errors are not returned in ppmlasso output, since these become very approximate when using a LASSO or other regularisation approach in parameter estimation.

Because point process models can be fitted using standard glm software, one can entirely avoid using specialised point process software, but the onus is on the user to ensure that quadrature points have been chosen in sufficient numbers and locations and that the appropriate quadrature weights have been assigned. The main advantage of this approach is that the user has greater control over what type of model is fitted – for example, as well as GLM, one could use any of its extensions (GAM, elastic net, CART, …). Along these lines, Fithian & Hastie (2013) proposed a simple algorithm based on weighted logistic regression that can be used to estimate slope coefficients in a Poisson PPM, referred to as infinitely weighted logistic regression (IWLR). But IWLR only gives a solution proportional to a point process, because the intercept term and hence the scale of the log‐likelihood is arbitrary. A modification of this idea, which we call ‘downweighted Poisson regression’ (DWPR), is proposed in Section 5 of the Appendix S1. Given a random set of pseudo‐absences, this reduces the steps of assigning quadrature weights and fitting the model to just a few lines. When using DWPR, the intercept and hence the likelihood are estimable, so we can use this technique to look at questions like how many quadrature points to select (as done in Fig. 2b).

Given that MAXENT has recently been shown to be proportional to a Poisson PPM (Aarts, Fieberg & Matthiopoulos 2012; Fithian & Hastie 2013; Renner & Warton 2013), maxent software can also be thought of as a means of fitting a Poisson PPM. This does, however, require some departures from the default maxent software settings, as described in Section 6 of the Appendix S1. Using maxent to fit a Poisson PPM has the advantages that it is familiar to many users, has a lot of nice mapping features and is easy to use. Example features include the interactive ‘explain’ tool which visualises the link between the mapped prediction and the model at any selected point, and maps that show whether environments in new regions or times of interest are within the training range of the modelled data. maxent can be run from r using the package dismo, which allows streamlined data preparation and modelling, and the potential to use block resampling along the lines of Wenger & Olden (2012). A short tutorial for fitting PPMs using dismo is presented in Section 6 of the Appendix S1.

A key issue, however, with implementations of Poisson PPM using GLM and maxent software is the lack of assumption checking tools. One should not take ‘on faith’ the assumption that there is no spatial dependence in the data beyond that explained by environmental variables included in the model. A related issue is the lack of a capacity to fit models that account for point interactions using GLM or maxent.

Fitting Cox processes in a Bayesian framework, as in r‐inla and lgcp, has the advantage that spatial dependence can be estimated and accounted for in a flexible and statistically efficient way. However, the additional complexity of estimating the latent field results in much slower computation times as compared to competitors. Careful selection of the respective prior distribution for the Gaussian random field is important to avoid overfitting (Illian et al. 2013). But there is currently little guidance about prior selection – such guidelines are currently being developed for r‐inla.

Analysis of Eucalyptus sparsifolia

When fitting point process models to Eucalyptus sparsifolia data, results were broadly similar across methods of fitting Poisson PPMs (Fig. 5a–b), with some variation due to different decisions being made in default implementations (e.g. spatstat does not apply a LASSO penalty, whereas maxent does). But as identified previously, there is evidence of spatial clustering between presence points in close proximity – at distances less than 10 km, the inhomogeneous K‐function in Fig. 3a strays above its simulation envelope, and the point interaction coefficient from a fitted Cox process is significantly above zero (Fig. 4c). Further evidence of dependence can be seen in the mean of latent Gaussian field from the Cox process fit, which was sometimes large relative to its standard deviation (Fig. 4a–b). Maps produced by models which account for this spatial dependence (Fig.  5c and d) highlight areas of relatively higher intensity further west.

The analysis is largely consistent with pre‐existing knowledge of the distribution of Eucalyptus sparsifolia, thought to prefer ‘low nutrient soils, but some on medium and high nutrient soils, over a wide range of rainfall’ (Hager & Benson 2010). The maxent response curve for rainfall (Fig. 6a) indicates suitability over a wide rainfall range, and several models (Poisson PPM and area‐interaction models produced by ppmlasso, and maxent) either dropped all rainfall terms or assessed them as relatively unimportant in describing the distribution of E. sparsifolia (results not shown). The species appears most strongly associated with low‐nutrient high‐quartz sedimentary soils and has low intensity in volcanic soils. See Section 9 of the Appendix S1 for a list of coefficient estimates.

image
Maps produced by maxent software to aid interpretation: (a) The explain tool – the location indicated by the arrow has low intensity, and the plots at the right suggest that the high minimum temperature is largely responsible. (b) Multivariate similarity surface plot (right) and most dissimilar variable plot (left) for climate change predictions – the redder colours in the left panel indicate areas that are most dissimilar to the environmental conditions used to build the model, while the right panel identifies which variables are most responsible for the dissimilarity. It seems minimum temperature may be limiting the distribution near the coast under the climate change model.

Minimum annual temperature is strongly associated with E. sparsofila distribution, yet not mentioned by Hager & Benson (2010). The quadratic term is significantly negative in models produced by ppmlasso and r‐inla, consistent with the response curve produced by maxent (Fig. 6a). This variable has implications for climate change projections, suggesting a substantial decrease in E. sparsofila intensity at the southern end of its range under warming scenarios (Fig. 6a).

A key distinction in the models produced by the different methods is in the number of significant variables (Section 9 of the Appendix S1). The Poisson PPM and area‐interaction model fitted by ppmlasso added 24 and 18 nonzero terms in the model, respectively, while the Cox process model produced by r‐inla added only seven, five of which were soil indicators. One possible explanation is that there may have been collinearity between the Gaussian random field and environmental predictors, dampening the environmental signal – such a ‘spatial confounding’ effect is seen elsewhere in spatial statistics, and adjustments can account for it when fitting spatial generalised linear mixed models (Hodges & Reich 2010; Hughes & Haran 2013). Extension of these ideas to address spatial confounding in Cox process models is a potential avenue for future research.

Extensions

To this point, the focus has been on spatial point processes, to describe a set of point locations in space. A number of potential variations on the method may be of interest to species distribution modellers.

A time stamp is often available with presences as well as their point location. It may be of interest to study the patterning of points jointly in space and time, thus fitting a spatio‐temporal point process (Cressie & Wikle 2011). This seems especially relevant in telemetry (Hooten et al. 2013), where one would expect strong temporal dependence in spatial patterning (with individuals tending to be found near their last known location); thus, there is a strong case for modelling such point events jointly in space and time, in order to tease apart habitat preference from habitat availability. Another example where spatio‐temporal modelling is of clear interest is in the study of invasive species not at equilibrium (Hooten & Wikle 2008).

Sometimes presences are observed along a network rather than in a region in space. For example, when modelling roadkill events (Ramp et al. 2005), presence points occur along a road network. Poisson PPMs can be fitted to point events arising along networks relatively easily; however, methods for studying and accounting for dependence in point events along networks are a little explored topic (Baddeley, Jammalamadaka & Nair 2014).

Simultaneously modelling data from multiple species has potential from a number of standpoints. Fithian et al. (2015) showed how estimating observer bias simultaneously across species can improve outcomes, making use of the idea that the sources of observer bias can often be reasonably assumed constant across species, given that this bias is a property of the observer more so than of the study species. Multispecies models could also be used to study species interaction, by specifying what is known as a marked point process model (Cressie 1993) which explicitly includes terms for species interaction.

An issue with presence‐only data is data quality, and there are a number of potential extensions to take into account data of varying quality. For example, typically there is uncertainty in the spatial location of presence points, and locations are often assigned ‘accuracy’ scores to estimate this which can be accounted for in subsequent modelling (Hefley et al. 2014). Further, environmental data are also measured with uncertainty, maps of environmental data often being spatially interpolated from (often sparsely distributed) weather stations. This is a type of errors‐in‐variables problem (Carroll et al. 2012), and PPMs for such problems, while rare in the literature, should be a relatively straightforward extension given established methods for errors‐in‐variables approaches to GLM (Stoklosa et al. 2015). Presence‐only data are furthermore subject to imperfect detectability, inducing biased estimates, but this bias can be reduced by building a hierarchical model including both presence‐only data and independent presence–absence data (Dorazio 2014).

A key strength of point process models is that they operate at what is usually the most ecologically relevant of sampling levels – the level of the individual. This means they can (in principle) incorporate processes operating at the level of the individual, such as interactions between individual organisms, or covariates that vary across individuals. A limiting factor obviously is the quality of data at hand, but there is exciting potential in this framework.

Discussion

Point process modelling has been introduced as a natural framework for modelling presence‐only data that is better understood as point events rather than as data from transects or grid cells.

A particular benefit of the PPM specification that we have highlighted is greater clarity around the issue of how to choose quadrature points, and the possibility of querying the data being analysed to verify that a given choice of quadrature points is appropriate (Fig. 2). We found in Section ‘4.2' that for our example data, the number of quadrature points required for sufficient convergence of the log‐likelihood (a change of <2) depended upon the selection method, but was closer to 100 000 than to the 10 000 usually advocated. Perhaps more quadrature points are needed when randomly chosen than when on a regular mesh, suggesting greater efficiency when using a regular mesh design, although at the cost of making it more difficult to quantify uncertainty in the approximation. This is related to classical results from survey sampling, where systematic sampling is often more efficient than random sampling under serial correlation, but for which it is more difficult to quantify uncertainty (Cochran 1946).

Our hope is that approaching the ‘pseudo‐absence problem’ via numerical quadrature will shift attention of analysis away from quadrature point choice and towards where it belongs – developing and interpreting a plausible model for intensity as a function of environment and possibly observer bias variables.

PPMs as in this paper can be understood as applying regression methods to point event data, and as such, issues that arise in other areas of regression analysis apply equally well here. For example, there is increasing awareness of a dichotomy between prediction and explanation (Elith & Leathwick 2009) – many SDM researchers are interested primarily in the prediction problem and hence are mostly concerned with using a method which has good predictive performance. This leads the user down the road towards regularised methods (such as in ppmlasso) or model averaging (Araújo & New 2007). Others are interested primarily in explanation – identifying key associations between environmental variables and a species. This leads the user to put greater focus on appropriately accounting for different sources of uncertainty, which requires careful consideration of the question of spatial dependence in the data, and potential problems like multicollinearity (Zuur, Ieno & Elphick 2010).

Care must be taken when interpreting a fitted PPM concerning what is actually being modelled, with particular reference to how the data were collected. For example, in order to interpret intensity as relative abundance of individuals per unit area, the intensity of presence records should be proportional to the intensity of individuals (i.e. abundance) of the species. Contrast this with how observers might collect data and how data are entered into online data bases. First, observers may tend to go to a site and only record one individual even though many are present, so individuals in abundant sites are under‐represented. In that case, really one is modelling relative intensity of occupied sites or, equivalently, relative probability of presence (which opens a can of worms, since the extent of a site may be unknown or vary between observers). Secondly, in data aggregation services such as the Global Biodiversity Information Facility (GBIF, Belbin et al. 2013), records arrive from multiple providers. Duplicate records may represent the same individual, for instance, because the same record has been contributed through multiple channels or because different specimens of the same individual were lodged in different museums or herbaria. These duplicate records do not always have exactly the same coordinates because of variable data handling practices. These two examples illustrate typical problems in dealing with the realities of presence‐only data, emphasising that it is important that appropriate data cleaning and model interrogation is considered, and that the interpretation of the final model not stray far from the data used to construct it.

While this paper has focussed on the merits from a model‐fitting perspective of taking a point process approach, there are broader advantages afforded by a coherent modelling framework for presence‐only data. From a technical perspective, PPMs can serve as a model for generation of simulated presence‐only data, with and without point interactions (the spatstat package makes this quite straightforward). From an analyst's perspective, PPMs offer a way forward regarding the assessment of goodness‐of‐fit to presence‐only data, obviating the need to adapt tools like ROC curves to the presence‐only context (Jiménez‐Valverde 2012). Apart from a suite of diagnostic plots for goodness‐of‐fit, likelihood‐based procedures can be used to quantify predictive success, for example Kullback‐Leibler distance. From an ecologist's perspective, rather than aggregating to an arbitrary sampling unit, one can specify a model for location and behaviour of individual organisms, for example, incorporate covariate information particular to individuals into analyses, where available. In this respect, the point process framework offers an exciting platform that can be used to study ecological processes.

Data accessibility

Locations of Eucalyptus sparsifolia, including the 230 presence‐only locations used in this paper, are available from http://www.bionet.nsw.gov.au.

Environmental data for Blue Mountains region: DRYAD entry doi:10.5061/dryad.985s5.

      Number of times cited according to CrossRef: 129

      • Is more data always better? A simulation study of benefits and limitations of integrated distribution models, Ecography, 10.1111/ecog.05146, 43, 10, (1413-1422), (2020).
      • Landscape does matter: Disentangling founder effects from natural and human‐aided post‐introduction dispersal during an ongoing biological invasion, Journal of Animal Ecology, 10.1111/1365-2656.13284, 89, 9, (2027-2042), (2020).
      • Estimating the drivers of species distributions with opportunistic data using mediation analysis, Ecosphere, 10.1002/ecs2.3165, 11, 6, (2020).
      • A standard protocol for reporting species distribution models, Ecography, 10.1111/ecog.04960, 43, 9, (1261-1277), (2020).
      • , Joint Species Distribution Modelling, 10.1017/9781108591720, (2020).
      • Making the most of available monitoring data: A grid-summarization method to allow for the combined use of monitoring data collected at random and fixed sampling stations, Fisheries Research, 10.1016/j.fishres.2020.105623, 229, (105623), (2020).
      • Hierarchical multi‐grain models improve descriptions of species’ environmental associations, distribution, and abundance, Ecological Applications, 10.1002/eap.2117, 30, 6, (2020).
      • Predicted effects of Chinese national park policy on wildlife habitat provisioning: Experience from a plateau wetland ecosystem, Ecological Indicators, 10.1016/j.ecolind.2020.106346, 115, (106346), (2020).
      • Developing a point process model for ecological risk assessment of pine wilt disease at multiple scales, Forest Ecology and Management, 10.1016/j.foreco.2020.118010, 463, (118010), (2020).
      • Statistical challenges in spatial analysis of plant ecology data, Spatial Statistics, 10.1016/j.spasta.2020.100418, (100418), (2020).
      • Ebola spillover correlates with bat diversity, European Journal of Wildlife Research, 10.1007/s10344-019-1346-7, 66, 1, (2020).
      • Determining marine bioregions: A comparison of quantitative approaches, Methods in Ecology and Evolution, 10.1111/2041-210X.13447, 11, 10, (1258-1272), (2020).
      • Telemetry reveals strong effects of offshore wind farms on behaviour and habitat use of common guillemots (Uria aalge) during the breeding season, Marine Biology, 10.1007/s00227-020-03735-5, 167, 8, (2020).
      • Improving species distribution model predictive accuracy using species abundance: Application with boosted regression trees, Ecological Modelling, 10.1016/j.ecolmodel.2020.109202, 432, (109202), (2020).
      • Spatial conservation prioritisation in data-poor countries: a quantitative sensitivity analysis using multiple taxa, BMC Ecology, 10.1186/s12898-020-00305-7, 20, 1, (2020).
      • Niche overlap between two sympatric frugivorous Neotropical primates: improving ecological niche models using closely-related taxa, Biodiversity and Conservation, 10.1007/s10531-020-01997-5, (2020).
      • Bias in presence-only niche models related to sampling effort and species niches: Lessons for background point selection, PLOS ONE, 10.1371/journal.pone.0232078, 15, 5, (e0232078), (2020).
      • Good Practices for Species Distribution Modeling of Deep-Sea Corals and Sponges for Resource Management: Data Collection, Analysis, Validation, and Communication, Frontiers in Marine Science, 10.3389/fmars.2020.00303, 7, (2020).
      • Wide dispersal of recently weaned grey seal pups in the Southern North Sea, ICES Journal of Marine Science, 10.1093/icesjms/fsaa045, (2020).
      • Decomposing Habitat Suitability Across the Forager to Farmer Transition, Environmental Archaeology, 10.1080/14614103.2020.1746880, (1-14), (2020).
      • Habitat suitability for primate conservation in north-east Brazil, Oryx, 10.1017/S0030605319001388, (1-11), (2020).
      • Validation of presence‐only models for conservation planning and the application to whales in a multiple‐use marine park, Ecological Applications, 10.1002/eap.2214, 0, 0, (2020).
      • Modelling geospatial distributions of the triatomine vectors of Trypanosoma cruzi in Latin America, PLOS Neglected Tropical Diseases, 10.1371/journal.pntd.0008411, 14, 8, (e0008411), (2020).
      • Spatial Intensity in Tourism Accommodation: Modelling Differences in Trends for Several Types through Poisson Models, ISPRS International Journal of Geo-Information, 10.3390/ijgi9080473, 9, 8, (473), (2020).
      • Advancing predictive modeling in archaeology: An evaluation of regression and machine learning methods on the Grand Staircase-Escalante National Monument, PLOS ONE, 10.1371/journal.pone.0239424, 15, 10, (e0239424), (2020).
      • Balancing Current and Future Reproductive Investment: Variation in Resource Selection During Stages of Reproduction in a Long-Lived Herbivore, Frontiers in Ecology and Evolution, 10.3389/fevo.2020.00163, 8, (2020).
      • Habitat selection patterns are density dependent under the ideal free distribution, Journal of Animal Ecology, 10.1111/1365-2656.13352, 0, 0, (2020).
      • Identifying priority conservation areas for a recovering brown bear population in Greece using citizen science data, Animal Conservation, 10.1111/acv.12522, 23, 1, (83-93), (2019).
      • Physiology in ecological niche modeling: using zebra mussel's upper thermal tolerance to refine model predictions through Bayesian analysis, Ecography, 10.1111/ecog.04627, 43, 2, (270-282), (2019).
      • Recolonization of native and invasive plants after large-scale clearance of a temperate coastal dunefield, Applied Geography, 10.1016/j.apgeog.2019.05.007, 109, (102030), (2019).
      • Importance and effectiveness of correction methods for spatial sampling bias in species with sex‐specific habitat preference, Ecology and Evolution, 10.1002/ece3.5765, 9, 23, (13188-13201), (2019).
      • Effects of climate change and horticultural use on the spread of naturalized alien garden plants in Europe, Ecography, 10.1111/ecog.04389, 42, 9, (1548-1557), (2019).
      • An integrated approach for cetacean knowledge and conservation in the central Mediterranean Sea using research and social media data sources, Aquatic Conservation: Marine and Freshwater Ecosystems, 10.1002/aqc.3117, 29, 8, (1302-1323), (2019).
      • Seasonal occurrence and abundance of dabbling ducks across the continental United States: Joint spatio‐temporal modelling for the Genus Anas, Diversity and Distributions, 10.1111/ddi.12960, 25, 9, (1497-1508), (2019).
      • A practical guide for combining data to model species distributions, Ecology, 10.1002/ecy.2710, 100, 6, (2019).
      • A concise guide to developing and using quantitative models in conservation management, Conservation Science and Practice, 10.1111/csp2.11, 1, 2, (2019).
      • Incorporating citizen science data in spatially explicit integrated population models, Ecology, 10.1002/ecy.2777, 100, 9, (2019).
      • Bunching up the background betters bias in species distribution models, Ecography, 10.1111/ecog.04503, 42, 10, (1717-1727), (2019).
      • Spatial patterns of discovery points and invasion hotspots of non‐native forest pests, Global Ecology and Biogeography, 10.1111/geb.12988, 28, 12, (1749-1762), (2019).
      • The MIAmaxent R package: Variable transformation and model selection for species distribution models, Ecology and Evolution, 10.1002/ece3.5654, 9, 21, (12051-12068), (2019).
      • Relationships between invasive plant species occurrence and socio-economic variables in urban green spaces of southeastern British Columbia, Canada, Urban Forestry & Urban Greening, 10.1016/j.ufug.2019.126527, (126527), (2019).
      • Sampling bias correction in species distribution models by quasi-linear Poisson point process, Ecological Informatics, 10.1016/j.ecoinf.2019.101015, (101015), (2019).
      • Data Integration for Large-Scale Models of Species Distributions, Trends in Ecology & Evolution, 10.1016/j.tree.2019.08.006, (2019).
      • Reliable species distributions are obtainable with sparse, patchy and biased data by leveraging over species and data types, Methods in Ecology and Evolution, 10.1111/2041-210X.13196, 10, 7, (1002-1014), (2019).
      • The area under the precision‐recall curve as a performance metric for rare binary events, Methods in Ecology and Evolution, 10.1111/2041-210X.13140, 10, 4, (565-577), (2019).
      • The recent past and promising future for data integration methods to estimate species’ distributions, Methods in Ecology and Evolution, 10.1111/2041-210X.13110, 10, 1, (22-37), (2019).
      • Interactive spatial scale effects on species distribution modeling: The case of the giant panda, Scientific Reports, 10.1038/s41598-019-50953-z, 9, 1, (2019).
      • Combining multiple data sources in species distribution models while accounting for spatial dependence and overfitting with combined penalized likelihood maximization, Methods in Ecology and Evolution, 10.1111/2041-210X.13297, 10, 12, (2118-2128), (2019).
      • Climate Change and Mountaintop-Removal Mining: A MaxEnt Assessment of the Potential Threat to West Virginian Fishes, Northeastern Naturalist, 10.1656/045.026.0304, 26, 3, (499), (2019).
      • Measuring Terrestrial Area of Habitat (AOH) and Its Utility for the IUCN Red List, Trends in Ecology & Evolution, 10.1016/j.tree.2019.06.009, (2019).
      • Predicting large-scale habitat suitability for cetaceans off Namibia using MinxEnt, Marine Ecology Progress Series, 10.3354/meps12934, 619, (149-167), (2019).
      • Changing windows of opportunity: past and future climate-driven shifts in temporal persistence of kingfish (Seriola lalandi) oceanographic habitat within south-eastern Australian bioregions, Marine and Freshwater Research, 10.1071/MF17387, 70, 1, (33), (2019).
      • Representing species distributions in spatially-explicit ecosystem models from presence-only data, Fisheries Research, 10.1016/j.fishres.2018.10.011, 210, (89-105), (2019).
      • Habitat use of toothed whales in a marine protected area based on point process models, Marine Ecology Progress Series, 10.3354/meps12820, 609, (239-256), (2019).
      • Spatiotemporal identification of roadkill probability and systematic conservation planning, Landscape Ecology, 10.1007/s10980-019-00807-w, (2019).
      • Challenges and opportunities in developing decision support systems for risk assessment and management of forest invasive alien species, Environmental Reviews, 10.1139/er-2019-0024, (1-28), (2019).
      • Bioregions in Marine Environments: Combining Biological and Environmental Data for Management and Scientific Understanding, BioScience, 10.1093/biosci/biz133, (2019).
      • Efficient Modelling of Presence-Only Species Data via Local Background Sampling, Journal of Agricultural, Biological and Environmental Statistics, 10.1007/s13253-019-00380-4, (2019).
      • A Sensitivity Analysis of the Application of Integrated Species Distribution Models to Mobile Species: A Case Study with the Endangered Baird’s Tapir, Environmental Conservation, 10.1017/S0376892919000055, (1-9), (2019).
      • New methods for measuring ENM breadth and overlap in environmental space, Ecography, 10.1111/ecog.03900, 42, 3, (444-446), (2018).
      • Opportunistic records reveal Mediterranean reptiles’ scale‐dependent responses to anthropogenic land use, Ecography, 10.1111/ecog.04122, 42, 3, (608-620), (2018).
      • Climate, human influence and the distribution limits of the invasive European earwig, , in Australia, Pest Management Science, 10.1002/ps.5192, 75, 1, (134-143), (2018).
      • blockCV: An r package for generating spatially or environmentally separated folds for k‐fold cross‐validation of species distribution models, Methods in Ecology and Evolution, 10.1111/2041-210X.13107, 10, 2, (225-232), (2018).
      • Understanding the connections between species distribution models for presence-background data, Theoretical Ecology, 10.1007/s12080-018-0389-9, 12, 1, (73-88), (2018).
      • Wrong, but useful: regional species distribution models may not be improved by range‐wide data under biased sampling, Ecology and Evolution, 10.1002/ece3.3834, 8, 4, (2196-2206), (2018).
      • Evaluating the individuality of animal‐habitat relationships, Ecology and Evolution, 10.1002/ece3.4554, 8, 22, (10893-10901), (2018).
      • Spatial modeling with R‐INLA: A review, WIREs Computational Statistics , 10.1002/wics.1443, 10, 6, (2018).
      • Landscape-scale distribution of tree roosts of the northern long-eared bat in Mammoth Cave National Park, USA, Landscape Ecology, 10.1007/s10980-018-0659-3, 33, 7, (1103-1115), (2018).
      • Penalized composite likelihoods for inhomogeneous Gibbs point process models, Computational Statistics & Data Analysis, 10.1016/j.csda.2018.02.005, 124, (104-116), (2018).
      • Species Distributions, Spatial Ecology and Conservation Modeling, 10.1007/978-3-030-01989-1, (213-269), (2018).
      • Spatial Dispersion and Point Data, Spatial Ecology and Conservation Modeling, 10.1007/978-3-030-01989-1, (101-132), (2018).
      • Environmental drivers of spatiotemporal foraging intensity in fruit bats and implications for Hendra virus ecology, Scientific Reports, 10.1038/s41598-018-27859-3, 8, 1, (2018).
      • Climate Change Could Increase the Geographic Extent of Hendra Virus Spillover Risk, EcoHealth, 10.1007/s10393-018-1322-9, 15, 3, (509-525), (2018).
      • Digital footprints: Incorporating crowdsourced geographic information for protected area management, Applied Geography, 10.1016/j.apgeog.2017.11.004, 90, (44-54), (2018).
      • Hen harrier Circus cyaneus nest sites on the Isle of Mull are associated with habitat mosaics and constrained by topography , Bird Study, 10.1080/00063657.2017.1421611, 65, 1, (62-71), (2018).
      • A Statistical Commentary on Mineral Prospectivity Analysis, Handbook of Mathematical Geosciences, 10.1007/978-3-319-78999-6, (25-65), (2018).
      • Improving the spatial allocation of marine mammal and sea turtle biomasses in spatially explicit ecosystem models, Marine Ecology Progress Series, 10.3354/meps12640, 602, (255-274), (2018).
      • A matter of timing: how temporal scale selection influences cetacean ecological niche modelling, Marine Ecology Progress Series, 10.3354/meps12551, 595, (217-231), (2018).
      • Using Ecological Modelling Tools to Inform Policy Makers of Potential Changes in Crop Distribution: An Example with Cacao Crops in Latin America, Economic Tools and Methods for the Analysis of Global Change Impacts on Agriculture and Food Security, 10.1007/978-3-319-99462-8, (11-23), (2018).
      • Citizen science records describe the distribution and migratory behaviour of a piscivorous predator, Pomatomus saltatrix, ICES Journal of Marine Science, 10.1093/icesjms/fsy057, 75, 5, (1573-1582), (2018).
      • (Under What Conditions) Do Politicians Reward Their Supporters? Evidence from Kenya’s Constituencies Development Fund, American Political Science Review, 10.1017/S0003055418000709, (1-17), (2018).
      • Assessment of the response of pollinator abundance to environmental pressures using structured expert elicitation, Journal of Apicultural Research, 10.1080/00218839.2018.1494891, (1-12), (2018).
      • Predicting Yellow Fever Through Species Distribution Modeling of Virus, Vector, and Monkeys, EcoHealth, 10.1007/s10393-018-1388-4, (2018).
      • Monitoring programs of the U.S. Gulf of Mexico: inventory, development and use of a large monitoring database to map fish and invertebrate spatial distributions, Reviews in Fish Biology and Fisheries, 10.1007/s11160-018-9525-2, (2018).
      • Beyond climate control on species range: The importance of soil data to predict distribution of Amazonian plant species, Journal of Biogeography, 10.1111/jbi.13104, 45, 1, (190-200), (2017).
      • Used‐habitat calibration plots: a new procedure for validating species distribution, resource selection, and step‐selection models, Ecography, 10.1111/ecog.03123, 41, 5, (737-752), (2017).
      • Improved species‐occurrence predictions in data‐poor regions: using large‐scale data and bias correction with down‐weighted Poisson regression and Maxent, Ecography, 10.1111/ecog.03149, 41, 7, (1161-1172), (2017).
      • Combining point‐process and landscape vegetation models to predict large herbivore distributions in space and time—A case study of Rupicapra rupicapra, Diversity and Distributions, 10.1111/ddi.12684, 24, 3, (352-362), (2017).
      • The zoon r package for reproducible and shareable species distribution modelling, Methods in Ecology and Evolution, 10.1111/2041-210X.12858, 9, 2, (260-268), (2017).
      • The importance of topographically corrected null models for analyzing ecological point processes, Ecology, 10.1002/ecy.1877, 98, 7, (1764-1770), (2017).
      • How to make more out of community data? A conceptual framework and its implementation as models and software, Ecology Letters, 10.1111/ele.12757, 20, 5, (561-576), (2017).
      • Breeding density, fine‐scale tracking, and large‐scale modeling reveal the regional distribution of four seabird species, Ecological Applications, 10.1002/eap.1591, 27, 7, (2074-2091), (2017).
      • Opening the black box: an open‐source release of Maxent, Ecography, 10.1111/ecog.03049, 40, 7, (887-893), (2017).
      • Using a novel model approach to assess the distribution and conservation status of the endangered Baird's tapir, Diversity and Distributions, 10.1111/ddi.12631, 23, 12, (1459-1471), (2017).
      • Museum specimen data reveal emergence of a plant disease may be linked to increases in the insect vector population, Ecological Applications, 10.1002/eap.1569, 27, 6, (1827-1837), (2017).
      • Modelling imperfect presence data obtained by citizen science, Environmetrics, 10.1002/env.2446, 28, 5, (2017).
      • Bias correction of bounded location errors in presence‐only data, Methods in Ecology and Evolution, 10.1111/2041-210X.12793, 8, 11, (1566-1573), (2017).
      • Integrated species distribution models: combining presence‐background data and site‐occupancy data with imperfect detection, Methods in Ecology and Evolution, 10.1111/2041-210X.12738, 8, 4, (420-430), (2017).
      • Environmental and managerial factors associated with pack stock distribution in high elevation meadows: Case study from Yosemite National Park, Journal of Environmental Management, 10.1016/j.jenvman.2017.01.076, 193, (52-63), (2017).
      • Restricted cross-scale habitat selection by American beavers, Current Zoology, 10.1093/cz/zox059, 63, 6, (703-710), (2017).
      • See more