Volume 3, Issue 3 p. 545-554
Free Access

Likelihood analysis of species occurrence probability from presence-only data for modelling species distributions

J. Andrew Royle

Corresponding Author

J. Andrew Royle

Correspondence author. E-mail: [email protected]Search for more papers by this author
Richard B. Chandler

Richard B. Chandler

U.S. Geological Survey, Patuxent Wildlife Research Center, Laurel, MD 20708, USA

Search for more papers by this author
Charles Yackulic

Charles Yackulic

U.S. Geological Survey, Patuxent Wildlife Research Center, Laurel, MD 20708, USA

Search for more papers by this author
James D. Nichols

James D. Nichols

U.S. Geological Survey, Patuxent Wildlife Research Center, Laurel, MD 20708, USA

Search for more papers by this author
First published: 31 January 2012
Citations: 347

Summary

1. Understanding the factors affecting species occurrence is a pre-eminent focus of applied ecological research. However, direct information about species occurrence is lacking for many species. Instead, researchers sometimes have to rely on so-called presence-only data (i.e. when no direct information about absences is available), which often results from opportunistic, unstructured sampling. maxent is a widely used software program designed to model and map species distribution using presence-only data.

2. We provide a critical review of maxent as applied to species distribution modelling and discuss how it can lead to inferential errors. A chief concern is that maxent produces a number of poorly defined indices that are not directly related to the actual parameter of interest – the probability of occurrence (ψ). This focus on an index was motivated by the belief that it is not possible to estimate ψ from presence-only data; however, we demonstrate that ψ is identifiable using conventional likelihood methods under the assumptions of random sampling and constant probability of species detection.

3. The model is implemented in a convenient r package which we use to apply the model to simulated data and data from the North American Breeding Bird Survey. We demonstrate that maxent produces extreme under-predictions when compared to estimates produced by logistic regression which uses the full (presence/absence) data set. We note that maxent predictions are extremely sensitive to specification of the background prevalence, which is not objectively estimated using the maxent method.

4. As with maxent, formal model-based inference requires a random sample of presence locations. Many presence-only data sets, such as those based on museum records and herbarium collections, may not satisfy this assumption. However, when sampling is random, we believe that inference should be based on formal methods that facilitate inference about interpretable ecological quantities instead of vaguely defined indices.

Introduction

Species distribution is naturally characterized by the probability of occurrence of a species, say ψ(x) = Pr(y(x) = 1) where y(x) is the true occurrence state of a species at some location (pixel) x (Kéry 2011). Inference about ψ(x) can be achieved directly from presence–absence data using logistic regression and related models (MacKenzie et al. 2002). However, ecologists are not always fortunate enough to have presence–absence data, and many data sets exist which only contain locations of species presence – so-called presence-only data.

maxent (e.g. Phillips et al. 2006) is a popular software package for producing ‘species distribution’ maps from presence-only data. Interestingly, maxent does not produce estimates of occurrence probability but, instead, produces estimates of an ill-defined ‘suitability index’ (Elith et al. 2011). Because maxent does not correspond to an explicit model of species occurrence, it is not suitable for making explicit predictions of an actual state variable or testing hypotheses about factors that influence occurrence probability. Support for producing indices of species distribution from presence-only data, as opposed to estimates of occurrence probability, has been justified in the literature based on the incorrect assertion that occurrence probability ψ (sometimes referred to as ‘prevalence’ or occupancy) cannot be estimated from presence-only data.

The principle aim of our paper is to show that occurrence probability can be estimated from presence-only data. We consider a formal model-based approach to analysis of presence-only data. We emphasize the critical assumption required for statistical inference about species occurrence probability from presence-only data, which is random sampling of space as a basis for accumulating presence-only observations. In addition, the estimator we devise here is most relevant only when species detection probability is constant. We conclude that, under these assumptions, inference about occurrence probability can be achieved directly from presence-only data using conventional likelihood methods (e.g. Lancaster and Imbens 1996). We suspect that this is surprising to many users of maxent and related species distribution modelling tools in the light of repeated statements to the contrary in the literature (e.g. Phillips and Dudik 2008; Elith et al. 2011; Kéry 2011), asserting that probability of occurrence is not identifiable. For example, Elith et al. (2010) state that

Formally, we say that prevalence is not identifiable from presence-only data (Ward et al. 2009). This means that it cannot be exactly determined, regardless of the sample size; this is a fundamental limitation of presence-only data.

In fact, Ward et al. (2009) do not make such a definitive claim. Their precise claim is

[...occurrence probability...] is identifiable only if we make unrealistic assumptions about the structure of [...the relationship between occurrence probability and covariates....] such as in logistic regression....

In that context, it seems that subsequent references to Ward et al. (2009) misconstrue their result. In our view, logistic regression (or other binary regression models) is hardly unrealistic. Indeed, such models are the most common approach to modelling binary variables in ecology (and probably all of statistics), especially in the context of modelling species occurrence (MacKenzie et al. 2002; Tyre et al. 2003; Kéry et al. 2010). Even more generally, the logistic function is the canonical link of the binomial GLM (McCullagh and Nelder 1989, p. 38) and, as such, it is customarily adopted and widely used, and even books have been written about it (Hosmer and Lemeshow 2000).

We demonstrate the application of the formal model-based framework for estimating occurrence probability from presence-only data using a data set derived from the North American Breeding Bird Survey, and we provide an r package for producing estimates of species distribution model parameters from presence-only data.

Before proceeding, we note that the statistical principle of maximum entropy (Jaynes 1957, 1963; Jaynes and Bretthorst 2003) is widely applied to problems in statistics and other disciplines, and our development here is not critical of these ideas. Rather, we are critical of the routine application of the software package maxent as applied to species distribution modelling. We specifically object to the pervasive views in the maxent user community that one should avoid characterizing species distribution by occurrence probability, that occurrence probability is not identifiable and that one should instead obtain indices of species occurrence probability by using maxent.

Genesis of presence-only data

The original motivation for the development of maxent was to estimate and model the distribution of a species using presence-only data (Dudik et al. 2004; Phillips et al. 2004). Species distribution is naturally characterized by occurrence probability, which provides a quantitative description of the probability of the focal species occurring at a location and a mechanism for generating explicit predictions of occurrence and testing hypotheses related to factors that influence occurrence. maxent attempts to approximate the probability of occurrence by using a logistic transformation of its suitability index (Phillips & Dudik 2008). Before explaining the details of this indirect method, we first consider a model to describe the genesis of presence-only data, and the common approach of estimating the probability of occurrence using standard sampling methods.

Occupancy or Presence/Absence Sampling Experiment

We imagine that presence-only data arise by randomly sampling spatial units, say x (e.g. corresponding to a pixel), and then observing a random variable y which we will assert is true presence or absence at x. The state space of potential values of x will be denoted by inline image, and we assume henceforth that inline image is prescribed by the investigator. The data resulting from random sampling are y(x) for each x in the sample, say x1,…,xN. Naturally (under random sampling), we assume
image(eqn 1)
where ψ(x) = Pr(y(x) = 1) is the probability that the species is present at pixel x– or the probability that pixel x is ’occupied’. To be explicit, we note that ψ(x) is the conditional probability of ygivenx, which we will subsequently denote by ψ(y|x). In practical applications, attention is usually focused on developing covariate models on the logit-transform of ψ(y|x), for example,
image
where z(x) is some landscape or habitat covariate. For example, elevation, forest cover or annual precipitation was measured at pixel x. This is a standard logistic regression model (e.g. Hosmer and Lemeshow 2000) and corresponds also to the state model underlying occupancy models (MacKenzie et al. 2006). The model is widely used to model occurrence, range and distribution of species. We note that while the logistic model is the most widely used because it is the canonical link function of the binomial GLM (McCullagh and Nelder 1989), many other link functions are available, although seldom are alternatives considered in ecological applications.

Presence-Only Sampling Experiment

We adopt the view here that presence-only data, that is, a sample of locations for which y = 1, arise by discarding the y = 0 observations from a data set that arose by random sampling as described earlier. That is, we sample pixels randomly and obtain x1,…,xN and record y(x1),…,y(xN). Then, we consider only those sites x1,...xn for which y(x) = 1. The corresponding subset of locations constitutes our data set, which we will label here x1,...,xn. We use ‘n’ here instead of ‘N’ as above and recognize that the presence-only x’s are a reordered version of a subset of the initial sample.

Likelihood analysis

The basic characteristic of presence-only data is that the variable y is no longer random in our sample, that is, because y = 1 with probability 1 for all observations. Instead, x is the random quantity, and the set of n locations x1,…,xn are the data upon which inference is based. Importantly, the specific values of x that appear in the sample represent a biased selection from all possible values inline image, favouring those for which y = 1. To clarify the nature of the induced bias in our sample of x, we invoke Bayes rule. In the remainder, we use π() to represent the probability distributions of x, and ψ() to represent probability distributions of y.

The central statistical problem in the analysis of presence-only data is to identify the likelihood of the observations x in the presence-only sample. To find the likelihood, we need to identify the conditional probability distribution π(x|y = 1), that is, that of x for presence-only (y = 1) pixels. We can compute π(x|y = 1) by an application of Bayes’ rule:
image(eqn 2)

This might appear to be an awkward invocation of Bayes rule because we often do not think of spatial location as the outcome of a stochastic process in most contexts. It is somewhat more natural in the context of environmental covariates (Lancaster and Imbens 1996; Lele and Keim 2006), but they are equivalent formulations. We proceed with this development in terms of x here because this is pervasive in the maxent literature.

The probability distribution π(x) is that describing the possible outcomes of the random variable x–‘pixel identity’. We regard the state space of x as discrete here, having M unique elements, and therefore π(x) = 1/M. ψ(y = 1|x) is the probability that y = 1 conditional on x– what we refer to as ‘occurrence probability’. Then, ψ(y = 1) is the marginal probability that a pixel is occupied, which is, by definition, the integral of ψ(y = 1|x) over inline image or, in the case of a discrete landscape, the sum over all elements of inline image:
image
We see that this is the ‘spatially averaged’ occurrence probability and has also been called prevalence in the literature (e.g. Ward et al. 2009).

Equation 2 makes it clear that the x’s for which y = 1 are not a representative sample of all x’s. Intuitively, the presence-only sample (i.e. x’s for which y = 1) will favour pixels for which ψ(y = 1|x) is large relative to ψ(y = 1).

In this expression of Bayes rule, the variable x is ‘pixel identity’ for which π(x) is constant. It is not an indicator of whether pixel x appears in the sample. In the latter case, Pr(y = 1|x) would be the probability of occupancy conditional on pixel x being in the sample which has no clearly useful meaning. That said, random sampling is important for the invocation of Bayes rule – by imagining that the presence-only sample arises by first random sampling pixels for presence–absence and then discarding the y = 0 pixels (See Appendix). Alternatively, it can be justified by sampling randomly the sample frame consisting of all y = 1 observations. Under random sampling, either with or without replacement, the probability that a sample unit appears in the sample is constant and thus has no effect on Eqn 2.

The Likelihood

We note that this expression of Bayes rule appears in a large number of species distribution modelling papers that involve the development or application of maxent. However, these papers never provide further analysis of the result, instead making the (incorrect) claim that direct analysis of π(y = 1|x) is intractable because the background prevalence (ψ(y = 1) here) is not identifiable. In fact, this is incorrect, as has been noted in other contexts that produce similarly biased data (e.g. case–control studies, Lancaster and Imbens 1996, and ‘resource selection probability functions’, RSPF; Manly et al. 2002; Lele and Keim 2006). To clarify this, we use the previous application of Bayes rule to describe the likelihood for the presence-only data.

We emphasize that ψ(y = 1|x) here is precisely the ‘probability of occurrence’ or occupancy probability, that is, Pr(y = 1|x), as in MacKenzie et al. (2002, 2006) and Tyre et al. (2003). In practice, these probabilities would usually depend on parameters, say β, which we write ψ(y|x;β). For example, occurrence probability might vary according to a polynomial response over space, on an appropriate scale. Therefore, π(x|y = 1) is
image
and with π(x) constant, it cancels from the numerator and denominator and so
image(eqn 3)
The likelihood for an observation xi in our presence-only data set is based on π(x|y = 1;β) regarded as a function of the parameters β. Therefore, for the presence-only sample x1,…,xn, the joint likelihood is
image(eqn 4)
Parameters β can be estimated by maximizing the likelihood using standard methods. Further, inferences in the form of hypothesis tests, confidence intervals or model selection can be achieved using conventional statistical ideas.

As noted earlier, the denominator is the marginal probability of occurrence over the landscape, and it is computed by summing over all elements of inline image where inline image is the state space of x, that is, the landscape as defined by the analyst. Clearly, this marginal probability could be estimated by evaluating ψ(y = 1|x) at a random sample of points x independent of y (Lele and Keim 2006). Sometimes, a sample of inline image chosen independent of y is referred to as the ‘background’ in species distribution modelling or, in the context of case–control models, ‘contaminated controls’ (Lancaster and Imbens 1996).

Imperfect Detection of Species

In practice, we expect bias in observing species presence, such that the probability of detecting a species given that it is present should be less than 1. A standard model of this phenomenon (MacKenzie et al. 2002; Tyre et al. 2003) is constructed as follows: Let yobs be the observed species presence and then define the probability of detection as Pr(yobs = 1|y = 1 , x) = p. If p is constant spatially, the marginal probability of the contaminated observations yobs is pψ(y|x), and we see that the constant p cancels from the Eqn 4 and inferences about ψ are unaffected.

Geographic vs. Environmental Space

maxent is now always discussed in terms of environmental covariates, say z (typically vector-valued), whereas we have developed the problem thus far in terms of space, x. Elith et al. (2011; Appendix) make the first attempt we know of to reconcile what they call the ’geographic space’ (in terms of x) and the ‘environmental space’ (in terms of z) formulations. Earlier papers seem to simply substitute z for x without much (if any) discussion of that. Conceptually, there is no reason to regard x and z differently with regard to the application of Bayes rule, and so simply substituting z for x is reasonable, if we assume that z is randomly sampled instead of x. In discrete space, this is assured because x in that case is merely an index to unique elements of z, so that z(x) is effectively a 1-to-1 transformation of x, that is, the elements of the sample frame of z(x) are associated with unique elements of x. It is then somewhat more natural to view the application of Bayes rule in terms of how occupancy status relates to environmental covariates because the view of z as a random variable is standard. When y only depends on x through z, this is often expressed as:
image

By the law of total probability, the marginal probability ψ(y = 1|z(x)) can be computed directly or estimated if a random sample of z(x) independent of y is available (Lele and Keim 2006).

We introduce covariates into the model by modelling the relationship between ψ(y = 1|z(x)) and those covariates, for example, using a logit link:
image(eqn 5)
where β0 is an intercept and the other β’s are coefficients associated with each of J covariates.

Likelihood Analysis in R

We provide an example here involving a landscape comprised of 10 000 pixels. We consider a single covariate z, which for our purposes we define by simulating 10 000 values from a standard normal distribution. The probability of occurrence is defined as
image
We generated y (true occurrence) for each pixel on the landscape as a Bernoulli trial with probabilities ψ(x) and sampled 2000 occupied pixels for which we used the resulting values of z(x) as our data set. The likelihood function definition requires about a half-dozen lines of R code. Every line of R code for simulating data, defining the likelihood and obtaining the MLEs is given in the following box:

z<- rnorm(10000,0,1) # simulate a covariate

lpsi<- -1 -1*z # define the linear predictor

# occurrence probability

psi<-exp(lpsi)/(1+exp(lpsi))

# generate presence-absence data

y<-rbinom(10000,1,psi)

# keep the presence-only data

data<- sample(z[y==1],2000)

# define the neg log-likelihood

lik<-function(parm){

beta0<-parm [1]

beta1<-parm [2]

 gridpsi<-

exp(beta0+beta1*z)/(1+exp(beta0+beta1*z))

 datapsi<-

exp(beta0 + beta1*data)/(1+exp(beta0+beta1*data))

-1*sum(log(datapsi/(sum(gridpsi))))

}

# minimize it

out<-nlm(lik,c(0,0),hessian=TRUE)

 # produce the estimates

out$estimate

We conducted 5000 simulations under the model described above and found that the MLEs were unbiased (Fig. 1). Furthermore, the log-likelihood has a distinct mode (Fig. 2), indicating that occurrence probability, ψ(y = 1|z), is identifiable in the situations we examined, under random sampling. We do note, however, that there is a prominent ridge in the likelihood, highlighting the low information content inherent in presence-only data. We developed an r package ‘maxlike’, which implements the likelihood analysis in some generality.

Details are in the caption following the image

Distributions of the maximum likelihood estimates obtained by fitting our model to 5000 simulated data sets. β0 and β1 are the intercept and slope parameters of the linear model of occurrence probability (ψ(y = 1|z)). The data-generating values are indicated by vertical lines. Kernel density estimators were used to represent the distributions.

Details are in the caption following the image

The log-likelihood surface of the maxlike model for a data set simulated using logit(ψ(y = 1|z)) = −1−1*z. The ‘X’ indicates the maximum. Parameters of the model are identifiable, but there exists a prominent ridge in the likelihood.

MAXENT analysis

The definition of the likelihood for an observed sample of x from presence-only sites is straightforward using Bayes Rule as we demonstrated earlier. maxent is not using the likelihood as a basis for estimation. Instead, maxent is operating on what the various authors refer to as the ‘maximum entropy distribution’. Using a single covariate, z(x), as an example, Phillips & Dudik (2008; and others) define the ‘maximum entropy distribution’ as:
image(eqn 6)
As in the development of Eqn 4, the summation in the denominator is taken over the background or, if available, all pixels inline image. In addition, maxent obtains β by maximizing a penalized version of Eqn 6.
It is natural to wonder how q(x) relates to the likelihood given above in Eqn 4. It is clear from the development (e.g. Phillips and Dudik 2008 and others) that maxent implies the following strict equivalence
image
For example, Phillips and Dudik (2008) and subsequent papers state that maxent is ‘estimating π(x)’, where π(x) is their notation for what we labelled π(x|y = 1) above. To be precise, the Phillips and Dudik (2008) state: ‘However, if we have only occurrence data, we cannot determine the species’ prevalence (Phillips et al. 2006; Ward et al. 2009). Therefore, instead of estimating P(y = 1|x) directly, we estimate the distribution π.’

We therefore are led to ask: In what sense is maxent‘estimating π’? The only clear interpretation of π(x|y = 1) ≡q(x) is that maxent is estimating a specific version of π(x|y = 1) in which Pr(y = 1|z(x)) = ψ(y = 1|z(x)) = exp(βz(x)) (i.e. occurrence probability is modelled as an exponential function) and, furthermore, a penalized form of that specific π(x|y = 1). Clear advantages to either of these two methodological choices (ψ(y = 1|z(x)) = exp(βz(x)) and the penalty) have not been established. In particular, modelling probabilities by a simple exponential function does not appear to be customary, or even very natural, as it does not have bounded support on [0,1] as ψ(y = 1|z(x)) must.

Identifiability of β0 or ‘Species Prevalence’

There is a widespread and incorrect belief (see Phillips and Dudik quote above) that species prevalence (Elith et al. 2011 also use the term ‘proportion of occupied sites’) cannot be determined from presence-only data (Phillips and Dudik 2008; Elith et al. 2011; Kéry 2011), and this is widely used as justification for producing vaguely defined ‘suitability indices’. While this is repeatedly asserted, there is never any specific discussion or argument as to why this is the case. In fact, it is in direct contradiction to existing literature (Lancaster and Imbens 1996; Lele and Keim 2006).

As we demonstrated, lack of identifiability of occurrence probability is not a general feature of presence-only data. We can clearly estimate the intercept term in Eqn 5 by maximum likelihood, if a suitable parametric form of ψ(y = 1|z) is assumed (Lancaster and Imbens 1996) and a continuous covariate is present (Lele and Keim 2006). Lacking a particular parametric form, one must know ψ(y = 1) (Lancaster and Imbens 1996) and, if covariates are only nominal categorical, then only relative probabilities of occurrence are achievable (Lele and Keim 2006). Conversely, it is clear that, no matter the composition of covariates, with the choice ψ(yi = 1|z) = exp(βzi), the intercept term is not identifiable, as the intercept cancels from Eqn 6 (Lele and Keim 2006). Thus, the inability to estimate occurrence probability is a feature of the specific model used by maxent and not a feature of presence-only data. As such, we do not see an advantage to using the exponential form for ψ(y = 1|z) over the more conventional logistic occurrence probability model.

Logistic output from maxent

In relying on the ‘maxent distribution’, instead of adopting a direct focus on occurrence probability, the ability to estimate the intercept is lost. Despite this, maxent provides an ad hoc approach to producing ‘logistic output’ (Phillips and Dudik 2008; Elith et al. 2011, Appendix). This procedure amounts to prescribing a particular value of β0, so that the output of maxent can be interpreted as a probability. Indeed, graphical output from maxent explicitly and misleadingly labels such output ‘probability of presence’. Phillips and Dudik (2008) imply some objective basis for this procedure, by stating (Phillips and Dudik 2008, abstract) ‘we describe a new logistic output format that gives an estimate of the probability of presence’ (our emphasis). The implication here is that one is able to estimate‘probability of presence’ using maxent. In fact, as the rest of the paper makes clear, the user is required to prescribe a specific value for the intercept. Phillips and Dudik provide the following equation:
image
or, equivalently,
image
where they recommend setting β0H where H is the mean log(q) for the observed data. They provided an argument that one might as well set β = 0 so that the procedure ‘...assigns typical presence sites probability of presence close to 0.5’. (Phillips and Dudik 2008, p. 165). This is questionable because β0 for ‘typical presence sites’ could be any real number, and thus, the bias of maxent will depend on how different the true value is from the prescribed value of β0. Thus, Phillips and Dudik (2008) fail to provide an objective approach to estimating this value.

‘Regularization’ in maxent

In maxent, the specific objective function maximized is not the likelihood given in Eqn 4 earlier. Rather, it is the exponential function along with a penalty term that has the effect of penalizing the maximum entropy distribution for large covariate effects. This penalization is termed ‘regularization’ in the maxent terminology. The effect of the penalty is clear – it shrinks the regression coefficients to 0 – and it is a standard concept in smoothing methods and other contexts (Green and Silverman 1994; Tibshirani 1996). General motivation for the need of a penalty term in the context of species distribution modelling is not clear, and we have not seen specific justification given in the literature other than the suggestions that it may prevent over-fitting or save the user time by avoiding the need to formally compare competing hypotheses (Phillips et al. 2004). As an alternative to regularization, we recommend that exercising restraint in the creation of covariate data sets and maintaining a focus on developing a priori models can also prevent over-fitting.

The practical problem in using the penalized objective function is that it will generally lead to biased estimators of the important β parameters, and we believe that this should be considered and understood by users of maxent prior to analysis. In our view, this penalty is not necessary in developing occupancy models from presence-only data. Moreover, the way in which it is handled by maxent seems ad hoc– which is to say, the smoothing parameters are fixed a priori based on heuristics.

There could be at least two situations in which using a penalized objective function is a sensible thing to do. One is when no obvious model set can be developed a priori and the number of potential models is extremely large. In that case, we might wish to fit some omnibus complex model for the sake of hypothesis generation. A second possibility for using a penalty to the objective function is in the presence of sparse data, or small sample size relative to the number of predictors. In that case, some model parameters will be weakly identified, and the penalty term essentially keeps the parameter in a reasonable region of the parameter space and probably alleviates numerical errors and other pathologies that you might expect in such cases.

MAXENT Scalings of q(x)

To implement precisely the estimation problem as implemented in maxent, it is not sufficient to understand the ‘maximum entropy distribution’. This is because maxent implements various scalings to force q(x) to be between 0 and 1. In particular, maxent uses two ‘normalizing constants’ that are involved in the calculation of q(x): the linear normalizer (LN) and the density normalizer (DN), in addition to the penalty (λ). Thus, maxent minimizes the following function:
image
The DN is simply equal to the sum of exp(LN+βz(x)) over all x (not just the observed locations) and thus is just a function of the observed z and the coefficient β. The LN is chosen to equal −1 * maximum value of (β*z(x)), ensuring that LN+β*z(x) is less than zero for all x.

We see that LN itself is a redundant parameter because exp(LN+β*z(x)) = exp(LN)*exp(β*z(x)) and LN is included in both the denominator and numerator. Furthermore, we can absorb DN directly into β, yielding the reparameterization β* = β/DN. As such, the scaling of q does not appear be a meaningful methodological element of the maxent problem.

Comparison between maxent and maxlike

Simulation Study

We compared the maxent suitability index to estimates of ψ(x) obtained by maximizing the likelihood in Eqn 4 by fitting both models to 100 single simulated data sets. We created an artificial landscape in which each pixel had associated elevation and precipitation data. We generated presence–absence data using the following model:
image
where β = {0·5,2,−2,2,−1} and both elevation and precipitation were standardized to have mean 0 and unit variance. We then sampled 1000 presence locations and discarded the absence locations. We estimated parameters using maxent and by maximum likelihood using our r package ‘maxlike’. Figure 3 illustrates the general bias of maxent predictions for estimating ψ(x) and shows that the magnitude of the bias has a nonlinear form. For low values of ψ(x), the index is biased high, whereas it is biased low when ψ(x) is high. The fact that maxent’s ‘logistic output’ is not proportional to ψ(x) results from the program making a guess about species prevalence. If we were to change the default value to some other value, we would have found different results, and possibly higher degrees of bias. This nonconstant bias greatly limits the utility of the MAXENT output as an index.
Details are in the caption following the image

Comparison between maxlike and maxent estimates to true values of ψ(y = 1|z(x)). The grey lines represent the relationship between the estimate and the true value for each of 100 simulated data sets. The maxent index is not proportional to the probability of occurrence.

North American Breeding Bird Survey

We applied the maximum likelihood estimator of occurrence probability to data from the North American Breeding Bird Survey (BBS) and compared the resulting estimates to predictions based on logistic regression and also to maxent’s ‘logistic output’. We fitted models to data on the Carolina wren (Thryothorus ludovicianus) using four land cover variables (per cent cover of mixed forest, deciduous forest, coniferous forest and grasslands) and latitude and longitude. We considered quadratic effects for each covariate. Fitting this model in maxent required that we modified the default settings, so that so-called hinge, threshold and product features were disabled. We restricted our analysis to the 2222 BBS routes surveyed in the United States during 2006, the year when the land cover variables were measured. Each BBS route is an approximately 40-km-long stretch of road consisting of 50 ‘stops’– points at which observers record counts of all bird species seen or heard during a 3-min period.

Traditional analyses of BBS data treat either the stop or the route as the sample unit; however, maxent requires data formatted as rasters (spatially referenced grids) and treats the pixel as the sample unit. Thus, we imposed a 25 km2 grid over the study area, and for pixels with >1 stop, we classified the pixel as being occupied (yi = 1) if ≥1 detection was made at any of the stops in the pixel or unoccupied (yi = 0) otherwise. Note that only logistic regression made use of the yi = 0 data. Covariate values for each pixel in the United States were used as the ‘background’.

Of the three methods, the logistic regression model makes the most use of the data and thus is expected to outperform the presence-only models. More generally, presence–absence data should always be preferred to presence-only data because observed zeros are informative about the species’ range. Predicted probability maps from logistic regression have a clear interpretation as the probability that a pixel would yield an observation of the species in question. For these reasons, we considered the results of the logistic regression models as the standard against which we compared results of the other two estimators.

Maps of the Carolina wren distribution created using each of the 3 estimation methods are shown in Fig. 4. Salient points of the analysis are that maxent’s logistic output is not as similar to the logistic regression predictions as those obtained by the maximum likelihood estimator and are generally inconsistent with the observed data in that sense that the resulting ‘index’ of species range is less defined and more geographically diffuse. From the maps, we see that maxent’s ‘logistic output’ greatly underestimates the probability of occurrence throughout the core of the species’ range and overestimates occurrence probability in regions where the species was never detected. The reason for this bias is the same as the bias in our simulation study –maxent uses a default intercept value that implies baseline prevalence of 0·50. Clearly, the maxent predictions will depend on this value which is not estimated from the data using the maxent procedure, and this subjectivity prohibits a clear interpretation of the index. Thus, while one might obtain more consistency using a different value of this parameter, there is no objective basis for setting this value.

Details are in the caption following the image

Maps of the Carolina wren distribution generated using the three estimators applied to Breeding Bird Survey data.

Another important limitation of maxent for modelling these data relates to the difficulty of specifying the desired model of interest. For example, one cannot test for a specific interaction because the software requires that either all or none of the possible interactions are tested. Similarly, one cannot evaluate the possibility of a specific quadratic effect.

Discussion

Inference about occupancy from presence-only data has proved to be elusive. Rather than developing methods for direct inference about species occurrence, ecologists have settled for the production of ill-defined ‘suitability indices’ such as those issued by maxent. However, under random sampling, formal statistical inference about the probability of species occurrence can be achieved from presence-only data using conventional likelihood methods (Lancaster and Imbens 1996). We imagine that inference based on this likelihood should be accessible to practitioners familiar with ordinary statistical concepts.

Our simulation study using a standard logistic regression type of model indicates that occupancy probability is identifiable from presence-only data, consistent with what has been shown in related classes of models (Lancaster and Imbens 1996; Lele and Keim 2006). Some might argue that parametric assumptions are overly restrictive and, as a result, it is better to estimate something vaguely defined, which only might be proportional to occurrence probability. However, the most sensible and natural interpretation of the model underlying maxent is that it also assumes a parametric relationship between ψ(y = 1|z) and covariates, one that is exponential. This is widely justified based solely on the incorrect assertion that the marginal probability of occurrence is not identifiable, and not based on any specific benefit of the exponential function. The lack of identifiability problem is specific to the parametric model that is implemented in maxent, and not a general feature of presence-only data. In our view, it does not make sense to forgo estimation of what is an eminently sensible quantity in the absence of any concrete technical or conceptual argument.

The ability to estimate occurrence probability parameters from presence-only data naturally requires larger sample sizes than from presence–absence data. Our analysis of data from the North American BBS provides sufficient data for this purpose, but such data may be unavailable in many studies. This was noted by Ward et al. (2009), who qualified their statement about identifiability by noting that

Even when [...occurrence probability...] is identifiable, the estimate is highly variable.

While clearly the precision of estimates of model parameters in any specific application is a matter of sample size and complexity of the model, this does caution that sufficiently precise estimates might not be produced for all applications. We do not view this as a serious deterrent to seeking out estimates for parameters of models that are ecologically sensible independent of whether or not data are available to achieve a certain level of precision. Whether an estimator is ‘highly variable’ in a situation is not relevant if the estimand is the object of inference, and there are not competing (presumably more precise) estimators available for achieving that objective.

We emphasize that random sampling is critical because under this assumption, the marginal probability that a presence-only unit is included in the sample is constant, and thus, the sample inclusion probability does not affect π(x|y = 1) constructed by invocation of Bayes rule. The issue of imperfect detection does not affect the development of the estimator, but it does affect interpretation of results. If detection probability, p, is constant spatially, then parameter estimators under the random sampling presence-only model are unaffected. Despite this, imperfect detection probability poses a number of complications, because it stands to reason that detection probability should be influenced by a whole host of things including nuisance effects such as ‘effort’– which might include things such as human population or road density, such that what we obtain in many samples is a random sampling of presence-only sites from among those that are easy to access or sample. Also, detection probability might be related to ecological processes including population density of the species being studied, such that detection is more likely to occur at high-density sites and vice versa (Royle and Nichols 2003). In such cases, we expect to obtain a sample of presence-only sites that is biased toward high-density sites. Despite the importance of imperfect detection, it is possible to accommodate this in formal models for inference about species occurrence probability. So-called ‘occupancy models’ (MacKenzie et al. 2002; Tyre et al. 2003) generalize the logistic regression model to allow for imperfect detection, that is, false negatives, and some work has also been done to accommodate false positives within the occupancy modelling framework (Royle and Link 2006; Miller et al. 2011). In our view, application of any statistical procedure to presence-only data should acknowledge the core assumptions, seriously consider consequences of their violation on inferences and discuss them in the context of the specific study.

maxent is a popular software package for producing species distribution models, which is not based on a formal model for species occurrence. As such, the focus of maxent, as it is applied in practice, is not on formal inference about mechanisms responsible for observed species distribution pattern. We believe that it is not widely appreciated that direct inference about occurrence probability can be achieved using standard likelihood methods. We believe that the likelihood approach advanced in this paper offers a better framework for species distribution modelling because it allows users to estimate the fundamental parameter governing species distributions, the probability of occurrence. In this paper, we have also tried to provide context to certain technical facets of maxent that have important consequences. Principal among those are the implicit assumption equating the conditional probability Pr(x|y = 1) to the specific exponential function referred to as the ‘maximum entropy distribution’ and the implication that occupancy probability is modelled by an exponential model, thereby neglecting estimation of the intercept parameter and forsaking the ability to estimate occurrence probability. Second, we believe that many maxent users are unaware of the relevance of the penalty term that appears fundamental to maxent. To date, there has not been a formal justification of the need, importance or consequences of the penalty in the context of species distribution modelling, and there has been no mention of the possible bias introduced by this procedure. In our view, poorly motivated and justified technical elements of maxent distract from understanding the central inference problem of species distribution modelling.

Acknowledgements

We thank Peter Blank for providing habitat and weather data sets used in the BBS analysis, and members of PWRC's WMS Research Group.