Revisiting resource selection probability functions and single‐visit methods: clarification and extensions
Summary
- Models accounting for imperfect detection are important. Single‐visit (SV) methods have been proposed as an alternative to multiple‐visit methods to relax the assumption of closed population. Knape & Korner‐Nievergelt (Methods in Ecology and Evolution, 2015) showed that under certain models of probability of detection, SV methods are statistically non‐identifiable leading to biased population estimates.
- There is a close relationship between estimation of the resource selection probability function (RSPF) using weighted distributions and SV methods for occupancy and abundance estimation. We explain the precise mathematical conditions needed for RSPF estimation as stated in Lele & Keim (Ecology, 87, 2006, 3021). The identical conditions that remained unstated in our papers on SV methodology are needed for SV methodology to work. We show that the class of admissible models is quite broad and does not excessively restrict the application of the RSPF or the SV methodology.
- To complement the work by Knape and Korner‐Nievergelt, we study the performance of multiple‐visit methods under the scaled logistic detection function and a much wider set of situations. In general, under the scaled logistic detection function, multiple‐visit methods also lead to biased estimates.
- As a solution to this problem, we extend the SV methodology to a class of models that allows use of scaled probability function. We propose a multinomial extension of SV methodology that can be used to check whether the detection function satisfies the RSPF condition or not. Furthermore, we show that if the scaling factor depends on covariates, then it can also be estimated.
- We argue that the instances where the RSPF condition is not satisfied are rare in practice. Hence, we disagree with the implication in Knape & Korner‐Nievergelt (Methods in Ecology and Evolution, 2015) that the need for RSPF condition makes SV methodology irrelevant in practice.
Introduction
Occupancy models (MacKenzie et al. 2002) and N‐mixture models (Royle 2004) are popular approaches to deal with imperfect detection of unmarked organisms. These methods, in their original formulations, require replicate visits to sites. For the identifiability of the model parameters, these methods need that the population be closed during the replicate visits; that is, there is no emigration, immigration, births or deaths between the visits. This assumption is often difficult to satisfy in practice and can lead to biased estimates of occupancy or abundance (Rota et al. 2009; Bayne, Lele & Sólymos 2011). Logistical and cost‐related issues arise naturally when conducting repeated surveys. For example, is it pragmatic to visit many sites a small number of times or a small number of sites many times? Lele, Moreno & Bayne (2012) and Sólymos, Lele & Bayne (2012) proposed a solution where they showed that under some easily satisfied conditions, the single‐visit (SV) surveys could be used to correct for imperfect detection. This avoids the need for the closed population assumption and eliminates the costs associated with repeated visits. The SV approach is logistically simpler and can be applied to historical data sets that lack replicated surveys. It can also be argued that, with the same budget, a SV methodology allows researchers to study larger geographical areas than replicate‐visit methods and thus gain greater generality. Given the financial limitations most ecological studies face, this can be an important deciding factor.
In a recent paper, Knape & Korner‐Nievergelt (2015), henceforth referred to as K&K, showed that under the log‐link and the scaled logistic model for probability of detection, the SV method can estimate relative change in the abundances but not the absolute abundances. They also point out the relationship between their results and possible problems in the estimation of resource selection probability functions (RSPF) as suggested in Lele & Keim (2006). We are very grateful to the authors for pointing out these issues and particularly the relationship with results in Lele & Keim (2006).
The comments by K&K have made us think about our methodology in a rigorous fashion. We realized that in our papers on SV methodology, we should have stated the necessary conditions on the class of permissible models for probability of detection and occupancy under which the identifiability holds. Such a condition, which we now refer to as RSPF condition, was stated in Lele & Keim (2006). If the RSPF condition is satisfied, it is possible to estimate absolute probability of selection. Similarly, if the models for probability of detection satisfy the RSPF condition, the results related to SV methodology remain valid. We note that both K&K and Hastie & Fithian (2013) observe non‐identifiability because the log‐link and the scaled logistic model do not satisfy the RSPF condition. Fortunately, neither the log‐link nor the scaled logistic model has found much use in practice. In our experience, most models that are actually used in practice do satisfy the RSPF condition. In this paper, we argue that providing recommendations for statistical methods should not ignore practical considerations, namely how often an extreme situation when the method fails is likely to happen in practice.
The RSPF condition: permissible class of models
Estimation of the probability of selection
and
denote the distribution of resources on available and used units, respectively. They are related to each other as:

denotes the probability of selecting the resource
, given that it is encountered (Lele & Keim 2006; Lele et al. 2013). It is called the RSPF. By definition, the function
is any function that takes values between 0 and 1. The ratio function
, the relative probability of selection, is commonly known as the resource selection function. We emphasize that this ratio function need not be an exponential function. For example, if
is based on a logit or complementary log–log link, the ratio function
is not an exponential function.
It is well‐established that, given data arising from
, one can estimate the ratio
. However, because the scientific interest lies in estimating
, Lele & Keim (2006) asked the question: if we have an estimate of the ratio
, what conditions will allow inference on
? Or, equivalently, can we identify the parameter β using only the knowledge of the ratio function? The necessary condition for this was found to be as follows: the RSPF π(x; β) should be such that if β ≠ β*, then
for any constant K > 0 (Gilbert, Lele & Vardi 1999; Lele & Keim 2006). For ease of reference, we call this the RSPF condition and the class of models that satisfy this condition as RSPF model class. Loosely speaking, the RSPF condition reduces to the following two conditions: (i) not all covariates in the model are categorical, and (ii) the function
is nonlinear and involves all components of the parameter vector.
We first note that the exponential function or scaled logistic (or, in general any scaled function) with unknown scaling constant does not satisfy the RSPF condition. If all covariates in the model are categorical covariates with finite number of categories, from the loglinear model theory it follows that the probability of selection is necessarily modelled by the exponential function. Hence, we need the condition that not all covariates are categorical. In the following, we explain the mathematical reasoning behind the RSPF condition in a simple situation.
A necessary step in any model‐fitting exercise is that we have an idea about the form of the model that we want to fit. Hence, we assume that the model form
is specified. Given this model specification, the goal of statistical analysis is to infer about the parameter β. First, we note that if we know the ratio
, we know the difference
. For the simplicity of the argument, let us consider the case where there is only one continuous covariate and that the function is differentiable in x. Then, knowing the difference
for every pair (x, y) is equivalent to knowing
. It follows that only those parameters that are involved in the function
are potentially identifiable. Further, the RSPF condition can be restated as a necessary and sufficient condition:
if and only if β = β*. The RSPF condition restricts the model space for
so that
depends on all components of the parameter vector β. This leads to a simple conclusion that RSPF model class consists of functions that can be written as:
where c(β) is not a constant function of β. Simple calculus shows that if RSPF is based on logit or complementary log–log link function, the RSPF condition is satisfied. Consideration of the polynomial functions in the exponent part of the logistic or complementary log–log functions provides us with a very flexible class of models that satisfy the RSPF condition (Fig. 1). Such functions take values between 0 and 1 but may never reach or even come arbitrarily close to either of the boundary values (Figs 1 and 2). In the Appendix S1, Supporting Information, we provide a simple program to generate data under any RSPF model and to estimate its parameters. Readers can try different model forms for
and see for themselves whether the methods work or not. Remember that if the nonlinearity on the log‐scale is weak, one may need very large sample sizes to get reasonable estimates.


SV occupancy studies
Let z and x denote the covariates that affect detection and occupancy, respectively. Let p(z; θ) and Ψ(x; β) denote the probability of detection and probability of occupancy, respectively. Again, for the sake of simplicity, we will consider single continuous covariates. It is trivial to see that we can estimate p(z; θ)Ψ(x; β) from SV data. The question is: Given this product, when can we separate out the components?
For notational simplicity, let us denote the product function p(z; θ)Ψ(x; β) by h(x, z, β, θ). We first note that if we know h(x, z, β, θ) and one of the components, say p(z; θ), then we can obtain the other component, Ψ(x; β). Second thing to note is that the ratio p(z; θ)/p(z′; θ) can be obtained from the product function by h(x, z, β, θ)/h(x, z′, β, θ). Hence, it follows that the necessary condition for identifiability is that p(z; θ) belongs to the RSPF model class. Conversely, we may impose the condition that Ψ(x; β) belongs to the RSPF model class. The RSPF condition needs to be satisfied by at least one of the two components. This allows us to decompose the product function into two components. However, to determine which component is detection and which component is occupancy, we need to impose the condition that the set of covariates that affect detection and covariates that affect occupancy should not be completely overlapping.
Because of the low information content in binary data, estimation of occupancy and detection from single‐survey data is extremely difficult (Welsh, Lindenmayer & Donnelly 2013). As described in Moreno & Lele (2010) and Lele, Moreno & Bayne (2012), one may need to use penalized likelihood function to stabilize the estimators. Unfortunately, it is not clear how to choose a good penalty function in general. In the Supplementary Information, we use a quasi‐Bayesian approach where the means of the prior distributions for the parameters are determined from the observations themselves instead of based on a probabilistic quantification of ‘belief’. This seems to stabilize the estimation process considerably, resulting in estimators that are nearly unbiased. Unfortunately, this estimation method lacks a strong theoretical basis. We are currently exploring the use of expert opinion (Lele & Allen 2006) to stabilize the estimators in this situation.
SV abundance surveys
As compared to species occupancy, abundance data are more informative, and hence, there seems to be little need for stabilizing the estimators. Under the log‐link model for the abundances, it is clear that one can obtain estimates of p(z; θ)exp(Xβ). Hence, following the logic described above, as long as p(z; θ) belongs to the RSPF model class, we can estimate the abundances using SV survey data, irrespective of the link function for the abundance.
Notice that RSPF model class does not require models to reach the boundary values 0 or 1 for any covariate combination (Fig. 2). In the Supplementary Material, we provide programs that the readers can use to test the results described above. We also note that, given the nonlinear nature of the problem, it is analytically impossible to know whether the solution to the likelihood equation will be unique. One may use the data cloning method (Lele, Nadeem & Schmuland 2010; Sólymos 2010; Campbell & Lele 2014) to diagnose the non‐uniqueness of the solution.
Sensitivity to model assumptions
We have now clarified the precise mathematical conditions under which single‐survey methodology is valid. Clearly, the log‐link and scaled logistic models used in K&K do not satisfy the RSPF condition, and hence, if the true detection probability function is a scaled logistic function, the estimates are biased. Curiously, however, K&K seem to imply that because true detection function may not necessarily satisfy the RSPF condition, SV methodology should not be used in practice. Our disagreement is with this implication.
Such an implication is very strange on two accounts. First, it is well known that validity of every statistical method depends on assumptions. The results are sensitive to the violation of these assumptions. Secondly, and more importantly, whether the true generating mechanism satisfies all the assumptions can seldom be known. For example, ecologists use maximum likelihood estimators (MLEs) and the associated confidence intervals that are based on the result that the MLEs are consistent and asymptotically normal. This result holds true only if a number of regularity conditions are satisfied by the underlying true mechanism. Aside from the basic requirement that the model parameters are identifiable, the regularity conditions relate to the expected value of the higher order derivatives of the log‐likelihood function. In the dependent data situation, one needs that the underlying true mechanism is a ϕ‐mixing process of certain order. Population time‐series analysis is a common research activity in quantitative ecology. We are not aware of any papers that prove that the true underlying mechanism satisfies this mixing condition and other regularity conditions. Hierarchical models are also commonly used to analyse ecological data. These models make distributional assumptions about the latent variables. Statistical inferences are sensitive to the distributional assumptions on latent variable, and in most cases, one cannot test the validity of the latent variable model specification. In many cases, these latent variables do not even correspond to any observable characteristics, and hence, there is not even a potential to test the assumptions in practice. We can continue with this list ad infinitum, but the moral of the story is that it is not news that statistical methods are sensitive to model misspecification. Moreover, one cannot always test the validity of the model specification.
We do not take an issue with K&K's mathematical findings that SV method is sensitive to the RSPF condition. As we showed above, one can launch such a criticism against every statistical method that has ever been proposed. We note that the multiple‐visit N‐mixture (MV) method is not criticized for simply being sensitive to the closed population assumption; it is criticized because the closed population assumption is seldom satisfied in practice (Bayne, Lele & Sólymos 2011; Chandler, Royle & King 2011; Dail & Madsen 2011). If this assumption were satisfied in most situations, we would be using MV method without qualms. It is well known that the standard bootstrap method fails if, among many other regularity conditions, the underlying true distribution is heavy tailed (Athreya 1987). In spite of the possibility that the underlying true distribution may be heavy tailed, we continue to use the bootstrap method in practice because our experience suggests that such situations are rare.
Vast scientific experience, not only in the field of detection error but also in various other scientific ventures, of modelling probability of an event suggests that most probability models that are actually used in practice do satisfy the RSPF condition. Hence, we claim that the detection error models are more likely to satisfy the RSPF condition than not and that we are likely to be correct with our inferences.
As we have already acknowledged, SV methods are sensitive to the violation of the RSPF condition. There are two ways to deal with the issue of sensitivity. One is to be aware of it and accept the (hopefully, small) probability that the results are potentially wrong. The other is to modify the method to make it robust against such deviations. Population closure is considered unlikely in practice, and hence, methods are being developed that are robust against this assumption. SV method replaces the population closure assumption with the RSPF condition. Generalized (or dynamic) N‐mixture method (DM; Dail & Madsen 2011) replaces the population closure assumption by explicitly modelling the population changes from one time point to other. Both these assumptions are liable to fail. For example, underlying population changes occur on continuous time‐scale, and there are many different mechanisms. DM models the population change on discrete time‐scale and with a particular form of transition matrix. There is no way to check the appropriateness of this discrete time population transition model given the observed data. If it is not appropriate, DM will lead to biased estimates. It is simply unrealistic to make statistical methods robust against every possible model misspecification. It is an occupational hazard that any time a statistical method is used to analyse real data, there is always a (hopefully small) possibility that the true data‐generating mechanism is such that assumptions are not satisfied, and hence, the results are completely wrong.
Sensitivity of the generalized N‐mixture method to scaled probability link
Knape & Korner‐Nievergelt show that when detection probability is modelled as scaled logistic, the SV leads to biased estimators. K&K, citing Zipkin et al. (2014), suggest that the generalized N‐mixture model might have numerical issues under the standard models. However, the behaviour of the model was not studied using scaled link functions. To fill this gap, we present a simulation study to better understand the behaviour of DM under various situations. We conducted the simulation study so that the stated assumptions of the DM are fully satisfied. We follow the models and notation described in Dail & Madsen (2011). We used two continuous covariates that affected abundance and two continuous covariates that affected detectability. In one set‐up, there was no overlap between the abundance and detection covariates. In the other set‐up, the covariates were overlapping: one of the continuous covariates affected both abundance and detection. For the transition parameters, we used constant ‘arrival’ γ = 1 and constant ‘survival’ ω values (ω ranged from 0 to 1 in steps of 0·1). The assumption that transition parameters do not depend on covariates is not realistic. It implies that as time passes, the abundances are less affected by the covariates. However, when these parameters do depend on covariates, the estimation procedure is extremely slow and extensive simulations are nearly impossible to perform. Because the dependence on the covariates vanishes for other time points, we compared the estimated mean abundances under DM and SV only for the first visit where the abundances do depend on covariates. We considered 200 locations and 3 replicate visits. This is a fairly large sample for carrying out multiple visits in practice. For smaller number of locations but more replicate visits, the biases are even larger. We used the ‘unmarked’ package (Fiske & Chandler 2011) to estimate the parameters for DM and the ‘detect’ package (Sólymos et al. 2014) to analyse the SV data. Reproducible R (R Core Team 2014) simulation code can be found in the Appendix S1.
Let us look at the results (Fig. 3) when the RSPF condition is satisfied, that is when 1/c = 1 (c ≥ 1 is the scaling factor in the logit link following the notation of K&K). In this case, both DM and SV methods are nearly unbiased. DM method exhibits increasing negative bias as the populations become more open (ω < 1) and when scaled logistic model (1/c < 1) is used for detection probability. In fact, if ω = 0, DM likelihood is identical to the SV likelihood, and hence, it also requires that the detection function satisfies the RSPF condition. This is reflected in the fact that the pattern in the DM‐based bias is identical to the pattern in SV‐based bias when the dependence between consecutive visits is weak.

) estimates based on generalized N‐mixture (DM) and single‐visit N‐mixture (SV) models as a function of the scaling constant (1/c) used in the scaled logistic link. Blue‐to‐red coloured lines indicate different values of the survival rate (ω), models exhibited trend due to a fixed arrival rate (γ = 1) through the T = 3 repeated visits (only estimates for the first visit shown, see Appendix S1 for R code and full set of results). The set of abundance and detection covariates were disjunct (top row) or included a variable in common (bottom row). Lines represent mean values from 120 replicates. The number of locations was 200.
An obvious conclusion from this simulation study is that DM is also not robust against the scaled logistic detection function. Of course, no simulation study can ever determine that one method is better than other methods under every possible scenario. We have always presented, and still consider, SV method as one of the tools in the statistical toolbox available to the ecologists. It is certainly an improvement over the multiple‐visit N‐mixture method and is definitely a reasonable alternative to the generalized N‐mixture method when the dynamic parameters are unknown or when time‐series data are unavailable.
Diagnosing the presence of scaled probability links
For a scientist and a statistician, a mathematical model is not simply a mathematical formula. A mathematical or a statistical model should represent a realistic mechanism. Neither K&K nor Hastie & Fithian (2013) offer a mechanism that might underlie a scaled probability link function. In this section, we propose a few possible mechanisms that could lead to scaled probability models. Furthermore, we show how SV data sets can be used to diagnose the possibility of scaling in the detection function. A constant scaling model by itself is not very useful in practice because it does not further our understanding of the detection process. In order for a mathematical model to be useful, we should be able to use it not only for understanding the process of detection but also to use that knowledge in designing effective surveys. This suggests that we should try to find out whether there are any covariates that affect the scaling factor. Hence, we show how such covariates can be incorporated in the scaling component of the detection model. It again turns out that the RSPF condition becomes important for the identifiability of covariate‐dependent scaling factors.
A constant scaling factor in the detection function or a RSPF may arise because of data entry errors. For example, occasionally an observer, being human, might transcribe ‘absent’ even when he means to write ‘present’. Proportion of such random data entry errors (to be precise, 1 – proportion of errors) are represented as a constant scaling factor in the detection function, although we hope such errors are infrequent in practice. Similarly, in telemetry studies, occasionally one may miss a GPS location completely at random; that is, the probability of missingness does not depend on the habitat the animal is present (Frair et al. 2004). Again, such errors are rare in practice but can be represented as a constant scaling factor in the RSPF model. It is much more likely that the probability of missingness depends on covariates. We will show how to deal with that case later in the section.
We first start with an approach to diagnose the presence of a scaling factor in the detection function when ancillary information at a subset of locations is collected. We emphasize that this information can be collected during the SV surveys without any need to revisit the location repeatedly. Thus, logistical requirements are similar to the SV surveys.
Distance sampling extension of the binomial–ZIP model
It often happens that a subset of the data is collected using a different protocol. For example, when professional biologists are conducting surveys at a few locations as part of an otherwise volunteer‐based program, they may collect information about distance classes within which the individuals are observed. It is known that the inferences are highly sensitive to the correct estimation of the distances. Only a highly trained field staff can obtain reliable information on the distances at a subset of the locations in the larger survey.



. A constant truncation distance across the survey locations leads to constant qi = q for all locations. This multinomial model, thus, corresponds to the constant scaling factor model proposed by K&K. Moreover, the marginal distribution of Yi, the total counts at the site, is a binomial–Poisson mixture with detection probability piq, a scaled probability function. Thus, any constant scaling of pi gets absorbed in q. In practice, to account for large number of zeros, it is often sensible to model Ni as a zero‐inflated Poisson random variable. As in the SV data (Sólymos, Lele & Bayne 2012), it is known that the zero‐inflation parameter and the detection error parameters tend to be confounded. In Appendix 1, we extend the conditional likelihood method to the multinomial–ZIP model. This analysis can be conducted using the function ‘svabuRDm’ in the Appendix S1.
Analysis of the data using only the marginal distribution of Yi will lead to biased estimates of the intercept parameter for the abundance model. On the other hand, analysis based on the multinomial model will lead to unbiased estimates of the same. Hence, the analysis of such a subset of the data can be used to check the assumption that the detection function satisfies the RSPF condition. One can predict the abundances at this subset of locations using two methods: SV method and the multinomial method. If the predicted abundances are substantially different, scaled probability function might be needed for the detection.
To demonstrate that our proposed multinomial model can be used to estimate abundance in the presence of an unknown scaling factor, we performed a simulation study to compare the relative bias in abundance intercept between the 3‐level multinomial model and the SV method. We used the scaled logistic detection probability function with scaling factor q = 0·5 corresponding to 100 m truncation distance and τ = 80 m effective detection radius. Results in Fig. 4 indicate that the 3‐level multinomial model gives nearly unbiased abundance intercept values. We know that the SV method would lead to biased estimates of the abundance. The SV abundance intercept parameter shows a bias approximately equal to log(q) = log(0·5) = −0·69 (Fig. 4). Comparison of results between the 3‐level model and the SV model would alert the researcher about the need to consider a scaled logistic model. When extra information on the distance classes is available for a subset of locations, it is possible to exploit such extra information to test for scaling and use the estimates as offsets in SV models (see Sólymos et al. 2013 for a discussion of detectability offsets).

Incorporating covariates in the scaling function
Suppose a researcher realizes the need for incorporating a scaling factor in the detection function using the diagnostic tool described above or based on their understanding of the detection process. In our opinion, a detection function is not a nuisance parameter that only needs to be estimated to correct the abundance estimates. It should also be used to help design effective surveys in the future. A constant scaling factor is not of much use for such a purpose. One needs to understand why there is a scaling factor and how one can control its value and potentially increase the detection probability. Given the results in Welsh, Lindenmayer & Donnelly (2013), the goal should be to design surveys so that detection probability is high and correcting the abundance estimates is less important. Hence, after diagnosing the presence of the scaling factor, one should strive to model it as a function of covariates. We give an example of such a situation and show how to fit the scaled detection function that depends on covariates. Not surprisingly, the RSPF condition again becomes important for identifiability. Either the scaling function or the original detection function needs to satisfy the RSPF condition.



Notice that the distribution of the observed data given the true counts reduces to (Yi|Ni)~Binomial(Ni, pi, qi). If we take qi = q to be independent of any covariates, this reduces to the scaled detection probability mentioned by K&K. However, it is far more likely that the scaling factor depends on covariates. The method of conditional likelihood (Sólymos, Lele & Bayne 2012) can be extended to this case. Mathematical details are available in the Appendix 2. Analysis of such data can be conducted using the function ‘svabuRD’ in the Appendix S1. In Fig. 5, we present simulation results showing that when qi depends on a covariate, such as point count radius, all the parameters are estimable using the single survey.

), mean availability (
) and mean probability of detection (
) corresponding to the three processes is shown in the right panel. Box‐plots represent simulation results based on 100 replicates. Bias was calculated as (estimate – true value), and relative bias was defined as [(estimate – true value)/true value].
Conclusions
All statistical analyses inherently depend on models and assumptions. As practicing scientists, we are well aware of the reality that we never know whether or not the true data‐generating mechanism satisfies these assumptions. In spite of this, we continue to use statistical methods for data analysis as long as the assumptions are not unrealistic. We consider the risk of being wrong occasionally as simply an occupational hazard.
In this paper, we have clarified the conditions under which single‐survey methodology is valid. We argued that the situations under which the RSPF and SV methodologies fail are rare. This is further emphasized by the lack of any references to published studies where the log or scaled logistic model is preferred over the commonly used link functions for the probability of detection. We also illustrated that the generalized N‐mixture method does not protect against the violation of the RSPF condition. If the population is completely open (ω = 0), DM likelihood reduces to the SV N‐mixture likelihood, and as such, parameters are non‐identifiable under the DM method unless the detection function satisfies the RSPF condition. We discuss various mechanisms that may lead to scaling in the detection function. We discuss a practical method, based only on a single visit to the survey locations, to diagnose the presence of a scaling factor. If the scaling factor is substantially different than 1, one may need to think about possible mechanisms for scaling. We showed that if the scaling factor is a function of covariates, SV methodology is useful to estimate the parameters in the scaling function.
In practice, when designing surveys, one has to strike a balance between the increase in cost due to multiple visits, possibility of failure of critical assumptions, such as independence of surveys, and the possibility that the RSPF condition is not satisfied by the true model of detection when aiming to make statements about abundance and occupancy. Even a cursory look at the current literature suggests that the detection models that are actually used do belong to the RSPF model class. This fact and the breadth of the RSPF model class suggest that one may be on safe grounds to use SV method for data analysis and making inferences.
Acknowledgements
SRL was partially supported by the NSERC Discovery grant. Comments from E. Bayne, J. Wright, the associate editor, an anonymous reviewer, and J. Knape on earlier versions improved the paper. This research was enabled in part by support provided by WestGrid (www.westgrid.ca) and Compute Canada Calcul Canada (www.computecanada.ca).
Data accessibility
All data used in this manuscript are present in its Supporting Information. R scripts for data generation and analysis were uploaded as online Supporting Information. The code in the Supporting Information can be used to reproduce the simulations and results presented in this paper. The code to reproduce the simulations and results presented in this paper are provided at: http://github.com/psolymos/detect/tree/master/extras/revisitingSV.
Appendix 1
Conditional maximum likelihood estimation of the Multinomial–ZIP model parameters

. The probability mass function for this probability distribution is given by:

is the unobserved count portion of the unknown variable Ni(yi0 = Ni when yi. = 0); and
is the corresponding cell probability. The derivation follows along the same lines as in Solymos et al. (2012).


Hence,
. The Supporting Information provides code to simulate data under this model and estimate the parameters.
Appendix 2
An example where scaled link function for detection probability is applicable

and Di is the population density that we model using log‐link and covariates. The marginal distributions under the Zero‐inflated Poisson model and corresponding Negative Binomial models can be derived in a similar fashion. It is also trivial to extend the method of conditional likelihood (Sólymos et al. 2012) to this case using the fact that conditional distribution P(Yi = yi|Yi > 0) is independent of the zero inflation parameter for when the distribution of Ni is zero inflated. For example, for zero inflated Poisson model it is given by

Under the RSPF condition and provided that covariates that affect ‘p’ and ‘q’ are not completely overlapping, this model leads to identifiable parameters. See Lele et al. 2012 and the present paper for the discussion of identifiability for single‐visit Binomial–Binomial mixture, which is exactly the same as the piqi component discussed here. The Supporting Information provides code to simulate data under this model and estimate the parameters.
References
Citing Literature
Number of times cited according to CrossRef: 13
- Christopher J. Lortie, Jenna Braun, Alessandro Filazzola, Florencia Miguel, A checklist for choosing between R packages in ecology and evolution, Ecology and Evolution, 10.1002/ece3.5970, 10, 3, (1098-1105), (2020).
- Emily K. Purswell, Erin W. Lashnits, Edward B. Breitschwerdt, Shelly L. Vaden, A retrospective study of vector‐borne disease prevalence in dogs with proteinuria: Southeastern United States, Journal of Veterinary Internal Medicine, 10.1111/jvim.15610, 34, 2, (742-753), (2020).
- Mevin B. Hooten, Xinyi Lu, Martha J. Garlick, James A. Powell, Animal movement models with mechanistic selection functions, Spatial Statistics, 10.1016/j.spasta.2019.100406, (100406), (2019).
- Meredith C. VanAcker, Max R. Lambert, Oswald J. Schmitz, David K. Skelly, Suburbanization Increases Echinostome Infection in Green Frogs and Snails, EcoHealth, 10.1007/s10393-019-01427-1, (2019).
- Emmanuele Santolini, Robert M. West, Peter V. Giannoudis, Leeds-Genoa Non-Union Index: a clinical tool for asessing the need for early intervention after long bone fracture fixation, International Orthopaedics, 10.1007/s00264-019-04376-0, (2019).
- Yan Wang, Lewi Stone, Understanding the connections between species distribution models for presence-background data, Theoretical Ecology, 10.1007/s12080-018-0389-9, 12, 1, (73-88), (2018).
- Péter Sólymos, Steven M. Matsuoka, Steven G. Cumming, Diana Stralberg, Patricia Fontaine, Fiona K. A. Schmiegelow, Samantha J. Song, Erin M. Bayne, Evaluating time-removal models for estimating availability of boreal birds during point count surveys: Sample size requirements and model complexity, The Condor, 10.1650/CONDOR-18-32.1, 120, 4, (765-786), (2018).
- Matthew J. Germino, David M. Barnard, Bill E. Davidson, Robert S. Arkle, David S. Pilliod, Matthew R. Fisk, Cara Applestein, Thresholds and hotspots for shrub restoration following a heterogeneous megafire, Landscape Ecology, 10.1007/s10980-018-0662-8, 33, 7, (1177-1194), (2018).
- David B. McWethy, Aníbal Pauchard, Rafael A. García, Andrés Holz, Mauro E. González, Thomas T. Veblen, Julian Stahl, Bryce Currey, Landscape drivers of recent fire activity (2001-2017) in south-central Chile, PLOS ONE, 10.1371/journal.pone.0201195, 13, 8, (e0201195), (2018).
- Tal Avgar, Subhash R. Lele, Jonah L. Keim, Mark S. Boyce, Relative Selection Strength: Quantifying effect size in habitat‐ and step‐selection inference, Ecology and Evolution, 10.1002/ece3.3122, 7, 14, (5322-5330), (2017).
- Hannah M. Specht, Henry T. Reich, Fabiola Iannarilli, Margaret R. Edwards, Seth P. Stapleton, Mitch D. Weegman, Michael K. Johnson, Brittney J. Yohannes, Todd W. Arnold, Occupancy surveys with conditional replicates: An alternative sampling design for rare species, Methods in Ecology and Evolution, 10.1111/2041-210X.12842, 8, 12, (1725-1734), (2017).
- Francisco Voeroes Dénes, Péter Sólymos, Subhash Lele, Luís Fábio Silveira, Steven R. Beissinger, Biome‐scale signatures of land‐use change on raptor abundance: insights from single‐visit detection‐based models, Journal of Applied Ecology, 10.1111/1365-2664.12818, 54, 4, (1268-1278), (2016).
- Trevor J. Hefley, Mevin B. Hooten, Hierarchical Species Distribution Models, Current Landscape Ecology Reports, 10.1007/s40823-016-0008-7, 1, 2, (87-97), (2016).




