Volume 1, Issue 2 p. 131-139
Free Access

Design of occupancy studies with imperfect detection

Gurutzeta Guillera-Arroita

Corresponding Author

Gurutzeta Guillera-Arroita

Correspondence author. E-mail: [email protected]Search for more papers by this author
Martin S. Ridout

Martin S. Ridout

National Centre for Statistical Ecology, School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, CT2 7NZ, UK

Search for more papers by this author
Byron J. T. Morgan

Byron J. T. Morgan

National Centre for Statistical Ecology, School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, CT2 7NZ, UK

Search for more papers by this author
First published: 04 May 2010
Citations: 174

Summary

1. Occupancy is an important concept in ecology. To obtain an unbiased estimator of occupancy it is necessary to address the issue of imperfect detection, which requires conducting replicate surveys at the sites being sampled. As the allocation of total effort can be done in different ways, occupancy studies should be designed carefully to ensure an efficient use of available resources.

2. In this paper we address the design of single-season single-species occupancy studies with a focus on: (1) issues relating to small sample sizes and (2) the potential relevance of including the precision of the detectability estimator as a criterion for design. We explore analytically the model with constant probabilities and examine how bias and precision are affected by the numbers of sites and replicates used.

3. We show how, for small sample sizes, the estimator properties depart from those predicted by large sample approximations, emphasize the need to use simulations when designing for small sample sizes and provide a new software tool that can assist in this process.

4. We offer advice on the amount of replication needed when the probability of detection is a quantity of interest and show that, in this case, it is more efficient to reduce the number of sites and increase the amount of replication per site compared with situations where only occupancy is of concern.

5. Synthesis and applications. It is essential to have clearly stated objectives before starting a study and to design the sampling accordingly. As the allocation of effort into replication and sites can be done in different ways, occupancy studies should be designed carefully to ensure an efficient use of available resources. To avoid waste, it is crucial to anticipate the quality of the estimates that can be expected from a particular study design. The discussion and guidance provided here is of special interest for those designing occupancy studies with small sample sizes, something not uncommon in the context of ecology and conservation.

Introduction

Occupancy, defined as the proportion of sites occupied by a species, is a state variable commonly used in ecology for the modelling of habitat relationships, metapopulation studies and wildlife monitoring programmes. When species detection is imperfect, occupied sites may be classified as unoccupied based on survey data. If not accounted for, these false absences lead to underestimates of occupancy. The issue of imperfect detection in the context of occupancy studies has received much attention in recent years. MacKenzie et al. (2002) presented a modelling approach for addressing the simultaneous estimation of occupancy and detectability which has since been developed in a number of ways including extensions to cover multiple seasons (MacKenzie et al. 2003), multiple species (MacKenzie, Bailey, & Nichols 2004) and heterogeneity in detection probability (Royle 2006). To account for imperfect detection when modelling occupancy, replicate surveys have to be carried out at sampled sites. Replication is commonly achieved by conducting repeated surveys at different points in time or by surveying different sectors of each sampled site. Other methods include independent surveys carried out by different observers within a single visit or the simultaneous use of independent detection methods. The need for replication creates a trade-off between the number of sites to survey and the number of replicate surveys to carry out per site.

Several papers have addressed the issue of study design in the context of occupancy modelling. MacKenzie et al. (2002), Tyre et al. (2003) and Field, Tyre, & Possingham (2005a) provided some guidance on the number of replicate surveys needed based on simulations. MacKenzie & Royle (2005) presented the first detailed investigation on this subject, giving advice on general issues and providing specific recommendations for the most efficient allocation of survey effort under three sampling schemes and different cost function scenarios. They based their guidance on analytic results obtained by considering the large sample properties of the maximum-likelihood estimator for occupancy probability under a model with constant probabilities of occupancy and detectability. Bailey et al. (2007) later described a software tool developed for exploring design trade-offs for different occupancy models, either using analytic approximations or simulations. They presented an example and noted that the use of simulations is important when working with small sample sizes.

Small sample sizes are not uncommon in ecological studies. In particular they are frequently encountered in surveys linked to conservation projects, as these often have limited resources and tend to focus on rare species. Pilot studies, by their nature, also tend to deal with relatively small amounts of data. Under these circumstances the large sample approximations may be poor. In our experience, the effects of working with small sample sizes are not always addressed in practice and the use of simulations as a tool for assisting study design appears not to be widespread.

While for many studies the primary object of inference is the probability of occupancy, with the probability of detection being regarded merely as a nuisance parameter, there are circumstances when the latter is a quantity of interest in its own right. For instance, this is the case when the estimates obtained from a (pilot) study are to be used as input for the design of subsequent monitoring protocols (e.g. Field et al. 2005b; Pellet 2008) or when there is interest in evaluating the performance of detection methods (e.g. Mortelliti & Boitani 2008). Detectability may also be of interest when it reflects some important characteristic of the ecological system. For example, it could be associated with reproduction (Best & Petersen 1982). Detectability estimates provide information on the number of times that a site needs to be visited before stating with a given degree of certainty whether the species of interest is present or absent at that particular location. This information can be especially relevant in the context of environmental impact assessments. Under these scenarios there is a benefit in obtaining a precise estimate of detection probability.

In this paper we address the design of single-season single-species occupancy studies with a focus on: (1) issues relating to small sample sizes and (2) the potential relevance of including the precision of the detectability estimator as a criterion for design. We investigate analytically the quality of the maximum-likelihood estimators for the occupancy model with constant probabilities of occupancy and detection. We also show how bias and precision are affected by the number of sites and replicates employed and illustrate how the predictions made by large sample theory diverge from the actual distribution of the estimator when sample sizes are small. We discuss how studies are designed using recommendations based on asymptotic approximations and provide guidance to assist survey design when detection probability is a parameter of interest. Finally, we describe the design procedure with an emphasis on the need to use simulations as a tool for sampling design when the sample size is small and provide a numerical example to illustrate the steps. In this context we present a new software application (Single-season Occupancy study Design Assistant, soda) that can assist in the process by automating the search for a suitable design.

Modelling occupancy under imperfect detection: estimator properties

The detailed formulation of occupancy models with imperfect detection is well covered in the literature (e.g. MacKenzie et al. 2006); so, here we limit the description to key aspects relevant to our analysis. Let ψ be the probability of occupancy, p the probability of detection, S the number of sites to be surveyed and K the number of replicate surveys per sampling site. We assume that both occupancy and detection probabilities are constant in time and space. Although in practice this simplification may not always be reasonable, it is necessary in order to provide general study design guidelines. We use the maximum-likelihood approach for model fitting as proposed by MacKenzie et al. (2002) and assume a standard survey design with K surveys carried out in all S sampling sites.

The likelihood function corresponding to the constant probability occupancy model for a standard design can be written in a compact form as follows:
image(eqn1)
where SD is the number of sites where the species was detected at least once, d is the total number of detections in the detection history and p*= 1 − (1 − p)K is the probability of detecting the species in at least one of the K surveys carried out at an occupied site. Note that (SD, d) is a sufficient statistic as it summarizes the detection history with no loss of information. MacKenzie et al. (2006, p. 95) point out that the analytical solution for the maximum-likelihood parameter estimates (MLEs) satisfies the equations:
image(eqn2)
That is, as inline image gets smaller, the estimate of occupancy (inline image) increases compared with the naïve estimate obtained assuming that the species was not missed at any of the occupied sites (inline image). When evaluating the performance of this model via simulations, MacKenzie et al. (2002) noted that, when working with small probabilities of detection, they sometimes obtained estimates of occupancy that tended to 1. By studying the model analytically we identified the detection histories that result in boundary estimates (inline image). It can be shown that the MLE expressions given by eqn 2 are only valid as long as the observed detection history fulfils the following condition:
image(eqn3)
and that, otherwise, the MLEs are:
image(eqn4)

Eqn 3 indicates that the occupancy estimate hits the boundary when the proportion of sites where the species was not detected (left term) is smaller than the proportion of zeros in the history raised to the power of K (right term). This suggests that boundary estimates may be an issue when working with small sample sizes and low probabilities, especially when the amount of replication is small.

A graphical representation of all the MLEs obtainable for a given design illustrates the issues resulting from small sample sizes and the effect that increasing the number of sites or replicates has on the quality of the estimates (Fig. 1). Given a finite number of sites (S) and replicates (K) there is a finite number of histories that can be theoretically observed (i.e. 2SK possible combinations of zeros and ones). Under the model with constant probabilities of occupancy and detectability all those histories that share the same SD and d produce the same estimates of occupancy and detection (eqn 2). This results in (S +1)[1 + S(K − 1)/2] possible estimate points in the parameter space (dots in the figure).When sample sizes are very small, there are only a few distinct detection histories that can be observed and, correspondingly, few possible parameter estimate values (Fig. 1a). The parameter space is sparsely covered by the MLEs, which means that the estimator is not precise, an effect more pronounced as probabilities of occupancy and detection get smaller. In fact there are no solutions covering the area corresponding to the lowest probabilities, which causes the estimator to be substantially biased in this region. As more samples are added to the study, the MLE solutions cover more of the probability space. Additional replication results in a better coverage of the area corresponding to low probabilities of detection (Fig. 1b), while an increase in the number of sampling sites achieves a more even coverage in the area corresponding to high probabilities of detection (Fig. 1c). When the amount of replication is large, the MLEs coincide with the naïve estimates in most cases as p* is close to unity except for very low values of p.

Details are in the caption following the image

Maximum-likelihood estimates for all possible detection histories that can be observed under a design with (a) S =10, K =3, (b) S =10, K =9, (c) S =30, K =3, (d) S =30, K =9, (e) S =100, K =3, (f) S =100, K =9. No assumptions are made here about true values of the parameters. Each dot represents a pair of estimates (inline image,inline image) which corresponds to the solution for all histories summarized by the sufficient statistics (SD, d). There are (S +1)[1 + S(K − 1)/2] different possible (SD, d) combinations. Dotted lines connect estimates for histories that share the same SD, from 1 (bottom line) to S (top line). Moving along the lines from right to left, each dot corresponds to histories with a decreasing value of d, from a maximum of KSD to a minimum of SD. At the right-most side of the graph estimates correspond to the naïve estimates and ‘bend’ upwards as detectability (p) gets smaller. For clarity (e) and (f) have been plotted without lines and using smaller markers.

Likelihood theory provides tools for approximating the properties of the MLEs when the sample size is large. The theory indicates that the estimators are asymptotically unbiased and thus mean square error (MSE) and variance are the same. The asymptotic variance–covariance matrix can be derived by inverting the information matrix (i.e. the expectation of the second derivative of the negative log-likelihood with respect to the parameters, Severini 2000, p. 91). MacKenzie & Royle (2005) presented the formula for the asymptotic variance of the occupancy estimator:
image(eqn5)
where TS = SK is the total effort assigned to the survey. They noted that as p* approaches unity, the variance of inline image reduces to the variance of a binomial proportion [i.e. ψ(1 − ψ)/S]. It can be shown that the remaining elements of the variance–covariance matrix are:
image(eqn6)
image(eqn7)
As p* approaches unity, the covariance tends to zero and the variance of inline image approaches p(1 − p)/(TS × ψ). For a fixed total effort, as replication increases the variance of inline image first decreases as this allows false and true detections to be distinguished and then starts increasing as dictated by the binomial proportion due to the reduction in the number of sampling sites. The variance of inline image also starts by decreasing as more replication is added to the design but then remains at about a constant level as the variance is dictated by the total amount of effort, no matter whether it is spent on additional sites or replicates.

Design of occupancy studies

Large sample approximations and simulations are tools that can assist in the design of occupancy studies. Here, we comment on these two approaches and provide an overall picture of the design process with an emphasis on small sample sizes. Note that, to design a study, we need to assume values for the parameters to be estimated.

Optimal design based on asymptotic approximations

The asymptotic variance approximations can be of use when designing occupancy studies as they allow us to explore analytically how estimator precision changes for different design parameters. MacKenzie & Royle (2005) derive study design recommendations based on the asymptotic approximation of the variance of the occupancy estimator (Table 1a). Recommendations can also be produced incorporating the variance of inline image as part of the design criterion, which is useful when detectability is itself a parameter of interest. There are different criteria that can be used for optimal design. For a discussion on the merits of the different methods, see Atkinson & Donev (1992, p. 106). One common approach is to minimize the trace of the variance–covariance matrix, that is, the sum of the variances of the parameters. This is called A-optimality and it gives equal weight to the two variances rather than minimizing the variance of each of the parameters separately (i.e. the variance of inline image and inline image in our case). Alternatively D-optimality minimizes the determinant of the variance–covariance matrix. For large samples, the maximum-likelihood estimators inline image and inline image are approximately normally distributed and D-optimal design minimizes the area of elliptical confidence region based on this distribution. Here, we derive the optimal number of replicate surveys to be carried out at each sampling site using the A-optimality (Table 1b) and D-optimality (Table 1c) criteria. The optimal number of replicates increases driven by the variance of inline image, with larger changes observed for low probabilities of occupancy and low probabilities of detection respectively. As happens when considering the variance of the occupancy estimator only, the optimal number of replicate surveys in these two cases is determined by the parameter values (ψ and p) irrespective of the total effort assigned to the survey (TS). Note that the optimal number of replicates is the same regardless of whether the study is designed to minimize survey effort or estimator variance (measured through any of the three criteria above).

Table 1. Optimum number of replicate surveys to be carried out at each sampling site for a standard design with constant per-survey costs when the criterion for design is based on minimizing (a) the variance of inline image (MacKenzie & Royle 2005), (b) the sum of variances of inline image and inline image (A-optimality) and (c) the determinant of the variance–covariance matrix (D-optimality) (based on estimator properties that assume large sample sizes).
ψ
0·1 0·2 0·3 0·4 0·5 0·6 0·7 0·8 0·9
(a)
p 0·1 14 15 16 17 18 20 23 26 34
0·2 7 7 8 8 9 10 11 13 16
0·3 5 5 5 5 6 6 7 8 10
0·4 3 4 4 4 4 5 5 6 7
0·5 3 3 3 3 3 3 4 4 5
0·6 2 2 2 2 3 3 3 3 4
0·7 2 2 2 2 2 2 2 3 3
0·8 2 2 2 2 2 2 2 2 2
0·9 2 2 2 2 2 2 2 2 2
(b)
p 0·1 19 16 17 17 19 20 23 27 34
0·2 13 10 9 9 9 10 11 13 16
0·3 10 7 7 6 6 7 7 8 10
0·4 8 6 5 5 5 5 5 6 7
0·5 7 5 4 4 4 4 4 5 6
0·6 6 4 4 3 3 3 3 4 4
0·7 5 4 3 3 3 3 3 3 4
0·8 4 3 3 2 2 2 2 2 3
0·9 3 2 2 2 2 2 2 2 2
(c)
p 0·1 19 19 20 21 23 24 27 30 36
0·2 9 10 10 11 11 12 13 14 17
0·3 6 6 7 7 7 8 8 9 11
0·4 5 5 5 5 5 6 6 7 8
0·5 4 4 4 4 4 4 5 5 6
0·6 3 3 3 3 3 4 4 4 5
0·7 3 3 3 3 3 3 3 3 4
0·8 2 2 2 2 2 2 3 3 3
0·9 2 2 2 2 2 2 2 2 2

Design based on a simulation study

Likelihood theory tells us that asymptotic approximations are good when the sample size is large enough; however, it does not tell us how large it needs to be. In Fig. 2 we illustrate how the properties of the MLEs under the constant occupancy model depart from the asymptotic approximation for a combination of design parameter values that is realistic within the context of ecological studies (168 units of total effort). The difference between the approximated and actual estimator distributions is larger for low probabilities of occupancy and detection. Designing an occupancy study based on asymptotic properties of the estimators is therefore not appropriate if the intended sample size is small, especially when dealing with rare and elusive species. Under these circumstances, the actual quality of the estimators may be very different from that predicted by the asymptotic variance expressions and the design identified as optimal using large sample approximations may not be the best available, as illustrated in the example section. In these cases the most appropriate method for designing a study relies on the use of simulations.

Details are in the caption following the image

Actual (top row) and asymptotic (bottom row) distribution of the MLEs for different underlying probabilities of occupancy and detectability (marked with a triangle) under an optimal design with 168 units of total effort: (a) 12 sites and 14 replicates (S =12 and K =14); (b,c) 56 sites and 3 replicates (S =56, K =3). Plots show the part of the distribution that contains 0·999 probability. For small probabilities of occupancy and detection the estimators have strong bias, with many of the detection histories resulting in boundary estimates (dots at the top left of the plot). As probabilities increase, the true distribution of the MLEs becomes closer to the bivariate normal distribution predicted by the asymptotic approximation.

Sampling design procedure for occupancy surveys: the big picture

The design of an occupancy survey (Fig. 3) should start with a clear statement of the project requirements in terms of the quality of the estimators (e.g. maximum allowed variance) and total survey effort available. With this in mind the design can be made to either (A) maximize the quality of the estimators or (B) minimize the effort employed. We also need to assume initial values for the parameters to be estimated. These can be based on the results of a pilot study, on studies carried out for the same or similar species in comparable circumstances or on expert opinion. The first issue to address is whether the sample size can be considered large enough to base the choice of design parameters on asymptotic approximations. If the total effort available is large and the probabilities of occupancy and detectability are expected to be relatively high, the design can safely be based on these approximations. Nevertheless, we recommend verifying that the approximations are valid before proceeding to collect data. This involves running a simulation with the chosen design parameters (K and S) and given parameter assumptions (ψ and p). If the sample size is not large enough for the asymptotic approximation to be good, the design needs to be based on a simulation study, in which the quality of estimators is evaluated for different combinations of design parameters. There is software which allows simulating the model with a given set of K, S, ψ and p to evaluate estimator bias and variance (genpres, Bailey et al. 2007). Program soda offers the possibility of running an automated search for a suitable design which explores different combinations of K and S given the assumptions and requirements specified by the user. The tool allows the user to select whether priority is given to maximizing estimator quality or minimizing total effort, and allows detectability to be incorporated as part of the design criteria. Program soda can be freely downloaded at http://www.kent.ac.uk/ims/personal/msr/soda.html. An R function for evaluating the performance of a given design is available at the same site.

Details are in the caption following the image

Occupancy survey design procedure. Shaded boxes represent decision stages. Survey design has to start with clear targets for total effort and estimator quality (e.g. measured as the MSE of inline image or including inline image with the A- or D-optimality criteria). Priority can be given to maximize estimator quality (A) or minimize total effort (B). Although not included here for simplicity, there are other issues such as cost of surveys and logistical constraints that may need to be incorporated in the design process.

Once a candidate design is identified, either through asymptotic approximations or simulations, we need to verify whether it fulfils the requirements of the project. If it does, the study can proceed to data collection. Otherwise, if no suitable design was found, the objectives and constraints of the project need to be reconsidered: can more resources be allocated to this study? Could less precise estimates still be informative for the purpose of the study? If the answer to these questions is negative the study should not continue as it would be a waste of resources that could be used elsewhere (Legg & Nagy 2006). If the project objectives or constraints are redefined, a new design should be sought given the new requirements.

Example: designing an occupancy study when sample size is small

As an illustration of the design process let us assume that (1) our target is for the occupancy estimator to be approximately unbiased with a maximum SE of 0·075 (i.e. maximum RMSE 0·075), (2) the maximum effort that can be employed in the study is TSmax = 350 and (3) the probabilities of occupancy and detectability are thought to be ψi ≈ 0·2 and pi ≈ 0·3. If we decide to start our study design from the recommendations derived from the asymptotic properties of the estimators, the first thing to do is to find the optimal number of replicates to be used, in this case K =5 (Table 1a). Let us first assume that our priority is to minimize the variance (option A in Fig. 3). In this case we will make use of the total available effort and the number of sites to be surveyed (S) will be derived as S = TS/K =350/5 = 70. We should now evaluate the variance of the occupancy estimator under this design (K =5 and S =70) to verify whether it is within our target. From the expression of the asymptotic variance of the occupancy estimator (eqn 5) we get:
image
which gives an SE of 0·057. According to the asymptotic approximation the estimator is unbiased; so, the RMSE is also 0·057. This RMSE is within the target that our project set (0·057 < 0·075); so, the design seems good. In order to verify that the approximations made for the design were appropriate we would now run a simulation for the chosen design parameters (K =5 and S =70) and assumptions (ψi = 0·2 and pi = 0·3). A simulation with 50 000 iterations reveals that the actual RMSE of the occupancy estimator (0·070) is higher than predicted by the approximation (0·057), although still within the project target, so the design could be kept. However, given that the approximation was not very accurate it may be worth exploring other combinations of parameters as there is no guarantee of the optimality of the chosen design. For instance, a design with K =6 and S =58 would be a better choice (Table 2). Note also that increasing the replication (K =7) would provide a more suitable design if detection probability was to be considered as part of the design criterion instead of occupancy only.
Table 2. Actual and asymptotic root mean-squared errors for inline image and inline image under different study designs assuming underlying probabilities ψi = 0·2 and pi = 0·3
K
4 5 6 7 8 9
TS ≈ 250
S 62 50 42 36 31 28
 aRMSE inline image (×102) 6·9/9·6 6·8/8·6 6·9/7·9 7·1/7·5 7·5/7·3 7·8/7·1
 RMSE inline image (×102) 12·6/10·1 10·6/9·3 9·6/8·7 9·3/8·4 9·6/8·2 9·6/8·0
 RMSE*inline image (×102) 9·3/9·7 8·2/9·0 7·7/8·4 7·5/8·1 7·7/8·0 7·9/7·7
 Boundary estimates 1·1% 0·7% 0·5% 0·5% 0·5% 0·5%
TS ≈ 300
S 75 60 50 43 37 33
 aRMSE inline image (×102) 6·3/8·7 6·2/7·9 6·3/7·3 6·6/6·9 6·9/6·7 7·2/6·5
 RMSE inline image (×102) 10·1/9·2 8·4/8·4 7·8/7·8 7·5/7·5 7·9/7·4 8·1/7·2
 RMSE*inline image (×102) 8·2/8·9 7·2/8·2 6·9/7·7 6·7/7·4 7·0/7·3 7·2/7·1
 Boundary estimates 0·5% 0·3% 0·2% 0·2% 0·2% 0·3%
TS ≈ 350
S 87 70 58 50 43 39
 aRMSE inline image (×102) 5·8/8·1 5·7/7·3 5·9/6·8 6·1/6·4 6·4/6·2 6·6/6·0
 RMSE inline image (×102) 8·3/8·5 7·0/7·6 6·6/7·2 6·7/6·9 6·9/6·7 7·1/6·6
 RMSE*inline image (×102) 7·4/8·4 6·5/7·6 6·3/7·2 6·3/6·9 6·5/6·6 6·6/6·5
 Boundary estimates 0·2% 0·1% 0·1% 0·1% 0·1% 0·1%
 A-optimality criterion (×103) 14·1 10·7 9·5 9·2 9·3 9·3
 D-optimality criterion (×10−5) 3·28 2·21 1·95 1·96 2·04 2·05
  • Three levels of total effort (TS = 250, 300 and 350) and six levels of replication (K =4–9) were considered. Asymptotic root mean-squared error (aRMSE) was obtained analytically. Actual root mean-squared error (RMSE) was estimated via simulation with 50 000 iterations. The frequency of boundary estimates (inline image) and the actual root mean-squared error after removing these (RMSE*) are also shown for reference. For TS = 350, the sum of the mean-squared errors (A-optimality criterion) and the determinant of the MSE matrix (D-optimality criterion) are also shown.
We now repeat the process assuming that the priority is on minimizing the total effort TS (option B in Fig. 3). In this case the number of sites to be surveyed (S) is derived from the expression of the asymptotic variance of the occupancy estimator (eqn 5):
image
The total effort required for this design (205) is within the target that our project set (350); so, the design seems good. However, simulations show that the occupancy estimator has some bias and large variance; its RMSE (0·1391) is almost twice the maximum RMSE allowed by the project (0·075), which renders this design unsuitable. The asymptotic approximation is poor for the sample size in this study; so, it is best to choose the design via simulations. By exploring different combinations of K and S we can identify the design that fulfils the variance target with the minimum effort. In this case, K =7 and S =43 would be a good choice. Note that the number of replicates (7) differs from the optimal K suggested by the asymptotic approximations (5) and the total effort required is substantially larger (301 vs. 205).

Discussion

When faced with the task of planning a study it is essential to address explicitly three basic questions: (1) why is the study needed, (2) what is a suitable state variable and (3) how to do the sampling? (Yoccoz, Nichols, & Boulinier 2001). Here, we have concentrated on aspects related to the ‘how’ question in the context of occupancy studies, in particular on issues derived from the trade-off resulting from the allocation of survey effort between number of sites and number of replicates. However, we emphasize the need to first deal properly with the ‘why’ and ‘what’ questions, as well as to consider other elements related to the ‘how’ such as the selection of sites, the timing of surveys (MacKenzie & Royle 2005) or decisions on the type of replication to be used.

Addressing the ‘why’ question requires a clear statement of the objectives of the study from which design requirements can be derived, including the maximum survey effort available and the level of precision needed for results to be meaningful (Field et al. 2007). Defining this is not just a statistical decision and should incorporate considerations of the species biology and the system in general. For instance, management decisions should explicitly evaluate the costs associated with false positives and false negatives when detecting trends, costs that are not necessarily equal (Field et al. 2005a). Although studies often focus on the estimate of occupancy, here we argue that there are situations when the probability of detection is also of interest. In these cases it is natural for the precision of p to be included as part of the design criterion. We show that, under these scenarios, the best design will tend to require more replication than in cases where only the precision of the occupancy estimator is considered, especially when working with rare species.

Ecological studies often involve small sample sizes. This is particularly true for studies related to conservation. Here, we show that the asymptotic approximations to the distributions of the maximum-likelihood estimators are unreliable for sample sizes that, although small, are realistic in the context of ecology. Estimators are biased and less precise than indicated by these large sample approximations. This is especially relevant when working with rare and elusive species as then the probabilities of occupancy and detection are low. We highlight the importance of taking these issues into consideration when designing occupancy studies and argue that simulations should be used in the design process. It is essential to determine the actual properties of the estimators under a chosen design, to make sure that they fulfil the design targets before spending, and maybe wasting, time and effort in the field. With a clear description of the overall design procedure, supported by a numerical example and a new software application, we aim at promoting the good practice of addressing small sample considerations when designing occupancy studies. However, it is important to note that this guidance does not replace the careful evaluation of each project’s characteristics. Apart from the requirements addressed here, there may be other issues that need to be incorporated in the design process such as decisions on the minimum number of sites that the program aims to survey, the cost of each survey or other logistical considerations. The large sample recommendations discussed are based on the model with constant probabilities. We do not give specific recommendations for studies involving covariates (e.g. occupancy in two habitats) but the same general approach is applicable and the use of simulations remains the best tool to guide study design. Here, we have concentrated on maximum-likelihood inference. An alternative Bayesian approach avoids asymptotic assumptions; however, it is still necessary to select an optimal design and prior sensitivity needs to be considered.

Designing a study requires initial values of the parameters to be estimated. It is important to realize that the actual performance of the chosen design depends on the correctness of these initial values. Given that these parameters are the object under study, there may be considerable uncertainty about their true values. Before deciding on a final design, we recommend exploring the sensitivity of the design to a change in these initial values. Bayesian experimental design (Chaloner & Verdinelli 1995) provides a systematic framework to account for prior knowledge on the parameters in the design process. Sequential methods divide studies into stages, with later stages designed using the results of earlier ones to update the initial estimates (Abdelbasit & Plackett 1983). The potential of these techniques in the context of occupancy study design is the subject of future work.

Acknowledgements

This research has been supported by an EPSRC/NCSE grant. The authors thank Darryl MacKenzie and one anonymous reviewer for valuable comments that improved the quality of this manuscript.