How to predict fine resolution occupancy from coarse occupancy data

The area of occupancy (AOO) is a widely used index in conservation assessments, notably in criteria B2 of the International Union for Conservation of Nature (IUCN) red‐list. However, IUCN guidelines require assessing AOO at finer resolution than is generally available. For this reason, extrapolation techniques have been proposed to predict finer AOO from coarser resolution data. Here, we apply 10 published downscaling models to the distributions of a large number of plant and bird species' in contrasting landscapes. We further compare the output of two ensemble models, one relying on all 10 downscaling models and one a subset of five models that can be fit rapidly and robustly, with minimal oversight required. We further compare the accuracy of downscaled predictions with respect to species prevalence. Across all landscapes and taxa, the models predicted AOO consistently. Some, such as the power law and Hui models, were nonlinear with respect to species prevalence. Some models consistently over or under predicted, such as the Nachman and Poisson models. Furthermore, some models proved to give more variable predictions than other models, e.g. Nachman and power law. For these reasons, none of these models are suitable for downscaling if used individually. The Thomas model was also rejected, because it is too computationally intensive, even though its predictions are relatively unbiased. The most effective model, when used by itself, was the improved binomial model. However, the two ensemble models were able to provide accurate predictions of AOO with low variability compared to using any one single model. There was no significant loss in performance using the simpler ensemble model, and therefore this solution is the least computationally intensive and requires least user oversight. Our results show that downscaling models could be potential tools to reliably estimate AOO for conservation assessments. Under circumstances where there is no a priori reason to prefer one model over another then an ensemble of these models may be the best solution for batch analysis of IUCN status under criteria B2. Moreover, we foresee the use of downscaling for the production of other biodiversity indicators, such as for invasive species monitoring.


| INTRODUC TI ON
Knowing the distribution and abundance of organisms at high resolutions is critical for many conservation applications. For example, the area of occupancy (AOO) of a species is an easily calculated measure of a species' rarity and range that is correlated with abundance (Gaston, 1994;Gaston & Lawton, 1990), which is much more difficult to estimate. Consequently, it is a criteria recommended by the International Union for Conservation of Nature and Natural Resources (IUCN) for assessing red list extinction risks under criteria B2 (IUCN, 2012). Gaston and Fuller (2009) found that AOO was used in the red listing of between 37% and 97% of species, depending on the taxonomic group.
To calculate AOO, we simply require information on a species' occurrences across its range which can then be can be summarised as the number of occupied grid cells of a given area (Gaston, 1991(Gaston, , 1994. Rather than requiring targeted studies, it is therefore possible to measure AOO using aggregated data from a wide range of data sources collected over extended time-spans, such as freely available biodiversity data (e.g. GBIF), allowing its calculation to be easily automated (Bachman, Moat, Hill, de la Torre, & Scott, 2011).
Another reason for the popularity of AOO is that organisms are often overlooked when they are small, cryptic, hidden or difficult to identify. Such organisms require more effort to survey at fine resolutions. Biogeographic atlases are often constructed from ad hoc surveys that inevitably lead to spatial unevenness in surveying effort. Therefore, observations are often aggregated, to reduce this geographic bias. So although it might be preferable to study actual populations of organisms for conservation, for a large number species, AOO is a practical measure of their status, rather than the number of observations or their population size.
The grain size at which the records are summarised is nontrivial as we must be certain of the status of the species across all cells.
The finer the grain size the closer the correlation between AOO and true abundance (Kunin, 1998) but the greater the sampling coverage required. Aggregating data at coarse grain sizes reduces false-absences but at the expense of becoming less useful for conservation planning and assessments. Coarse-grain data, commonly referred to as atlas data, can be quickly and cheaply generated, and as the quantity of freely available biological record data is markedly increasing, there has been a surge in their publication (Powney & Isaac, 2015).
Consequently, atlases are some of the most accessible species distribution data.
As no standardised method exists for selecting the most appropriate grain size, atlas data are generated at a wide variety of grain sizes, and grid squares of 25, 100 or 2,500 km 2 are typical for even well studied taxa in European countries. However, conservation assessments typically require AOO to be measured at finer, standardised grain sizes. For example, IUCN guidelines suggest that assessments should be carried out at 4 km 2 resolution (IUCN Standards and Petitions Subcommittee, 2017). They also recommend not to use grain sizes larger than 10 km 2 , above which a single grid cell occupancy would be larger than the critically endangered threshold for AOO. However, for the vast majority of species we have insufficient knowledge across the entirety of their ranges to create maps at such fine grain sizes.
There is therefore a challenge to compare AOO collected at differing grain sizes and for estimating AOO at finer grain size than the available data. A solution is to employ the occupancy-area relationship (OAR, Kunin, 1998); species occupancy is scale-dependent so that AOO increases as grain size is increased. Travelling along the OAR slope allows us to traverse grain sizes. For predicting AOO at the fine grain sizes necessary, we can model the OAR at coarse resolutions for which data are available and then extrapolate the relationship to predict AOO at finer resolutions (Kunin, 1998), termed "occupancy downscaling." These curves are a fundamental property of spatial data, but their shape will be influenced by the degree of aggregation of the organism at those resolutions and by species prevalence at the atlas scale.
At least ten different models have been proposed for extrapolating the OAR. They cover a broad theoretical spectrum and levels of complexity (see Azaele, Cornell, & Kunin, 2012;Barwell, Azaele, Kunin, & Isaac, 2014) (Supporting Information Table S3). The simplest model proposed is the Poisson model which assumes that individuals are distributed independently according to a binomial distribution (Wright, 1991). Three models are variations assuming fractal distribution patterns, the power law (Kunin, 1998), Nachman (Nachman, 1981) and logistic (Hanski & Gyllenberg, 1997) models. Four models are based on the negative binomial distribution and incorporate a measure of over-dispersion: the negative binomial (He & Gaston, 2000), generalised negative binomial (He, Gaston, & Wu, 2002), improved negative binomial (He & Gaston, 2003) and finite negative binomial (Zillio & He, 2010). The Thomas model incorporates spatial point processes in order to more flexibly account for species aggregations (Azaele et al., 2012). The only spatially-explicit model is the Hui model, which differs from the others in that it only requires data at a single grain size (Hui, 2009;Hui, McGeoch, & Warren, 2006;Hui et al., 2009). It is based on the conditional probability that a randomly chosen adjacent cell is occupied when the cell in question is also occupied. This has the advantage of giving some information on the likelihood of adjacent calls being occupied in the finer grid.
A related model, which we do not consider here, accounts for the percolation effect on occupancy at coarse grains, this is the so-called "droopy-tail" model (Hui & McGeoch, 2007 However, Barwell et al. (2014) compared the same models, with the addition of the Hui model, using 38 species of British dragonflies (Odonata) and concluded that the Hui Model gave the best predictions. Among the other models, the Nachman and Power Law models also performed well, but in contrast to Azaele et al. (2012) the Thomas model was among the worst performing. In another comparison of models on different datasets with contrasting extents, the improved negative binomial model was one of the best performing (Hui & McGeoch, 2007).
However, testing the models poses a real challenge for model validation as we must know true occupancy at the prediction scale, which for most species is unknown at fine grains size over wide extent. In fact, current tests of downscaling models have been conducted on only a handful of species, for which high quality data in at least part of their distribution range was available (Azaele et al., 2012;Barwell et al., 2014). Thus, prior to adopting downscaling models for conservation assessment we need to test their performance on a broader range of species and landscapes, particularly, comparing taxa with differing degrees of aggregation; landscapes with different spatial features and data collected at different grain sizes. This is especially true as there are no clear guidelines as to which methods are the most reliable and under which circumstances. The IUCN documentation suggests using a power model, extrapolated between the atlas and 4 km 2 resolution, however, there is no objective assessment of whether this is the most appropriate model (IUCN, 2012). To recommend a suitable approach to downscaling some pertinent questions need to be resolved, such as whether certain models or combination of models are more suitable for particular taxa, whether the models form a consistent order in terms of the predicted AOO (e.g. from the highest predicted AOO to the lowest predicted AOO), and how does the prevalence of the species at the atlas scale influence downscaling.
In this paper, we present the results of downscaling 2,724 distribution maps from 1,325 species of plants and birds in a wide variety of landscapes in north-western Europe. We test the ability of the 10 downscaling models to estimate known AOO at a finer grain size, and examine if model performance is predictable with regards species prevalence. We also compare the performance of individual models to an ensemble approach and make recommendations as to the most suitable for predicting occupancy in situations where the most appropriate model is unknown.

| Study area
Although previous studies have examined just a handful of species from single datasets, in this study we have collated data from six European atlases covering two taxonomic groups, vascular plants and birds, at three spatial scales from regional to country level, totalling 2,724 individual maps from 1,325 species (Table 1).
Vascular plant and bird atlas data were selected due to their high availability and the contrasting spatial distributions of the two taxa.
The selected atlases cover a wide degree of landscape variability, from densely-(Flanders) to sparsely-(Assynt) populated, and from lowland (Fermanagh, Flanders, and Shropshire) to mountainous (Cumbria and Assynt) regions.
Bird data were downloaded from the Global Biodiversity Information Facility (National Biodiversity Data Centre, 2011;Vermeersch et al., 2014). Vascular plant data were downloaded from the database of the Botanical Society of Britain and Ireland database (Evans, Evans, & Rothero, 2002;Forbes & Northridge, 2012;Halliday, 1997;Lockton & Whild, 2015

| Modelling
We tested the ability of ten downscaling models to estimate AOO at finer grain sizes than the input data. Each atlas was first coarsened to three larger grain sizes to generate the OAR to which to fit the models. We investigated several methods of coarsening the atlas data that maintain constant extents across all grain sizes, hereafter referred to as "upgraining" (Marsh, Barwell, Gavish, & Kunin, 2018).
The ten downscaling models were applied to the coarse-grain size atlas data and extrapolated to predict occupancy at the grain size of the original atlas data (which was not employed during modelling), the observed occupancy. As we investigated several methods of upgraining which alter the extents of the atlases, we compared the observed and predicted areas occupied, calculated as the number of occupied cells multiplied by cell area, rather than proportional occupancy within the altered extent. All analyses were carried out in R 3.3.2 (R Core Team, 2017) using the package "downscale" (Marsh et al., 2018).

| Preparing the atlas data-"Upgraining"
For each species, we first aggregated the original atlas data by combining cells in 2 × 2 arrays. This scale is hereafter referred to as the "base" scale and is the smallest grain size used for modelling the OAR. We then aggregated these data a further two times to provide three estimates of occupancy in order to generate the OAR with which to fit the downscaling functions. The process of coarsening the grain size we refer to as "upgraining" in order to distinguish it from alternative processes that are often referred to as "upscaling," such as predicting species richness from a subsampled area or extrapolating a predictive bioclimatic model of species presence across a wider area (e.g. Harte, Smith, & Storch, 2009;Marcer, Pino, Pons, & Brotons, 2012). The exception is the Hui model, the only spatially explicit model, which takes only the base scale data as input.
The boundaries of most atlases are irregular, and therefore aggregating cells to larger grain sizes also increases the apparent extent of the atlas (Figure 1, top row). As the downscaling models fit the proportion of the extent occupied against grain size this is undesirable. Unlike previous studies, we therefore standardised the extent of all modelling grain sizes to that of the largest grain size. To do so, however, requires extending the atlases at all finer grain sizes to unsampled areas by creating new cells (Figure 1, bottom row).
As we assume no a priori knowledge of these unsampled areas we assign these cells as absences. Alternatively, we could include only those largest grain size cells that are completely sampled at the base scale. However, for irregularly-shaped or subsampled atlases this will require discarding a portion of the atlas. We must therefore compromise between making assumptions about the status of the species in unsampled areas and having to discard information by excluding sampled areas.
To explore this compromise we tested four potential solutions

| Downscaling
We compared the ability of each of the ten downscaling models to predict occupancy for each species atlas using each of the four upgraining methods. However, due to the processing time required for the Thomas model, it was only run for the All-Sampled threshold criteria. Before modelling we first checked that during the upgraining process we did not reach the scale of saturation, the grain size at which all cells are occupied, or the scale of endemism, the grain size at which only a single cell is occupied (Azaele et al., 2012). If reached, we discarded all occupancies for grain sizes larger than this point.
In cases where this left fewer than three data points the species was discarded from the analysis. Default parameters were used in all cases, but where necessary starting parameters were adjusted to improve accuracy after visual inspection of the plots. The Hui model does not require upgrained data but to keep continuity with the alternative models we used extent-standardised maps created using the four upgraining procedures.
Additionally, we compared two methods of averaging predictions across multiple models. The first, hereafter "ensemble," involves averaging log predicted AOO across all nine models (ten in the case of the All-Sampled upgraining method for which we also ran the Thomas model). The second method, we term "simple ensemble," involves averaging only across the five simplest models: power law, Nachman, Poisson, logistic and negative binomial. These models were selected as they can be fitted quickly and are robust to the choice of input parameters, meaning that the minimum of user engagement is required when processing large volumes of species.
Verifying the performance of this simple ensemble would allow batch analysis of downscaling models with high confidence of model fit.

| Analyses
In all cases, the fitted models were used to calculate the AOO (not the proportion of occupancy as extents varied between upgraining threshold criteria used) at the grain size of the original atlas data.
The accuracy of the downscaled predictions for each threshold criterion for each species was calculated as the percentage difference between the predicted and observed AOO is dependent upon the spatial characteristics of the species including its prevalence across the landscape. We therefore plotted accuracy against prevalence determined at the base scale.

| Upgraining methods
In our datasets, which are not atypical, using the most stringent threshold for upgraining (Sampled-only) resulted in too few data for downscaling for 580 species maps, because the occupancy rapidly saturates at coarse grains. Additionally, a further 18 species maps, plus one species map using the Gain-Equals-Loss threshold, reached the scale of endemism as occurrences at the edges of the atlas were removed. These species are particularly rare species with confined distributions. The Sampled-only in particular, and the Gain-Equals-Loss methods gave markedly lower and more dispersed estimates than the All-Occurrences and All-Sampled methods (Figure 3). In contrast, the All-Occurrences and All-Sampled methods gave comparable results and for many of the models gave the most accurate downscaling predictions.
F I G U R E 1 "Upgraining" of atlas data to several larger grain sizes for a representative species, Acrocephalus scirpaceus, from the Irish birds dataset. The standard method of aggregating cells in 2 × 2 arrays (top row) results in an increase in the extent of around a third. In this study, we standardise the extent across all grain sizes (bottom row) by extending the cells at all smaller grain sizes to the extent of that of the largest. Added unsampled cells at smaller grain sizes are assigned as absences. The map in the bottom right is the proportion of the area of each cell at the largest grain size that is sampled by cells at the smallest. These values can be used as thresholds to determine if these cells are discarded or retained; we explore four potential solutions to this trade-off (

| Downscaling models
Apart from the Thomas and finite negative binomial models, each model took similar amounts of time to process. On a standard desktop computer more than 150 models could be run in one minute. The finite negative binomial model was slower by half, but the Thomas model took at least two orders of magnitude longer to fit but varied greatly. The time to fit the Thomas model could be reduced by initiating the function with suitable starting parameters, however, it is always slower than other models and prohibitively so for hundreds of models.
Model performance was consistent irrespective of the type of organisms and landscape (Figure 3, Supporting Information Figure   S4). For example the Poisson model, that assumes spatial independence of the distribution, consistently underestimates occupancy, particularly for commoner species. However, the Hui and Power Law models tend to overestimate the occupancy and particularly for those species with higher prevalences.
There were also consistent differences in within-model variability. For example, the power law, Hui, and Thomas models displayed highest variability in accuracy, based on the span of the 50% quartile of all errors (Table 2; Supporting Information Figures S5-S8), whereas the negative binomial and finite negative binomial were more consistent. Ensemble models seem to be as robust as these models, but their means are closer to true AOO. The simple ensemble approach did not display noticeably more variability than the full ensemble approach despite using half the number of models (Table 2). Furthermore, both ensemble models gave good predictions across a wide range of prevalences and little bias at the extremes of rarity and commonness (Supporting Information Figure   S4), suggesting there is little added benefit to utilising the full ensemble approach at the expense of extra processing time and user oversight.
For nearly all models, occupancy prediction of the commonest species was poorer, but this is to be expected because as the occupancy comes close to saturation it contains less information about the underlying species distribution. The exceptions were the power law and Hui models, which greatly overestimated occupancy at lower prevalences. Several of the models gave unbiased predictions of occupancy across a wide range of prevalence up to 50% prevalence.
Notable are the improved negative binomial, generalised negative binomial and Thomas models. For several models, the predictions for the rarest species underestimated occupancy (see Thomas, Poisson and Negative Binomial models in Figure 3). The responses of the F I G U R E 2 The extent-standardised atlas data for a representative species, Acrocephalus scirpaceus, from the Irish birds dataset after upgraining the atlas data a further two scales. We explored the four possible methods of thresholding the extent available in the "downscale" package. Grey cells are absences, red cells presences and white cells no data. Each map is overlain with the outline of the extent of the original, nonstandardised atlas data at the base scale

| Suitable use cases
Downscaling is a useful technique to compare data collected at different resolutions, however, it is important to be aware of its limitations. As with all extrapolation techniques, there has to be confidence in the underlying assumptions upon which the predictions rely and more caution is needed the further you extrapolate from known data.
Downscaling is more accurate for rarer species and for this reason is suitable for use in conservation assessments. However, if the organism is so rare that it is only known from a few sites, then conservation assessments are probably better conducted based directly upon the available data, rather than through extrapolation. Our results show a slight underestimation of occupancy at the lowest prelevancies for several models (Figure 3). This might be a property of the spatial distribution of rare species, but it may also reflect an over-sampling of rare species. For example, if observers are more interested in observing and recording rare species than common ones or if rare species are more likely to be false-positive observations (Groom & Whild, 2017). Biogeographic atlas data are rarely, if ever, a completely systematic sample and often contain spatial and taxonomic sampling biases (Dennis & Thomas, 2000;Rich & Woodruff, 1992).
Reliable downscaling results assume spatially even and saturated surveying of the area of interest in the atlas data. The assumption is therefore that aggregating data from multiple field surveys tends to balance out unevenness in surveying effort. Poor species detectability might influence the result, but if surveying effort is close to saturation this problem will be reduced. Downscaling from a resolution where surveying is particularly incomplete will result in unreliable predictions that are sensitive to difference in surveying effort and detectability. For this reason, there is a compromise in choosing the grain size to downscale from. A large grain size will saturate surveying effort quickly, but will require greater extrapolation to predict occupancy at a fine resolution. The examples we have chosen to study are from published biogeographic atlases. All these atlases have some estimation of the completeness of their surveying and all are considered to be intensely surveyed.
F I G U R E 3 A comparison of occupancy predictions created using four different methods of upgraining. Species were split into 15 bands equally distributed in log space based upon prevalence (proportion of grid cells occupied) at the "base" scale. Lines are the running means for of the percentage difference between log observed and log predicted occupancy of all atlases combined. The coloured polygons are the inner 50% of the records from predictions that is 50% of the records fall within the polygon, 50% outside it. For many species there is not sufficient data for the Sampled-Only method to work, therefore the Sampled-Only data does not include data from as many species as the other methods. The Thomas model was only run on the All-Sampled data   (Kissling et al., 2017;Vanderhoeven et al., 2017).

| Choosing upgraining methods
Upgraining is necessary to create a series of occupancy grids at different resolutions from which to extrapolate from. However, there is a trade-off between making assumptions about absences outside the survey extent and the exclusion of areas not completely covered by the coarsest-grained grid.
In particular, the exclusion of so much data when using the Sampled-Only method is not conducive to a useful prediction in many cases. Not only were a significant number of species discarded as models could not be fitted, but the models also consistently underpredicted occupancy (Figure 3). The Gain-Equals-Loss threshold also undepredicted occupancy on average. Using the All-Occurrences and All-Sampled thresholds produced the most accurate and consistent estimates. Indeed for the power law model, results were identical using the two thresholds. As the power law does not saturate at one, having additional absences just shifts the intercept up or down while the slope remains unaffected which then gets standardised out. As the other models have a saturation point, shifting their intercept closer to that point will also affect their slope. We therefore recommend either the All-Occurrences and All-Sampled upgraining methods. The final choice may be dependent upon the shape of the study region, where the larger the departure from a rectangular extent means that the All-Sampled threshold must extrapolate to more unsampled areas.

| Choosing suitable models
Three properties are desirable for a robust model: on average the model should recover accurate estimates of fine-grain AOO (the best fit lines); there should be minimum variability between species and atlases (the error around the best fit lines); and estimates should be unbiased with respect to prevalence (the slope of the lines).
The Nachman, improved negative binomial, generalised negative binomial and Thomas models produced the most accurate results ( Table 2). The Thomas model, however, is too computationally intensive and requires oversight which prevents it being useful for batch processing. The Poisson, negative binomial and finite negative binomial models generally undepredicted AOO, whereas the power law and Hui models overpredicted.
The Poison, power law, Nachman, generalised negative binomial, Hui and Thomas models all showed high variability in accuracy across species, suggesting that even if mean accuracy is high there is high likelihood of an inaccurate estimate for any given species.
The negative binomial and finite negative binomial models had the lowest variability.
These results are largely consistent with previous model comparisons (Azaele et al., 2012;Barwell et al., 2014;Hui & McGeoch, 2007 appropriate as it averages out much of the inter-and intramodel variation. Furthermore, using the simple ensemble method results in no major loss in performance compared to the more computationally intensive full ensemble method. Furthermore, it utilises only the most robust models that should achieve optimisation without any further user oversight, and so is ideal for bulk processing. The IUCN Standards and Petitions Subcommittee (2017) recommended method of using a power law relationship in particular is poor choice due to its overprediction of occupancy for all species, bar the extremes, and extreme nonlinearity in the response with prevalence, and we recommend this method is avoided. Use of this method may lead to over optimistic conservation assessments due to overestimation of occupancy. However, if downscaling were used in the assessment of change, it could overestimate both the rate of decline of a declining species and the rate of recovery of an increasing species. Despite the diverse landscapes and taxonomic groups, we used as test cases the choice of best model is the same.

| Occupancy downscaling for conservation assessments
Occupancy downscaling is a potentially powerful method for conservation assessments and to compare biogeographic atlases between regions and time periods. The method is simple, uses widely collected data that are becoming increasingly freely-available in online databases and makes few assumptions about the way the data are collected.
However, in this study, we are only extrapolating one grain size down and there is some variability even in the predictions through the ensemble approach. It is likely that for assessments at the IUCN recommended scale of 4 km 2 extrapolation across greater scales than we were able to test is possible. However, we would expect the relative performances of the models to remain consistent, even though the error will increase with further extrapolation. For this reason, additional work is being conducted to test these models across more scales, but with simulated species. It will be important for any assessments made using this approach to include an estimate of the uncertainty around the prediction. One such solution might be to present the range of individual model values as an estimate of error.
To conclude, with the right choice of models, we have shown that AOO can be rapidly and automatically estimated from widely available biodiversity data and for a wide range of species and landscapes. This technique promises to speed the process of red-listing species, but can also be used in repeatable and reliable production of biodiversity change indicators.

ACK N OWLED G EM ENTS
The

AUTH O R S' CO NTR I B UTI O N S
Q.J.G. and C.J.M. sourced the data, reformatted it for use in the study and wrote the code for analysis. Y.G. and W.E.K. provided advice on downscaling and on details of the models. All authors contributed to the drafting of the manuscript.

DATA ACCE SS I B I LIT Y
The bird, Assynt and Shropshire data are available from the Global Biodiversity Information Facility (National Biodiversity Data