Volume 7, Issue 1
Research Article
Free Access

Estimating dark diversity and species pools: an empirical assessment of two methods

Rob J. Lewis

Corresponding Author

Institute of Ecology and Earth Sciences, University of Tartu, Lai 40, Tartu, 51005 Estonia

Correspondence author. E‐mail: robert.lewis@ut.eeSearch for more papers by this author
Robert Szava‐Kovats

Institute of Ecology and Earth Sciences, University of Tartu, Lai 40, Tartu, 51005 Estonia

Search for more papers by this author
Meelis Pärtel

Institute of Ecology and Earth Sciences, University of Tartu, Lai 40, Tartu, 51005 Estonia

Search for more papers by this author
First published: 20 July 2015
Citations: 35

Summary

  1. Species absent from a community but with the potential to establish (dark diversity) are an important, yet rarely considered component of habitat‐specific species pools. Quantifying this component remains a challenge as dark diversity cannot be observed directly and must be estimated. Here, we empirically test whether species ecological requirements or species co‐occurrences provide accurate estimates of dark diversity.
  2. We used two spatially nested independent datasets, one comprising 3033 samples of coastal grassland vegetation from 4 m2 and 200 m2 plots from Scotland, UK, and another comprising 780 samples of forest vegetation plots from 30 m2 and 500 m2 plots from Switzerland. Dark diversity for each of the smaller scaled plots was estimated through investigating the degree of (i) similarity in ecological requirements (measured as Ellenberg values); and (ii) co‐occurrence likelihood. Estimates were validated using species from the larger spatial scales. Estimates were further validated using observations from all larger scale plots surrounding a focal assemblage within a 2 km (Scottish grassland) and 10 km (Swiss forest) radius.
  3. The co‐occurrence method was shown to be more accurate resulting in far fewer negative mismatches (i.e. species observed but not predicted), as well as higher proportions of observed and predicted species, relative to the Ellenberg method. Of the species observed in the large‐scale samples, 18% were estimated as part of the smaller scale dark diversity via the co‐occurrence approach relative to 8% for the Ellenberg method for both the Scottish and Swiss data, respectively. These values increased to 67% & 60% and 32% & 35%, respectively, across all observations within a 2 km (Scottish grasslands) and 10 km (Swiss forests) radius.
  4. The study demonstrates that dark diversity for a community can be successfully estimated using readily available data, through exploring species co‐occurrence patterns. This work substantiates that habitat‐specific species pools can be accurately quantified and should prove valuable for understanding underlying community processes and improving our knowledge of the mechanisms governing species co‐existence.

Introduction

Species present in a community can be viewed as a subset of all species theoretically able to exist in that community (Pärtel 2014). This idea is a refinement of the species‐pool hypothesis (for a review see Cornell & Harrison 2014), in that communities maintain a theoretical suite of species that have the potential to establish and co‐occur (Eriksson 1993; Zobel 1997). By definition, a species pool, irrespective of scale, is always habitat specific inasmuch as it excludes species that would be environmentally and ecologically unsuited to conditions present in the target community. Species absent from a target community, but which remain part of the ‘habitat‐specific’ species pool constitutes dark diversity (Pärtel, Szava‐Kovats & Zobel 2011). Such species maintain the potential to establish and understanding reasons for its absence is ecologically meaningful. Dark diversity therefore can serve as a valuable biodiversity metric (Pärtel 2014). Habitat‐specific species pools (hereafter species pools) have been demonstrated to help disentangle trait dissimilarity (i.e. convergence vs divergence patterns) resulting from fine‐scale biotic processes (e.g. phenotypic exclusion), enhancing our understanding of community assembly (de Bello et al. 2012). Understanding species pools in terms of its partitionable additive components (i.e. observed and absent species) also permits measures of community completeness, that is a logistic expression (ln(observed diversity/dark diversity); Pärtel, Szava‐Kovats & Zobel 2013). Such an approach has previously been postulated as a better means of comparing biodiversity among regions and habitats (de Bello et al. 2010), and should be useful for quantifying habitat degradation and monitoring restoration success (Suding 2011).

Identifying absent species from regional, local or community pools is, however, not straightforward. It has been proposed that four major scale‐dependent processes – speciation, dispersal, selection and drift – operate to govern species distribution patterns and in turn the beta‐niche (i.e. the region of a species' niche that corresponds to the habitat(s) where it is found; Silvertown et al. 2006) to which individual species belong (Vellend 2010). Arbitrarily delineating species pools, through geographic species inventories, is one approach, but one that fails to consider these fundamental ecological constraints, ignoring inherent spatial and temporal variability among species pools within regions (Lessard et al. 2012). Furthermore, it is not possible for inventories to identify those species that constitute a community's dark diversity, which by definition cannot be directly observed. One should also consider the sample effort. Species observations typically represent snapshots of community compositions, potentially overlooking hidden species (see Pärtel 2014) that are either cryptic or dormant (Klimesova & Klimes 2007). Disentangling between dark diversity and those species that have simply been overlooked is not possible from a single observation study and requires repeat sampling.

At the macroscale, recent attention has been paid to accurate estimation of biogeographic species pools (e.g. dispersion fields; Carstensen et al. 2013), accounting for a species site‐specific dispersion probability. However, beyond the biogeographic scale, species pool estimates must consider a species affinity to the local and regional environmental conditions, accounting for a species site‐specific establishment probability. One method is to consider ecological requirements of species based on expert evaluation. Regional floras and faunas can be broadly linked to habitat types; however, more quantitative data can often be sought. For example, for plant species, Ellenberg indicator values (Ellenberg et al. 1991) provide a semi‐quantitative measure of species environmental thresholds and distributions, linking species occurrences with the environment. First demonstrated by Pärtel et al. (1996), the approach considers species with similar ecological requirements to have the ability to establish in similar habitats (Fig. 1). Alternatively, another method is to use species co‐occurrence patterns, patterns that have in the past proven central to the development of ecological theories (Diamond 1975; Gilpin & Diamond 1982), and continue to remain at the forefront of ecological research (Veech 2013). Here, an absent species is considered part of the species pool if it typically co‐occurs with observed species present in the focal community (Ewald 2002; Münzbergová & Herben 2004; Riibak et al. 2015; Ronk, Szava‐Kovats & Pärtel in press; Fig. 2).

image
Conceptual flow diagram illustrating how ecological requirements of species, (i.e. Ellenberg indicator values) are used to predict the dark diversity of a focal community. Illustrated are the sequential steps in the calculation process (i‐iv) for a hypothetical assemblage (Community 1) comprising three species: species A (spA), species C (spC) and species D (spD) from a regional species list: species A, B, C, D & E. The influence of two user defined threshold values (i.e. a constant ‘t’ in which the standard deviation (SD) is multiplied), one broad (t1) and one narrow (t2), is represented. For ease of interpretation, the example represents only the minimum number of dimensions of an n‐dimensional Euclidean space, that is two dimensions (W & X).
image
Conceptual flow diagram illustrating how species co‐occurrence patterns calculated using Beal's smoothing multivariate transformation can be used to predict the dark diversity of a focal community. Illustrated are the sequential steps in the calculation process (i–iv), including the influence of two user defined threshold values, one broad (t1) and one narrow (t2).

The species pools in which both these methods attempt to measure are the same and comparable, that is the realized beta‐niche of plants (Silvertown et al. 2006); however, the size of dark diversity and in turn the size of species pools can vary as a result of applying different statistically justified threshold values. For the Ellenberg method, these thresholds determine the limits of similarity (i.e. Euclidean distance in an x‐dimensional ellipsoid hypervolume) based on standard deviations (Fig. 1). And, for the co‐occurrence method, thresholds reflect co‐occurrence probabilities (i.e. percentiles along a species distribution curve; Fig. 2). Broader thresholds, that is, larger standard deviations or small percentiles will result in larger estimates of dark diversity and vice a versa. The ability to adjust thresholds therefore adds great flexibility to the estimation methods. For example, population studies interested in detecting a single species would benefit from constraining the estimated size of dark diversity, resulting in greater confidence in dark diversity estimations. Alternatively, a lesser constraint may benefit community studies interested in investigating diversity patterns, proving a broad representation of a communities' dark diversity.

While both of these analytical methods mediate the difficulties faced when measuring species pools (i.e. identifying species with high dispersal and establishment probability but are otherwise absent), an empirical evaluation of these methods has yet to be explored. Given that dark diversity can for most taxa only be estimated and not measured directly, a robust evaluation of methods and statistically justified thresholds seems an obvious and necessary pre‐requirement before the dark diversity concept can be used with confidence for ecological and conservation purposes. Using two independent hierarchical nested data sets, we aim to empirically assess the accuracy of species ecological requirements and co‐occurrences as two methods for estimating dark diversity and in turn quantifying species pools.

Methods

Data

We used plant species data sampled from two very different vegetation types: coastal grassland vegetation of Scotland and forest vegetation of Switzerland, hereafter referred to as SG and SF, respectively. The data resulted from the Scottish Coastal Survey (1975–1977; Shaw, Hewett & Pizzey 1983) and a systematic sample of Swiss forests (Wohlgemuth et al. 2008). Both reflect a nested hierarchical sampling design, measuring nested community compositions from small to larger spatial scales. We use data from the smallest scale (SG = 4 m2 & SF = 30 m2) to estimate dark diversity and data from the largest scale (SG = 200 m2 & SF = 500 m2) to validate our estimations. For the respective habitat types, the 200 m2 and 500 m2 scales were suitably large enough to be representative of the community while not so large as to encompass different community types, maintaining environmental homogeneity. Plots where fewer than 3 species were observed at the smaller scale were removed from the analyses. In turn, the SG data set comprised 3033 nested plots encompassing 465 plant species, while the SF data set comprised 726 nested plots encompassing 410 plant species.

Estimating dark diversity

Our evaluation focused on two analytical methods employed to estimate dark diversity from compositional data: (i) ecological requirements of species, using Ellenberg indictor values; and (ii) species co‐occurrence patterns.

(i) Ellenberg indicator values

Ellenberg indicator values originate from Central European flora reflecting plant species preferences in respect to several environmental factors. For the Scottish data, values were adjusted for British plants (Hill et al. 1999) for four environmental factors soil nutrient status, pH, soil moisture and light. A plant community habitat preference was determined through pinpointing its position in a four‐dimensional hyperspace by calculating an average indicator value for all species in the community, in relation to the four habitat factors, each representing an individual axis. Because, Ellenberg indicator values only indicate the position of species optima along environmental gradients and contain no information on niche width, the ecological amplitude of species must be considered to prevent severe underestimation of the species pool. We therefore compiled species pool estimates through defining species ecological amplitudes as a chosen constant t multiplied by the standard deviations (SD) of the mean inside a four‐dimensional (w, x, y, z) ellipsoid hypervolume.
urn:x-wiley:2041210X:media:mee312443:mee312443-math-0001
where urn:x-wiley:2041210X:media:mee312443:mee312443-math-0002, urn:x-wiley:2041210X:media:mee312443:mee312443-math-0003, urn:x-wiley:2041210X:media:mee312443:mee312443-math-0004 and urn:x-wiley:2041210X:media:mee312443:mee312443-math-0005 represent the means and w, x, y and z the individual focal species values of each of the four Ellenberg dimensions (soil nutrient status, soil pH, soil moisture and light). t represents a constant in which the standard deviation (SD) of the community mean values is multiplied. Species pools were in turn estimated with different threshold distances through changing the value of the constant t (0·5, 1·0, 1·5 and 2·0). We selected two measures indicative of what we term ‘broad’ and ‘narrow’ thresholds following results derived from exploring dissimilarity in dark diversity size estimates from the two methods (see below). The broad and narrow thresholds govern the estimated size of dark diversity, being either less constrained (i.e. larger) if broad, or, more constrained (i.e. smaller) if narrow. For the Ellenberg method, species were included as part of the dark diversity component based on values of EA being less than or equal to 1 for four environmental proxies: soil nutrient status, pH, soil moisture and light (for a two‐dimensional example see Fig. 1).

(ii) Co‐occurrence patterns

The probability of species co‐occurrences was estimated using Beals index (Beals 1984), following the methods of Münzbergová & Herben (2004):
urn:x-wiley:2041210X:media:mee312443:mee312443-math-0006
where Si is the number of species at site i, Iij is the incidence (0, 1) of species j at community i.

Njk is the number of joint occurrences of species j and k, Iik is the incidence (0, 1) of species k at site i, and Nk is the number of occurrences of species k. For a given species, predicting dark diversity for sites where the species is absent required a species‐specific threshold. Frequent species will systematically be assigned a higher Beals index values relative to less frequent species. This was achieved by modelling co‐occurrence probabilities from sites where the focal species was actually present and setting threshold limits to the absolute minimum (sensu Münzbergová & Herben 2004). To exclude bias through outliers common in co‐occurrence probability distributions (Botta‐Dukat 2012), we also modelled the 1st, 5th and 10th percentile as a means of determining species‐specific threshold limits. Species with co‐occurrences probabilities outside the percentile range were excluded from dark diversity estimates (Fig. 2). Co‐occurrence thresholds operated to the same effect as the thresholds distances assigned to the Ellenberg analysis, that is it determined whether species were included or excluded from the dark diversity. As with the Ellenberg method, two threshold values were selected indicative of ‘broad’ and ‘narrow’. Threshold selection was based on minimum dissimilarity in dark diversity sizes (see below).

Statistical analysis

Adjusting the dark diversity size

We performed a dissimilarity analysis to ensure we selected threshold values for both the Ellenberg and co‐occurrence methods that result in reasonably similar sizes of dark diversity. This initial procedure was necessary to ensure a fair comparison of the two methods. Large differences in dark diversity sizes between the two methods can lead to spurious interpretations. Therefore, data from the small‐scale plots were used to estimate species pools following the methods detailed above, systematically substituting threshold values. Congruence between methods with differing thresholds was tested through a ratio of species pool size estimates from co‐occurrence over that of the Ellenberg approach:
urn:x-wiley:2041210X:media:mee312443:mee312443-math-0007

Values closest to zero identified threshold values that result in the lowest dissimilarity between the two methods. We selected two threshold values with a median lr1 closest to zero. Each threshold represented a narrow filter (i.e. minimizing the species pool) and a broad filter (i.e. maximizing the species pool) and used to investigate the method performance.

Empirical dark diversity performance testing

To evaluate the accuracy of the two methods,, we validated the species pool estimates derived from modelling 465 species (SG) and 410 species (SF) recorded at the smaller scale plots, with species observed at the larger scale 200 m2 (SG) and 500 m2 (SF) plots. Species new to the data, that is species only observed at the larger scales, were removed from the analyses, as these cannot be estimated. Data were decomposed using the following notation: a = number of species observed at the larger scale and also predicted from modelling the smaller scale compositions, b= number of species observed at the larger‐scale but not predicted (negative mismatch), while c = number of species predicted but not observed at the larger scale (positive mismatch). Expressed in terms of matching and mismatching components, we use ternary plots to examine the congruence between the two methods. Differences between ‘a’, ‘b’ and ‘c’ components derived from the two methods were statistically tested using paired t‐tests. Where necessary, data were transformed to meet assumptions of normality and homogeneity of variance.

Because it is unlikely that all species belonging to a regional‐scale habitat‐specific species pool to be observed at the upper scales of the data sets (i.e. 200 m2 and 500 m2, for SG and SF, respectively), we extended the above analyses to include a further four scales from the centre of each focal assemblage (SG = 0·2, 0·5, 1 and 2 km radius and, SF = 4, 6, 8 and 10 km radius). These scales allowed us to evaluate the proportions of positive mismatches that would otherwise be observed if sampling effort were increased, that is species that belong to the dark diversity. This was achieved by compiling species lists of observed species in all larger scale plots within an increasing radius (km) of each focal assemblage. Still, even here, large saturation of observed relative to the species pool is not expected, as even within these broader scales, we use a limited number of sample plots and not complete inventories. All statistical analyses were done in r version 3·0·2 (R Development Core Team 2014).

Results

Adjusting the dark diversity size

As can be expected, size estimates of dark diversity were shown to be sensitive to the user defined threshold values. For SG, congruence between the two methods was closest where threshold values equalled 2 standard deviations of the mean (Ellenberg) and a 1 percentile limit (co‐occurrence); median lr1 = 0·05 (Fig. 3a & Table S1). While for SF similarity in pool size was high where threshold values equalled 1·5 standard deviations of the mean (Ellenberg) and a zero percentile (co‐occurrence); median lr1 = −0·18 (Fig. 3b & Table S1). These threshold values determined the ‘broad’ filter, increasing the size of dark diversity and in turn the size of the species pool (SG median pool size = 63 & 60 and SF = 72 & 84 for the co‐occurrence and Ellenberg methods, respectively). Threshold values 1·5 and 1 standard deviations of the mean (Ellenberg) and a 10th percentile limit (co‐occurrence) delineated the ‘narrow’ filters for SG and SF, respectively, decreasing the size of dark diversity and in turn the size of the species pool (SG median pool size = 26 & 30 and SF = 26 & 29 for the co‐occurrence and Ellenberg methods, respectively). These restricted threshold values also resulted in close to zero lr1 values: SG lr1 = +0·14 and SF lr1 = +0·11 (Fig. 3; Table S1).

image
Box plots displaying disparity in estimated species pool size between the both methods (co‐occurrence & Ellenberg) for (a) Scottish grasslands and (b) Swiss forests. Presented are all combinations of tested thresholds between the co‐occurrence (0, 1, 5, & 10 percentiles) and the Ellenberg (0·5, 1, 1·5 and 2 standard deviations of the mean.

Empirical dark diversity performance testing

The co‐occurrence method outperformed the Ellenberg method irrespective of the data set used with significantly larger proportions of species observed and predicted resulting from the co‐occurrence method, relative to the Ellenberg (Fig. 4, Table 1). Negative mismatches (i.e. observed but not predicted) were frequent among the Ellenberg method, while large proportions of positive mismatches (i.e. predicted but not observed) prevailed across both methods (Fig. 4). These patterns were statistically validated. Negative mismatches and indeed positive mismatches resulting from the Ellenberg method were statistically and significantly more frequent relative to the co‐occurrence across both data sets (Table 1). Restricting size estimates of dark diversity (i.e. narrow thresholds) resulted in greater proportions of negative mismatches for both methods (Fig. 4b, d, f, h). However, significantly fewer prevailed among estimates resulting from the co‐occurrence method (Table 1). While neither method resulted in large proportions of component ‘a’, that is where species predicted from smaller scale observations were in turn observed at the larger scale, increasing the observation scale did increase predictive success. Using only broad threshold values, SG predictive success increased from 18% at 200 m2 to 67% (2 km scale) using the co‐occurrence method and from 8% to 32% using the Ellenberg method (Fig. 5a). Similarly, SF predictive success increased from 18% at 500 m2 to 60% (10 km scale) using the co‐occurrence method and from 8% to 35% using the Ellenberg method (Fig. 5b).

Table 1. Results of paired t‐tests statistically comparing co‐occurrence and Ellenberg methods in estimating dark diversity of Scottish grasslands (SG) and Swiss forest (SF). Presented are the mean differences (x diff; co‐occurrence–Ellenberg), t and p statistics from comparisons of individual three part compositions where a = species observed & predicted, b = observed and not predicted (negative mismatch) and c = predicted and unobserved (positive mismatch), for both broad and narrow threshold filters
Broad Narrow
SG Component x diff t P x diff t P
Co‐occurrence: Ellenberg a 0·61 80·13 <0·001 0·79 76·67 <0·001
b −0·96 −86·44 <0·001 −0·58 −70·64 <0·001
c −0·03 −2·64 0·008 0·02 1·31 0·19
SF Component x diff t P x diff t P
Co‐occurrence: Ellenberg a 0·61 27·91 <0·001 0·94 32·63 <0·001
b −0·66 −29·43 <0·001 −0·4 −31·17 <0·001
c −0·11 −4·18 <0·001 0·02 0·52 0·6
image
Ternary diagrams illustrating proportions of observed & predicted, predicted and unobserved (positive mismatch) and observed and not predicted (negative mismatch) for species pool predictions from Scottish grasslands (a:d) and Swiss forests (e:h). Presented are results from two tested methods: Ellenberg (a, b, e, f) and Co‐occurrence (c, d, g, h) representing threshold values maximizing the species pool (broad filter; a, c, e, g) and minimizing the species pool (narrow filter; b, d, f, h).
image
Cumulative per cent of observed and predicted species (± standard error) with increasing distance from the focal plot (Scottish grasslands (a) 200–2000 m; Swiss forests (b) 4000–10 000 m). Observed species at these scales are compound observations across all 200 m2 (Scottish grasslands) and 500 m2 (Swiss forests) plots within the defined spatial extent.

Discussion

Dark diversity, that is species theoretically able to exist in a community but otherwise absent, is an important component of species pools. Unlike, its counter‐part ‘observed diversity’, dark diversity cannot be directly measured, irrespective of spatial scale. This presents a challenge for deriving ‘complete’ estimates of a species pool, as it is vital the absent component be accurately accounted. Such difficulties may in part explain the relatively limited advance in the species pool concept since its formalization (Eriksson 1993; Pärtel et al. 1996). Most often, studies define the species pool subjectively or arbitrarily, using regional or local species inventories (Williams 1947; White & Hurlbert 2010), or by pooling local‐scale observations (Belote, Sanders & Jones 2009; Chase et al. 2011; Kraft 2011). Such unstandardized approaches inherently suffer from bias (Hortal et al. 2008), fail to consider species ecological affinity to a focal assemblage and, importantly, ignore entirely dark diversity. Through the use of two independent and disparate data sets, the present study focused on two analytical approaches: Ecological requirements of species taken as Ellenberg indicator values, and species co‐occurrence patterns, which both alleviate the difficulties in accurately constructing species pools. Both methods consider species ecological affinities and both account for dark diversity. However, our results suggest the two methods are not equal in terms of overall performance accuracy.

The results of this study suggest that the co‐occurrence approach, relative to the Ellenberg method, to be more consistent in successfully predicting species observed with increasing sampling scale. The disparity between the two methods was especially clear when species observations were extended across larger Euclidean distances up to 2 and 10 km (depending on the analysed data set) from the focal assemblage, indicating a large proportion of positive mismatches through the co‐occurrence approach accounted for local dark diversity and not an overestimation of the species pool. In fact, in this study, disparity between method performances cannot be linked to over‐ and/or underestimations of species pools since we determined and applied threshold values that result in convergent dark diversity sizes. The latter is an important point. Threshold values applied here were not designed to optimize overall estimation performance, as would be for a specific ‘case study’. Using ecological requirements of species thus simply resulted in more instances where estimates failed to account for locally observed species relative to the co‐occurrence approach.

The relatively better performance of co‐occurrence estimates of dark diversity can be explained in part by the community composition data from which they are derived. As is well understood, compositional data typically display greater dissimilarity with increasing distance, that is they exhibit spatial autocorrelation. This autocorrelation in a broad‐sense captures information on abiotic, biotic and disturbance factors; thus, the co‐occurrence patterns themselves are the response of the prevailing ecological conditions and emphasizing ecological control (Whittaker 1956; Bray & Curtis 1957; Hutchinson 1957). This makes the co‐occurrence approach exceptionally holistic in determining compatible absent species, that is dark diversity, principally because estimates are based on fairly accurate representations of a species realized niche. This however, is also a limiting factor. Species are not always in equilibrium with the abiotic environment (Svenning & Skov 2004), yet estimating species pools through species concurrences assumes that they are. Co‐occurrence of ‘leading’ and ‘trailing’ edge populations (Svenning & Sandel 2013) may thus results in an overestimation of dark diversity, estimating species to be present which are otherwise environmentally non‐adapt.

On the contrary, the relatively poorer performance of the Ellenberg approach might be explained through its estimates being based on broader representations of a species’ realized niche. The Ellenberg method considers a species part of a community's dark diversity if an absent species shares similar environmental tolerances to those observed in a community. However, the ability to first establish in that community is not considered. Using co‐occurrence patterns, the likelihood of a species to establish among a community given both abiotic and biotic constraints is inherently considered. However, species establishment probability derived from Ellenberg values considers primarily the abiotic environment and even then limited to mean values of just a few specific ecological gradients. The approach thus disregards important mechanisms that govern species community assembly (e.g. biotic exclusion, propagule dispersal and population growth) and in turn fails to fully consider a vital pre‐requisite when estimating species pools, that is to focus only on those species with a reasonable probability to disperse to the study site (Zobel, van der Maarel & Dupré 1998; Pärtel, Szava‐Kovats & Zobel 2011, 2013).

One major challenge in quantifying dark diversity and species pools is how the relative scales (i.e. regional, local and community pools) are delineated. Here, the confines of our data defined our regional pool as all species with the ability to disperse to and establish in a given focal assemblage, but limited to the gamma‐diversity of the entire survey area. Applying either method across large geographic scales can result in an overestimation of populations belonging to the dark diversity. For example, for species co‐occurrences, generalist species that possess broad geographic ranges and which frequently occur can increase the probability that a species historically constrained to one region to be included into a communities’ dark diversity in a region outside the species range. Although not strictly incorrect, it must first be asked whether realized or fundamental ranges are of interest as well as consideration of populations that may not necessarily be at equilibrium with the abiotic environment. Therefore, at broad scales prior to the application of co‐occurrence or even ecological requirements of species to estimate dark diversity, it may be beneficial first to employ additional filters. This can be achieved through the application of concentric circles around a study site (sensu Graves & Gotelli 1983; geographic species pool), or by considering only species that adhere to similar historical and environmental conditions, that is the biogeographic species pool (Carstensen et al. 2013), see Ronk, Szava‐Kovats & Pärtel (in press). This may even provide a possible explanation for the large proportion of predicted and unobserved species (i.e. positive mismatch) apparent in this regional‐scale study, irrespective of the estimation method and vegetation type.

For two distinctly separate vegetation types, we report significantly more negative mismatches (i.e. observed & unpredicted) resulting from the Ellenberg method than the co‐occurrence method. Nevertheless, in some cases, the use of ecological requirements of species may still provide a suitable means of estimating absent species. For example, in this study, we focus on Ellenberg indicator values that are based on individual expert knowledge about species ecological requirements (Ellenberg et al. 1991). Although generally considered good estimates of real environmental conditions, some issues have undergone discussion as to whether Ellenberg indicator values are limited to certain parts of a given ecological gradient, vegetation type (Diekmann 2003) and region of origin (Godefroid & Elías 2007), which together can limit the utility of using Ellenberg values for macro‐ecological species pool estimates. Alternatively, use of Ellenberg might perform better where habitat heterogeneity in the data is high (e.g. along a successional gradient), providing broad ranges of Ellenberg values. In such cases, a variation of the approach used here has been shown to perform well working with nationwide habitat types (Pärtel et al. 1996).

Furthermore, although effective in this study, the success of co‐occurrence patterns in detecting absent species is strongly dependent on large compositional data sets from which patterns can be analysed. The higher a species frequency the higher the likelihood of successful predictions (Münzbergová & Herben 2004), with 40 sampled units considered the minimum threshold in order to achieve good correlation between predicted and modelled species (De Cáceres & Legendre 2008). The use of this method can therefore be limited where data are sparse or where species incidence is low, that is where suites of species are temporally hidden (Pärtel 2014) or are simply rare. In fact, disentangling unsampled species with dark diversity is problematic from a single temporal observation, irrespective of the method. Nevertheless, techniques are available whereby sampling effort can be tested, for example species accumulation and rarefaction curves, Jackknife and Chao estimates (Gotelli & Colwell 2001; Cao et al. 2007). Estimating dark diversity for undersampled studies is therefore problematic for the co‐occurrence approach, and in such instances, regional species lists and ecological requirements of species might prove a useful alternative.

Our results are clear in that the co‐occurrence outperforms the Ellenberg method. Nevertheless, given our empirical approach, it remains difficult to make generalizations concerning how the two methods might respond to different ecological factors. For example, as already discussed, generalist species may result in overestimating dark diversity, but to what extent remains unanswered. Similarly, how the two methods might respond to rarity, diversity, homogeneity/heterogeneity among the data, as well as variation in pool size, sample size, functional composition and dominant assembly processes cannot be inferred empirically. Explorative simulations would help in addressing these types of questions, continuing to increase our understanding of the types of ecological factors that enhance or limit successful dark diversity estimates.

Summary

Arguably, absent species can be as ecologically meaningful as present species, and in part, only tradition dictates focus on one rather than the other. Dark diversity has potential to unveil valuable information, revealing underlying community processes, and to enhance our understanding of the distribution and function of species (de Bello et al. 2012; Riibak et al. 2015). However, estimating dark diversity is challenging and not straightforward. This study brings new empirical evidence highlighting the efficacy of two methods applied towards accurately estimating dark diversity. We demonstrate through empirical hierarchically sampled data sets that both species ecological requirements and species co‐occurrence patterns provide an opportunity to examine the dark diversity of ecological systems. In particular, our results highlight that using species co‐occurrence for estimating dark diversity and in turn species pool calculations, ultimately provides far greater accuracy than using species ecological requirements for macro‐scale assessment of ecological communities. While further work is clearly required to fully understand responses of dark diversity estimates to various ecological structures, processes and patterns reflected in community data, this study does indeed demonstrate that accurate estimates of dark diversity are achievable.

Acknowledgements

We thank colleagues of the Macroecology workgroup (www.botany.ut.ee/macroecology/en/) for helpful discussions and comments on this work. In particular, we thank Matthew Spencer, (University of Liverpool) and Robin Pakeman (James Hutton Institute) for their helpful comments on an earlier version. We are also grateful to have received very helpful comments from three anonymous reviewers, their efforts helped to significantly improve the manuscript. This work has been supported by the European Regional Development Fund (Centre of Excellence FIBIR) and by the institutional research funding (IUT 20‐29) of the Estonian Ministry of Education and Research.

    Data accessibility

    SG data used for these analyses are archived in the Dryad digital repository doi: 10.5061/dryad.gv506. SF data are freely available under the r package ‘dave’ (Wildi 2013).

      Number of times cited according to CrossRef: 35

      • Temporal lags in observed and dark diversity in the Anthropocene, Global Change Biology, 10.1111/gcb.15093, 26, 6, (3193-3201), (2020).
      • Dark diversity reveals importance of biotic resources and competition for plant diversity across habitats, Ecology and Evolution, 10.1002/ece3.6351, 10, 12, (6078-6088), (2020).
      • Drivers of plant community completeness differ at regional and landscape scales, Agriculture, Ecosystems & Environment, 10.1016/j.agee.2020.107004, 301, (107004), (2020).
      • Deriving site‐specific species pools from large databases, Ecography, 10.1111/ecog.05172, 43, 8, (1215-1228), (2020).
      • Functional traits determine why species belong to the dark diversity in a dry grassland fragmented landscape, Oikos, 10.1111/oik.07308, 129, 10, (1468-1480), (2020).
      • Community completeness as a measure of restoration success: multiple-study comparisons across ecosystems and ecological groups, Biodiversity and Conservation, 10.1007/s10531-020-02050-1, (2020).
      • Requirements of plant species are linked to area and determine species pool and richness on small islands, Journal of Vegetation Science, 10.1111/jvs.12758, 30, 4, (599-609), (2019).
      • DarkDivNet – A global research collaboration to explore the dark diversity of plant communities, Journal of Vegetation Science, 10.1111/jvs.12798, 30, 5, (1039-1043), (2019).
      • sPlot – A new tool for global vegetation analyses, Journal of Vegetation Science, 10.1111/jvs.12710, 30, 2, (161-186), (2019).
      • A novel method to predict dark diversity using unconstrained ordination analysis, Journal of Vegetation Science, 10.1111/jvs.12757, 30, 4, (610-619), (2019).
      • The call of the wild: Investigating the potential for ecoacoustic methods in mapping wilderness areas, Science of The Total Environment, 10.1016/j.scitotenv.2019.133797, 695, (133797), (2019).
      • Using completeness and defaunation indices to understand nature reserve’s key attributes in preserving medium- and large-bodied mammals, Biological Conservation, 10.1016/j.biocon.2019.108273, (108273), (2019).
      • Throwing light on dark diversity of vascular plants in China: predicting the distribution of dark and threatened species under global climate change, PeerJ, 10.7717/peerj.6731, 7, (e6731), (2019).
      • Dark diversity in the dark: a new approach to subterranean conservation, Subterranean Biology, 10.3897/subtbiol.32.38121, 32, (69-80), (2019).
      • Specialisation and diversity of multiple trophic groups are promoted by different forest features, Ecology Letters, 10.1111/ele.13182, 22, 1, (170-180), (2018).
      • Landscape history confounds the ability of the NDVI to detect fine‐scale variation in grassland communities, Methods in Ecology and Evolution, 10.1111/2041-210X.13036, 9, 9, (2009-2018), (2018).
      • A simple survey protocol for assessing terrestrial biodiversity in a broad range of ecosystems, PLOS ONE, 10.1371/journal.pone.0208535, 13, 12, (e0208535), (2018).
      • Global Patterns in Local and Dark Diversity, Species Pool Size and Community Completeness in Ectomycorrhizal Fungi, Biogeography of Mycorrhizal Symbiosis, 10.1007/978-3-319-56363-3_18, (395-406), (2017).
      • Effects of landscape composition, species pool and time on grassland specialists in restored semi-natural grasslands, Biological Conservation, 10.1016/j.biocon.2017.07.037, 214, (176-183), (2017).
      • Estimating species pools for a single ecological assemblage, BMC Ecology, 10.1186/s12898-017-0155-7, 17, 1, (2017).
      • Predicting species establishment using absent species and functional neighborhoods, Ecology and Evolution, 10.1002/ece3.2804, 7, 7, (2223-2237), (2017).
      • Dispersal limitation determines large‐scale dark diversity in Central and Northern Europe, Journal of Biogeography, 10.1111/jbi.13000, 44, 8, (1770-1780), (2017).
      • Historical biome distribution and recent human disturbance shape the diversity of arbuscular mycorrhizal fungi, New Phytologist, 10.1111/nph.14695, 216, 1, (227-238), (2017).
      • Using dark diversity and plant characteristics to guide conservation and restoration, Journal of Applied Ecology, 10.1111/1365-2664.12867, 54, 6, (1730-1741), (2017).
      • Observed and dark diversity of alien plant species in Europe: estimating future invasion risk, Biodiversity and Conservation, 10.1007/s10531-016-1278-4, 26, 4, (899-916), (2016).
      • Applying the dark diversity concept to nature conservation, Conservation Biology, 10.1111/cobi.12723, 31, 1, (40-47), (2016).
      • The community ecology of invasive species: where are we and what's next?, Ecography, 10.1111/ecog.02446, 40, 2, (335-352), (2016).
      • Measuring size and composition of species pools: a comparison of dark diversity estimates, Ecology and Evolution, 10.1002/ece3.2169, 6, 12, (4088-4101), (2016).
      • Mycorrhizal diversity: Diversity of host plants, symbiotic fungi and relationships, Fungal Ecology, 10.1016/j.funeco.2016.09.001, 24, (103-105), (2016).
      • Does pollen-assemblage richness reflect floristic richness? A review of recent developments and future challenges, Review of Palaeobotany and Palynology, 10.1016/j.revpalbo.2015.12.011, 228, (1-25), (2016).
      • Hybrid ecosystems can contribute to local biodiversity conservation, Biodiversity and Conservation, 10.1007/s10531-016-1218-3, 25, 14, (3023-3041), (2016).
      • Distribution and community structure of chloropid flies (Diptera: Chloropidae) in Nearctic glacial and post‐glacial grasslands, Insect Conservation and Diversity, 10.1111/icad.12180, 9, 4, (358-368), (2016).
      • Large‐scale dark diversity estimates: new perspectives with combined methods, Ecology and Evolution, 10.1002/ece3.2371, 6, 17, (6266-6281), (2016).
      • Species pools, community completeness and invasion: disentangling diversity effects on the establishment of native and alien species, Ecology Letters, 10.1111/ele.12702, 19, 12, (1496-1505), (2016).
      • Macroecology of biodiversity: disentangling local and regional effects, New Phytologist, 10.1111/nph.13943, 211, 2, (404-410), (2016).