Volume 9, Issue 6 p. 1401-1408
APPLICATION
Free Access

mobsim: An r package for the simulation and measurement of biodiversity across spatial scales

Felix May

Corresponding Author

Felix May

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany

Institute of Computer Science, Martin-Luther University Halle-Wittenberg, Halle (Saale), Germany

Correspondence

Felix May

Email: [email protected]

Search for more papers by this author
Katharina Gerstner

Katharina Gerstner

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany

Search for more papers by this author
Daniel J. McGlinn

Daniel J. McGlinn

Biology Department, College of Charleston, Charleston, SC, USA

Search for more papers by this author
Xiao Xiao

Xiao Xiao

School of Biology and Ecology, and Senator George J. Mitchell Center for Sustainability Solutions, University of Maine, Orono, ME, USA

Search for more papers by this author
Jonathan M. Chase

Jonathan M. Chase

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany

Institute of Computer Science, Martin-Luther University Halle-Wittenberg, Halle (Saale), Germany

Search for more papers by this author
First published: 16 February 2018
Citations: 29

Abstract

  1. Estimating biodiversity and its change in space and time poses serious methodological challenges. First, there has been a long debate on how to quantify biodiversity, and second, measurements of biodiversity and its change are scale-dependent. Therefore, comparisons of biodiversity metrics between communities are ideally carried out across scales. Simulations can be used to study the behaviour of biodiversity metrics across scales, but most approaches are system specific, plagued by large parameter spaces, and therefore cumbersome to use and interpret. However, realistic spatial biodiversity patterns can be generated without reference to ecological processes, which suggests a simple simulation framework as important tool for ecologists.
  2. Here, we present the r package mobsim that allows users to simulate the abundances and the distributions of individuals of different species in a spatially explicit landscape. Users can define key properties of communities, including the total number of individuals, the species-abundance distribution (SAD) and the degree of intraspecific spatial aggregation. Furthermore, the package provides functions that derive biodiversity measures, such as rarefaction curves and species–area relationships (SAR), from simulated communities or from observed data, as well as functions that simulate different sampling designs.
  3. We show several example applications of the package. First, we illustrate how species rarefaction and accumulation curves can be used to disentangle changes in the fundamental components that underlie biodiversity: (i) total abundance, (ii) species-abundance distribution and (iii) species aggregation. Second, we demonstrate how mobsim can be applied to assess the performance of species-richness estimators. The latter indicates how spatial aggregation challenges classical non-spatial species-richness estimators.
  4. mobsim allows the simulation and analysis of a large range of biodiversity scenarios and sampling designs in a comprehensive way by directly manipulating key community properties. The simplicity and control provided by the package also makes it a useful didactic tool. The combination of controlled simulations and their analysis will facilitate a more rigorous interpretation of real-world data that exhibit sampling effects and scale dependence.

1 INTRODUCTION

Understanding how biodiversity varies in space and time poses one of the greatest challenges in ecology. This challenge is at least partly caused by the dependence of most biodiversity measurements on spatial scale (Rahbek, 2005; Rosenzweig, 1995) and on the specific biodiversity index used (reviewed in Magurran & McGill, 2011). Any biodiversity index (e.g. species richness, Shannon or Simpson diversity) transforms the numbers of individuals and species in a given sample into a single metric that necessarily only captures a portion of information about the underlying abundances and spatial distributions. Furthermore, biodiversity measures vary nonlinearly with spatial scale and thus any comparisons among samples will typically be highly scale-dependent (Chase & Knight, 2013). Despite continued development of approaches for estimating and comparing diversity measures (e.g. Jost, 2006; Colwell et al., 2012), no single measure can capture all of the relevant information about biodiversity and its change. Although we focus on taxonomic diversity measures here, the same issues apply for measurements of functional and (phylo)genetic diversity (Chao, Chiu, & Jost, 2014).

Here, we introduce the software package mobsim that facilitates understanding and interpretation of biodiversity changes across spatial scales. All biodiversity measures, including local (α-) and regional (γ-) diversity, as well as their scaling relationships (measures of β-diversity), depend on three underlying components, namely (i) the total abundance of individuals, (ii) the species-abundance distribution (SAD) (which includes the total number of species and their (un)evenness in abundance) and (iii) the spatial distributions of individuals and species (He & Legendre, 2002; McGill, 2011). mobsim includes spatially explicit simulation tools, which allow user-defined manipulations of these biodiversity components. In nature, these components emerge from species traits and dynamic ecological processes, but with mobsim, we can vary the emergent properties (e.g. abundances and distributions) without specifying lower level mechanisms (e.g. species interactions, dispersal limitation and/or habitat filtering). Many attempts have been made to link biodiversity patterns with ecological processes (McGill, 2010), but there is a substantial disagreement on the relative importance of different processes. We argue that regardless of which ecological processes are operating in a given community, species abundances and spatial distributions have important effects on key biodiversity metrics. Our package allows ecologists to better understand how user-defined differences in community structure influence biodiversity statistics, particularly when it comes to scale dependence in biodiversity measures.

A useful analogy for understanding and teaching how sampling processes influence resulting biodiversity measurements is to consider the problem of drawling jellybeans from a jar (Gotelli & Colwell, 2001; Heard, 2016). In a similar vein, mobsim simulates individuals of different species (the proverbial “jelly beans”) in a landscape and thus allows studying the influence of sampling and scale, as well as the interrelatedness between different biodiversity indices and relationships in a comprehensive way. The package provides functions for three purposes (1) the simulation of communities in space, (2) the analysis of biodiversity relationships and (3) the simulation of different spatial sampling designs (Figure 1, Table 1).

Details are in the caption following the image
General overview on the purposes of mobsim. The package provides functions to simulate species abundances and spatial distributions based on user-defined parameters, including the numbers of species and individuals, the species–abundance distribution and the aggregation of conspecific individuals. Simulated or observed distributions can be analysed and visualised using mobsim functions. Alternatively, sampling processes can be simulated using mobsim and the analysis can be done with additional software for classical site-by-species matrices
Table 1. List of main functions in mobsim
Function name General purpose Description Arguments
sim_sad Simulation Simulate local species-abundance distributions (SADs)

No. of species (s_pool)

No. of individuals (n_sim)

Statistical model of the SAD (sad_type)a

Coefficient of the SAD model (sad_coef)a

Option to fix the simulated richness at the desired value (fix_s_sim)

sim_poisson_coords Simulation Add spatially random coordinates to an SAD

Species abundance vector (abund_vec)

Landscape extent in x and y dimension (xrange, yrange)

sim_thomas_coords Simulation Add coordinates with intraspecific aggregation to an SAD

Species abundance vector (abund_vec)

Parameter vector for cluster sizes (sigma)

Parameter vector for no. of clusters (mother_points)

Parameter vector for no. of individuals per cluster (cluster_points)

Landscape extent in x and y dimension (xrange, yrange)

sim_poisson_community Simulation Simulate a community with certain SAD and spatially random coordinates Combines the arguments of sim_sad and sim_poisson_coords
sim_thomas_community Simulation Simulate a community with certain SAD and intraspecific aggregation Combines the arguments of sim_sad and sim_thomas_coords
spec_sample_curve Analysis Derive individual-based species rarefaction curves (non-spatial) and species accumulation curves (spatially explicit)

Community object (comm)

Option to choose rarefaction or accumulation curve (method)

divar Analysis Derive diversity–area relationships from randomly located square sampling plots of different sizes. Diversity indices include species richness, no. of endemics, Shannon index, Simpson index and the effective numbers of species based on Shannon and Simpson indices

Community object (comm)

Vector of sampling areas (prop_area)

Number of samples for each sampling area (n_samples)

Option to exclude samples without any individuals (exclude_zeros)

dist_decay Analysis Derive the distance–decay function from pairwise community similarity measures of square sampling plots

Community object (comm)

Sampling area (prop_area)

Number of samples (n_samples)

Similarity index (method)b

sample_quadrats Sampling Virtual sampling of different communities using square plots of user-defined sizes. Users can choose different sampling designs, including random sampling, transect and lattice designs.

Community object (comm)

No. of sampling quadrats (n_quadrats)

Area of sampling quadrats (quadrat_area)

Option to plot the sampling design (plot)

Type of sampling design (method)

Option to avoid overlapping sampling with a random design (avoid_overlap)

Position of lower left quadrat for transect and grid designs (x0, y0)

Distances among neighbouring quadrats for transect and grid designs (delta_x, delta_y)

  • a The parameter values are used as arguments in sads::rsad().
  • b This parameter value is used as argument for vegan::vegdist().

Specifically, in spatially explicit simulations users define the total numbers of individuals and species, the shape and evenness of the SAD, and the intraspecific aggregation of species. Functions for the analysis of biodiversity patterns, such as rarefaction curves (Gotelli & Colwell, 2001) and species–area relationships (Rosenzweig, 1995), allow users to assess how different biodiversity indices vary with spatial scale and/or sampling effort. Finally, the package provides functions to simulate different sampling designs and to convert spatially explicit data into classical community matrices (i.e. sites-by-species abundances matrices). These matrices can then be analysed using standard analytical tools (Legendre & Legendre, 2012) to assess how the simulated changes are expressed in measures of biodiversity and influenced by the sampling design. The package is available on cran and on Github (see Data accessibility). Key aspects of mobsim are also available as interactive shiny application (http://idiv-app1.inf-bb.uni-jena.de:8080/mobsim_app/).

2 PACKAGE DESCRIPTION

2.1 Simulation of community data

An ecological community is characterised by its species-abundance distribution (SAD) and by the spatial distribution of individuals. In mobsim, users can use a predefined SAD and add simulated positions of individuals, or simulate both the SAD and the positions (Table 1). For the simulation of SADs, a wrapper around the function rsad from the r package sads is provided, which offers many options for the underlying statistical distribution (Prado, Miranda, & Chalom, 2016). In contrast to sads::rsad, the function mobsim::sim_sad allows the simultaneous specification of the simulated number of individuals and the number of species in the pool.

The spatial coordinates of individuals are simulated using simple stochastic point processes in mobsim, either as a Poisson process, where individuals are placed randomly, or as a Thomas process, where individuals of the same species are clustered (Wiegand & Moloney, 2014). For the Thomas process, a common model of intraspecific aggregation in ecology (e.g. Morlon et al., 2008; Plotkin et al., 2000), users define the numbers and sizes of the clusters, as well as the number of individuals per cluster, either independently or jointly for all species. The Thomas process only considers intraspecific aggregation while individuals of different species are distributed randomly with respect to each other (McGill, 2010).

2.2 Analysis of community data

mobsim offers several functions to derive spatial and non-spatial measures and relationships from simulated or empirical data. The function spec_sample_curve derives the expected number of species given a certain number of sampled individuals. Individuals are sampled either randomly, giving the well-known individual-based rarefaction curve (Gotelli & Colwell, 2001), or sampling proceeds from a focal individual to the nearest neighbour, which results in the spatial species–accumulation curve (spatial SAC) (Chiarucci et al., 2009). Note that this is different from the sample-based accumulation curve described in Gotelli and Colwell (2001), which considers the distribution of individuals among samples, but not the spatial location of plots.

The function divar (diversity–area relationships) estimates several diversity indices for randomly distributed sampling plots of user-defined areas. Accordingly, this function can be used to derive the species–area and endemics–area relationships (Harte & Kinzig, 1997; Rosenzweig, 1995). In addition, divar estimates the Shannon- and Simpson diversity indices, and their corresponding effective numbers of species (ENS) (Jost, 2006). The sampling plots of different areas are distributed independently of each other, instead of being fully nested.

The function dist_decay estimates the distance–decay relationship of community similarity (DDR) (Morlon et al., 2008) from randomly distributed, non-overlapping plots of user-defined area. The calculation is based on pairwise indices of compositional similarity among plots using the function vegdist from the r package vegan (Oksanen et al., 2017). This function calculates many (dis)similarity indices for abundance (e.g. Bray–Curtis index) and presence–absence data (e.g. Jaccard and Sørensen indices). By default, dist_decay uses the Bray–Curtis index and calculates Euclidian distances among sampling plots.

2.3 Sampling of community data

mobsim also provides functionality to simulate sampling processes by distributing sampling quadrats in a community. The data type provided by the sampling is the classical sites-by-species matrix (Legendre & Legendre, 2012). Users can choose the size and number of quadrats, as well as the spatial design.

3 EXAMPLE APPLICATIONS

Here, we present two example applications of mobsim: (i) changes of biodiversity components and (ii) an assessment of species-richness estimators. We provide a third example on extinctions due to habitat loss, as well as the r code for all applications, in the online supporting information.

3.1 Changes of single biodiversity components

The biodiversity in a sampled area depends on three components that can vary independently: (1) the total number of individuals, (2) the SAD of the species pool and (3) the spatial distribution of individuals and species (Chase & Knight, 2013; McGill, 2011). Here, we show how the combination of rarefaction and accumulation curves can be used to disentangle changes in these three biodiversity components (Figure 2). The key point is that the shape of the rarefaction curve only depends on the underlying SAD, but not on the spatial distribution, while the shape of the accumulation curve depends on both the SAD and the spatial distribution (McGlinn et al., 2018). First, we randomly removed 50% of all individuals. This does not affect the underlying SAD, as indicated by overlapping rarefaction and accumulation curves that end at different numbers of individuals (Figure 2a,b). Second, we simulated communities with lower evenness by increasing the variation in species abundances. This resulted in changes in both the rarefaction and the accumulation curves (Figure 2c,d). Here, the difference (or ratio) between the curves changes with sampling effort, which indicates scale and/or sampling effort dependent effect sizes (Chase & Knight, 2013). Despite having the same number of species in the pool, the simulated communities differ in species richness even for the maximum number of individuals, because the rarest species are not sampled into the local community. See Figure S1 for the same figure that used a fixed species richness of the simulated community, where the curves converge at the largest number of individuals. Third, we add intraspecific aggregation by using a Thomas process instead of a Poisson process, which does not affect the rarefaction curve, but leads to lower expected species richness in the spatial species–accumulation curve (Figure 2e,f).

Details are in the caption following the image
Simulated species rarefaction and accumulation curves for changes of three different biodiversity components. The black lines and intervals show the reference community with 2,000 individuals and 100 species in the species pool, a log-normal species-abundance distribution with meanlog = 3 and sdlog = 1, and spatially random positions (Poisson distribution). The red lines show changed communities for half the number of individuals (first column), a decrease in evenness (second column), simulated as higher variation in abundances with sdlog = 1.5 and a higher intraspecific aggregation (third column), simulated with a Thomas process with cluster extent sigma = 0.02. Ribbons indicate 95% confidence intervals derived from 1,000 replicate simulations

3.2 Testing species-richness estimators

We are often interested in inferring total species richness in a region based on a limited number of samples from within that region, and a number of species-richness estimators have been developed to accomplish this (Chiu, Wang, Walther, & Chao, 2014; Colwell & Coddington, 1994). However, it remains an open question of how well these estimators perform for different communities and for different sampling strategies (Colwell & Coddington, 1994; Reese, Wilson, & Flather, 2013). The simulation tools of mobsim are well suited to address this issue.

Here, we assess the performance of a bias-corrected version of the Chao1 estimator (Chiu et al., 2014), in the face of spatial aggregation and different sampling designs. We simulated a community with 1,000 species and 1,000,000 individuals and used the function sample_quadrats to sample from the community. We varied the proportion of total area sampled between 0.01% and 1% as well as the number of sampling quadrats (1–100) that jointly represent the total sampling effort. These combinations of sampling strategies were applied to communities with the same SAD, but with different intraspecific aggregations. We examined four scenarios for aggregation: (1) a random (Poisson) distribution; (2) several large clusters per species; (3) several small clusters per species; and (4) one large cluster per species. We used the function vegan::estimateR to calculate the species-richness estimator of Chiu et al. (2014).

For the community with a spatially random (Poisson) distribution, we found no influence of whether a single large or several small quadrats were sampled on the estimation of regional richness (Figure 3a). However, the estimated richness and its uncertainty strongly varied with total sampling effort. The bias of the estimator decreased with increasing sampling effort, but the estimated and true values only converged at the highest effort. The variance of the estimator decreased drastically with sampling effort (Figure 3a).

Details are in the caption following the image
Performance of an asymptotic species-richness estimator for communities with different intraspecific aggregation and for different sampling strategies. The panels show the estimated species richness (chao1 from function vegan::estimateR) vs. the proportion of total area sampled. The colour indicate different numbers of randomly distributed sampling quadrats that together form the total amount of sampled area. The different panels show results for communities with the same species-abundance distribution (SAD) but different intraspecific aggregation. The points and error bars represent means and 95% confidence intervals from 1,000 replicate simulations. The horizontal lines indicate the true species richness. The following parameter values were used for the simulations: species pool richness: s_pool = 1,000; number of individuals: n_sim = 1,000,000; a log-normal SAD with meanlog = 3 and sdlog = 1; large cluster size: sigma = 0.05, small cluster size: sigma = 0.01; one cluster per species: mother_points = 1

For aggregated distributions, the spatial configuration of sampling mattered; and a sampling design strategy with several small quadrats was less biased than sampling few large quadrats (Figure 3b–d). For high aggregation, species richness was strongly underestimated (Figure 3c,d). This is an important finding, because aggregated species distributions tend to be the rule in nature (McGill, 2010, 2011).

Our simulation results underline the recommendation by developers of species-richness estimators that the estimated values should be only interpreted as lower bounds (Chao, 1987; Chiu et al., 2014). Furthermore, our findings indicate that for aggregated species distributions, both sampling design and sampling effort have a large influence. Recently, Azaele et al. (2015) presented an extrapolation approach that considers spatial aggregation, which might help to solve the problems indicated here.

3.3 Further potential applications of mobsim

The capability of mobsim goes beyond the examples shown here. In order to understand the relationship between the fundamental biodiversity components and aggregated biodiversity indices and relationships, the example on species rarefaction and accumulation curves can be repeated with other descriptors, such as species–area and distance–decay relationships. Furthermore, the simulation and sampling tools of mobsim are well suited to study the scale dependence of species-abundance distributions.

While our second example only assessed the performance of a single species-richness estimator (Chao1), users can apply mobsim to assess the bias and variance of numerous biodiversity extrapolation methods, including univariate biodiversity estimators (Azaele et al., 2015; Chao, Gotelli, et al., 2014), as well as estimators of entire SADs (Chao, Hsieh, Chazdon, Colwell, & Gotelli, 2015).

One important application of mobsim is the simulation of biodiversity change scenarios and the subsequent analysis of the dependence of our detection of biodiversity changes on scale, sampling and biodiversity index (Chase & Knight, 2013). A simple example of such application is provided in the package vignette “Introduction to mobsim.” The current version of mobsim does not simulate spatial or temporal community dynamics explicitly. Nevertheless, simulated communities can be considered as different snapshots in space or in time and in this way, mobsim is useful for investigating spatio-temporal biodiversity changes in a simple and general way.

4 CONCLUSIONS

The combination of tools for simulation and analysis of biodiversity patterns provided in mobsim is well suited to foster understanding on the emergence and consequences of scale-dependent biodiversity changes. The package integrates key tools of community ecology so that ecologists can derive valid interpretations of biodiversity patterns and changes observed in real-world data. In mobsim simulations, users have to specify key community properties, including the numbers of individuals and species, the SAD and intraspecific aggregation, in order to predict aggregated biodiversity measures, for example the SAR, SAC or DDR. An alternative approach for the investigation of biodiversity patterns is the estimation of key community properties, including total abundance, SAD and spatial distribution from aggregated metrics and relationships, such as the SAR or the distance–decay. While such an inverse approach will be technically and computationally much more challenging than the forward simulations of mobsim, the package can facilitate tackling such questions by efficiently searching through the parameter space of total abundance, SAD and spatial distribution with simulations.

ACKNOWLEDGEMENTS

We gratefully acknowledge the support of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig funded by the German Research Foundation (FZT 118). We thank Robert O'Hara, one anonymous associate editor and two anonymous reviewers for valuable comments, which helped to improve the article and the r package.

    AUTHORS' CONTRIBUTIONS

    F.M. and J.M.C. conceived the package concept and structure. F.M. implemented the first package version. K.G., X.X. and D.M. contributed code and supported the package revision. K.G. implemented the shiny online application. F.M. wrote the first manuscript draft and all authors critically revised the text and gave approval for publication.

    DATA ACCESSIBILITY

    This article does not use any empirical data. The package is available on cran (https://CRAN.R-project.org/package=mobsim) and on Github (https://doi.org/10.5281/zenodo.1170472). The Github repository also includes the code for all example applications presented in the article (in the folder “examples”).