mobsim: An r package for the simulation and measurement of biodiversity across spatial scales
Abstract
- Estimating biodiversity and its change in space and time poses serious methodological challenges. First, there has been a long debate on how to quantify biodiversity, and second, measurements of biodiversity and its change are scale-dependent. Therefore, comparisons of biodiversity metrics between communities are ideally carried out across scales. Simulations can be used to study the behaviour of biodiversity metrics across scales, but most approaches are system specific, plagued by large parameter spaces, and therefore cumbersome to use and interpret. However, realistic spatial biodiversity patterns can be generated without reference to ecological processes, which suggests a simple simulation framework as important tool for ecologists.
- Here, we present the r package mobsim that allows users to simulate the abundances and the distributions of individuals of different species in a spatially explicit landscape. Users can define key properties of communities, including the total number of individuals, the species-abundance distribution (SAD) and the degree of intraspecific spatial aggregation. Furthermore, the package provides functions that derive biodiversity measures, such as rarefaction curves and species–area relationships (SAR), from simulated communities or from observed data, as well as functions that simulate different sampling designs.
- We show several example applications of the package. First, we illustrate how species rarefaction and accumulation curves can be used to disentangle changes in the fundamental components that underlie biodiversity: (i) total abundance, (ii) species-abundance distribution and (iii) species aggregation. Second, we demonstrate how mobsim can be applied to assess the performance of species-richness estimators. The latter indicates how spatial aggregation challenges classical non-spatial species-richness estimators.
- mobsim allows the simulation and analysis of a large range of biodiversity scenarios and sampling designs in a comprehensive way by directly manipulating key community properties. The simplicity and control provided by the package also makes it a useful didactic tool. The combination of controlled simulations and their analysis will facilitate a more rigorous interpretation of real-world data that exhibit sampling effects and scale dependence.
1 INTRODUCTION
Understanding how biodiversity varies in space and time poses one of the greatest challenges in ecology. This challenge is at least partly caused by the dependence of most biodiversity measurements on spatial scale (Rahbek, 2005; Rosenzweig, 1995) and on the specific biodiversity index used (reviewed in Magurran & McGill, 2011). Any biodiversity index (e.g. species richness, Shannon or Simpson diversity) transforms the numbers of individuals and species in a given sample into a single metric that necessarily only captures a portion of information about the underlying abundances and spatial distributions. Furthermore, biodiversity measures vary nonlinearly with spatial scale and thus any comparisons among samples will typically be highly scale-dependent (Chase & Knight, 2013). Despite continued development of approaches for estimating and comparing diversity measures (e.g. Jost, 2006; Colwell et al., 2012), no single measure can capture all of the relevant information about biodiversity and its change. Although we focus on taxonomic diversity measures here, the same issues apply for measurements of functional and (phylo)genetic diversity (Chao, Chiu, & Jost, 2014).
Here, we introduce the software package mobsim that facilitates understanding and interpretation of biodiversity changes across spatial scales. All biodiversity measures, including local (α-) and regional (γ-) diversity, as well as their scaling relationships (measures of β-diversity), depend on three underlying components, namely (i) the total abundance of individuals, (ii) the species-abundance distribution (SAD) (which includes the total number of species and their (un)evenness in abundance) and (iii) the spatial distributions of individuals and species (He & Legendre, 2002; McGill, 2011). mobsim includes spatially explicit simulation tools, which allow user-defined manipulations of these biodiversity components. In nature, these components emerge from species traits and dynamic ecological processes, but with mobsim, we can vary the emergent properties (e.g. abundances and distributions) without specifying lower level mechanisms (e.g. species interactions, dispersal limitation and/or habitat filtering). Many attempts have been made to link biodiversity patterns with ecological processes (McGill, 2010), but there is a substantial disagreement on the relative importance of different processes. We argue that regardless of which ecological processes are operating in a given community, species abundances and spatial distributions have important effects on key biodiversity metrics. Our package allows ecologists to better understand how user-defined differences in community structure influence biodiversity statistics, particularly when it comes to scale dependence in biodiversity measures.
A useful analogy for understanding and teaching how sampling processes influence resulting biodiversity measurements is to consider the problem of drawling jellybeans from a jar (Gotelli & Colwell, 2001; Heard, 2016). In a similar vein, mobsim simulates individuals of different species (the proverbial “jelly beans”) in a landscape and thus allows studying the influence of sampling and scale, as well as the interrelatedness between different biodiversity indices and relationships in a comprehensive way. The package provides functions for three purposes (1) the simulation of communities in space, (2) the analysis of biodiversity relationships and (3) the simulation of different spatial sampling designs (Figure 1, Table 1).

Function name | General purpose | Description | Arguments |
---|---|---|---|
sim_sad | Simulation | Simulate local species-abundance distributions (SADs) |
No. of species (s_pool) No. of individuals (n_sim) Statistical model of the SAD (sad_type)a Coefficient of the SAD model (sad_coef)a Option to fix the simulated richness at the desired value (fix_s_sim) |
sim_poisson_coords | Simulation | Add spatially random coordinates to an SAD |
Species abundance vector (abund_vec) Landscape extent in x and y dimension (xrange, yrange) |
sim_thomas_coords | Simulation | Add coordinates with intraspecific aggregation to an SAD |
Species abundance vector (abund_vec) Parameter vector for cluster sizes (sigma) Parameter vector for no. of clusters (mother_points) Parameter vector for no. of individuals per cluster (cluster_points) Landscape extent in x and y dimension (xrange, yrange) |
sim_poisson_community | Simulation | Simulate a community with certain SAD and spatially random coordinates | Combines the arguments of sim_sad and sim_poisson_coords |
sim_thomas_community | Simulation | Simulate a community with certain SAD and intraspecific aggregation | Combines the arguments of sim_sad and sim_thomas_coords |
spec_sample_curve | Analysis | Derive individual-based species rarefaction curves (non-spatial) and species accumulation curves (spatially explicit) |
Community object (comm) Option to choose rarefaction or accumulation curve (method) |
divar | Analysis | Derive diversity–area relationships from randomly located square sampling plots of different sizes. Diversity indices include species richness, no. of endemics, Shannon index, Simpson index and the effective numbers of species based on Shannon and Simpson indices |
Community object (comm) Vector of sampling areas (prop_area) Number of samples for each sampling area (n_samples) Option to exclude samples without any individuals (exclude_zeros) |
dist_decay | Analysis | Derive the distance–decay function from pairwise community similarity measures of square sampling plots |
Community object (comm) Sampling area (prop_area) Number of samples (n_samples) Similarity index (method)b |
sample_quadrats | Sampling | Virtual sampling of different communities using square plots of user-defined sizes. Users can choose different sampling designs, including random sampling, transect and lattice designs. |
Community object (comm) No. of sampling quadrats (n_quadrats) Area of sampling quadrats (quadrat_area) Option to plot the sampling design (plot) Type of sampling design (method) Option to avoid overlapping sampling with a random design (avoid_overlap) Position of lower left quadrat for transect and grid designs (x0, y0) Distances among neighbouring quadrats for transect and grid designs (delta_x, delta_y) |
- a The parameter values are used as arguments in sads::rsad().
- b This parameter value is used as argument for vegan::vegdist().
Specifically, in spatially explicit simulations users define the total numbers of individuals and species, the shape and evenness of the SAD, and the intraspecific aggregation of species. Functions for the analysis of biodiversity patterns, such as rarefaction curves (Gotelli & Colwell, 2001) and species–area relationships (Rosenzweig, 1995), allow users to assess how different biodiversity indices vary with spatial scale and/or sampling effort. Finally, the package provides functions to simulate different sampling designs and to convert spatially explicit data into classical community matrices (i.e. sites-by-species abundances matrices). These matrices can then be analysed using standard analytical tools (Legendre & Legendre, 2012) to assess how the simulated changes are expressed in measures of biodiversity and influenced by the sampling design. The package is available on cran and on Github (see Data accessibility). Key aspects of mobsim are also available as interactive shiny application (http://idiv-app1.inf-bb.uni-jena.de:8080/mobsim_app/).
2 PACKAGE DESCRIPTION
2.1 Simulation of community data
An ecological community is characterised by its species-abundance distribution (SAD) and by the spatial distribution of individuals. In mobsim, users can use a predefined SAD and add simulated positions of individuals, or simulate both the SAD and the positions (Table 1). For the simulation of SADs, a wrapper around the function rsad from the r package sads is provided, which offers many options for the underlying statistical distribution (Prado, Miranda, & Chalom, 2016). In contrast to sads::rsad, the function mobsim::sim_sad allows the simultaneous specification of the simulated number of individuals and the number of species in the pool.
The spatial coordinates of individuals are simulated using simple stochastic point processes in mobsim, either as a Poisson process, where individuals are placed randomly, or as a Thomas process, where individuals of the same species are clustered (Wiegand & Moloney, 2014). For the Thomas process, a common model of intraspecific aggregation in ecology (e.g. Morlon et al., 2008; Plotkin et al., 2000), users define the numbers and sizes of the clusters, as well as the number of individuals per cluster, either independently or jointly for all species. The Thomas process only considers intraspecific aggregation while individuals of different species are distributed randomly with respect to each other (McGill, 2010).
2.2 Analysis of community data
mobsim offers several functions to derive spatial and non-spatial measures and relationships from simulated or empirical data. The function spec_sample_curve derives the expected number of species given a certain number of sampled individuals. Individuals are sampled either randomly, giving the well-known individual-based rarefaction curve (Gotelli & Colwell, 2001), or sampling proceeds from a focal individual to the nearest neighbour, which results in the spatial species–accumulation curve (spatial SAC) (Chiarucci et al., 2009). Note that this is different from the sample-based accumulation curve described in Gotelli and Colwell (2001), which considers the distribution of individuals among samples, but not the spatial location of plots.
The function divar (diversity–area relationships) estimates several diversity indices for randomly distributed sampling plots of user-defined areas. Accordingly, this function can be used to derive the species–area and endemics–area relationships (Harte & Kinzig, 1997; Rosenzweig, 1995). In addition, divar estimates the Shannon- and Simpson diversity indices, and their corresponding effective numbers of species (ENS) (Jost, 2006). The sampling plots of different areas are distributed independently of each other, instead of being fully nested.
The function dist_decay estimates the distance–decay relationship of community similarity (DDR) (Morlon et al., 2008) from randomly distributed, non-overlapping plots of user-defined area. The calculation is based on pairwise indices of compositional similarity among plots using the function vegdist from the r package vegan (Oksanen et al., 2017). This function calculates many (dis)similarity indices for abundance (e.g. Bray–Curtis index) and presence–absence data (e.g. Jaccard and Sørensen indices). By default, dist_decay uses the Bray–Curtis index and calculates Euclidian distances among sampling plots.
2.3 Sampling of community data
mobsim also provides functionality to simulate sampling processes by distributing sampling quadrats in a community. The data type provided by the sampling is the classical sites-by-species matrix (Legendre & Legendre, 2012). Users can choose the size and number of quadrats, as well as the spatial design.
3 EXAMPLE APPLICATIONS
Here, we present two example applications of mobsim: (i) changes of biodiversity components and (ii) an assessment of species-richness estimators. We provide a third example on extinctions due to habitat loss, as well as the r code for all applications, in the online supporting information.
3.1 Changes of single biodiversity components
The biodiversity in a sampled area depends on three components that can vary independently: (1) the total number of individuals, (2) the SAD of the species pool and (3) the spatial distribution of individuals and species (Chase & Knight, 2013; McGill, 2011). Here, we show how the combination of rarefaction and accumulation curves can be used to disentangle changes in these three biodiversity components (Figure 2). The key point is that the shape of the rarefaction curve only depends on the underlying SAD, but not on the spatial distribution, while the shape of the accumulation curve depends on both the SAD and the spatial distribution (McGlinn et al., 2018). First, we randomly removed 50% of all individuals. This does not affect the underlying SAD, as indicated by overlapping rarefaction and accumulation curves that end at different numbers of individuals (Figure 2a,b). Second, we simulated communities with lower evenness by increasing the variation in species abundances. This resulted in changes in both the rarefaction and the accumulation curves (Figure 2c,d). Here, the difference (or ratio) between the curves changes with sampling effort, which indicates scale and/or sampling effort dependent effect sizes (Chase & Knight, 2013). Despite having the same number of species in the pool, the simulated communities differ in species richness even for the maximum number of individuals, because the rarest species are not sampled into the local community. See Figure S1 for the same figure that used a fixed species richness of the simulated community, where the curves converge at the largest number of individuals. Third, we add intraspecific aggregation by using a Thomas process instead of a Poisson process, which does not affect the rarefaction curve, but leads to lower expected species richness in the spatial species–accumulation curve (Figure 2e,f).

3.2 Testing species-richness estimators
We are often interested in inferring total species richness in a region based on a limited number of samples from within that region, and a number of species-richness estimators have been developed to accomplish this (Chiu, Wang, Walther, & Chao, 2014; Colwell & Coddington, 1994). However, it remains an open question of how well these estimators perform for different communities and for different sampling strategies (Colwell & Coddington, 1994; Reese, Wilson, & Flather, 2013). The simulation tools of mobsim are well suited to address this issue.
Here, we assess the performance of a bias-corrected version of the Chao1 estimator (Chiu et al., 2014), in the face of spatial aggregation and different sampling designs. We simulated a community with 1,000 species and 1,000,000 individuals and used the function sample_quadrats to sample from the community. We varied the proportion of total area sampled between 0.01% and 1% as well as the number of sampling quadrats (1–100) that jointly represent the total sampling effort. These combinations of sampling strategies were applied to communities with the same SAD, but with different intraspecific aggregations. We examined four scenarios for aggregation: (1) a random (Poisson) distribution; (2) several large clusters per species; (3) several small clusters per species; and (4) one large cluster per species. We used the function vegan::estimateR to calculate the species-richness estimator of Chiu et al. (2014).
For the community with a spatially random (Poisson) distribution, we found no influence of whether a single large or several small quadrats were sampled on the estimation of regional richness (Figure 3a). However, the estimated richness and its uncertainty strongly varied with total sampling effort. The bias of the estimator decreased with increasing sampling effort, but the estimated and true values only converged at the highest effort. The variance of the estimator decreased drastically with sampling effort (Figure 3a).

For aggregated distributions, the spatial configuration of sampling mattered; and a sampling design strategy with several small quadrats was less biased than sampling few large quadrats (Figure 3b–d). For high aggregation, species richness was strongly underestimated (Figure 3c,d). This is an important finding, because aggregated species distributions tend to be the rule in nature (McGill, 2010, 2011).
Our simulation results underline the recommendation by developers of species-richness estimators that the estimated values should be only interpreted as lower bounds (Chao, 1987; Chiu et al., 2014). Furthermore, our findings indicate that for aggregated species distributions, both sampling design and sampling effort have a large influence. Recently, Azaele et al. (2015) presented an extrapolation approach that considers spatial aggregation, which might help to solve the problems indicated here.
3.3 Further potential applications of mobsim
The capability of mobsim goes beyond the examples shown here. In order to understand the relationship between the fundamental biodiversity components and aggregated biodiversity indices and relationships, the example on species rarefaction and accumulation curves can be repeated with other descriptors, such as species–area and distance–decay relationships. Furthermore, the simulation and sampling tools of mobsim are well suited to study the scale dependence of species-abundance distributions.
While our second example only assessed the performance of a single species-richness estimator (Chao1), users can apply mobsim to assess the bias and variance of numerous biodiversity extrapolation methods, including univariate biodiversity estimators (Azaele et al., 2015; Chao, Gotelli, et al., 2014), as well as estimators of entire SADs (Chao, Hsieh, Chazdon, Colwell, & Gotelli, 2015).
One important application of mobsim is the simulation of biodiversity change scenarios and the subsequent analysis of the dependence of our detection of biodiversity changes on scale, sampling and biodiversity index (Chase & Knight, 2013). A simple example of such application is provided in the package vignette “Introduction to mobsim.” The current version of mobsim does not simulate spatial or temporal community dynamics explicitly. Nevertheless, simulated communities can be considered as different snapshots in space or in time and in this way, mobsim is useful for investigating spatio-temporal biodiversity changes in a simple and general way.
4 CONCLUSIONS
The combination of tools for simulation and analysis of biodiversity patterns provided in mobsim is well suited to foster understanding on the emergence and consequences of scale-dependent biodiversity changes. The package integrates key tools of community ecology so that ecologists can derive valid interpretations of biodiversity patterns and changes observed in real-world data. In mobsim simulations, users have to specify key community properties, including the numbers of individuals and species, the SAD and intraspecific aggregation, in order to predict aggregated biodiversity measures, for example the SAR, SAC or DDR. An alternative approach for the investigation of biodiversity patterns is the estimation of key community properties, including total abundance, SAD and spatial distribution from aggregated metrics and relationships, such as the SAR or the distance–decay. While such an inverse approach will be technically and computationally much more challenging than the forward simulations of mobsim, the package can facilitate tackling such questions by efficiently searching through the parameter space of total abundance, SAD and spatial distribution with simulations.
ACKNOWLEDGEMENTS
We gratefully acknowledge the support of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig funded by the German Research Foundation (FZT 118). We thank Robert O'Hara, one anonymous associate editor and two anonymous reviewers for valuable comments, which helped to improve the article and the r package.
AUTHORS' CONTRIBUTIONS
F.M. and J.M.C. conceived the package concept and structure. F.M. implemented the first package version. K.G., X.X. and D.M. contributed code and supported the package revision. K.G. implemented the shiny online application. F.M. wrote the first manuscript draft and all authors critically revised the text and gave approval for publication.
DATA ACCESSIBILITY
This article does not use any empirical data. The package is available on cran (https://CRAN.R-project.org/package=mobsim) and on Github (https://doi.org/10.5281/zenodo.1170472). The Github repository also includes the code for all example applications presented in the article (in the folder “examples”).