adiv: An r package to analyse biodiversity in ecology
The peer review history for this article is available at https://publons.com/publon/10.1111/2041-210X.13430
Abstract
- R is an open‐source programming environment for statistical computing and graphics structured by numerous contributed packages. The current packages used for biodiversity research focus on limited, particular aspects of biodiversity. Most packages focus on the number and abundance of species.
- I present an r package named adiv that provides additional methods to measure and analyse biodiversity. adiv contains approaches to quantify species‐based, trait‐based (functional) and phylogenetic diversity (a) within communities (α diversity), (b) between communities (β diversity) and (c) to partition it over space and time (α, β and γ levels of diversity). Partitioning approaches allow evaluating whether the levels of α and β diversity could have been obtained by chance. Moreover, groups of biological entities (e.g. species of the same clade or with similar biological characteristics) that drive each level of diversity (α, β and γ) can be identified via ordination analyses.
- Although the package focuses on interspecific diversity in its current state, the developed approaches can also be applied to analyse intraspecific diversity or, at another level, ecosystem diversity. More generally, the functions can be applied in any discipline interested in the concept of diversity, such as economics or linguistics. Indeed, all available approaches can be easily applied at other scales and to other disciplines provided that the data have the required format: a matrix of abundance or presence/absence data of some entities in some collections and information on the differences between the entities.
- adiv aims to complement existing r packages to provide scientists with a wide variety of diversity indices, as each index reflects a very specific facet of biodiversity. adiv will grow in the future to integrate as many validated approaches for biodiversity analysis as possible, not yet available in r. As it includes both traditional and recent viewpoints on how biodiversity should be evaluated, adiv offers a promising platform where methods to analyse biodiversity can be developed and compared in terms of their statistical behaviour and biological relevance. Applications of the most relevant tools for a given study aim will eventually improve research on human‐driven variations in biodiversity.
1 INTRODUCTION
The package named adiv (for Analysis of bioDIVersity) focuses on the measurement of biological diversity (or biodiversity, the variability in life from genes to ecosystems) and on the analysis of its organization in space and/or time. I developed adiv in the open‐source R environment for statistical computing and graphics (R Core Team, 2019). Although this environment is rich in contributed packages, most of those that focus on biodiversity, which include approximately 40 packages (among the 15,300 current packages, R Core Team, 2019), actually concentrate on species diversity (see Table 1 for a glossary with the definitions of the expressions in italics). A few other packages (e.g. FD Laliberté, Legendre, & Shipley, 2014; TPD Carmona, 2018) focus on functional diversity, while others (e.g. PhyloMeasures Tsirogiannis & Sandel, 2017) are dedicated to phylogenetic diversity. More generally, some packages, for example, entropart (Marcon & Herault, 2015), focus on species, phylogenetic and functional diversity and consider a limited number of diversity indices. Only a few packages focus on other aspects of biodiversity; for example, diveRsity (Keenan, McGinnity, Cross, Crozier, & Prodöhl, 2013) evaluates genetic diversity in the context of population genetic analyses.
| Concept | Definitions used in this paper |
|---|---|
| Community | Set of species observed at a given time in a given place |
| Functional diversity | Trait‐based diversity applied to functional traits (see the definition of trait‐based diversity below). A functional trait is a trait that influences the way species (or organisms) respond to environmental conditions or the way they contribute to ecosystem properties |
| Phylogenetic diversity | Diversity in the positions of species (or of organisms, for abundance‐weighted diversity) on a phylogenetic tree. A community with many closely related species may have less phylogenetic diversity than a community with few distantly related species |
| Phylogenetic correspondence analysis | Extension, proposed by Pavoine (2016), of the well‐known ordination approach named correspondence analysis to include phylogenetic data |
| Phylogenetic signal | Positive correlation between the differences in species trait values and the distances between species on a phylogenetic tree |
| Species diversity | Number and abundance of species. Indices of species diversity increase with the number of species in a community and with the evenness in species abundances |
| Species evenness | Evenness in species abundances |
| Species richness | Number of species |
| Taxonomic diversity | Diversity in the taxonomic relationships between species. For example, a community with many species of the same genus may have less taxonomic diversity than a community with few species that belong to different orders |
| Trait‐based diversity | Diversity in trait values. A community with many organisms that share similar trait values may have lower trait‐based diversity than a community with few organisms characterized by distinct trait values |
adiv enriches the total library of biodiversity packages by providing both traditional statistical approaches to diversity (e.g. Hill, 1973) and recent approaches that measure trait‐based and phylogenetic diversity (e.g. Chao, Chiu, & Jost, 2010; Faith, 1992; Kondratyeva, Grandcolas, & Pavoine, 2019; Pavoine, Baguette, & Bonsall, 2010; Pavoine, Love, & Bonsall, 2009; Pavoine & Ricotta, 2019). In addition, all approaches can be easily applied to other aspects of diversity. Indeed, adiv relies on the following principle: biodiversity emerges from the differences between entities, regardless of the entities selected (e.g. species, individuals, genera or assemblages) and the criterion used to evaluate the differences between the selected entities (e.g. morphology, behaviour, evolution). The main functions of adiv permit the quantification of species‐based, trait‐based and phylogenetic diversity within communities (α diversity) and between communities (β diversity). They complement these measurements that partition the diversity over all communities (γ diversity) with ordination methods to identify groups of biological entities (e.g. species of the same clade or with similar biological characteristics) that drive levels of diversity (Figure 1). The package is available on CRAN (https://cran.r‐project.org/web/packages/adiv/index.html). As an r package, it is open‐source to encourage transparency in science, as all codes can be freely read and checked.

2 DATASETS
The functions in adiv typically work on a matrix of species presence–absence or abundance data in communities and on species trait data and/or phylogenetic (or taxonomic) data (Figure 1). Data that characterize communities can also be included, such as environmental, spatial or temporal data. As trait‐based and phylogenetic aspects of diversity are often compared in ecological studies, the adiv package implements a few functions to test if closely related species share similar traits (phylogenetic signal, functions rtestdecdiv, K, Kstar and Kw).
For illustrative purposes, adiv currently contains eight datasets. One of these datasets, named batcomm, is used below to illustrate some of functions of adiv. batcomm is a list of two components: batcomm$ab contains the abundance of 34 bat species in four habitats (rainforests, cacao plantations, old fields and cornfields) in the Selva Lacandona of Chiapas, Mexico (data collected by Medellín, Equihua, & Amin, 2000); batcomm$tre contains a phylogenetic tree. The tree is ultrametric, meaning that the total branch length from any tip (species) to the root remains constant.
3 DIVERSITY INDICES
3.1 Species diversity
adiv contains two main functions for species diversity indices: speciesdiv, which includes widely used indices such as species richness and the Shannon (1948a,b) index, and divparam, which includes indices that have a parameter to control the importance given to rare versus abundant species in diversity measurements. As an illustration, using divparam, I applied Hill (1973) numbers to the bat dataset. Note that all scripts used in this paper are available in the adiv vignette (Pavoine, 2020), named ‘adiv Package User Guide’, that can be accessed from the R console using the following script: browseVignettes('adiv'). As a traditional diversity index, the Hill index increases with species richness and evenness in species abundances. Its parameter, denoted q, increases with the importance given to abundant species compared to rare species in diversity measurement (q = 0 means species richness, i.e. equal importance to all species). According to the Hill index, the rainforest dominates in terms of species diversity; however, variations among the diversity levels of the other three habitats indicate how the consideration of abundance data may influence our interpretation of the impact of environmental disturbance on biodiversity (Figure 2a).

With species diversity indices, species are implicitly considered interchangeable because only the number and/or abundance of species is important, not their identity. An assemblage of three bird species, say, a blackbird, sparrow and pigeon, would be considered as rich as an assemblage with three species of different groups, say, a blackbird, domestic cat and common wall lizard. In contrast, by considering phylogenetic analysis rather than strictly presence‐ or abundance‐based analyses, a higher value of diversity could be attributed to the second assemblage than to the first as the species in the second are more phylogenetically distinct and have more divergent biological characteristics or traits.
3.2 From species diversity to phylogenetic diversity
In the adiv package, four main functions consider the phylogeny of species when measuring their diversity. Among them, evodiv and evodivparam consist of replacing species in traditional diversity indices (for evodiv) and traditional parametric diversity indices (for evodivparam) with branch units or ‘features’ on phylogenetic trees where species are the tips (Faith, 1992; Pavoine & Ricotta, 2019). It is assumed that the number of features supported by a given branch of a phylogenetic tree is equal to the length of that branch such that the richness of features in a community is Faith's widely used phylogenetic diversity index (Faith, 1992). The abundance of any feature on a given branch of the phylogenetic tree is measured as the summed abundance of all tips (species) descending from that branch. This approach of replacing species by features is simple and can be applied to all diversity indices developed so far. For example, phylogenetic data can be simply added to the analysis of the bat communities using the evodivparam function. The results, which are displayed in Figure 2b, show that when abundant species are given high weight (q > 2), all habitats reach similarly low levels of phylogenetic diversity, indicating that, within each habitat, abundant species tend to be closely related.
3.3 Trait‐based diversity
The indices dedicated to phylogenetic diversity can be used with trait‐based data if a dendrogram is established, for example, by applying a clustering approach to a matrix of trait‐based differences between species. Three other functions in adiv allow the measurement of diversity using direct trait‐based differences or similarities between species: QE, qHdiv and Rentropy. QE implements Rao's quadratic entropy (1982), which is the average abundance‐weighted trait‐based difference between any two species in a community. The more different the traits of any two individuals in a community are, the higher the quadratic entropy is. Rentropy is equivalent to QE but species' relative abundances are squared‐root transformed before calculation. Both QE and Rentropy generalize well‐known species diversity indices to include trait‐based data: QE generalizes the Simpson (1949) index, which is a simple function of the Hill index if its parameter q = 2, and Rentropy is a direct generalization of the Hill index when q = 0.5. qHdiv allows, when required, intraspecific variation in biological trait values to be considered.
These (dis)similarity‐based indices can inversely also be used to evaluate phylogenetic diversity if phylogenetic (dis)similarities are used instead of trait‐based (dis)similarities. For example, with the bat dataset, I calculated cophenetic distances (sum of branch lengths along the shortest path) between species on the phylogenetic tree and used them in functions QE and Rentropy (Figure 2c). With QE, all habitats had similar levels of phylogenetic diversity, which is in accordance with the results obtained above with function evodivparam if parameter q = 2 (Figure 2b). With Rentropy, the rainforest dominates in terms of phylogenetic diversity, with cacao plantations and old fields having intermediate values and cornfields having the lowest value, which is in accordance with the results of function evodivparam if q approaches 0.5 (Figure 2b).
4 DISSIMILARITY INDICES
The concepts of diversity and (dis)similarity are linked: the diversity of an assemblage is null if all its components are identical. The biodiversity of a region increases with the increase in the dissimilarities between species and also between communities. Indices of dissimilarity thus complement those of diversity in adiv.
4.1 From species‐to‐species to community‐to‐community dissimilarity indices
A few functions are dedicated to the calculation of dissimilarities or similarities between species using trait, taxonomic or phylogenetic data (e.g. CFprop, CFbinary, dsimFun, dsimTax and dsimTree). These functions lead to particular mathematical properties for the (dis)similarities between species. These properties named ‘positive semidefinite’ for similarity matrices and ‘Euclidean’ for dissimilarity matrices are exploited, for example, in the dsimcom function of adiv (see details in the adiv vignette named ‘adiv Package User Guide’). dsimcom implements the Pavoine and Ricotta (2014) index of similarity between two communities by comparing their species lists, the abundances of each species and the functional or phylogenetic similarities between the species. This function is restricted to matrices of similarities between species that are said positive semidefinite. Other indices of the (dis)similarity between two communities can integrate any matrix of (dis)similarities between species without any restrictions on their mathematical properties apart from having non‐negative values and sometimes being bounded between 0 and 1. This is the case, for example, for the indices available in the dissABC and dissRicotta functions of adiv.
4.2 From compositional dissimilarity to tree‐based dissimilarity
adiv contains two functions dedicated to phylogeny‐based indices of the dissimilarity between two communities: evodiss and evodiss_family. Although these functions are dedicated to use with phylogenetic trees, they can be more generally applied to other tree‐shaped data, such as trait‐based dendrograms. These functions use the feature perspective described in Section 3.2, where a feature is a branch unit on a phylogenetic tree. From this perspective, traditional dissimilarity indices are not applied to species presence/absence or abundance data but to the presence/absence or abundance of each feature. This perspective grants access to a family of dissimilarity indices, and the evodiss and evodiss_family functions contain 30 key indices from this family, including six parametric indices where the importance given to rare versus abundant features can be controlled. For example, applying function evodiss with the chord distance to the bat communities reveals that the two habitats with the most divergent phylogenetic compositions are the old field and cornfield habitats (phylogenetic dissimilarity = 0.30), while the most similar habitats are the rainforest and cacao plantation habitats (0.12). Given the large number of species‐based dissimilarity indices developed so far, more indices from this family could be easily added in the future depending on the needs identified in the ecological and conservation literature.
5 APPORTIONMENT AND ORDINATION OF DIVERSITY
5.1 Alpha, beta and gamma diversity across space
When more than two communities are compared, the dissimilarity among them, named β diversity, complements the diversity within each community, named α diversity. The diversity of the whole set of communities (γ diversity) emerges from the combination of α and β diversity. adiv contains functions to partition species‐based, trait‐based or phylogenetic diversity in a nested hierarchy (with α, β and γ levels; abgdivparam, abgevodivparam, eqRao, eqRS, eqRSintra and wapqe functions). For example, abgevodivparam implements partitioning of Hill numbers. Applied to the bat dataset, this function shows that phylogenetic differences between the habitats mostly concern rare species, as phylogenetic β diversity decreases with parameter q (Figure 2d).
Some partitioning functions are associated with simple permutation tests to evaluate whether each level of diversity could have been obtained by chance. For example, the application of one of these tests (function rtestEqRS, permutation test) to the bat dataset shows that the differences in the phylogenetic compositions of the four habitats are not significant when species abundances are considered (statistic of the test, β diversity standardized between 0 and 1 = 2.42 × 10−2, p‐value = 0.071), although the differences are significant when only presence/absence data are evaluated (β = 2.37 × 10−2, p‐value = 0.049; nominal α = 0.050). This result confirms that the most abundant species in each habitat tend to be phylogenetically similar, while phylogenetic differences occur for rare species.
adiv also implements a range of ordination analyses to visualize species and communities as points in a space that reveals which species contribute to the differences between communities according to their traits or taxonomic or phylogenetic positions. For example, the application of phylogenetic correspondence analysis (function evoCA) to the bat dataset with presence/absence data (Figure 3) shows that cornfields were characterized by the absence of many species and many clades observed in other habitats and the presence of two closely related species: the long‐tongued bats Hylonycteris underwoodi and Lichonycteris obscura (Figure 3a–c). However, the representation of the phylogenetic tree on the map of the phylogenetic correspondence analysis highlights that the phylogenetic differences between habitats are small, although significant (see entangled phylogenetic branches on Figure 3d).

5.2 Temporal variations and crossed partitioning
Although they have thus far been mostly used in the context of spatial hierarchy, all partitioning and ordination approaches cited above could be explored to analyse the temporal variations in species, taxonomic, trait‐based and phylogenetic diversity. Additionally, crossed double principal coordinate analysis (crossed‐DPCoA, crossdpcoa_maineffect, crossdpcoa_version1 and crossdpcoa_version2 functions) allows for the partitioning of species‐based, taxonomic, trait‐based or phylogenetic diversity between two crossed factors according to the methods in Pavoine, Blondel, Dufour, Gasc, and Bonsall (2013). For example, 10 regions could be each followed for 5 years, resulting in space (regions) and time (years) being two crossed factors affecting diversity. Crossed‐DPCoA allows the effects of space to be disentangled from the effects of time with regard to variations in diversity level. adiv also allows studying spatial or temporal variations in trait‐based diversity in a phylogenetic context thanks to the decdiv function. decdiv indeed implements the approach by Pavoine et al. (2010) to partition the α, β and γ trait‐based diversity across the nodes of a phylogenetic tree (see also rtest.decdiv function for associated permutation tests).
6 CONCLUSIONS AND PERSPECTIVES
- Connections with other packages on diversity. I have written the adiv package to complement existing packages on diversity. Some functions contained in other packages (e.g. TPD, Carmona, 2018) are thus not available in adiv. Future developments could include the importation of functions from available packages to ease their calculation with the data format used by adiv.
- Additions of new methodologies to measure diversity and partition it across spatial and temporal units. Each current r package that tackles the concept of biodiversity contains only a few of the myriad of diversity indices developed so far in the literature. The consequence is that the indices are being used in scientific studies based on their accessibility in popular packages rather than based on their true scientific relevance and interest. Indeed, having these indices in a package renders them more accessible for researchers for whom programming is a constraint and a challenge. adiv aims to complement these packages to offer ecologists a wider variety of diversity indices. In the interest of open science in biodiversity research, researchers around the globe and from any discipline are invited to suggest critical methodologies for analysing biodiversity that could be implemented in adiv.
- Originality, uniqueness and redundancy. adiv also complements current packages by offering ways to evaluate how trait and phylogenetic information impacts evaluations of biodiversity. In particular, adiv contains functions (uniqueness and treeUniqueness) to quantify how redundant or unique a community is compared to a scenario where all species would be maximally dissimilar (e.g. having the most distinct values possible for biological traits). It also contains functions (e.g. distinctDis, distinctTree, distinctTopo and distinctUltra) to identify species that are original in a community because they have unique states of biological traits or no close relatives in the phylogeny. The presence of original species increases the diversity of the community. Further versions of adiv will include more originality indices, especially those that account for species abundances and species extinction risks (see Kondratyeva et al., 2019 and references therein).
Having all these methodologies in the same package will ease the diffusion of statistical methods among all researchers and data analysts in environmental organizations interested in the analysis of biodiversity. It will also facilitate comparison among methods in terms of their statistical behaviour and biological relevance. All these developments in adiv will contribute, over the years, to open science and advances in (bio)diversity research.
ACKNOWLEDGEMENTS
The author thanks the reviewers for their useful comments on this article. She also thanks Stéphane Dray for his help in transferring certain functions from the now deprecated class ‘phylog' of ade4 to the class ‘phylo' developed in the ape package and used in the adephylo package, and Giovanni Bacaro for co‐writing the dissRicotta and rare_Rao functions and allowing their inclusion in adiv. The help files of the corresponding functions mention their contributions.
Open Research
DATA AVAILABILITY STATEMENT
All data used in this paper are available online in package adiv: https://cran.r‐project.org/web/packages/adiv/index.html.




