COMADRE: a global data base of animal demography

Summary The open‐data scientific philosophy is being widely adopted and proving to promote considerable progress in ecology and evolution. Open‐data global data bases now exist on animal migration, species distribution, conservation status, etc. However, a gap exists for data on population dynamics spanning the rich diversity of the animal kingdom world‐wide. This information is fundamental to our understanding of the conditions that have shaped variation in animal life histories and their relationships with the environment, as well as the determinants of invasion and extinction. Matrix population models (MPMs) are among the most widely used demographic tools by animal ecologists. MPMs project population dynamics based on the reproduction, survival and development of individuals in a population over their life cycle. The outputs from MPMs have direct biological interpretations, facilitating comparisons among animal species as different as Caenorhabditis elegans, Loxodonta africana and Homo sapiens. Thousands of animal demographic records exist in the form of MPMs, but they are dispersed throughout the literature, rendering comparative analyses difficult. Here, we introduce the COMADRE Animal Matrix Database, an open‐data online repository, which in its version 1.0.0 contains data on 345 species world‐wide, from 402 studies with a total of 1625 population projection matrices. COMADRE also contains ancillary information (e.g. ecoregion, taxonomy, biogeography, etc.) that facilitates interpretation of the numerous demographic metrics that can be derived from its MPMs. We provide R code to some of these examples. Synthesis: We introduce the COMADRE Animal Matrix Database, a resource for animal demography. Its open‐data nature, together with its ancillary information, will facilitate comparative analysis, as will the growing availability of databases focusing on other aspects of the rich animal diversity, and tools to query and combine them. Through future frequent updates of COMADRE, and its integration with other online resources, we encourage animal ecologists to tackle global ecological and evolutionary questions with unprecedented sample size.


Online Figure S2
Classification of Matrix Population Models (MPMs) in COMADRE according to: A.
the type of matrix (See MatrixComposite in Table 1). B. the environmental conditions of studied population (Captivity). C. the general type of treatment of matrix model under consideration (MatrixTreatment). D. which sex was modelled (StudySex). E.
whether the matrix A (equation 2) was split into submatrices U, F and C (MatrixSplit).   (Caswell, 2001;Morris & Doak, 2002) and scripts (Cochran & Ellner 1992;Stubben 2007;Stott et al., 2012;Metcalf et al., 2013) can be consulted for further details. The format of the database will likely evolve with time, and some of the functions below may become obsolete. Nevertheless we maintain a set of useful scripts and functions here: https://github.com/jonesor/compadreDB/.

S4.1. Citation checking
The First, set the working directory to where the COMADRE R data object has been saved and load the data: The rcrossref package has a convenient function, cr_search_free, which conducts a free-text search of the CrossRef database. To use it, one needs to provide some query text, so in this case we can simply create a text string by concatenating the authors, journal and year of publication from COMADRE. For example, to obtain the full reference and DOI for the matrices for the koala, Phascolarctos cinereus. Firstly we can identify the pertinent rows in the metadata: id <-which(comadre$metadata$SpeciesAccepted == "Phascolarctos_cinereus") Then we can use this information to obtain the source information (authors, journal and year of publication) for the 5 matrices: temp <-comadre$metadata[id, c("Authors", "Journal", "YearPublication")] Now paste this information together to form a single search string for each matrix.
We can optionally ask R to return the unique set of values: Armed with the DOI, it is easy to obtain the full title, author list etc. from CrossRef in a range of formats using the function cr_cn. This uses the raw DOI, without the http://dx.doi.org/ prefix. Therefore this prefix must first be stripped from the query using gsub.

library(taxize)
Next, make a new vector called SpeciesBinomial by concatenating the accepted genus (GenusAccepted) and accepted species epithet (SpeciesEpithetAccepted) together. This is necessary, rather than simply using SpeciesAccepted, because SpeciesAccepted retains the infra-specific information, which is not used by the following code: comadre$metadata$SpeciesBinomial <paste(comadre$metadata$GenusAccepted, comadre$metadata$SpeciesEpithetAccep ted) Some species do not have an epithet (e.g. Tribolium sp.), and for these the epithet is listed as NA. Therefore, to search the Catalogue of Life effectively, the NA needs to be removed using gsub: comadre$metadata$SpeciesBinomial <-gsub("NA", "", comadre$metadata$SpeciesBinomial) Because species appear in the database numerous times, it is advisable for efficiency reasons to make a unique subset of the data: temp <-unique(comadre$metadata[, c("SpeciesBinomial", "GenusAccepted", "Family", "Order", "Class", "Phylum", "Kingdom")]) This dataset is still quite large (334 rows), so here I will just obtain the information for the first 5 rows.
temp <-temp[1:5,] This is accomplished using the classification function, which repeatedly queries the Catalogue of Life for each entry. Note that whenever there is an uncertainty as to which species is intended, the software prompts the user to select a species from a list.
x <-classification(temp$SpeciesBinomial,db='col') Thus, to acquire the taxonomic Order of the species of interest one would use: One could obviously repeat this for each part of the taxonomy.

S4.3. Plotting a life cycle from a matrix population model
This example plots a life cycle diagram with the stages and transitions of a given matrix chosen from the comadre database. It will call the R function plotLifeCycle from the COMADRE github repository. This works well with matrices of relatively low dimensionality (≤ 7), and where not many transitions are depicted. The function is based on the library DiagrammeR, so this needs to be called first:

library(DiagrammeR)
Let us first consider plotting the lifecycle for one of the species containing the word "lion" in the common name used by the author(s) in the original source used in the COMADRE database. As before, first load comadre: To find the species with the word lion in their common name, we use the function To plot its lifecycle, source the function plotLifeCycle from the GitHub repository of the database using source_url function from the devtools library: The resulting life cycle is:

S4.4. Simple demographic output for a subset of populations
This example produces some basic output such as the population growth rate ( ) and damping ratio (Caswell 2001) for a subset of species and populations given some selection criteria.
First we can subset the database to the data of interest: only mean matrices for bony fish from studies of three years duration or longer, and with a matrix dimension of three or greater.
tempMetadata <-subset(comadre$metadata, The row names from the subsetted dataframe can now be used to subset the entire comadre database using the function subsetDB, which is available as part of this supplementary information (Run it in Appendix 4.7 below).
id <-as.numeric(rownames(tempMetadata)) x<-subsetDB(comadre,id) The object x is now a subsetted version of the comadre database object that contains only the matrices that match the search criteria.
These matrices can now be analyzed by applying functions in a loop, or by using lapply. For example, to calculate population growth rate and damping ratio for the subset of matrices, we can first create an empty data.frame to accommodate the output: output <-data.frame(lambdas = rep(NA, length(x$mat)), damps = rep(NA, length(x$mat))) then use the functions in popbio package to derive demographic output (you may need to install the package first). Now one can plot the population growth rates and damping ratios derived from these matrices. In this plot, the vertical, dashed red line indicates population growth rate = 1 (or log ( ) = 0) par(mfrow = c(1,2)) hist(log(output$lambdas), xlab = "Log population growth rate", col = "gold", main = "") abline(v=0,col = "red", lwd = 4, lty = 3) hist(output$damps, xlab = "Damping ratio", col = "brown", main = "")

S4.5. Geographic distribution of studied populations
This example produces on a world map the viability (population growth rate λ > 1, λ = 1, λ < 1) of a subset of studied populations given some selection criteria, and colorcodes the location of each population according to the value of λ.
First, subset mean matrices for all Carnivora in the wild in the Northern hemisphere, with no issues for survival (no stage-specific survival >1), for which matrices have been split into A = U + F + C, and for which reproduction was explicitly modeled.
tempMetadata <-subset(comadre$metadata, Now, use the row names from the subsetted dataframe to subset the matrices. The function subsetDB used here is described below (Run it in Appendix 4.7 below).

S4.6. Ternary plots
Here we illustrate how to produce a ternary plot a la Silvertown et al (1993) with various life history traits such as population growth rate (λ),mean life expectancy (ηe), or reactivity (||Â||1) as the "fourth" dimension. We will use Caswell (2001)  Use the row names from the subsetted dataframe to subset the matrices.
keep <-as.numeric(rownames(tempMetadata)) Define the object containing MPMs in the same order that their metadata appears in tempMetadata.

tempMat <-comadre$mat[keep]
These MPMs can now be analyzed by applying functions in a loop, or by using lapply.

S4.7. Advanced subsetting and refined searches
The subsetDB function is a helper function that subsets the entire comadre database in a rather fast and convenient manner. There are two arguments: db and sub. Below we provide the full list of citations used in the information compiled in the first release of COMADRE (version 1.0.0). Users of these materials are strongly encouraged to credit the work of the specific studies by citing the publications whose information they may use.
The name "SpeciesAuthor" corresponds to the exact taxonomic name used by the author in the publication, as detailed in Table 1, with a sequential numerical suffix if more than one study exists for the same species (e.g. Ursus_americanus, Ursus_americanus_2). NA in citation refers to a secondary citation (see Table 1 in manuscript). -Analyses: Carried out analyses presented for summary statistics in the manuscript.

SpeciesAuthor
-Wrote paper: Wrote the first full draft of the manuscript, including tables, figures and references, and integrated posterior comments by coauthors.
-Wrote section: Wrote one section of the manuscript.
-Edited paper: Provided significant comments to the paper, as described by the authorship standards of the ICMJE.
-Compiled Supporting Information Appendixes: Organized and wrote the first full draft of the SOM, and integrated posterior comments by coauthors.