Volume 6, Issue 3
Application
Free Access

TR8: an R package for easily retrieving plant species traits

Gionata Bocci

Corresponding Author

Institute of Life Sciences, Scuola Superiore Sant'Anna, Via Santa Cecilia, 3, 56127 Pisa, Italy

Correspondence author. E‐mail: g.bocci@sssup.itSearch for more papers by this author
First published: 12 December 2014
Citations: 15

Summary

  1. A large amount of data about plant functional traits are available for researchers, but it is scattered over various data bases; searching and downloading traits values for a specific list of plant species of interest can be a time‐consuming and error‐prone activity if carried out manually.
  2. The TR8 package for R was built to provide plant scientists with a simple tool for retrieving plant functional traits from freely accessible online traitbases.

Introduction

Functional traits have been increasingly used by plant ecologists as it has been suggested that they could serve as a valid tool for reaching a broad generalization in understanding community ecology dynamics (McGill et al. 2006).

General frameworks for assessing functional diversity in ecosystems have been presented so far, especially for plant communities (e.g. Lavorel & Garnier 2002). Researchers have also been trying to unravel functional dependencies among different trophic levels [e.g. trying to relate plant communities dynamics to arbuscular mycorrhizal fungi (AMF) (Moora 2014), to invertebrate herbivores (Moretti et al. 2013) or trying to address various ecosystem processes at once (Bello et al. 2010)]. Plant functional traits have been used by ecologists to answer to a wide range of research questions, for example to evaluate the response of plant communities to global change (e.g Suding et al. 2008; Gornish & Prather 2014), to predict or analyse the susceptibility of plant communities to the invasion of alien species (e.g Tecco et al. 2013) and to evaluate the efficacy of restoration policies (Cordlandwehr et al. 2013; Fischer, von der Lippe & Kowarik 2013). As a consequence of this broad interest, ‘research devoted to plant traits has generated large amounts of data, leading to the imperious need of devising systems allowing potential users to access and value these data' (Garnier & Navas 2011). Even though some authors have suggested that for small‐scale experiments, traits should be measured in situ (Cordlandwehr et al. 2013), for large scale analysis, the use of already existing traitbases is an essential need; thus, various well‐known data bases containing plant functional traits have been built and are extensively used by plant scientists; some of these data bases can be freely accessed by users [e.g. BiolFlor (Klotz, Kühn & Durka 2002), LEDA Traitbase (Kleyer et al. 2008), the Ecological Flora of the British Isles (Fitter & Peat 1994)] while for others a preliminary registration is needed. In this study, the TR8 package for (R Development Core Team 2014) is presented: the package aims at helping plant scientists in programmatically retrieving plant functional traits.

Package description

The TR8 package was created to provide users with a simple tool for downloading plant traits from six data bases which are available on the internet and for which the users' registration is not mandatory. The queried data bases are the following:
  • BiolFlor (Klotz, Kühn & Durka 2002)
  • LEDA Traitbase (Kleyer et al. 2008)
  • ECOFLORA: Ecological Flora of the British Isles (Fitter & Peat 1994)
  • Ellenberg values for the Italian Flora (Pignatti, Menegoni & Pietrosanti 2005)
  • Flowering periods for the Italian Flora (Pignatti, Menegoni & Pietrosanti (2005), data retrieved from http://luirig.altervista.org/)
  • Mycorrhizal intensity data base (Akhmetzhanova et al. 2012)
  • MycoFlor11 MycoFlor is included in the development version 0.9.12.
    (Hempel et al. 2013).

The researchers in charge of the above‐listed traitbases were first contacted and asked whether they would agree to have their data sets queried in a programmatic way by a R package built for that scope; after obtaining their authorization (for BiolFlor it was agreed that a subset of all the available traits would be included, see Table 1 for a list of all the traits available through the TR8 package), several R scripts were built, tested and later collected in the TR8 package.

Table 1. List of traits available for download with the TR8 package: for a more detailed description of each trait please refer to the original sources (Ell = Ellenberg values, Flow = Flowering, Myco = Mycorrhizal)
LEDA Ecoflora BiolFlor Pignatti Akhmetzhanova et al. MycoFlor
Age of first flowering Maximum height Life form Ell light Infection of AMF Myco status
Branching Minimum height Life span Ell temperature
Bud bank seasonality at soil level Leaf area Rosettes Ell continentality
Buoyancy Leaf longevity Type of reprod Ell moisture
Mean canopy height Photosynthetic pathway Strategy type Ell soil reaction
Dispersal type Life form Pollen vector Ell nitrogen
Leaf distribution along the stem Vegetative reprod method Ell salinity
Leaf dry matter content Flow earliest month Life form
Leaf mass Flow latest month Distribution
Leaf size Pollen vector Flow first month
Dispersal morphology Seed weight mean Flow last month
Growth form Method of propagation
Life span Ell light
Releasing height Ell moisture
Seed bank Ell pH
Seed mass Ell nitrogen
Shoot growth form Ell salt
Seed number per shoot
Woodiness
Terminal velocity

The listed traitbases provide data in different ways: BiolFlor, ECOFLORA and http://luirig.altervista.org/ rely on a SQL back end and render queries' results as structured web pages, while the other data bases provide data as structured files (e.g. spreadsheet or text files) which are available for download on their websites (LEDA is an in‐between situation as the .txt files it provides are the results of the internal SQL queries). Given these differences among sources of information, manual download and elaboration of trait data from the different traitbases is a long and prone‐to‐errors process; thus, the package was built to be as simple as possible so that this process would be an easy one even for researchers with limited programming skills. The functioning of the package revolves around a single function (tr8()) which accepts as inputs two vectors, one containing plant species names and the other composed of codes corresponding to the desired traits to be downloaded. All the internal functions and classes used by the package (e.g. those which actually retrieve traits data) were intentionally left hidden to the user to seek this simplicity of use. Particular attention has been put in reminding the users of the correct bibliographic citations to use for the downloaded data.

Example of use

The TR8 package is now available on CRAN; it can thus be installed following the standard procedure (making sure that the packages TR8 relies on are installed as well) and loaded into a R session:

  • > install.packages("TR8",dependencies = TRUE)

  • > library("TR8")

The tr8 function accepts two arguments:
  • species_list a vector containing scientific names of plant species without authors' names,
  • download_list a vector containing codes which correspond to the traits to be retrieved for the selected species.
The codes accepted by the function are described in the available_tr8 dataframe; each entry (row) of the dataframe represents a trait and for each trait, the following data are provided:
  • a short description of the trait (column description),
  • the code that should be passed to the download_list argument in the tr8() function (column short_code),
  • the data base from which data will be retrieved (column db).

The first lines of the dataframe can be inspected through:

  • > head(available_tr8)

which shows, as a result, the following table:

short_code description db
1 h_max Maximum height Ecoflora
2 h_min Minimum height Ecoflora
3 le_area Leaf area Ecoflora
4 le_long Leaf longevity Ecoflora
5 phot_path Photosynthetic pathway Ecoflora
6 li_form Life form Ecoflora

Suppose there is a need of gathering Maximum height and Life form trait data for the following four species from a typical Italian riparian community (Populus alba L., Populus nigra L., Humulus lupulus, Rumex sanguineus L.); first two vectors are created, one containing the species names and another containing the short_codes corresponding to Maximum height and Life form; then the tr8 function is run, passing the two vectors as arguments, and results are stored in the traits object:

  • > my_species<c(“Populus alba”,“Populus nigra”, +

  • “Rumex sanguineus”,“Humulus lupulus”)

  • > my_traits<c(“h_max”,“li_form”)

  • > traits<tr8(species_list=my_species,download_list=my_traits)

Once all the data are downloaded (which may take a while), printing the object's name on the console will show the retrieved data:

  • > traits

h_max li_form
Humulus lupulus 800 hemicryptophyte
Populus alba 2400 <NA>
Populus nigra 3000 phanerophyte
Rumex sanguineus 100 hemicryptophyte

Whenever no data are found for a species, a NA will be used; moreover, to maintain the table structure as clean as possible, the short versions of trait names are adopted; to see the correspondences between short and long version of the codes, the traits object should be passed to the lookup function:

  • > lookup(traits)

One of the main concerns for researchers who provide open data is the correct citation of their work; thus, users of the TR8 package are always reminded to cite the sources of information for the data that they download; the bib method makes this easier; in this case simply run:

  • > bib(traits)

and a list with the correct bibliographic citations (only for those data bases which were actually queried) will be provided; traits is an object which contains various pieces of information about the retrieved data: to extract only the dataframe containing plant traits, the extract_traits method can be used:

  • > traits_dataframe<extract_traits(traits)

This dataframe can be used for further statistical analysis: the package comes with a vignette called TR8_workflow which shows one example of analysis [i.e. the rlq analysis, as implemented in the ade4 package (Dray & Dufour 2007), with all the required preceding steps as described in Kleyer et al. (2012)]. The tr8 function provides the users with the possibility of interactively selecting the traits to be downloaded; to do so, instead of passing a vector of traits' codes, the gui_config parameter should be set to TRUE:

  • > traits<tr8(species_list=my_species,gui_config=TRUE)

and R will show a multipanel window (see Fig. 1) where the user can choose which traits should be downloaded. The use of a GUI for selecting traits was implemented to help scientists who are not familiar with R, but the non‐interactive way should be preferred whenever possible as it guarantees that the data retrieval can be repeated by other scientists; thus, it makes any analysis based on data collected by the TR8 package fully reproducible.

image
The multipanel graphical interface used by TR8 package to ask the user which plant traits should be downloaded.

Data retrieval can be much more complex than what is presented here; thus, the above‐mentioned TR8_workflow vignette shows also a typical data retrieval workflow, highlighting the most common problems that can be expected in such a procedure and proposing possible solutions.

Caveats

Retrieving data from remote servers is a time‐ and internet band‐consuming activity; thus, the higher the number of both plant species and traits to be retrieved, the longer it will take to tr8 to complete its job. It is also fair not to overflow the remote data bases with http requests; thus, the functions were written in such a way that they pause between one query and the following one. To avoid possible problems, users are also urged to check plant species names (e.g. using the taxize package (Chamberlain & Szöcs 2013), before using tr8: this package can also be used to remove authors' names from species names). In the current implementation of TR8, whenever multiple values for the same trait are found, a mean is calculated if the variable is continuous, while when for categorical variables, all the levels are joined together: this approach was chosen to make the results of the queries transparent to the users, although this may generate long clumsy strings when several levels are found; this behaviour may change in future if feedback by the users suggests to opt for a different strategy.

As a last remark, users should be aware that collecting data from websites (web scraping) is an inherently fragile technique as it relies on the structure of the queried web pages; thus, any minor change in the way remote servers render their data leads to errors in the retrieved data: users of the TR8 package are invited to communicate to the author any unexpected behaviour of the package so that the underlying functions can be promptly updated.

Conclusion and future directions

The TR8 package is actively developed on github: users can get the latest version at https://github.com/GioBo/TR8 and are invited to cooperate in the package development (e.g. by reporting bugs or suggesting improvements). A second vignette called Expanding_TR8 is distributed with the package: it shows to R programmers how functions for retrieving data from remote servers should be written so that they can be easily integrated in TR8.

The TR8 package is currently mainly focused on European floras as the above‐listed data bases are the one the author has been using the most, but if other freely accessible data bases are made available to the author's knowledge, they may be included as sources of information, thus providing help to a broader range of plant ecologists.

Acknowledgments

The author thanks the researchers who made the six traitbases available. The comments and suggestions provided by the associate editor and the two reviewers considerably helped in improving the quality of both the paper and the TR8 package.

    Data accessibility

    All the data used in the examples are retrieved by the TR8 package from the publicly available data bases listed in the paragraph Package description.

        Number of times cited according to CrossRef: 15

        • When context matters: Spatial prediction models of environmental conditions can identify target areas for wild bee habitat management interventions, Landscape and Urban Planning, 10.1016/j.landurbplan.2019.103673, 193, (103673), (2020).
        • Weed Seed Bank Diversity in Dryland Cereal Fields: Does it Differ Along the Field and Between Fields with Different Landscape Structure?, Agronomy, 10.3390/agronomy10040575, 10, 4, (575), (2020).
        • Structural field margin characteristics affect the functional traits of herbaceous vegetation, PLOS ONE, 10.1371/journal.pone.0238916, 15, 9, (e0238916), (2020).
        • Biodiversity response to forest structure and management: Comparing species richness, conservation relevant species and functional diversity as metrics in forest conservation, Forest Ecology and Management, 10.1016/j.foreco.2018.09.057, 432, (707-717), (2019).
        • The potential of different semi-natural habitats to sustain pollinators and natural enemies in European agricultural landscapes, Agriculture, Ecosystems & Environment, 10.1016/j.agee.2019.04.009, 279, (43-52), (2019).
        • Towards an ecological trait‐data standard, Methods in Ecology and Evolution, 10.1111/2041-210X.13288, 10, 12, (2006-2019), (2019).
        • Can plant traits predict seed dispersal probability via red deer guts, fur, and hooves?, Ecology and Evolution, 10.1002/ece3.5512, 9, 17, (9768-9781), (2019).
        • Major disturbances test resilience at a long‐term boreal forest monitoring site, Ecology and Evolution, 10.1002/ece3.5061, 9, 7, (4275-4288), (2019).
        • Herbaceous climbers in herbaceous systems are shade‐tolerant and magnesium‐demanding, Journal of Vegetation Science, 10.1111/jvs.12768, 30, 5, (799-808), (2019).
        • Foraging strategies are maintained despite workforce reduction: A multidisciplinary survey on the pollen collected by a social pollinator, PLOS ONE, 10.1371/journal.pone.0224037, 14, 11, (e0224037), (2019).
        • A spatio-temporal dataset of forest mensuration for the analysis of tree species structure and diversity in semi-natural mixed floodplain forests, Annals of Forest Science, 10.1007/s13595-018-0688-8, 75, 1, (2018).
        • Quantification of regulating ecosystem services provided by weeds in annual cropping systems using a systematic map approach, Weed Research, 10.1111/wre.12303, 58, 3, (151-164), (2018).
        • Currently legislated decreases in nitrogen deposition will yield only limited plant species recovery in European forests, Environmental Research Letters, 10.1088/1748-9326/aaf26b, 13, 12, (125010), (2018).
        • How do weeds differ in their response to the timing of tillage? A study of 61 species across the northeastern United States, Annals of Applied Biology, 10.1111/aab.12377, 171, 3, (340-352), (2017).
        • Relationships between overstory and understory structure and diversity in semi-natural mixed floodplain forests at Bosco Fontana (Italy), iForest - Biogeosciences and Forestry, 10.3832/ifor1789-009, 9, 6, (919-926), (2016).