CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases
rOpenSci Resources:
The software package [CoordinateCleaner], developed as part of this research effort, was extensively reviewed and approved by the rOpenSci project (https://ropensci.org). A full record of the review is available at: [https://github.com/ropensci/CoordinateCleaner]
Funding information:
A.A. and A.Z. are supported by the European Research Council under the European Union's Seventh Framework Programme (FP/2007‐2013, ERC Grant Agreement n. 331024 to A.A.). DS received funding from the Swedish Research Council (2015‐04748). A.A. is further supported by the Swedish Research Council, the Swedish Foundation for Strategic Research, a Wallenberg Academy Fellowship, the Faculty of Sciences at the University of Gothenburg, and the David Rockefeller Center for Latin American Studies at Harvard University. C.D.R. is financed by CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico ‐ Brazil: 249064/2013‐8).
Abstract
- Species occurrence records from online databases are an indispensable resource in ecological, biogeographical and palaeontological research. However, issues with data quality, especially incorrect geo‐referencing or dating, can diminish their usefulness. Manual cleaning is time‐consuming, error prone, difficult to reproduce and limited to known geographical areas and taxonomic groups, making it impractical for datasets with thousands or millions of records.
- Here, we present CoordinateCleaner, an r‐package to scan datasets of species occurrence records for geo‐referencing and dating imprecisions and data entry errors in a standardized and reproducible way. CoordinateCleaner is tailored to problems common in biological and palaeontological databases and can handle datasets with millions of records. The software includes (a) functions to flag potentially problematic coordinate records based on geographical gazetteers, (b) a global database of 9,691 geo‐referenced biodiversity institutions to identify records that are likely from horticulture or captivity, (c) novel algorithms to identify datasets with rasterized data, conversion errors and strong decimal rounding and (d) spatio‐temporal tests for fossils.
- We describe the individual functions available in CoordinateCleaner and demonstrate them on more than 90 million occurrences of flowering plants from the Global Biodiversity Information Facility (GBIF) and 19,000 fossil occurrences from the Palaeobiology Database (PBDB). We find that in GBIF more than 3.4 million records (3.7%) are potentially problematic and that 179 of the tested contributing datasets (18.5%) might be biased by rasterized coordinates. In PBDB, 1205 records (6.3%) are potentially problematic.
- All cleaning functions and the biodiversity institution database are open‐source and available within the CoordinateCleaner r‐package.
1 INTRODUCTION
The digitalization of biological and palaeontological collections from museums and herbaria is rapidly increasing the public availability of species’ geographical distribution records. To date, more than 1 billion geo‐referenced occurrence records are freely available from online databases, such as the Global Biodiversity Information Facility (GBIF, www.gbif.org), BirdLife International (www.birdlife.org) or other taxonomically, temporally, or spatially more focused databases (e.g. http://www.paleobiodb.org, http://bien.nceas.ucsb.edu/bien). Together, these resources have become widely used in ecological, biogeographical and palaeontological research and have greatly facilitated our understanding of biodiversity patterns and processes (e.g. Díaz et al., 2016; Zanne et al., 2014).
Most biodiversity databases are composed of, or provide access to, a variety of sources. Hence, they integrate data of varying quality, often compiled and curated at different times and places. Unfortunately, the available meta‐data, for example on the nature of the records (museum specimen, survey, citizen science observation), the collection method (GPS record, grid cell from an atlas project) and collection‐time, varies and often meta‐data are missing. As a consequence, data quality in online databases is a major concern, and has limited their utility and reliability for research and conservation (Anderson et al., 2016; Chapman, 2005; Gratton et al., 2017; Yesson et al., 2007).
In the case of species occurrence records for extant taxa, problems with the geographical location constitute a major concern. In particular, erroneous or overly imprecise geographical coordinates can bias biodiversity patterns at multiple spatial scales (Maldonado et al., 2015). Common problems include (a) occurrence records assigned to country or province centroids due to automated geo‐referencing from vague locality description, (b) records with switched latitude and longitude, (c) zero coordinates due to data entry errors, (d) records from zoos, botanical gardens or museums, (e) records based on rasterized collections and (f) records that have been subject to strong decimal rounding (Table 1, Gueta & Carmel, 2016; Maldonado et al., 2015; Robertson, Visser, and Hui, 2016; Yesson et al., 2007). Records affected by these issues can cause severe bias depending on the research question and the geographical scale of analyses (Graham et al., 2008; Gueta & Carmel, 2016; Johnson & Gillingham, 2008).
| Test function | Level | Flags | Main error source | GBIF (%) | PBDB (%) |
|---|---|---|---|---|---|
| cc_cap | REC | Radius around country capitals | Imprecise geo‐referencing based on vague locality description | 1.1 | – |
| cc_cen | REC | Radius around country and province centroids | Imprecise geo‐referencing based on vague locality description | 1.8 | 1 |
| cc_coun | REC | Records outside indicated country borders | Various, e.g. swapped latitude and longitude | – | – |
| cc_dupl | REC | Records from one species with identical coordinates | Various, e.g. duplicates from various institutions, records from genetic sequencing data | – | – |
| cc_equ | REC | Records with identical lon/lat | Data entry errors | 1.6 | 1 |
| cc_gbif | REC | Radius around the GBIF headquarters in Copenhagen | Data entry errors, erroneous geo‐referencing | 0 | 0 |
| cc_inst | REC | Radius around biodiversity institutions | Cultivated/captured individuals, data entry errors | 0.8 | 0 |
| cc_iucn | REC | Records outside external range polygon | Naturalized individuals, data entry errors | – | – |
| cc_outl | REC | Geographically isolated records of a species | Various, e.g. swapped latitude and longitude | – | – |
| cc_sea | REC | Records located within oceans | Various, e.g. swapped latitude and longitude | 0.1 | – |
| cc_urb | REC | Records from within urban areas | Cultivated individuals, old records | – | – |
| cc_val | REC | Records outside lat/lon coordinate system | Data entry errors, e.g. wrong decimal delimiter | 0 | 0 |
| cc_zero | REC | Plain zeros in the coordinates and a radius around (0/0) | Data entry errors, failed geo‐referencing | 1.6 | 0.01 |
| cd_ddmm | DS | Over proportional drop of records at 0.6 | Erroneous conversion from dd.mm to dd.dd | 4.1% datasets | – |
| cd_round | DS | Decimal periodicity or over proportional number of zero decimals | Rasterized or rounded data | 18.5% datasets | – |
| cf_age | FOS/REC | Temporal outliers in fossil age or collection year | Various | – | – |
| cf_equal | FOS | General time validity | Data entry errors | – | 0 |
| cf_range | FOS | Overly imprecise age ranges | Lack of data | – | 3.3 |
| cf_outl | FOS | Outliers in space‐time | Data entry error | – | 2.1 |
- REC, record‐level; DS, dataset‐level; FOS, fossil‐level; dd.mm, degree minute annotation; dd.dd, decimal degree annotation; GBIF, Global Biodiversity Information Facility; PBDB, Paleobiology Database.
In addition to spatial issues, the temporal information (i.e. the year of collection) associated with occurrence records can be erroneous. In the case of fossil occurrences, temporal information includes the age of the specimen typically defined by the stratigraphic range of the sampling locality. Although sampling biases (and their temporal and spatial heterogeneity) are arguably the most severe issue in the analysis of the fossil record (Foote, 2000; Xing et al., 2016), overly imprecise or erroneous fossil ages, data entry errors or taxonomic uncertainties can negatively affect the reliability of the analysis (Varela, Lobo, & Hortal, 2011). While large‐scale analyses of the fossil record appear resilient to error in the data (Adrain & Westrop, 2000; Sepkoski, 1993), the inclusion of erroneous data is likely to generate non‐negligible biases at smaller temporal and taxonomic scales.
Manual cleaning is possible, but time‐consuming and limited to the taxonomic and geographical expertise of individual researchers. It is thus generally not feasible for datasets that comprise thousands or millions of occurrence records. Furthermore, manual cleaning — often based on poorly documented and thus irreproducible ad hoc decisions — can add subjectivity and, in the worst case, bias. These issues call for standardized data validation and cleaning tools for large‐scale biodiversity data (Gueta & Carmel, 2016).
2 DESCRIPTION
Here, we present CoordinateCleaner, a new software package for standardized, reproducible and fast identification of potential geographical and temporal errors in databases of recent and fossil species occurrences. CoordinateCleaner is implemented in R (R Core Team, 2018) based on standard tools for data handling and spatial statistics (Allaire et al., 2018; Arel‐Bundock, 2018; Becker, Wilks, Brownrigg, Minka, Deckmyn, 2017; Bivand & Lewin‐Koh, 2017; Bivand & Rundel, 2018; Chamberlain, 2017; Hester, 2017; Hijmans, 2017a,b; Pebesma & Bivand, 2005; Varela, Gonzalez Hernandez, & Fabris Sgarbi, 2016; Wickham, 2011, 2016; Wickham, Danenberg, & Eugster, 2017; Wickham & Hesselberth, 2018; Wickham, Hester, & Chang, 2018; Xie, 2018). See the online documentation available at https://ropensci.github.io/CoordinateCleaner for an in‐depth description of methods and simulations. The main features of the package are listed below.
2.1 Automatic tests for suspicious geographical coordinates or temporal information
CoordinateCleaner compares the coordinates of occurrence records to reference databases of country and province centroids, country capitals, urban areas, known natural ranges and tests for plain zeros, equal longitude/latitude, coordinates at sea, country borders and outliers in collection year. The reference databases are compiled from several sources (Central Intelligence Agency, 2014; South, 2017, and www.naturalearthdata.com/). All functions available in CoordinateCleaner are summarized in Table 1 and each of them can be customized with flexible parameters and individual reference databases.
2.2 A global database of biodiversity institutions
A common problem are occurrence records matching the location of biodiversity institutions, such as zoological and botanical gardens, museums, herbaria or universities. These can have various origins: records from living individuals in captivity or horticulture, individuals that have escaped horticulture near the institution, or specimens without collection coordinates that have been erroneously geo‐referenced to their physical location (e.g. a museum). To address these problems we compiled a global reference database of 9,691 biodiversity institutions from multiple sources (Botanic Gardens Conservation International, 2017; GeoNames, 2017; Global Biodiveristy Information Facility, 2017; Index Herbariorum, 2017; The Global Registry of Biodiversity Repositories, 2017; Wikipedia, 2017) and geo‐referenced them using the ggmap and opencage R‐packages (Kahle & Wickham, 2013; Salmon, 2017). Where automatic geo‐referencing failed (c. 50% of the entries), we geo‐referenced manually using Google Earth Pro (Google Inc, 2017) or information from the institutions web‐pages, if available. We acknowledge that this database might not be complete, and have set up a website at http://biodiversity-institutions.surge.sh/ where scientists can explore the database and submit additions or corrections. See https://ropensci.github.io/CoordinateCleaner/articles/Background_the_institutions_database.html for a detailed description of the database.
2.3 Algorithms to identify conversion errors and rasterized data
Two types of potential bias are unidentifiable on record‐level if the relevant meta‐data are missing: (A) coordinate conversion errors based on the misinterpretation of the degree sign (°) as decimal delimiter and (B) occurrence records derived from rasterized collection designs or subjected to strong decimal rounding (e.g. presence/absence in 100 × 100 km grid cells). This may be particularly problematic for studies with small geographical scale, which need high precision, and if the erroneous records have been combined with precise GPS‐records into datasets of mixed precision. CoordinateCleaner implements two novel algorithms to identify these problems on a dataset‐level (a dataset in this context can either be all available records or subsets thereof, for instance from different contributing institutions). The tests assume that datasets with a sufficient number of biased records show a characteristic periodicity in the statistical distribution of their coordinates or coordinate decimals.
To detect coordinate conversion bias (A), we use a binomial test together with the expectation of a random distribution of the coordinate decimals in the dataset (implemented in the cd_ddmm function). If we consider a dataset of coordinates spanning several degrees of latitude and longitude, we can expect the distribution of decimals to be roughly uniform in range [0, 1). In the case of a conversion error, the coordinate decimal cannot be above 0.59 (because one degree only has 59 min). Thus, conversion errors tend to inflate the frequency of coordinates with decimals <0.6. We use two tests to identify this bias. First, we use the fraction of coordinate decimals below 0.6 to fit a binomial distribution with parameter q = 0.592 (which assumes uniformly distributed decimals). This yields estimates of (a) a p‐value accepting or rejecting the hypothesis of a uniform distribution and (b) the parameter
, which best explains the empirical distribution of decimals below and above 0.6. The first test is therefore given by the p‐value that can be used to reject the hypothesis of a uniform distribution when smaller than a given threshold. The second test is based on the relative difference (
) between the estimated frequency of decimals below 0.6 (
) and the expected one (q). Thus any r > 0 indicates a higher‐than‐expected frequency of decimals smaller than 0.6. We flag a dataset as biased, if the p‐value is smaller than a user‐defined threshold (by default set to 0.025) and r is larger than a user‐defined threshold (by default set to 1).
To detect rasterized sampling bias (B), we test for the regular pattern in the sample coordinates caused by a rasterized sampling (or strong decimal rounding). This test involves three steps, which are implemented in a single function (cd_round). First, the algorithm amplifies the pattern by binning the coordinates and then calculates the autocorrelation among the number of records per bin as the covariance of two consecutive sliding windows. This step generates a vector x of autocorrelation values.

where Q75 is the 75% quantile of xk,
is its interquartile range, and T is a user‐set multiplier defining the test sensitivity. Third, we compute the distance (in degrees) between all flagged outliers and identify D as the most common distance. A dataset is then flagged as potentially biased if D is within a user‐defined range (by default between 0.1 and 2 degrees) and the number of outliers spaced by a distance D exceeds a user‐defined value (by default set to 3).
We optimized all default settings based on simulations to obtain high sensitivity for datasets of variable size and geographical scale. The cd_ddmm and cd_round functions succeeded to identify bias A) and bias B) in simulated datasets with more than 100 records and more than 50 individual sampling locations data respectively (https://ropensci.github.io/CoordinateCleaner/articles/Background_dataset_level_cleaning.html). Both functions include optional visual diagnostic output to evaluate the results for flagged datasets, which we recommend to guide a final decision, especially for dataset with few records, or geographically restricted extent.
2.4 Spatio‐temporal tests for fossil data

where Q75(r) is the 75 quartile age range (a) or age (b) across all records in the set,
is the interquartile range of r and M is a user‐defined sensitivity threshold (by default set to 5).

where and Q is a user‐set sensitivity threshold (five by default). The test is replicated n times, where each replicate uses a randomly sampled age within the age range of i. Records are flagged if they have been identified as outlier in a fraction of k replicates, where n and k user‐defined parameters (by default set to 5 and 0.5 respectively). The cf_range and cf_outl function can identify outliers across entire datasets or on a per‐taxon base.
3 Running CoordinateCleaner
CoordinateCleaner includes three wrapper functions: clean_coordinates, clean_dataset and clean_fossils which combine a set of tests suitable for the respective data. clean_coordinates is the main function and creates an object of the S3‐class ‘spatialvalid’, which has a summary and plotting method. Flagged occurrence records can easily be identified, checked or removed before further analyses. We provide two tutorials demonstrating how to use CoordinateCleaner on recent and fossil datasets and multiple short examples on the package at https://ropensci.github.io/CoordinateCleaner/. A reproducible minimal example is:
Alternatively, eah cleaning function can be called individually, for instance in pipelines based on the magrittr pipe (%>%).
4 EMPIRICAL EXAMPLE
We demonstrate CoordinateCleaner on occurrence records for flowering plants available from GBIF (c. 91 million geo‐referenced records; Global Biodiversity Information Facility, 2017, accessed 02 Feb 2017) and the Palaeobiology Database (PBDB, c. 19,000 records; PBDB, 2018 accessed 26 Jan 2018). We chose GBIF and PBDB as examples because they are large and widely used providers of biodiversity data. We stress that both platforms put substantial efforts in identifying problematic records and acquiring meta‐data to increase data quality, and that we consider their data as having generally high quality and improving. We ran the clean_coordinates, clean_fossils and clean_dataset wrapper functions with all tests recommended in our tutorials, except those that are dependent on downstream analyses (Table 1). We used a custom gazetteer with a 1‐degree buffer for cc_sea, to avoid flagging records close to the coastline (available in the package with data(‘buffland’)). For computational efficiency, we divided the GBIF data into subsets of 200K records.
clean_coordinates flagged more than 3,340,000 GBIF records (3.6%), the majority due to coordinates matching country centroids, zero coordinates and equal latitude and longitude (Table 1). Figure 1a shows the number of occurrence records flagged per 100 × 100 km grid‐cell, globally. Concerning the fossil data from PBDB, clean_fossils flagged 1,205 records (6.3%), mostly due to large uncertainty in dating and unexpected old age or distant location. These flags might include records where a precise dating was not possible, records with low taxonomic resolution, homonyms or problems during data entry. Figure 1b shows the number of fossil records flagged per 100 × 100 km grid‐cell, globally.

On the dataset‐level, we retrieved 2,494 individual datasets of flowering plants from GBIF, mostly representing data from different publishers (e.g. collections of specific museums). These datasets varied considerably in the number of records (from 1 record to 16 million) and geographical extent (<1 degree to global). We limited the tests to 641 datasets with at least 50 individual sampling locations to test for bias in decimal conversion (function cd_ddmm, Table 1) and 966 datasets with more than 100 occurrence records for the rasterization bias (function cd_round, Table 1). clean_dataset flagged 26 (4.1%) datasets as biased towards decimals below 0.6 (potentially related to ddmm to dd.dd conversion) and 179 datasets (18.5%) with a signature of decimal periodicity (potential rounding or rasterization). The high percentage of datasets with biased decimals was surprising and these might include datasets with clustered sampling. Since the value of such data for biological research is strongly dependent on follow up analyses we recommend to use a case‐by‐case judgement based on the desired precision, diagnostic plots and meta‐data for a final decision on the flagged datasets. In general, not all flagged records and datasets are necessarily erroneous: our tests only indicate deviations from common and explicit assumptions. Flagged data may require further validation by researchers or exclusion from subsequent analyses.
5 COMPARISON TO OTHER SOFTWARE
To our knowledge, few other tools exist for standardized data cleaning, namely the scrubr (Chamberlain, 2016) and biogeo (Robertson et al., 2016) r packages. Additionally, the modestR package (García‐Roselló et al., 2013) implements a graphical user interface and includes cleaning of GBIF data based on habitat suitability. Some of the basic functions performed by CoordinateCleaner overlap with these packages, however, CoordinateCleaner provides a substantially more comprehensive set of options, including novel tests and data (see https://ropensci.github.io/CoordinateCleaner/articles/Background_comparison_other_software for a function‐by‐function comparison of CoordinateCleaner, scrubr and biogeo).
Primarily, CoordinateCleaner adds the following novelties as compared to available packages: (a) A unique set of tests for problematic geographical coordinates, tailored to common but often overlooked problems in biological databases and not restricted to specific organisms, (b) A global, geo‐referenced database of biodiversity institutions, to identify records from cultivation, zoos, museums, etc., (c) Novel algorithms to identify problems not identifiable on record‐level, for example errors from the conversion of the coordinate annotation or low coordinate precision due to rasterized data collection, (d) Tests tailored to fossils, accounting for problems in dating and (e) Applicability to large datasets. These features in combination with their user‐friendly implementation and extensive documentation and tutorials, will render CoordinateCleaner a useful tool for research in biogeography, palaeontology, ecology and conservation.
In general, no hard rule exists to judge data quality for biogeographical analyses – what is ‘good data’ depends largely on downstream analyses. For instance, continent‐level precision might suffice for ancestral range estimation in some global studies, whereas species distribution models based on environmental data can require a 1‐km precision. The objective of CoordinateCleaner is to automate the identification of problematic records as far as possible for all scales, with default values tailored to large datasets with millions of records and thousands of species. Nevertheless, some researcher judgement will always be necessary to choose suitable tests, specify appropriate thresholds, and avoid adding bias by cleaning. In the worst case, automatic cleaning could bias downstream analyses by information loss caused by overly strict filtering, exacerbating sampling bias by false outlier removal, and over‐confidence in the cleaned data. In most cases, however, CoordinateCleaner speeds up the identification of problematic records and common problems in a datasets for further verification. In some cases, disregarding flagged records might be warranted, but we recommend to carefully judge, and verify flagged records when possible, especially for the outlier and dataset‐level tests. We provide an extensive documentation to guide cleaning and output interpretation (https://ropensci.github.io/CoordinateCleaner).
ACKNOWLEDGEMENTS
We thank all GBIF and PDBD administrators and contributors for their excellent work. We thank Sara Varela, Carsten Meyer and an anonymous reviewer for helpful comments on an earlier version of the manuscript, and rOpenSci, Maëlle Salmon, Irene Steves and Francisco Rodriguez‐Sanchez for helpful comments on the R‐code, as well as Juan D Carrillo for valuable feedback on the tutorial for cleaning fossil records.
AUTHORS’ CONTRIBUTIONS
A.Z. developed the tools and designed this study. D.S. and A.Z. designed and implemented the dataset‐level cleaning algorithms. D.E. developed the website for contributing to the biodiversity institutions database. A.Z., T.A., J.A., C.D.R., H.F., A.H., M.A., R.S., S.t.S., N.W. and V.Z. contributed data to the biodiversity institutions database. A.Z. wrote the manuscript, with contributions from A.A., D.S., T.A., J.A., D.E., H.F. and V.Z. All authors read and approved the final version of the manuscript.
DATA ACCESSIBILITY
The code of CoordinateCleaner is open source and has been reviewed by rOpenSci. The package is available as R‐package from the CRAN repository (stable, https://cran.rstudio.com/web/packages/CoordinateCleaner/index.html) and GitHub (developmental, https://github.com/ropensci/CoordinateCleaner). The biodiversity institutions database is part of the package under a CC‐BY license. Cleaning pipelines for occurrence records from GBIF and fossils from PBDB are available from https://ropensci.github.io/CoordinateCleaner, (https://doi.org/10.5281/zenodo.2539408) and from CRAN as part of the package.
REFERENCES
Citing Literature
Number of times cited according to CrossRef: 36
- Rubén Milla, Crop Origins and Phylo Food: A database and a phylogenetic tree to stimulate comparative analyses on the origins of food crops, Global Ecology and Biogeography, 10.1111/geb.13057, 29, 4, (606-614), (2020).
- Marcelo Reginato, Fabián A. Michelangeli, Bioregions of Eastern Brazil, Based on Vascular Plant Occurrence Data, Neotropical Diversification: Patterns and Processes, 10.1007/978-3-030-31167-4_18, (475-494), (2020).
- Michelle L. Gaynor, Chao‐Nan Fu, Lian‐Ming Gao, Li‐Min Lu, Douglas E. Soltis, Pamela S. Soltis, Biogeography and ecological niche evolution in Diapensiaceae inferred from phylogenetic analysis, Journal of Systematics and Evolution, 10.1111/jse.12646, 58, 5, (646-662), (2020).
- Alan Paton, Alexandre Antonelli, Mark Carine, Rafaela Campostrini Forzza, Nina Davies, Sebsebe Demissew, Gabriele Dröge, Tim Fulcher, Aurelie Grall, Norbert Holstein, Meirion Jones, Udayangani Liu, Joe Miller, Justin Moat, Nicky Nicolson, Matthew Ryan, Suzanne Sharrock, David Smith, Barbara Thiers, Janine Victor, Tim Wilkinson, John Dickie, Plant and fungal collections: Current status, future perspectives, PLANTS, PEOPLE, PLANET, 10.1002/ppp3.10141, 2, 5, (499-514), (2020).
- Gergana N. Daskalova, Isla H. Myers-Smith, John L. Godlee, Rare and common vertebrates span a wide spectrum of population trends, Nature Communications, 10.1038/s41467-020-17779-0, 11, 1, (2020).
- Matheus Colli‐Silva, Marcelo Reginato, Andressa Cabral, Rafaela Campostrini Forzza, José Rubens Pirani, Thais N. da C. Vasconcelos, Evaluating shortfalls and spatial accuracy of biodiversity documentation in the Atlantic Forest, the most diverse and threatened Brazilian phytogeographic domain, TAXON, 10.1002/tax.12239, 69, 3, (567-577), (2020).
- Jeffrey R. Smith, J. Nicholas Hendershot, Nicole Nova, Gretchen C. Daily, The biogeography of ecoregions: Descriptive power across regions and taxa, Journal of Biogeography, 10.1111/jbi.13871, 47, 7, (1413-1426), (2020).
- Timo Conradi, Jasper A. Slingsby, Guy F. Midgley, Henning Nottebrock, Andreas H. Schweiger, Steven I. Higgins, An operational definition of the biome for global change research, New Phytologist, 10.1111/nph.16580, 227, 5, (1294-1306), (2020).
- Alison Paulo Bernardi, Miguel Busarello Lauterjung, Adelar Mantovani, Maurício Sedrez dos Reis, Phylogeography and species distribution modeling reveal a historic disjunction for the conifer Podocarpus lambertii, Tree Genetics & Genomes, 10.1007/s11295-020-01434-2, 16, 3, (2020).
- Alexander Zizka, Jefferson G. Carvalho‐Sobrinho, R. Toby Pennington, Luciano P. Queiroz, Suzana Alcantara, David A. Baum, Christine D. Bacon, Alexandre Antonelli, Transitions between biomes are common and directional in Bombacoideae (Malvaceae), Journal of Biogeography, 10.1111/jbi.13815, 47, 6, (1310-1321), (2020).
- Vítězslav Moudrý, Rodolphe Devillers, Quality and usability challenges of global marine biodiversity databases: An example for marine mammal data, Ecological Informatics, 10.1016/j.ecoinf.2020.101051, (101051), (2020).
- Caitlin C. Mothes, Hunter J. Howell, Christopher A. Searcy, Habitat suitability models for the imperiled Wood Turtle (Glyptemys insculpta) raise concerns for the species’ persistence under future climate change, Global Ecology and Conservation, 10.1016/j.gecco.2020.e01247, (e01247), (2020).
- Katharina Sielemann, Alenka Hafner, Boas Pucker, The reuse of public datasets in the life sciences: potential risks and rewards, PeerJ, 10.7717/peerj.9954, 8, (e9954), (2020).
- William H. Brightly, Sue E. Hartley, Colin P. Osborne, Kimberley J. Simpson, Caroline A. E. Strömberg, High silicon concentrations in grasses are linked to environmental conditions and not associated with C4 photosynthesis, Global Change Biology, 10.1111/gcb.15343, 0, 0, (2020).
- Alexander Zizka, Fernanda Antunes Carvalho, Alice Calvente, Mabel Rocio Baez-Lizarazo, Andressa Cabral, Jéssica Fernanda Ramos Coelho, Matheus Colli-Silva, Mariana Ramos Fantinati, Moabe F. Fernandes, Thais Ferreira-Araújo, Fernanda Gondim Lambert Moreira, Nathália Michellyda Cunha Santos, Tiago Andrade Borges Santos, Renata Clicia dos Santos-Costa, Filipe C. Serrano, Ana Paula Alves da Silva, Arthur de Souza Soares, Paolla Gabryelle Cavalcante de Souza, Eduardo Calisto Tomaz, Valéria Fonseca Vale, Tiago Luiz Vieira, Alexandre Antonelli, No one-size-fits-all solution to clean GBIF, PeerJ, 10.7717/peerj.9916, 8, (e9916), (2020).
- Pablo Sanchez‐Martinez, Jordi Martínez‐Vilalta, Kyle G. Dexter, Ricardo A. Segovia, Maurizio Mencuccini, Adaptation and coordinated evolution of plant hydraulic traits, Ecology Letters, 10.1111/ele.13584, 0, 0, (2020).
- Jan Reimuth, Gerhard Zotz, The biogeography of the megadiverse genus Anthurium (Araceae), Botanical Journal of the Linnean Society, 10.1093/botlinnean/boaa044, (2020).
- Huasheng Huang, Robert Morley, Alexis Licht, Guillaume Dupont-Nivet, Friðgeir Grímsson, Reinhard Zetter, Jan Westerweel, Zaw Win, Day Wa Aung, Carina Hoorn, Eocene palms from central Myanmar in a South-East Asian and global perspective: evidence from the palynological record, Botanical Journal of the Linnean Society, 10.1093/botlinnean/boaa038, (2020).
- Rafael O. Wüest, Niklaus E. Zimmermann, Damaris Zurell, Jake M. Alexander, Susanne A. Fritz, Christian Hof, Holger Kreft, Signe Normand, Juliano Sarmento Cabral, Eniko Szekely, Wilfried Thuiller, Martin Wikelski, Dirk Nikolaus Karger, Macroecology in the age of Big Data – Where to go from here?, Journal of Biogeography, 10.1111/jbi.13633, 47, 1, (1-12), (2019).
- Alexander Zizka, Josue Azevedo, Elton Leme, Beatriz Neves, Andrea Ferreira Costa, Daniel Caceres, Georg Zizka, Biogeography and conservation status of the pineapple family (Bromeliaceae), Diversity and Distributions, 10.1111/ddi.13004, 26, 2, (183-195), (2019).
- Jocelyn E. Pender, Andrew L. Hipp, Marlene Hahn, John Kartesz, Misako Nishino, Julian R. Starr, How sensitive are climatic niche inferences to distribution data sampling? A comparison of Biota of North America Program (BONAP) and Global Biodiversity Information Facility (GBIF) datasets, Ecological Informatics, 10.1016/j.ecoinf.2019.100991, (100991), (2019).
- Bradley J. Butterfield, Camille A. Holmgren, R. Scott Anderson, Julio L. Betancourt, Life history traits predict colonization and extinction lags of desert plant species since the Last Glacial Maximum, Ecology, 10.1002/ecy.2817, 100, 10, (2019).
- Bruno S. Espinosa, Carlos D'Apolito, Silane A.F. Silva-Caminha, Marcos G. Ferreira, Maria L. Absy, Neogene paleoecology and biogeography of a Malvoid pollen in northwestern South America, Review of Palaeobotany and Palynology, 10.1016/j.revpalbo.2019.104131, (104131), (2019).
- Lidiane Asevedo, Carlos D'Apolito, Shana Yuri Misumi, Marcia Aguiar de Barros, Ortrud Monika Barth, Leonardo dos Santos Avilla, Palynological analysis of dental calculus from Pleistocene proboscideans of southern Brazil: A new approach for paleodiet and paleoenvironmental reconstructions, Palaeogeography, Palaeoclimatology, Palaeoecology, 10.1016/j.palaeo.2019.109523, (109523), (2019).
- Étienne Léveillé-Bourret, Bing-Hua Chen, Marie-Ève Garon-Labrecque, Bruce A. Ford, Julian R. Starr, RAD sequencing resolves the phylogeny, taxonomy and biogeography of Trichophoreae despite a rapid recent radiation (Cyperaceae), Molecular Phylogenetics and Evolution, 10.1016/j.ympev.2019.106727, (106727), (2019).
- Jing Jin, Jun Yang, BDcleaner: A workflow for cleaning taxonomic and geographic errors in occurrence data archived in biodiversity databases, Global Ecology and Conservation, 10.1016/j.gecco.2019.e00852, (e00852), (2019).
- Hugh Burley, Linda J. Beaumont, Alessandro Ossola, John B. Baumgartner, Rachael Gallagher, Shawn Laffan, Manuel Esperon-Rodriguez, Anthony Manea, Michelle R. Leishman, Substantial declines in urban tree habitat predicted under climate change, Science of The Total Environment, 10.1016/j.scitotenv.2019.05.287, 685, (451-462), (2019).
- Pieter De Frenne, Florian Zellweger, Francisco Rodríguez-Sánchez, Brett R. Scheffers, Kristoffer Hylander, Miska Luoto, Mark Vellend, Kris Verheyen, Jonathan Lenoir, Global buffering of temperatures under forest canopies, Nature Ecology & Evolution, 10.1038/s41559-019-0842-1, (2019).
- Xiao Feng, Daniel S. Park, Cassondra Walker, A. Townsend Peterson, Cory Merow, Monica Papeş, A checklist for maximizing reproducibility of ecological niche models, Nature Ecology & Evolution, 10.1038/s41559-019-0972-5, (2019).
- Benjamin M. Marshall, Colin T. Strine, Exploring snake occurrence records: Spatial biases and marginal gains from accessible social media, PeerJ, 10.7717/peerj.8059, 7, (e8059), (2019).
- John Waller, Data Location Quality at GBIF, Biodiversity Information Science and Standards, 10.3897/biss.3.35829, 3, (2019).
- David W. Armitage, Stuart E. Jones, Coexistence barriers confine the poleward range of a globally distributed plant, Ecology Letters, 10.1111/ele.13612, 0, 0, (undefined).
- Craig F. Barrett, Joshua Lambert, Mathilda V. Santee, Brandon T. Sinn, Samuel V. Skibicki, Heather M. Stephens, Hana Thixton, Genetic, morphological, and niche variation in the widely hybridizing ‐ species complex, Plant Species Biology, 10.1111/1442-1984.12293, 0, 0, (undefined).
- Thomas L.P. Couvreur, Gilles Dauby, Anne Blach‐Overgaard, Vincent Deblauwe, Steven Dessein, Vincent Droissart, Oliver J. Hardy, David J. Harris, Steven B. Janssens, Alexandra C. Ley, Barbara A. Mackinder, Bonaventure Sonké, Marc S.M. Sosef, Tariq Stévart, Jens‐Christian Svenning, Jan J. Wieringa, Adama Faye, Alain D. Missoup, Krystal A. Tolley, Violaine Nicolas, Stéphan Ntie, Frédiéric Fluteau, Cécile Robin, Francois Guillocheau, Doris Barboni, Pierre Sepulchre, Tectonics, climate and the diversification of the tropical African terrestrial flora and fauna, Biological Reviews, 10.1111/brv.12644, 0, 0, (undefined).
- Iliana Chollett, D. Ross Robertson, Comparing biodiversity databases: Greater Caribbean reef fishes as a case study, Fish and Fisheries, 10.1111/faf.12497, 0, 0, (undefined).
- Conor A. Waldock, Adriana De Palma, Paulo A. V. Borges, Andy Purvis, Insect occurrence in agricultural land‐uses depends on realized niche and geographic range properties, Ecography, 10.1111/ecog.05162, 0, 0, (undefined).




