Volume 5, Issue 11
Application
Free Access

ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models

Robert Muscarella

Corresponding Author

Department of Ecology, Evolution and Environmental Biology, Columbia University, 1200 Amsterdam Ave., New York, NY 10027, USA

Correspondence author. E‐mail: bob.muscarella@gmail.comSearch for more papers by this author
Peter J. Galante

Department of Biology, City College of the City University of New York, 160 Convent Ave., New York, NY 10031, USA

Search for more papers by this author
Mariano Soley‐Guardia

Department of Biology, City College of the City University of New York, 160 Convent Ave., New York, NY 10031, USA

Graduate Center of the City University of New York, 365 5th Ave., New York, NY 10016, USA

Search for more papers by this author
Robert A. Boria

Department of Biology, City College of the City University of New York, 160 Convent Ave., New York, NY 10031, USA

Search for more papers by this author
Jamie M. Kass

Department of Biology, City College of the City University of New York, 160 Convent Ave., New York, NY 10031, USA

Graduate Center of the City University of New York, 365 5th Ave., New York, NY 10016, USA

Search for more papers by this author
María Uriarte

Department of Ecology, Evolution and Environmental Biology, Columbia University, 1200 Amsterdam Ave., New York, NY 10027, USA

Search for more papers by this author
Robert P. Anderson

Department of Biology, City College of the City University of New York, 160 Convent Ave., New York, NY 10031, USA

Graduate Center of the City University of New York, 365 5th Ave., New York, NY 10016, USA

Division of Vertebrate Zoology (Mammalogy), American Museum of Natural History, Central Park West & 79th Street, New York, NY 10024, USA

Search for more papers by this author
First published: 03 September 2014
Citations: 353

Summary

  1. Recent studies have demonstrated a need for increased rigour in building and evaluating ecological niche models (ENMs) based on presence‐only occurrence data. Two major goals are to balance goodness‐of‐fit with model complexity (e.g. by ‘tuning’ model settings) and to evaluate models with spatially independent data. These issues are especially critical for data sets suffering from sampling bias, and for studies that require transferring models across space or time (e.g. responses to climate change or spread of invasive species). Efficient implementation of procedures to accomplish these goals, however, requires automation.
  2. We developed ENMeval, an R package that: (i) creates data sets for k‐fold cross‐validation using one of several methods for partitioning occurrence data (including options for spatially independent partitions), (ii) builds a series of candidate models using Maxent with a variety of user‐defined settings and (iii) provides multiple evaluation metrics to aid in selecting optimal model settings. The six methods for partitioning data are n−1 jackknife, random k‐folds ( = bins), user‐specified folds and three methods of masked geographically structured folds. ENMeval quantifies six evaluation metrics: the area under the curve of the receiver‐operating characteristic plot for test localities (AUCTEST), the difference between training and testing AUC (AUCDIFF), two different threshold‐based omission rates for test localities and the Akaike information criterion corrected for small sample sizes (AICc).
  3. We demonstrate ENMeval by tuning model settings for eight tree species of the genus Coccoloba in Puerto Rico based on AICc. Evaluation metrics varied substantially across model settings, and models selected with AICc differed from default ones.
  4. In summary, ENMeval facilitates the production of better ENMs and should promote future methodological research on many outstanding issues.

Introduction

Correlative ecological niche models (ENMs, often termed species distribution models, SDMs) have become an important tool for research in ecology, conservation and evolutionary biology (Guisan & Thuiller 2005; Elith et al. 2006; Kozak, Graham & Wiens 2008; Dormann et al. 2012). These tools, however, are subject to a number of methodological issues including the challenge of balancing goodness‐of‐fit with model complexity (Warren & Seifert 2011), and the need to evaluate model performance with independent data (Veloz 2009; Hijmans 2012).

A number of recent studies have demonstrated the sensitivity of ENM performance to model specification (e.g. Araújo & Guisan 2006; Elith, Kearney & Phillips 2010; Anderson & Gonzalez 2011; Elith et al. 2011; Warren & Seifert 2011; Araújo & Peterson 2012; Merow, Smith & Silander 2013; Shcheglovitova & Anderson 2013; Syfert, Smith & Coomes 2013; Radosavljevic & Anderson 2014; Warren et al. 2014). Two main conclusions are that: (i) species‐specific tuning of settings (also called ‘smoothing’) can improve model performance, and (ii) spatially independent training and testing (also called ‘calibration’ and ‘evaluation’) data sets can reduce the degree to which models are overfit (e.g. to biased sampling). These issues are particularly critical for studies involving transfer across space or time, especially those requiring extrapolation into non‐analog conditions (e.g. Elith, Kearney & Phillips 2010; Anderson 2013). In practice, however, these recommendations are rarely implemented, largely because they can be prohibitively laborious and time‐consuming (Phillips & Dudík 2008). As a result, most empirical studies rely on default settings of a given algorithm/software package and potentially biased evaluation methods.

We developed an R package (ENMeval) to help address these issues. Specifically, the current version of ENMeval facilitates construction and evaluation of ENMs with one of the most commonly used presence‐only methods, Maxent (Phillips, Anderson & Schapire 2006). The structure of ENMeval, however, will allow later expansion to other niche modelling algorithms. Briefly, Maxent quantifies statistical relationships between predictor variables at locations where a species has been observed versus ‘background’ locations in the study region. These modelled relationships are constrained by various transformations of the original predictor variables (‘feature classes’ or FCs) – allowing more FCs enables more flexible and complex fits to the observed data. Higher flexibility, however, can increase the propensity for model overfitting (Peterson et al. 2011). By default, Maxent determines which FCs to allow based on the number of occurrence localities in a data set. Regardless of which feature classes are permitted in a model run, Maxent provides protection against overfitting via regularization, which penalizes the inclusion of additional parameters that result in little or no ‘gain’ to the model (Merow, Smith & Silander 2013). Users can specify which FCs to allow, and adjust the level of regularization via a single regularization multiplier (RM; default = 1·0). The RM acts in concert across all FCs as a coefficient multiplied to the individual regularization values (βs in Maxent) that correspond to each respective FC (Phillips & Dudík 2008). Several existing studies provide additional details on the mathematical underpinnings of Maxent (Phillips, Anderson & Schapire 2006; Phillips & Dudík 2008; Elith et al. 2011; Merow, Smith & Silander 2013; Yackulic et al. 2013).

Although the current default settings in Maxent were based on an extensive empirical tuning study (Phillips & Dudík 2008), recent work has shown that they can result in poorly performing models (Shcheglovitova & Anderson 2013; Radosavljevic & Anderson 2014). Additionally, artificial spatial autocorrelation between training and testing data partitions (e.g. due to sampling bias) can inflate metrics used to evaluate model performance (Veloz 2009; Wenger & Olden 2012; Radosavljevic & Anderson 2014). ENMeval should help address these issues and facilitate increased rigour in the development of Maxent models.

Package description

ENMeval provides a number of novel resources for Maxent users. First, it includes six methods to partition data for training and testing, including three designed to achieve spatially independent splits. Secondly, it executes a series of models across a user‐defined range of settings (i.e. combinations of FCs and RM values). Finally, it provides six evaluation metrics to characterize model performance. All of these operations can be completed with a single call to the primary function of the package, ENMevaluate, although supporting functions are also available (Table 1). The evaluation metrics returned can be used to compare models, and depending on the user's choice of evaluation criteria, select the optimally performing settings. ENMeval specifically does not perform model selection because it is not clear which optimality criteria are most appropriate for evaluating ENMs (Fielding & Bell 1997; Lobo, Jiménez‐Valverde & Real 2008; Peterson et al. 2011; Warren & Seifert 2011). Rather, the various evaluation statistics provided can be used to select settings based on recommendations from current and future literature. Below, we briefly outline the components of the package and demonstrate its functionality by conducting species‐specific tuning for eight species of native trees in Puerto Rico.

Table 1. The functions included in ENMeval. See the main text and package manual (Appendix S2) for additional details
Function name Description
calc.aicc Calculate the Akaike Information Criterion corrected for small samples sizes (AICc) based on Warren & Seifert (2011)
calc.niche.overlap Compute pairwise niche overlap (similarity of estimated suitability scores) in geographic space for Maxent predictions. The value ranges from 0 (no overlap) to 1 (identical predictions). Based on the ‘nicheOverlap’ function of the dismo R package (Hijmans et al. 2011)
corrected.var Calculates variance corrected for non‐independence of k‐fold iterations (Shcheglovitova & Anderson 2013)
ENMevaluate The primary function of ENMeval, this function automatically executes Maxent across a range of feature class and regularization multiplier settings, providing several evaluation metrics to aid in identifying settings that balance model goodness‐of‐fit with model complexity
get.evaluation.bins A general title for six separate functions (‘get.randomkfold’, ‘get.jackknife’, ‘get.user’, ‘get.block’, ‘get.checkerboard1’, ‘get.checkerboard2’) that partition occurrence and background localities into separate bins for training and testing (i.e. calibration and evaluation)
make.args Generate a list of arguments to pass to Maxent and to use as labels in plotting
eval.plot A basic plotting function to visualize evaluation metrics generated by ENMevaluate

DATA PARTITIONING AND MODEL EXECUTION

A run of ENMevaluate begins by using one of the six methods to partition occurrence localities into testing and training bins (folds) for k‐fold cross‐validation (Fig. 1; Fielding & Bell 1997; Peterson et al. 2011). The ‘random k‐fold’ method partitions occurrence localities randomly into a user‐specified number of (k) bins (equivalent to the ‘cross‐validate’ partitioning scheme available in the current version of the Maxent software). Primarily when working with small data sets (e.g. < ca. 25 localities), users may choose a special case of k‐fold cross‐validation where the number of bins (k) is equal to the number of occurrence localities (n) in the data set (Fig. 1a; Pearson et al. 2007; Shcheglovitova & Anderson 2013). Note that neither of these methods accounts for spatial autocorrelation between testing and training localities, which can inflate evaluation metrics, at least for data sets that result from biased sampling (Veloz 2009; Hijmans 2012; Wenger & Olden 2012). As a third option, users can define a priori partitions, which provides a flexible way to conduct spatially independent cross‐validation with background masking (see below).

image
ENMeval provides six methods for partitioning data into bins (each of which implements a variation on k‐fold cross‐validation), four of which are illustrated here. In all panels, different coloured occurrence localities represent different bins. (a) With the n−1 jackknife method, each of n occurrence localities is used for testing once (e.g. white locality here), while all others are used for training in that iteration (coloured localities). A total of n models are run, and evaluation metrics are summarized across these iterations. (b) The ‘block’ method partitions data into four bins based on the lines of latitude and longitude that divide occurrence localities as equally as possible. The amount of geographic (and environmental) space corresponding to each bin, however, is likely to differ. (c and d) The two ‘checkerboard’ methods involve aggregating the original environmental input grids into either one or two checkerboard‐like grids based on user‐defined aggregation factors. For the ‘checkerboard1’ method (c), a single grid is used to partition occurrence localities into two bins. The ‘checkerboard2’ method (d) is identical to ‘checkerboard1’ except that an additional second level of spatial aggregation is specified (i.e. fine‐ and coarse‐grain aggregation). Localities are first partitioned into two groups according to the ‘fine’‐grain checkerboard (as in ‘checkerboard1’). Then, these groups are further subdivided into two groups each based on the ‘coarse’‐grain checkerboard (with an aggregation factor specified by the user), yielding 4 bins. Note that, creating these grids to define bins does not affect the grain size of the environmental predictor variables themselves. As opposed to the ‘block’ method, both ‘checkerboard’ methods result in an approximately equal sample of geographic (and likely environmental) space in each bin. The numbers of occurrence localities, however, are likely to differ across bins. A warning is given if fewer than four bins are created as a result of the spatial configuration of occurrence localities. For each of these last three methods (b, c and d), k models are run iteratively using k−1 bins for training and the remaining one for testing. Evaluation metrics are then summarized across the k iterations.

Three additional methods are variations of what Radosavljevic & Anderson (2014) referred to as ‘masked geographically structured’ data partitioning (Fig. 1). The ‘block’ method partitions data according to the latitude and longitude lines that divide the occurrence localities into four bins of (insofar as possible) equal numbers (Fig. 1b). Both occurrence and background localities are assigned to each of the four bins based on their position with respect to these lines. ENMeval also includes two variants of a ‘checkerboard’ approach to partition occurrence localities. These generate checkerboard grids across the study extent that partition the localities into bins (Fig. 1c,d). In contrast to the block method, the checkerboard methods subdivide geographic space equally but do not ensure a balanced number of occurrence localities in each bin.

Choosing among the data partitioning methods depends on the research objectives and the characteristics of the study system. For example, the block method may be desirable for studies involving model transfer across space or time, including the possibility of encountering non‐analog conditions (e.g. native versus invaded regions, climate change effects; Wenger & Olden 2012). In contrast, the checkerboard methods (which are less likely to require extrapolation in environmental space) may be more appropriate when model transferability is not required. Nonetheless, we emphasize that evaluating models with various combinations of data partitions and software settings does not guarantee the reliability of models projected across space or time. For applications that rely on model transferability, researchers should identify non‐analog conditions, illustrate extrapolated response curves, quantify uncertainty based on the manner of extrapolation and interpret those predictions with additional caution.

After data partitioning, the ENMevaluate function iteratively builds k models for each combination of settings, s, using k−1 bins for model training and the withheld bin for testing. Importantly, for the geographically structured partitioning methods, background localities in the same geographic area as the bin holding testing localities are not included in the training phase (Phillips 2008; Phillips & Dudík 2008; Radosavljevic & Anderson 2014). A ‘full’ model (using the entire, unpartitioned data set) is also made to calculate AICc, resulting in a total of × (+ 1) model runs.

Evaluation metrics

Because no consensus currently exists regarding the most appropriate metric or approach to evaluate performance of ENMs (Fielding & Bell 1997; Lobo, Jiménez‐Valverde & Real 2008; Peterson et al. 2011; Warren & Seifert 2011), ENMeval provides several metrics likely to be useful for presence‐background evaluations (Table 2). All calculations in ENMeval are based on Maxent raw output values (Merow, Smith & Silander 2013; Yackulic et al. 2013). Note, however, that any rescaling that preserves rank (e.g. cumulative or logistic) will lead to the same evaluation values for the rank‐based metrics used here (based on omission rates or AUC), but not for AICc (Warren et al. 2009; Peterson et al. 2011) or Schoener's D (see below). First, ENMeval calculates a measure of the model's ability to discriminate conditions at withheld occurrence localities from those at background samples: the area under the curve of the receiver operating characteristic plot based on the testing data (AUCTEST). In this implementation, AUCTEST is calculated with the full set of background localities (corresponding to all k bins) to enable comparison among k‐fold iterations (Radosavljevic & Anderson 2014). Secondly, to quantify overfitting, ENMeval calculates the difference between training and testing AUC (AUCDIFF), which is expected to be high with overfit models (Warren & Seifert 2011). ENMeval also calculates two omission rates that quantify model overfitting when compared with the respective theoretically expected omission rates: the proportion of test localities with Maxent output values lower than that corresponding to (i) the training locality with the lowest value (i.e. the minimum training presence, MTP; = 0% training omission) or (ii) a 10% omission rate of the training localities ( = 10% training omission) (Pearson et al. 2007). ENMeval provides the mean and variance (corrected for the non‐independence of the k iterations, Shcheglovitova & Anderson 2013) for each of these four metrics. The function also calculates the AICc value, ΔAICc and AICc weight for each full model, providing information on the relative quality of a model given the data (Burnham & Anderson 2004; Warren & Seifert 2011). Because AICc is calculated using the full data set, it is not affected by the method chosen for data partitioning. We note that, following Warren & Seifert (2011), AICc is calculated based on the number of non‐zero parameters in the Maxent lambda file and that this value may not accurately estimate the total degrees of freedom in the model (Hastie, Tibshirani & Friedman 2009). Nonetheless, the relative performance of AICc versus other model evaluation metrics is a topic of ongoing research (e.g. Cao et al. 2013; Radosavljevic & Anderson 2014; Warren et al. 2014), and ENMeval should help advance this line of research. Finally, to quantify how predictions differ in geographic space (e.g. Fig. 2), ENMeval computes pairwise niche overlap between models using Schoener's D (Schoener 1968; Warren et al. 2009). Finally, ENMeval includes a basic plotting function to visualize evaluation statistics (see Fig. 3).

Table 2. The evaluation metrics calculated by ENMeval
Metric Description References
AUCTEST The threshold‐independent metric AUC based on predicted values for the test localities (i.e. localities withheld during model training), averaged over k iterations. Higher values reflect a better ability for a model to discriminate between conditions at withheld (testing) occurrence localities and those of background localities (by ranking the former higher than the latter based on their predicted suitability values). The rank‐based AUC does not indicate model fit Hanley & McNeil (1982), Peterson et al. (2011)
AUCDIFF The difference between the AUC value based on training localities (i.e. AUCTRAIN) and AUCTEST (AUCTRAIN − AUCTEST). If AUCTRAIN < AUCTEST, the returned value is zero. Value of AUCDIFF is expected to be positively associated with the degree of model overfitting Warren & Seifert (2011)
ORMTP (‘Minimum Training Presence’ omission rate) A threshold‐dependent metric that indicates the proportion of test localities with suitability values (Maxent relative occurrence rates) lower than that associated with the lowest‐ranking training locality. Omission rates greater than the expectation of zero typically indicate model overfitting Fielding & Bell (1997), Peterson et al. (2011), Radosavljevic & Anderson (2014)
OR10 (10% training omission rate) A threshold‐dependent metric that indicates the proportion of test localities with suitability values (Maxent relative occurrence rates) lower than that excluding the 10% of training localities with the lowest predicted suitability. Omission rates greater than the expectation of 10% typically indicate model overfitting Fielding & Bell (1997), Peterson et al. (2011)
AICc The Akaike Information Criterion corrected for small samples sizes reflects both model goodness‐of‐fit and complexity. The model with the lowest AICc value (i.e. ΔAICc = 0) is considered the best model out of the current suite of models; all models with ΔAICc < 2 are generally considered to have substantial support Burnham & Anderson (2004), Warren & Seifert (2011)
image
Occurrences and predictions of Maxent models shown in selected portions of Puerto Rico for Coccoloba microstachya (a, b) and C. pyrifolia (c, d). Models shown here correspond to those producing the minimum AICc (a, c), and those built with default settings (b, d). Occurrence localities are indicated with an x. Scale bars show Maxent logistic output (used here only for visualization purposes); higher values (warmer colours) indicate higher predicted suitability.
image
Three evaluation metrics for Coccoloba microstachya (a–c) and C. pyrifolia (d–f) resulting from Maxent models made across a range of feature‐class combinations and regularization multipliers. Left panels show ΔAICc, centre panels show the omission rate of testing localities at the 10% training threshold (OR10), and right panels show AUCTEST. Dotted horizontal line in ΔAICc plots represents ΔAICc = 2, which delimits models that are generally considered to have substantial support relative to others examined – that is those below the line (Burnham & Anderson 2004). Default settings and settings that yielded minimum AICc are indicated with arrows. Legends denote feature classes allowed (L = linear, Q = quadratic, H = hinge, P = product and T = threshold). Note that for these species, AICc‐selected settings (based on all localities) resulted in substantially lower omission rates (in the models run with the partitioned data; ‘checkerboard2’ method) than were achieved by the default settings. However, maximal AUCTEST showed low correspondence with either the AICc‐chosen or default settings. For these species, AICc consistently selected regularization multipliers higher than the default value. Results for all eight species (including estimates of variance) are provided in Appendix S1.

Recent work has demonstrated equivalency between the Maxent algorithm and loglinear generalized linear models (Renner & Warton 2013), as well as close links to inhomogeneous Poisson process (IPP) models (Fithian & Hastie 2013). These connections open numerous additional diagnostic tools that are not readily available in the current Maxent software (e.g. using the data to determine the most appropriate spatial resolution of predictor variables). Future work will benefit by capitalizing on the connections among these approaches.

Case study

To demonstrate ENMeval, we tuned Maxent models for eight native tree species from the genus Coccoloba (Polygonaceae) in Puerto Rico (Table 3). We compiled occurrence localities (ranging from 24 to 71 across species) from herbaria at the University of Puerto Rico, the US National Museum of Natural History and the New York Botanical Garden. As predictor variables, we used a categorical map of soil parent material (Bawiec 1999) and four climatic variables: log‐transformed mean annual precipitation (log [mm year−1]), coefficient of variation of monthly mean precipitation, mean temperature of the coldest month (°C) and mean daily temperature range (°C) (Daly, Helmer & Quiñones 2003). To reduce the influence of spatial sampling bias, we applied a weighted‐target group approach (Anderson 2003) by using 22,858 tree species occurrence localities throughout Puerto Rico as background localities (Phillips et al. 2009). After partitioning occurrence data using the checkerboard2 method (see Fig. 1, aggregation factor = 5), we built models with RM values ranging from 0·5 to 4·0 (increments of 0·5) and with six different FC combinations (L, LQ, H, LQH, LQHP, LQHPT; where L = linear, Q = quadratic, H = hinge, P = product and T = threshold); this resulted in 1920 individual model runs.

Table 3. Evaluation metrics of Maxent ENMs generated by ENMeval for eight species of Coccoloba trees in Puerto Rico
Species n Model FC RM AUCTEST AUCDIFF ORMTP OR10 ΔAICc D
C. costata (C. Wright) 48 AICc LQHPT 1·5 0·823 0·070 0·039 0·175 0·0 0·932
Default LQH 1 0·829 0·069 0·062 0·175 20·9
C. diversifolia (Jacq.) 69 AICc LQ 3 0·760 0·080 0·081 0·222 0·0 0·807
Default LQH 1 0·776 0·078 0·056 0·250 95·8
C. krugii (Lindau) 24 AICc L 3 0·958 0·017 0·071 0·117 0·0 0·849
Default LQH 1 0·945 0·030 0·071 0·242 19·9
C. microstachya (Willd.) 58 AICc LQH 2 0·783 0·070 0·098 0·214 0·0 0·916
Default LQH 1 0·785 0·073 0·023 0·251 15·0
C. pyrifolia (Desf.) 71 AICc LQHP 4 0·692 0·053 0·022 0·150 0·0 0·871
Default LQH 1 0·690 0·083 0·010 0·170 36·4
C. sintenisii (Urb.) 27 AICc LQ 1·5 0·798 0·090 0·069 0·188 0·0 0·822
Default LQH 1 0·813 0·097 0·091 0·272 NA
C. swartzii (Meisn.) 42 AICc L 1·5 0·713 0·092 0·019 0·128 0·0 0·909
Default LQH 1 0·697 0·133 0·128 0·300 29·8
C. venosa (L.) 44 AICc LQH 2·5 0·712 0·113 0·095 0·312 0·0 0·860
Default LQH 1 0·707 0·145 0·115 0·312 53·9
  • Results are based on the ‘checkerboard2’ method for data partitioning and are shown for settings that gave minimum AICc values (i.e. ΔAICc = 0) as well as for Maxent default settings. Number of occurrence records used for each species is given by n. Schoener's D statistic (Schoener 1968; Warren et al. 2009) – a measure of model similarity in geographic space – compares the predictions of AICc‐selected models with those based on default settings. Values of D range from zero to one, with higher values indicating more similar models.

Here, we summarize the relative performance of models built with default settings versus those selected by AICc (i.e. ΔAICc = 0; Fig. 2); comprehensive results are provided in Appendix S1. Settings of AICc‐selected models differed from default settings for all species. Although we did not find general trends regarding FCs, AICc‐selected models had higher RM values than the default of 1·0 (Table 3; Fig. 3). Because higher regularization ‘smoothes’ response curves by imposing a higher penalty for the inclusion of parameters, this result suggests that default settings tended to result in more complex models relative to AICc‐selected models (Fig. 2). Overall, AICc‐selected models also had lower omission rates than default models, indicating less overfitting (Fig. 3). However, AICc‐selected models generally showed slightly lower AUCTEST values than those made by default, suggesting somewhat lower discriminatory ability (Table 3; Fig. 3). This preliminary analysis reinforces the importance of species‐specific tuning to build ENMs with Maxent.

Conclusions

By relieving some of the logistic challenges associated with species‐specific tuning and model evaluation, ENMeval facilitates research in ecological niche modelling. For instance, although beyond the scope of this software description, future work based on both simulated data sets and a variety of real species should help clarify the strengths and weaknesses of various data partitioning methods and evaluation metrics. Overall, we anticipate that ENMeval will help to advance research in model evaluation and methods for extrapolating ENMs in environmental space. Similar issues exist for algorithms other than Maxent, and the current structure of ENMeval will allow later incorporation of other algorithms.

Obtaining ENMeval

ENMeval requires a current R installation (freely available from http://cran.r-project.org/) and can be downloaded from CRAN at: http://cran.r-project.org/web/packages/ENMeval/index.html. The package manual is provided in Appendix S2. ENMeval is under development and we welcome bug reports and feedback, including suggestions for features that could be included in future versions.

Acknowledgements

We thank Fabiola Areces, Franklin Axelrod, Danilo Chinea and Jeanine Vélez at the University of Puerto Rico for providing access to digitized herbarium records. Rebecca Panko and Víctor José Vega López contributed by tirelessly georeferencing herbarium specimens. This manuscript benefitted from the comments of two anonymous reviewers. This research was supported by NSF‐DEB 1311367 to MU and RM, NSF‐DEB 1119915 to RPA, the Graduate Center of the City University of New York (Science Fellowship and Doctoral Student's Council Dissertation Award to MSG, CUNY Science Scholarship and Graduate Assistantship to JMK), and the Luis Stokes Alliance for Minority Participation (Bridge to Doctorate Fellowship to RAB). The authors have no conflict of interest to declare.

    Data accessibility

    We compiled the occurrence data used in our case study from publicly accessible data bases including GBIF (www.gbif.org) and digitized records from several herbaria (NY, MAPR, US and UPRRP). Environmental data layers were compiled from published studies. Specifically, the geologic substrate layer comes from Bawiec (1999) and the climatic data layers come from Daly, Helmer & Quiñones (2003). We provide all necessary data in Appendix S3, together with an R script to repeat our case study analysis.

      Number of times cited according to CrossRef: 353

      • Land use and climate change interaction triggers contrasting trajectories of biological invasion, Ecological Indicators, 10.1016/j.ecolind.2020.106936, 120, (106936), (2021).
      • Niches and radiations: a case study on the Andean sapphire‐vented puffleg Eriocnemis luciani and coppery‐naped puffleg E. sapphiropygia (Aves, Trochilidae), Journal of Avian Biology, 10.1111/jav.02242, 51, 1, (2020).
      • Projected distribution and climate refugia of endangered Kashmir musk deer Moschus cupreus in greater Himalaya, South Asia, Scientific Reports, 10.1038/s41598-020-58111-6, 10, 1, (2020).
      • Biogeographic barriers, Pleistocene refugia, and climatic gradients in the southeastern Nearctic drive diversification in cornsnakes (Pantherophis guttatus complex), Molecular Ecology, 10.1111/mec.15358, 29, 4, (797-811), (2020).
      • Evidence of ecological niche shift in Rhododendron ponticum (L.) in Britain: Hybridization as a possible cause of rapid niche expansion, Ecology and Evolution, 10.1002/ece3.6036, 10, 4, (2040-2050), (2020).
      • The combined role of dispersal and niche evolution in the diversification of Neotropical lizards, Ecology and Evolution, 10.1002/ece3.6091, 10, 5, (2608-2625), (2020).
      • Distribution of breeding humpback whale habitats and overlap with cumulative anthropogenic impacts in the Eastern Tropical Atlantic, Diversity and Distributions, 10.1111/ddi.13033, 26, 5, (549-564), (2020).
      • Analysis of potentially suitable habitat within migration connections of an intra-African migrant-the Blue Swallow (Hirundo atrocaerulea), Ecological Informatics, 10.1016/j.ecoinf.2020.101082, (101082), (2020).
      • Modelling species distributions in dynamic landscapes: The importance of the temporal dimension, Journal of Biogeography, 10.1111/jbi.13832, 47, 7, (1510-1529), (2020).
      • Climatic dynamics and topography control genetic variation in Atlantic Forest montane birds, Molecular Phylogenetics and Evolution, 10.1016/j.ympev.2020.106812, (106812), (2020).
      • Deer density drives habitat use of establishing wolves in the Western European Alps, Journal of Applied Ecology, 10.1111/1365-2664.13609, 57, 5, (995-1008), (2020).
      • Shifting ecosystem connectivity during the Pleistocene drove diversification and gene‐flow in a species complex of Neotropical birds (Tityridae: Pachyramphus), Journal of Biogeography, 10.1111/jbi.13862, 47, 8, (1714-1726), (2020).
      • Effects of different variable sets on the potential distribution of fish species in the Amazon Basin, Ecology of Freshwater Fish, 10.1111/eff.12552, 29, 4, (764-778), (2020).
      • Assessing the Relative Role of Environmental Factors That Limit the Distribution of the Yucatan Rattlesnake (Crotalus tzabcan), Journal of Herpetology, 10.1670/19-055, 54, 2, (216), (2020).
      • Comparing maximum entropy modelling methods to inform aquaculture site selection for novel seaweed species, Ecological Modelling, 10.1016/j.ecolmodel.2020.109071, 429, (109071), (2020).
      • Where and why? Bees, snail shells and climate: Distribution of Rhodanthidium (Hymenoptera: Megachilidae) in the Iberian Peninsula, Entomological Science, 10.1111/ens.12420, 23, 3, (256-270), (2020).
      • A standard protocol for reporting species distribution models, Ecography, 10.1111/ecog.04960, 43, 9, (1261-1277), (2020).
      • Predicting the potential distribution of the vine mealybug, Planococcus ficus under climate change by MaxEnt, Crop Protection, 10.1016/j.cropro.2020.105268, (105268), (2020).
      • Finding complexity in complexes: Assessing the causes of mitonuclear discordance in a problematic species complex of Mesoamerican toads, Molecular Ecology, 10.1111/mec.15496, 29, 18, (3543-3559), (2020).
      • Extinctions of Threatened Frogs may Impact Ecosystems in a Global Hotspot of Anuran Diversity, Herpetologica, 10.1655/0018-0831-76.2.121, 76, 2, (121), (2020).
      • Niche divergence and paleo-distributions of Lutzomyia longipalpis mitochondrial haplogroups (Diptera: Psychodidae), Acta Tropica, 10.1016/j.actatropica.2020.105607, 211, (105607), (2020).
      • Projected climate and land use change alter western blacklegged tick phenology, seasonal host‐seeking suitability and human encounter risk in California, Global Change Biology, 10.1111/gcb.15269, 26, 10, (5459-5474), (2020).
      • Phylogenomic approaches reveal how climate shapes patterns of genetic diversity in an African rain forest tree species, Molecular Ecology, 10.1111/mec.15572, 29, 18, (3560-3573), (2020).
      • Identifying suitable reintroduction sites for the White-rumped Vulture (Gyps bengalensis) in India's Western Ghats using niche models and habitat requirements, Ecological Engineering, 10.1016/j.ecoleng.2020.106034, 158, (106034), (2020).
      • High-resolution habitat suitability model for Phlebotomus pedifer, the vector of cutaneous leishmaniasis in southwestern Ethiopia, Parasites & Vectors, 10.1186/s13071-020-04336-3, 13, 1, (2020).
      • Fine-scale landscape genetics unveiling contemporary asymmetric movement of red panda (Ailurus fulgens) in Kangchenjunga landscape, India, Scientific Reports, 10.1038/s41598-020-72427-3, 10, 1, (2020).
      • Predicting the current and future distribution of three Coptis herbs in China under climate change conditions, using the MaxEnt model and chemical analysis, Science of The Total Environment, 10.1016/j.scitotenv.2019.134141, 698, (134141), (2020).
      • Ecological niche models and species distribution models in marine environments: A literature review and spatial analysis of evidence, Ecological Modelling, 10.1016/j.ecolmodel.2019.108837, 415, (108837), (2020).
      • Defining priorities for global snow leopard conservation landscapes, Biological Conservation, 10.1016/j.biocon.2019.108387, 241, (108387), (2020).
      • ENMTML: An R package for a straightforward construction of complex ecological niche models, Environmental Modelling & Software, 10.1016/j.envsoft.2019.104615, (104615), (2020).
      • Species interactions and climate change: How the disruption of species co‐occurrence will impact on an avian forest guild, Global Change Biology, 10.1111/gcb.14953, 26, 3, (1212-1224), (2020).
      • Assessing the status of critically endangered Kondana soft-furred rat (Millardia kondana) using integrative taxonomy: combining evidence from morphological, molecular and environmental niche modeling, Mammalia, 10.1515/mammalia-2019-0056, 0, 0, (2020).
      • Pollution control can help mitigate future climate change impact on European grayling in the UK, Diversity and Distributions, 10.1111/ddi.13039, 26, 4, (517-532), (2020).
      • Estimating effective population size using RADseq: Effects of SNP selection and sample size, Ecology and Evolution, 10.1002/ece3.6016, 10, 4, (1929-1937), (2020).
      • Building High‐Rate Nickel‐Rich Cathodes by Self‐Organization of Structurally Stable Macrovoid, Advanced Science, 10.1002/advs.201902844, 7, 7, (2020).
      • Climate‐induced changes in the suitable habitat of cold‐water corals and commercially important deep‐sea fishes in the North Atlantic, Global Change Biology, 10.1111/gcb.14996, 26, 4, (2181-2202), (2020).
      • Mapping tree species vulnerability to multiple threats as a guide to restoration and conservation of tropical dry forests, Global Change Biology, 10.1111/gcb.15028, 26, 6, (3552-3568), (2020).
      • Distribution of (Rossi, 1790) throughout the Iberian Peninsula based on a maximum entropy modelling approach, Annals of Applied Biology, 10.1111/aab.12584, 177, 1, (112-120), (2020).
      • Co-occurrence or dependence? Using spatial analyses to explore the interaction between palms and Rhodnius triatomines, Parasites & Vectors, 10.1186/s13071-020-04088-0, 13, 1, (2020).
      • Genotyping‐by‐sequencing and ecological niche modeling illuminate phylogeography, admixture, and Pleistocene range dynamics in quaking aspen (Populus tremuloides), Ecology and Evolution, 10.1002/ece3.6214, 10, 11, (4609-4629), (2020).
      • An ensemble high‐resolution projection of changes in the future habitat of American lobster and sea scallop in the Northeast US continental shelf, Diversity and Distributions, 10.1111/ddi.13069, 26, 8, (987-1001), (2020).
      • Climate niche mismatch and the collapse of primate seed dispersal services in the Amazon, Biological Conservation, 10.1016/j.biocon.2020.108628, 247, (108628), (2020).
      • Analysis of the remaining habitat of an endemic species rediscovered, Mammalian Biology, 10.1007/s42991-020-00023-z, 100, 3, (307-314), (2020).
      • Getting to the bottom of bycatch: a GIS-based toolbox to assess the risk of marine mammal bycatch, Endangered Species Research, 10.3354/esr01037, 42, (37-57), (2020).
      • Clearing up the Crystal Ball: Understanding Uncertainty in Future Climate Suitability Projections for Amphibians, Herpetologica, 10.1655/0018-0831-76.2.108, 76, 2, (108), (2020).
      • Paleoclimatic evolution as the main driver of current genomic diversity in the widespread and polymorphic Neotropical songbird Arremon taciturnus, Molecular Ecology, 10.1111/mec.15534, 29, 15, (2922-2939), (2020).
      • Spatial predictors of genomic and phenotypic variation differ in a lowland Middle American bird (Icterus gularis), Molecular Ecology, 10.1111/mec.15536, 29, 16, (3084-3101), (2020).
      • ntbox: An r package with graphical user interface for modelling and evaluating multidimensional ecological niches, Methods in Ecology and Evolution, 10.1111/2041-210X.13452, 11, 10, (1199-1206), (2020).
      • Citizen science and habitat modelling facilitates conservation planning for crabeater seals in the Weddell Sea, Diversity and Distributions, 10.1111/ddi.13120, 26, 10, (1291-1304), (2020).
      • Relationship Between Middle Managers' Transformational Leadership and Effective Followership Behaviors in Organizations, Journal of Leadership Studies, 10.1002/jls.21673, 13, 4, (6-19), (2020).
      • Prioritising search effort to locate previously unknown populations of endangered marine reptiles, Global Ecology and Conservation, 10.1016/j.gecco.2020.e01013, 22, (e01013), (2020).
      • Multiple dimensions of climate change on the distribution of Amazon primates, Perspectives in Ecology and Conservation, 10.1016/j.pecon.2020.03.001, (2020).
      • Modelling the geographical distributions of one native and two introduced species of crayfish in the French Alps, Ecological Informatics, 10.1016/j.ecoinf.2020.101172, (101172), (2020).
      • Climate suitability as indicative of invasion potential for the most seized bird species in Brazil, Journal for Nature Conservation, 10.1016/j.jnc.2020.125890, (125890), (2020).
      • Predicting distribution of Zanthoxylum bungeanum Maxim. in China, BMC Ecology, 10.1186/s12898-020-00314-6, 20, 1, (2020).
      • Niche determinants in a salamander complex: Does hybridism or reproductive parasitism explain patterns of distribution?, Ecosphere, 10.1002/ecs2.3265, 11, 10, (2020).
      • Extrapolation in species distribution modelling. Application to Southern Ocean marine species, Progress in Oceanography, 10.1016/j.pocean.2020.102438, (102438), (2020).
      • Integrative taxonomy identifies a new stingray species of the genus Rafinesque, 1818 (Dasyatidae, Myliobatiformes), from the Tropical Southwestern Atlantic, Journal of Fish Biology, 10.1111/jfb.14483, 97, 4, (1120-1142), (2020).
      • Biotic interactions govern the distribution of coexisting ungulates in the Arctic Archipelago — a case for conservation planning, Global Ecology and Conservation, 10.1016/j.gecco.2020.e01239, (e01239), (2020).
      • The roles of climate, geography and natural selection as drivers of genetic and phenotypic differentiation in a widespread amphibian Hyla annectans (Anura: Hylidae), Molecular Ecology, 10.1111/mec.15584, 29, 19, (3667-3683), (2020).
      • Predicting the potential distribution of Amblyomma americanum (Acari: Ixodidae) infestation in New Zealand, using maximum entropy-based ecological niche modelling, Experimental and Applied Acarology, 10.1007/s10493-019-00460-7, (2020).
      • Minimizing Risk and Maximizing Spatial Transferability: Challenges in Constructing a Useful Model of Potential Suitability for an Invasive Insect, Annals of the Entomological Society of America, 10.1093/aesa/saz049, (2020).
      • Placing the hybrid origin of the asexual Amazon molly (Poecilia formosa) based on historical climate data, Biological Journal of the Linnean Society, 10.1093/biolinnean/blaa010, (2020).
      • The Potential Global Distribution of the White Peach Scale Pseudaulacaspis pentagona (Targioni Tozzetti) under Climate Change, Forests, 10.3390/f11020192, 11, 2, (192), (2020).
      • Potential distributions of Bacillus anthracis and Bacillus cereus biovar anthracis causing anthrax in Africa, PLOS Neglected Tropical Diseases, 10.1371/journal.pntd.0008131, 14, 3, (e0008131), (2020).
      • Predicting the Distribution of Indicator Taxa of Vulnerable Marine Ecosystems in the Arctic and Sub-arctic Waters of the Nordic Seas, Frontiers in Marine Science, 10.3389/fmars.2020.00131, 7, (2020).
      • Potential impact of climate change on the distribution of the Eurasian Lynx ( Lynx lynx ) in Iran (Mammalia: Felidae) , Zoology in the Middle East, 10.1080/09397140.2020.1739371, (1-11), (2020).
      • Ooctonus vulgatus (Hymenoptera, Mymaridae), a potential biocontrol agent to reduce populations of Philaenus spumarius (Hemiptera, Aphrophoridae) the main vector of Xylella fastidiosa in Europe , PeerJ, 10.7717/peerj.8591, 8, (e8591), (2020).
      • Fragmenting fragments: landscape genetics of a subterranean rodent (Mammalia, Ctenomyidae) living in a human-impacted wetland, Landscape Ecology, 10.1007/s10980-020-01001-z, (2020).
      • Changing Only Slowly: The Role of Phylogenetic Niche Conservatism in Caviidae (Rodentia) Speciation, Journal of Mammalian Evolution, 10.1007/s10914-020-09501-0, (2020).
      • Distribution Pattern of Endangered Plant Semiliquidambar cathayensis (Hamamelidaceae) in Response to Climate Change after the Last Interglacial Period, Forests, 10.3390/f11040434, 11, 4, (434), (2020).
      • Climatic Change and Habitat Availability for Three Sotol Species in México: A Vision towards Their Sustainable Use, Sustainability, 10.3390/su12083455, 12, 8, (3455), (2020).
      • Ecological Niche Models of Four Hard Tick Genera (Ixodidae) in Mexico, Animals, 10.3390/ani10040649, 10, 4, (649), (2020).
      • Distribution of Malpighia mexicana in Mexico and its implications for Barranca del Río Santiago, Journal of Forestry Research, 10.1007/s11676-020-01157-z, (2020).
      • Global analysis of ecological niche conservation and niche shift in exotic populations of monkeyflowers ( Mimulus guttatus, M. luteus ) and their hybrid ( M. × robertsii ) , Plant Ecology & Diversity, 10.1080/17550874.2020.1750721, (1-14), (2020).
      • Potential Distribution and the Risks of Bactericera cockerelli and Its Associated Plant Pathogen Candidatus Liberibacter Solanacearum for Global Potato Production, Insects, 10.3390/insects11050298, 11, 5, (298), (2020).
      • Effects of climate change and human influence in the distribution and range overlap between two widely distributed avian scavengers, Bird Conservation International, 10.1017/S0959270920000271, (1-19), (2020).
      • Predicting the distribution of a rare chipmunk (Neotamias quadrivittatus oscuraensis): comparing MaxEnt and occupancy models, Journal of Mammalogy, 10.1093/jmammal/gyaa057, (2020).
      • Predicting the Potential Global Geographical Distribution of Two Icerya Species under Climate Change, Forests, 10.3390/f11060684, 11, 6, (684), (2020).
      • Genetic variation of the Chilean endemic long-haired mouse Abrothrix longipilis (Rodentia, Supramyomorpha, Cricetidae) in a geographical and environmental context , PeerJ, 10.7717/peerj.9517, 8, (e9517), (2020).
      • Pre-colonial Amerindian legacies in forest composition of southern Brazil, PLOS ONE, 10.1371/journal.pone.0235819, 15, 7, (e0235819), (2020).
      • Incorporating interspecific interactions into phylogeographic models: A case study with Californian oaks, Molecular Ecology, 10.1111/mec.15548, 0, 0, (2020).
      • Species Distribution Models and Niche Partitioning among Unisexual Darevskia dahli and Its Parental Disexual (D. portschinskii, D. mixta) Rock Lizards in the Caucasus, Mathematics, 10.3390/math8081329, 8, 8, (1329), (2020).
      • Quantifying range decline and remaining populations of the large marsupial carnivore of Australia’s tropical rainforest, Journal of Mammalogy, 10.1093/jmammal/gyaa077, (2020).
      • Evaluating the capacity of species distribution modeling to predict the geographic distribution of the mangrove community in Mexico, PLOS ONE, 10.1371/journal.pone.0237701, 15, 8, (e0237701), (2020).
      • Identifying marine invasion hotspots using stacked species distribution models, Biological Invasions, 10.1007/s10530-020-02332-3, (2020).
      • Toward reliable habitat suitability and accessibility models in an era of multiple environmental stressors, Ecology and Evolution, 10.1002/ece3.6753, 0, 0, (2020).
      • Predicting hedgehog mortality risks on British roads using habitat suitability modelling, PeerJ, 10.7717/peerj.8154, 7, (e8154), (2020).
      • Genomics of sorghum local adaptation to a parasitic plant, Proceedings of the National Academy of Sciences, 10.1073/pnas.1908707117, (201908707), (2020).
      • Potential distribution of a climate sensitive species, the White-winged Snowfinch Montifringilla nivalis in Europe , Bird Conservation International, 10.1017/S0959270920000027, (1-11), (2020).
      • Environmental Drivers and Distribution Patterns of Carnivoran Assemblages (Mammalia: Carnivora) in the Americas: Past to Present, Journal of Mammalian Evolution, 10.1007/s10914-020-09496-8, (2020).
      • Optimized Maxent Model Predictions of Climate Change Impacts on the Suitable Distribution of Cunninghamia lanceolata in China, Forests, 10.3390/f11030302, 11, 3, (302), (2020).
      • Modelling risks posed by wind turbines and power lines to soaring birds: the black stork (Ciconia nigra) in Italy as a case study, Biodiversity and Conservation, 10.1007/s10531-020-01961-3, (2020).
      • A sequential multi-level framework to improve habitat suitability modelling, Landscape Ecology, 10.1007/s10980-020-00987-w, (2020).
      • Using distribution models to estimate blooms of phytosanitary cyanobacteria in Brazil, Biota Neotropica, 10.1590/1676-0611-bn-2019-0756, 20, 2, (2020).
      • Altitudinal, latitudinal and longitudinal responses of cloud forest species to Quaternary glaciations in the northern Neotropics, Biological Journal of the Linnean Society, 10.1093/biolinnean/blaa070, (2020).
      • Including indigenous knowledge in species distribution modeling for increased ecological insights, Conservation Biology, 10.1111/cobi.13373, 0, 0, (2020).
      • Phylogeography and Genetic Diversity in a Southern North American Desert: Agave kerchovei From the Tehuacán-Cuicatlán Valley, Mexico, Frontiers in Plant Science, 10.3389/fpls.2020.00863, 11, (2020).
      • Invasive fountain grass (Pennisetum setaceum (Forssk.) Chiov.) increases its potential area of distribution in Tenerife island under future climatic scenarios, Plant Ecology, 10.1007/s11258-020-01046-9, (2020).
      • Assessing Niche Shifts and Conservatism by Comparing the Native and Post-Invasion Niches of Major Forest Invasive Species, Insects, 10.3390/insects11080479, 11, 8, (479), (2020).
      • See more