Volume 10, Issue 2
APPLICATION
Free Access

blockCV: An r package for generating spatially or environmentally separated folds for k‐fold cross‐validation of species distribution models

Roozbeh Valavi

Corresponding Author

E-mail address: rvalavi@student.unimelb.edu.au

School of Biosciences, University of Melbourne, Parkville, Victoria, Australia

Correspondence

Roozbeh Valavi

Email: rvalavi@student.unimelb.edu.au

Search for more papers by this author
Jane Elith

School of Biosciences, University of Melbourne, Parkville, Victoria, Australia

Search for more papers by this author
José J. Lahoz‐Monfort

School of Biosciences, University of Melbourne, Parkville, Victoria, Australia

Search for more papers by this author
Gurutzeta Guillera‐Arroita

School of Biosciences, University of Melbourne, Parkville, Victoria, Australia

Search for more papers by this author
First published: 12 October 2018
Citations: 30

Abstract

en

  1. When applied to structured data, conventional random cross‐validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection.
  2. We present the r package blockCV, a new toolbox for cross‐validation of species distribution modelling. Although it has been developed with species distribution modelling in mind, it can be used for any spatial modelling.
  3. The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds.
  4. Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.

Foreign Language Abstract

fa

هنگامی که روش اعتبارسنجی متقاطع بر روی داده‌های دارای ساختار (مکانی٬ محیطی …) اعمال می‌شود٬ می‌تواند منجر به تخمین نادرست خطای پیشبینی و در نتیجه اشتباه در انتخاب مدل‌ها شود.

در اینجا کتابخانه blockCV در نرم‌افزار برنامه‌نویسی R را بعنوان یک ابزار جدید برای اعتبارسنجی متقاطع مدل‌های توزیع گونه‌ها ارائه می‌دهیم. گرچه این کتابخانه با ایده‌ی مدلسازی توزیع گونه‌ها توسعه داده شده است اما می‌تواند برای انواع مدلسازی‌های مکانی نیز مورد استفاده قرار گیرد.

این کتابخانه توانایی ساخت زیرمجموعه‌های مجزای مکانی و محیطی از داده‌ها را دارد و همچنین شامل ابزاری برای اندازه‌گیری دامنه تاثیر خودهمبستگی مکانی در متغیرهای پیشبینی‌کننده می‌باشد که به کاربر دیدگاه بهتری از ساختار مکانی این داده‌ها را می‌دهد. این کتابخانه دارای ابزارهای گرافیکی برای بررسی و ساخت بلاک‌های مکانی نیز می‌باشد.

کتابخانه blockCV مدل‌سازان را قادر می‌سازد که دامنه وسیع‌تری از روش‌های اعتبارسنجی را بکار گیرند و همچنین درباره تاثیرات روش‌های مختلف اعتبارسنجی بر درک بهتر ما از قدرت پیشبینی مدل‌های توزیع گونه‌ها کمک می‌کند.

1 INTRODUCTION

Species distribution modelling (SDM) is a popular tool in Ecology, in part because it is able to produce spatial predictions of species distributions. An important component of the modelling process is evaluation of the output (Guisan, Thuiller, & Zimmermann, 2017). Is it fit for purpose? Are some models more suited to the nominated end‐use than others (Guillera‐Arroita et al., 2015)? Here we introduce a software package relevant to these questions.

When evaluating SDM performance, it is common to test how well predictions match observations at a set of locations (Franklin, 2010; Guisan et al., 2017; Peterson et al., 2011). Whilst early applications of SDM tended to focus on statistical measures of model fit on the data used to train the model, attention has gradually shifted towards testing on independent data (Elith & Leathwick, 2009). Since fully independent data (such as new surveys) are rarely available, a common approach involves sub‐sampling the data available for modelling. In ecology, this usually involves splitting the data into subsets for training and testing (also known as calibration and validation, respectively) (Franklin, 2010; Radosavljevic & Anderson, 2014). Training data are used for fitting the model and testing data for evaluating the performance. This is termed model validation (Hastie, Tibshirani, & Friedman, 2009), with variants based on splitting and testing strategies. For instance, one or more splits can be made, and testing can be done once or repeated in some way (e.g. k‐fold cross‐validation) (Box 1).

In recent years, some discussion has focussed on the best way to allocate data to training and testing datasets (e.g. Bahn & McGill, 2012; Radosavljevic & Anderson, 2014; Wenger & Olden, 2012). Should allocation be random, or organised in some way (hereafter “non‐random”)? This cannot be answered without knowledge of the data and the modelling task. Ecological data are often autocorrelated—i.e. observations close to each other (in space or time) are more similar than distant ones (Legendre, 1993). In species distribution modelling this is true of the response (the species data) and the predictor variables, and might result from biotic or abiotic processes. Spatially‐separated training and testing datasets can help test whether the model performs as well in nearby locations as it does in more distant places (Telford & Birks, 2009). If it does not, structure might not be properly accounted for in the model or the model might be over‐fitted to the training data (Dormann et al., 2007; Roberts et al., 2017). Several applications of SDMs require extrapolation to new times, and potentially new environments (Elith & Leathwick, 2009). Environmentally ‐separated training and testing sets allow evaluation of whether models can extrapolate usefully (Roberts et al., 2017). The package we introduce here, written in the r programming language (R Development Core Team, 2017) provides useful tools for allocating data, in non‐random ways, to training and testing folds.

2 THE blockCV r PACKAGE: OVERVIEW

Other software packages do exist for creating non‐random datasets for cross‐validation—for instance, r packages sperrorest (Brenning, 2012) and ENMeval (Muscarella et al., 2014) and the Python‐based ArcGIS SDMtoolbox (Brown, Bennett, & French, 2017). These have various relevant features, but as users of SDMs we found them limited in their applicability to distribution modelling across typical nuances of species data and analytical aims. Specifically, there were limited options for making splits, for allocating data to training and testing datasets, for dealing with both presence‐–absence and presence‐background datasets, and for finding solutions where species are patchily distributed. Package blockCV aims to fill that gap.

In a nutshell, package blockCV provides functions to build training and testing datasets using three general strategies, described in detail below: spatial blocks, buffers, and environmental blocks. It offers several options to construct those blocks and to allocate them to cross‐validation (cv) folds (see definitions, Box 1). It includes a function that applies geostatistical techniques to investigate the level of spatial autocorrelation in the chosen predictor variables, to inform the choice of block and buffer size. In addition, visualiszation tools further aid selection of block size and provide understanding of the spread of species data across generated folds. The package has been written with species distribution modelling in mind, though we have kept the output general enough that it is likely to be useful more generally. The functions allow for a number of common usage scenarios, including presence–absence and presence‐background species data (Box 1), rare and common species, and raster data for predictor variables.

The generated output is stored in lists and vectors that identify allocation of locations to training or testing data and, where relevant, to cross‐validation folds. These can be directly input to any species modelling workflow in r, and formats for the widely‐used species distribution modelling package biomod2 (Thuiller, Georges, Engler, & Breiner, 2017) are also included. With the package, we provide a vignette and example data to demonstrate use of these functions in modelling. The following sections describe the functionalities of the package in more detail.

BOX 1 Cross‐validation, blocks, folds and species data – definitions

Cross‐validation and folds

Cross‐validation (cv) is a technique for evaluating predictive models. It partitions the data into k parts (folds), using one part for testing and the remaining (− 1 folds) for model fitting. In k‐fold cv the process is iterated until all the folds have been used for testing. If k is equal to the number of records, it is called n‐fold, or leave‐one‐out cross‐validation (Hastie et al., 2009, p. 241). When the folds are not randomly chosen and some sort of environmental, temporal or spatial strategy is used to construct the folds, it is called block cross‐validation (Roberts et al., 2017).

Blocks and folds

Broadly speaking, blocks are groupings with similar characteristics. In our package they are units of geographical area (e.g. rectangles, spatial polygons and buffers of specific distance) or environmental clusters. Within these units, all species data are treated together—for instance, allocated to the same fold of cv. Several blocks could be allocated to one cv fold.

Species data

Presence‐only data are locations where a species was observed. These can be coupled with a large sample of points across the landscape, known as background data (Renner et al., 2015). Presence–absence data include both the locations where the species has been observed and where it has been searched for, but not found (Guillera‐Arroita et al., 2015). Following common convention, our code expects presence data to be represented by a 1 and absence or background by a 0.

3 INITIAL CONSIDERATIONS

Conceptually each block is a unit, within which all species data are treated in the same way. Several blocks could be allocated to one cv fold, so decisions need to be made about how to group the blocks into folds for cross‐validation (Box 1). The practical implementation of this “blocks into folds” step is affected by the species data, as described in the following sections. The package currently explicitly deals with the two main types of species response data (Box 1): presence‐only (with background points, where relevant) and presence–absence data. However all blocking strategies can be applied, with this package, to a broad range of data types, as discussed later in relation to the “species” argument.

The package assumes that species spatial data and predictor variables (rasters) have the same projection and similar extent—i.e. the predictors at least extend across all species data. By default, the package creates blocks according to the extent and shape of the study area, based on the predictors. Alternatively, the spatial blocks can be created based on the extent of the species spatial data. This is especially useful when the species data are not evenly dispersed across the whole region e.g. for rare species.

Spatial data sometimes have coordinate reference systems in degrees. Whilst the most satisfactory approach is for the user to first convert these to projected reference systems (e.g. UTMs), the package provides alternatives that handle data in degrees. For instance, those blocking strategies relying on distance (size of the blocks or buffers) use an (optional) scaling factor to convert degree to metres. Computations (described later) regarding spatial autocorrelation in predictor raster data with geographic reference system use the great circle distance (the shortest distance between two points on the surface of a sphere; Longley, Goodchild, Maguire, & Rhind, 2015, p. 305).

4 BLOCKING STRATEGIES

blockCV supports three strategies to separate training and testing data: spatial blocking, spatial buffering, and environmental blocking, each explained in detail below. We first explain the strategies, and then discuss approaches for choosing block or radius size.

4.1 Spatial blocking

A general strategy to account for spatial autocorrelation when evaluating models is to split data spatially into blocks. Package blockCV provides several methods to create spatial blocks (Figure 1). One is to build square spatial blocks of a specified size (i.e. width) (Figure 1a). The spatial position of these blocks can be shifted vertically or horizontally, allowing assessment of the sensitivity of model evaluation metrics to specific block arrangements. The package also allows division of the study area into vertical or horizontal bins of a given height or width (rectangle blocks, Figure 1b,c), as used by Wenger and Olden (2012) and Bahn and McGill (2012) respectively. Finally, the blocks can be specified by a user‐defined spatial polygon layer (Figure 1d).

image
Illustration of different spatial blocking strategies. In all panels (a‐d), blocks are outlined in red. The numbers in the blocks are fold numbers, showing allocation of blocks to folds

The package allows allocation of blocks to folds (Box 1) randomly, systematically or in a checkerboard pattern. Blocks‐to‐folds is one of the key steps for species modelling because species data are rarely evenly dispersed over landscapes. When random selection of folds is chosen, constraints can be set to avoid folds with little or no presence or (where relevant) absence data. Techniques are also implemented for finding block‐to‐fold allocations that achieve most even spread of species data across folds (e.g., a similar number of presence and absence records in each fold). In systematic allocation, blocks are numbered and assigned to folds sequentially. The number of folds can be specified by the user in the systematic and random allocations and it can be equal or less than the number of blocks. The checkerboard pattern has only two folds, but is useful for enforcing no adjacent blocks in a fold. As explained later, interactive tools are provided for tabulating and visualising the placement of folds and distribution of species data over folds. We note that in all the spatial blocking scenarios, all data in the test folds (including background points, if relevant) are excluded from the training datasets (e.g., Figure 2).

image
Spatial blocks (left) with systematic fold assignment. The middle and right figures show the training and testing sets in fold‐4 as an example. As can be seen, no data in the testing blocks are available as training data

4.2 Buffering

The buffering strategy generates spatially separated training and testing folds by considering circular buffers of specified radius around each observation point (Le Rest, Pinaud, Monestiez, Chadoeuf, & Bretagnolle, 2014). This approach is related to leave‐one‐out cross‐validation (Box 1), and can be desirable if a user wants to ensure that no test data abuts training data. In this case there is no need to distinguish between blocks (buffered points) and folds, because each left‐out point is equivalent to a fold.

In the following description, we envisage the method from the viewpoint of the test data. The approach varies slightly with the type of species data available (specified by spDataType argument).

For presence–absence data, folds are created based on all records, both presences and absences. Each target observation (presence or absence) forms a test point that can be considered the centre of the buffer; all other presence and absence points within the buffer are ignored. The training set comprises all presence and absence points outside the buffer.

When working with presence‐background data, test folds are determined by the presence data. A buffer is defined around a central target presence point using the specified radius (range). By default, the testing fold comprises the target presence point and all background points within the buffer (Figure 3). Note that the background points within in the buffer are NOT available for training the model. Since some modellers may wish to deal only with presence points, there is an option (addBG = FALSE) for working without background data. Any non‐target presence points inside the buffer are excluded from both the training and testing sets. All points (presence and background) outside the buffer are used as the training set. The method cycles through all the presence data, each time allocating one point for testing, so the number of folds is equal to the number of presence points in the dataset.

image
The buffering fold with the target point (presence) and background points in the testing fold

A “species” argument is used to handle different types of species response data. The default setting (“NULL”) means each row of data (each site) is treated in the same way—this is the relevant setting for presence‐only data (with no absence or background points) or other types of response data (e.g. multi‐class points for classifying remotely sensed imagery, or a continuous response such as tree height). However, for presence–absence or presence‐background data it should be used to direct the code to the column of data containing the 0s and 1s (see Box 1).

image

4.3 Environmental blocking

This strategy uses clustering methods (k‐means; Hartigan & Wong, 1979) to specify sets of similar environmental conditions based on the input covariates and the chosen number of clusters in (possibly multivariate) environmental space. Species data within each cluster are assigned to a fold, so the number of folds is by default equal to the number of clusters chosen by the user. This algorithm only makes sense with continuous raster data (i.e. categorical covariates like vegetation classes should be excluded).

The clustering can be based on all raster cells or only on values at the species points. When based on the complete rasters, the clusters will be consistent throughout the region and across all species being considered in that region. However, this does not guarantee that all clusters contain species records, especially when species data are not dispersed across all environments. So, the resulting folds in practice might be fewer than the specified k. Alternatively, the clustering can be done based only on the values of the predictors at the species presence and absence (or background) points. In this case, the number of folds will always be k. Although the clustering is repeated several times to achieve the most homogenous clusters, the resulting clusters might change in different runs. It is therefore worth setting a random seed in r before using this function, to achieve reproducibility.

5 CHOOSING BLOCK SIZE

One of the challenges of using spatial blocks or buffered cross‐validation is choosing the optimal size of blocks or buffers (Trachsel & Telford, 2016). The spatial autocorrelation range in model residuals has been used to define the optimal separation distance between training and testing sets (Roberts et al., 2017; Telford & Birks, 2009; Trachsel & Telford, 2016). This is the range over which residuals are approximately independent, and can be characterized using the empirical variogram, a fundamental geostatistical tool for measuring spatial autocorrelation. The empirical variogram describes the structure of spatial autocorrelation by measuring variability between all pairs of points (O'Sullivan & Unwin, 2010), and the range could be used as block size. It is easy to define residuals for presence‐–absence data (e.g. Bio, De Becker, De Bie, Huybrechts, & Wassen, 2002; Dunn & Smyth, 1996) but less obvious for presence‐only data (Baddeley, Turner, Møller, & Hazelton, 2005).

Having the residuals, however, implies having fitted the model already, and this ties assessment to the fitted model. We might instead prefer to apply block CV methods prior to model fitting, raising the question of how to define autocorrelation distances and block size. One option suggested for presence‐–absence data is to fit a variogram to the raw species data (Roberts et al., 2017 and see Bio et al., 2002) and use the resulting distance from the analysis as block size. Alternatively, to support a first choice of block size, prior to any model fitting, package blockCV allows the user to look at the existing autocorrelation in the predictors, as an indication of landscape spatial structure (Figure 4). The function works by automatically fitting variogram models to each continuous raster and finding the effective range of spatial autocorrelation. A number of random points (5,000 by default) is are taken from each input raster and parallel processing is used to speed up the computation. For the sake of simplicity, we used isotropic variogram (non‐directional) and assumed that the data met the necessary geostatistical criteria e.g. stationarity (having constant variance). The variogram fitting procedure uses the automap package (Hiemstra, Pebesma, Twenhöfel, & Heuvelink, 2009). Output plots show the spatial autocorrelation ranges of input raster covariates and the spatial block that has been created based on median of these ranges (Figure 4). After model fitting, the user may want to reassess the appropriateness of the chosen block size by looking at the model residuals.

image
Output graph from the package blockCV showing: (a) spatial autocorrelation ranges in input covariates, and (b) corresponding spatial blocks (the selected block size is based on median spatial autocorrelation range across all input data)

6 INTERACTIVE VISUALISATION TOOLS

Package blockCV provides two visualisation tools, developed as local web applications using r‐package shiny (Chang, Cheng, Allaire, Xie, & McPherson, 2017). Through a user interface, these tools (a) allow for graphical exploration of the generated folds (the foldExplorer tool) and (b) assist the selection of a suitable spatial block size (the rangeExplorer tool). The foldExplorer tool displays a map where folds are overlaid, and provides a summary of the number of records in each fold. This helps the user to assess the adequacy of the distribution of training and testing folds throughout the study area. The tool is available for all three blocking strategies; in the case of the environmental blocks, it visualises the blocks in geographic space. The rangeExplorer tool (Figure 5) allows the user to interactively change the size of spatial blocks, visualise the blocks and assess the impact of block size on the number and arrangement of blocks in the landscape (and optionally on the distribution of species data in those blocks).

image
The graphical interface for choosing a suitable spatial block size

7 FINAL REMARKS/CONCLUSION

The r package blockCV offers a suite of tools for creating data folds via blocking for evaluation of species distribution models, with three different blocking strategies, and tools that help users deal with typical nuances in species data, and for choosing block size and allocating blocks to folds. Which strategy is best to follow in a particular situation will depend on the purpose of the modelling and the region's environmental structure (Roberts et al., 2017). For instance, both environmental blocks and horizontal spatial blocks could create distinct climatic groups in training and testing datasets, which can be useful for assessing models aiming to predict to new climatic conditions.

We recommend that users become familiar with the literature on block cross‐validation, some of which we have cited in this article, so they can make appropriate choices. Different choices will have different implications on estimates of predictive performance. For instance, buffering is considered useful for enforcing spatial separation between training and testing folds (Le Rest et al., 2014; Pohjankukka, Pahikkala, Nevalainen, & Heikkonen, 2017), but—depending on the relative sizes of the buffer and the region—it may produce training sets across repeats more similar than those produced by other blocking strategies. A disadvantage of similar training sets is that the error estimator tends to have high variance—i.e. it is sensitive to the dataset being used to estimate the error (Hastie et al., 2009, p. 242). Buffering also enforces as many training sets as presence or presence–absence points, so it can be quite computationally expensive. Other blocking strategies and choices of how to allocate blocks to folds will also have flow‐on effects for estimates of performance, so it is important to think through what is most appropriate.

The package provides example data and a vignette with worked examples so users can explore the functions and learn how to use them in a species modelling workflow. The package will be actively maintained, and new features introduced as needs arise. We hope that this package enables modellers to more easily implement a range of evaluation approaches, so the modelling community learns more about the impacts of evaluation set‐up on our understanding of predictive performance of SDMs.

ACKNOWLEDGEMENTS

R.V. is supported by an Australian Government Research Training Program Scholarship and a Rowden White Scholarship; G.G.‐A. by an Australian Research Council (ARC) Discovery Early Career Researcher Award (DE160100904), and J.J.L.‐M. and J.E. by ARC Discovery Project 160101003. J.E. appreciates the support of the ARC's Centre of Excellence for Environmental Decisions (CE11001000104). We also thank Babak Mirbagheri, Nick Golding, and reviewers and editors: Robert Anderson, David Roberts, an anonymous reviewer and Associate Editor, for their helpful suggestions and advice.

    AUTHORS’ CONTRIBUTIONS

    R.V. conceived the idea, wrote the code, drafted all documentation, and led the writing of the manuscript. All authors contributed to the design and testing of the package, and contributed critically to the manuscript drafts.

    DATA ACCESSIBILITY

    No data were included in this article. Version 1.0 of the blockCV r package is archived at https://doi.org/10.5281/zenodo.1453551.

      Number of times cited according to CrossRef: 30

      • Mapping the geogenic radon potential for Germany by machine learning, Science of The Total Environment, 10.1016/j.scitotenv.2020.142291, 754, (142291), (2021).
      • Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models, Ecography, 10.1111/ecog.04890, 43, 4, (549-558), (2020).
      • Vanilla distribution modeling for conservation and sustainable cultivation in a joint land sparing/sharing concept, Ecosphere, 10.1002/ecs2.3056, 11, 3, (2020).
      • Mapping tree species vulnerability to multiple threats as a guide to restoration and conservation of tropical dry forests, Global Change Biology, 10.1111/gcb.15028, 26, 6, (3552-3568), (2020).
      • A standard protocol for reporting species distribution models, Ecography, 10.1111/ecog.04960, 43, 9, (1261-1277), (2020).
      • Spatial validation reveals poor predictive performance of large-scale ecological mapping models, Nature Communications, 10.1038/s41467-020-18321-y, 11, 1, (2020).
      • Transgressing Wallace's Line brings hyperdiverse weevils down to earth, Ecography, 10.1111/ecog.05128, 43, 9, (1329-1340), (2020).
      • Landscape scale variation in the hydrologic niche of California coast redwood, Ecography, 10.1111/ecog.05080, 43, 9, (1305-1315), (2020).
      • Historical demography and climate driven distributional changes in a widespread Neotropical freshwater species with high economic importance, Ecography, 10.1111/ecog.04874, 43, 9, (1291-1304), (2020).
      • Intertwined effects of defaunation, increased tree mortality and density compensation on seed dispersal, Ecography, 10.1111/ecog.05047, 43, 9, (1352-1363), (2020).
      • Novel approach to the analysis of spatially-varying treatment effects in on-farm experiments, Field Crops Research, 10.1016/j.fcr.2020.107783, (107783), (2020).
      • Unraveling the habitat preferences of two closely related bumble bee species in Eastern Europe, Ecology and Evolution, 10.1002/ece3.6232, 10, 11, (4773-4790), (2020).
      • Population genetic variability and distribution of the endangered Greek endemic Cicer graecum under climate change scenarios, AoB PLANTS, 10.1093/aobpla/plaa007, 12, 2, (2020).
      • Refining benchmarks for soil organic carbon in Australia’s temperate forests, Geoderma, 10.1016/j.geoderma.2020.114246, 368, (114246), (2020).
      • Including indigenous knowledge in species distribution modeling for increased ecological insights, Conservation Biology, 10.1111/cobi.13373, 0, 0, (2020).
      • Common mistakes in ecological niche models, International Journal of Geographical Information Science, 10.1080/13658816.2020.1798968, (1-14), (2020).
      • Assessing geographic and climatic variables to predict the potential distribution of the visceral leishmaniasis vector Lutzomyia longipalpis in the state of Espírito Santo, Brazil, PLOS ONE, 10.1371/journal.pone.0238198, 15, 9, (e0238198), (2020).
      • Forecasting financial vulnerability in the USA: A factor model approach, Journal of Forecasting, 10.1002/for.2724, 0, 0, (2020).
      • Good Practices for Species Distribution Modeling of Deep-Sea Corals and Sponges for Resource Management: Data Collection, Analysis, Validation, and Communication, Frontiers in Marine Science, 10.3389/fmars.2020.00303, 7, (2020).
      • Harmonized Landsat 8 and Sentinel-2 Time Series Data to Detect Irrigated Areas: An Application in Southern Italy, Remote Sensing, 10.3390/rs12081275, 12, 8, (1275), (2020).
      • Improving the Spatial Prediction of Soil Organic Carbon Content in Two Contrasting Climatic Regions by Stacking Machine Learning Models and Rescanning Covariate Space, Remote Sensing, 10.3390/rs12071095, 12, 7, (1095), (2020).
      • Plant Diversity Patterns and Conservation Implications under Climate-Change Scenarios in the Mediterranean: The Case of Crete (Aegean, Greece), Diversity, 10.3390/d12070270, 12, 7, (270), (2020).
      • Ignoring biotic interactions overestimates climate change effects: The potential response of the spotted nutcracker to changes in climate and resource plants, Journal of Biogeography, 10.1111/jbi.13699, 47, 1, (143-154), (2019).
      • undefined, 2019 Third World Conference on Smart Trends in Systems Security and Sustainablity (WorldS4), 10.1109/WorldS4.2019.8903921, (275-279), (2019).
      • Application of Machine Learning to Model Wetland Inundation Patterns Across a Large Semiarid Floodplain, Water Resources Research, 10.1029/2019WR024884, 55, 11, (8765-8778), (2019).
      • Efficient Modelling of Presence-Only Species Data via Local Background Sampling, Journal of Agricultural, Biological and Environmental Statistics, 10.1007/s13253-019-00380-4, (2019).
      • A Spatially Explicit Comparison of Quantitative and Categorical Modelling Approaches for Mapping Seabed Sediments Using Random Forest, Geosciences, 10.3390/geosciences9060254, 9, 6, (254), (2019).
      • Interpolation of Instantaneous Air Temperature Using Geographical and MODIS Derived Variables with Machine Learning Techniques, ISPRS International Journal of Geo-Information, 10.3390/ijgi8090382, 8, 9, (382), (2019).
      • Predictive Ecosystem Mapping of South-Eastern Australian Temperate Forests Using Lidar-Derived Structural Profiles and Species Distribution Models, Remote Sensing, 10.3390/rs11010093, 11, 1, (93), (2019).
      • SDMtune: An R package to tune and evaluate species distribution models, Ecology and Evolution, 10.1002/ece3.6786, 0, 0, (undefined).