biodivMapR: An r package for α‐ and β‐diversity mapping using remotely sensed images

The accelerated erosion of biodiversity is a critical environmental challenge. Operational methods for the monitoring of biodiversity taking advantage of remotely sensed data are needed in order to provide information to ecologists and decision‐makers. We present an R package designed to compute a selection of α‐ and β‐diversity indicators from optical imagery, based on spectral variation hypothesis. This package builds upon previous work on biodiversity mapping using airborne imaging spectroscopy, and has been adapted in order to process broader range of data sources, including Sentinel‐2 satellite images. biodivMapR is able to produce α‐diversity maps including Shannon and Simpson indices, as well as β‐diversity maps derived from Bray–Curtis dissimilarity. It is able to process large images efficiently with moderate computational requirements on a personal computer. Additional functions allow computing diversity indicators directly from field plots defined as polygon shapefiles for easy comparison with ground data and validation. The package biodivMapR should contribute to improved standards for biodiversity mapping using remotely sensed data. It should also contribute to the identification of relevant Remotely Sensed enabled Essential Biodiversity Variables.

ground in tropical environments. Féret and Asner (2014) developed a method meeting these requirements and validated its performances for tropical forest diversity mapping based on very high spatial resolution airborne imaging spectroscopy. Sentinel-2 data now provides global coverage of Earth surface with a revisit time of five days and high spatial resolution multispectral imagery. The package biodi-vMapR is an adaptation of the method proposed by Féret and Asner (2014) to Sentinel-2 data, and aims at providing operational tools to ecologists and remote-sensing communities in order to accelerate and ease their capacity to assess existing methodologies based on abundant remotely sensed data. Here biodivMapR provides a limited set of functions to process large datasets, from high dimensionality imaging spectroscopy acquired by airborne sensors or satellites, to high spatial coverage multispectral satellites such as Sentinel-2 images. The method is based on the spectral variation hypothesis (Palmer, Earls, Hoagland, White, & Wohlgemuth, 2002;Rocchini et al., 2010;Rocchini, Chiarucci, & Loiselle, 2004), and takes advantage of high spatial resolution multispectral information to differentiate species or groups of species based on the optical traits corresponding to the reflectance of each pixel (Homolová, Malenovský, Clevers, García-Santos, & Schaepman, 2013;Ustin & Gamon, 2010). However, it may be applicable for other types of data than optical data, as long as the hypothesis of spatial heterogeneity is relevant to estimate specific biodiversity components. The package includes a sample of Sentinel-2 image, and a script illustrating the workflow of the method: the optical image is transformed with principal component analysis (PCA), and a subset of selected components is used to produce spectral species maps based on k-means clustering. α-and β-diversity metrics including Shannon index and Bray-Curtis dissimilarity are then computed from this spectral species map at the spatial resolution of a window size defined by user. Some functions are also provided in order to perform direct validation if ground-truth is available.

| DE SCRIP TI ON OF THE PACK AG E
The workflow of the core processing of the package is presented in Figure 1 and the full pseudo code is proposed in Appendix S1.
Preprocessing functions are available in the package and are introduced in the next section. Front-end user interface is limited to core functions, in order to apply this workflow as straightforwardly as possible, and parameterize the different steps of the process flawlessly, even with limited expertise in image processing. Back-end functions are documented within the code, making it easier for a more experienced person to develop upon biodivMapR. We strongly recommend using bottom of atmosphere reflectance instead of top of atmosphere reflectance or radiance.

| Data checking and conversion
The current version of the package requires ENVI HDR format with band-order set to BIL ('Bands Interleaved by Line') for all raster data.
This decision was initially motivated by computational efficiency, but next versions of the package will accept more raster formats.
Please notice that the current version of the code expects spectral bands to be defined in the field 'wavelength' of the header file associated with the binary image. The raster format (image and mask) is checked internally when running the functions perform _ radiometric _ filtering and perform _ PCA. A message is return, stating if the image format is as expected, or if it should be modified.
If the image format should be modified, the process will automatically stop, in order to let the user convert the image. In this case, the function raster2BIL allows conversion to the expected format.
However, raster2BIL is based on the raster package (Hijmans, 2018) and converts any GDAL format into ENVI HDR format with BIL interleave. It also includes additional options such as conversion F I G U R E 1 Flowchart of the processing performed in biodivMapR to integers to reduce file volume, and definition of the sensor for automated header formatting.

| Radiometric filtering
The main purpose of the package is biodiversity mapping of vegetated areas, with a focus on forest monitoring, and more specifically tropical forests. Hence, preliminary filtering of irrelevant pixels is crucial to get optimal results from the method. Irrelevant pixels include non-vegetated areas, shaded areas, and cloudy areas. The function perform _ radiometric _ filtering performs a preliminary filtering in order to mask such pixels. This filter applies basic radiometric thresholding, and users may need to define a complementary mask as initial input if the filtering functions available in the package are not sufficient or relevant. The thresholding is only relevant when using optical multi or hyperspectral images, and is performed as follows: -Non-vegetated/Dry vegetation pixels: they are identified based on a low thresholding of the Normalized Difference Vegetation Index.
-Shaded areas: these are characterized by low overall reflectance. Pixels characterized by a near infrared (NIR) reflectance inferior to a given threshold are then masked. In temperate and tropical forests, a low threshold of 15%-20% reflectance in the NIR domain (800-1100 nm) usually results in correct detection of shaded areas. However, this does not hold with coniferous forests.
-Cloudy pixels: residuals from atmospheric corrections may lead to increased reflectance in the blue domain. Therefore, a high thresholding can be performed on the blue channel, when available.
The blue, red and NIR spectral bands as well as the corresponding thresholds can be adjusted by user. The updated mask is then saved and used during the following steps of the process. This radiometric filtering should not be performed when the input image does not include blue, red, or NIR spectral bands. The above-mentioned criteria may also be irrelevant if both dry and green vegetation should be monitored. Users are invited to judge on the relevance of these filters on their own situation.

| Computation of the PCA
A series of pre-processing steps, including atmospheric water band removal and continuum removal (CR) of the reflectance data and a PCA are applied on the image, by calling the function perform _ PCA.
Continuum removal is applied on high spatial resolution imaging spectroscopy in order to reduce within-crown variability caused by brightness variations, and effects of changing illumination conditions between flightlines. It follows the hypothesis that these effects influence canopy reflectance homogeneously over the full spectrum, and can be assimilated to the application of a multiplicative factor to the canopy reflectance. This CR is computed by fitting a convex hull over the top of the reflectance spectrum, and defining it as a baseline continuum. This continuum is then removed by dividing it into the reflectance spectrum for each pixel in the image. The importance of CR may not be as strong when the spatial resolution is the same order as or larger than the dimension of tree crowns, as

| Component selection
Principal

| Spectral species mapping
Spectral species mapping is based on k-means clustering of the components selected from the PCA. First, a subset of pixel is extracted from PCA, equivalent to the number of pixels randomly selected when computing PCA. This pool of pixels is then split into nb _ partitions partitions, and k-means clustering is performed on each partition, in order to define nbClusters clusters. The clustering is then applied to the whole PCA image in order to assign a cluster to each pixel based on closest centroid, for each repetition.
Therefore, nb _ partitions cluster maps result from these independent clusterings. Each cluster map is then processed until the computation of α-and β-diversity metrics, and these diversity metrics are finally averaged. The computation of these diversity metrics is explained in the next section. This operation can be considered as analogous to signal averaging, increasing signal to noise ratio by replicating measurements. The function map _ spectral _ species performs this clustering and writes the resulting raster file identifying the cluster corresponding to each pixel and each partition: the raster is named SpectralSpecies, with the same dimensions as the original image and nb _ partitions bands, each band corresponding to a map of centroids for a specific partition.

| α-Diversity mapping
The diversity maps are computed from the SpectralSpecies raster. The α-diversity maps are produced with the function map _ alpha _ div, and correspond to the computation of an α-diversity metric based on the distribution of clusters for a given window size over the whole image. This window size is expressed in pixels (from the original image), and defined by the input variable window _ size. Therefore, the actual surface of the windows used to compute the diversity metrics needs to be adjusted depending on i) the spatial resolution of the image being processed, and ii) the type of ecosystem and spatial scale of the diversity patterns expected. A minimum number of pixels are required to define this distribution: a low number of pixels lead to lower range of variation, and high sensitivity to noise. window _ size also needs to be compatible with the surface usually defined in the field to perform species inventories, and needs to include a relevant number of individuals. Surfaces between 0.5 ha and 4 ha are usually compatible with field observations, and correspond to 50-400 pixels of 10 m resolution products from Sentinel-2. window _ size is then usually defined in order to fall within this range. If data with coarser spatial resolution are processed (LandSat: 30m, MODIS: 500m), the hypotheses behind the method (pixel size inferior or equivalent to individuals) may not be respected.
biodivMapR 1.0 includes two α-diversity indicators: Shannon and Simpson index. More indices will be available in the next versions of the package.

| β-Diversity mapping
β-Diversity maps are derived from the pairwise Bray-Curtis dissimilarity (BC) matrix obtained from the spectral species map. In the initial version of the method, the BC matrix was then processed with an ordination method in order to project the n × n dimensional space of the dissimilarity matrix into an n × 3 dimensional space. However, such procedure rapidly becomes unmanageable when processing large datasets. In order to alleviate computational requirements, β-diversity maps are currently produced in two steps. First, ordination aiming at defining the projected 3D space from a dissimilarity matrix is performed on a limited subset of windows. Then, the dissimilarity between each window in the image and the subset of windows used in ordination is computed.
Finally, the coordinates of each window in the resulting 3D space are defined based on the coordinates of the three most similar windows used during ordination, using inverse weighted distance.
Non-metric multidimensional scaling (NMDS) was used as ordination method used by Féret and Asner (2014). However, NMDS is also particularly computer intensive compared to eigenanalyses such as principal coordinate analysis (PCoA) (Legendre, 1998).
Both NMDS and PCoA are available in biodivMapR. More dissimilarity metrics will be implemented in future versions of the package.

| Computation of α-and β-diversity for a set of plots
Finally, the function diversity _ from _ plots computes diversity indices over each spatial polygon of a shapefile of plots, if available, in order to compare field inventories with diversity indices estimated from remotely-sensed images. The biodiversity indices are then directly computed from the SpectralSpecies raster, in order to keep the exact contour of the plot in the native spatial resolution of the image. The resulting diversity indices (α-diversity indices and BC matrix) are then saved in csv format in a dedicated 'VALIDATION' subdirectory.
F I G U R E 2 Example of Sentinel-2 image subset acquired over tropical forest in Cameroon and field plots with corresponding vegetation type F I G U R E 3 α-Diversity (left) and βdiversity (right) maps produced with biodivMapR from the Sentinel-2 image subset displayed in Figure 2

| E X AMPLE
The package includes a 10 × 10 km Sentinel-2 subset with associated plot network shapefiles. The plots were defined based on visual interpretation of the image, therefore values for α-and β-diversity metrics are not provided. Samples were selected over very diverse forest, moderately diverse forest, monodominant forest, and degraded forest/low vegetation close to the main roads ( Figure 2). The script used to process the data is provided in Appendix S2 of SI.  -PCoA#1 allows differentiating medium and high diversity forests from low diversity forest and low vegetation, but does not discriminate medium and high diversity forests.
-PCoA#2 allows differentiating low diversity forest from medium/ high diversity forests and low vegetation -PCoA#3 allows differentiating medium diversity forests from high diversity forests and low vegetation.

ACK N OWLED G M ENTS
We acknowledge financial support from Agence Nationale de la