A scalable model of vegetation transitions using deep neural networks

Abstract In times of rapid global change, anticipating vegetation changes and assessing their impacts is of key relevance to managers and policy makers. Yet, predicting vegetation dynamics often suffers from an inherent scale mismatch, with abundant data and process understanding being available at a fine spatial grain, but the relevance for decision‐making is increasing with spatial extent. We present a novel approach for scaling vegetation dynamics (SVD), using deep learning to predict vegetation transitions. Vegetation is discretized into a large number (103–106) of potential states based on its structure, composition and functioning. Transition probabilities between states are estimated via a deep neural network (DNN) trained on observed or simulated vegetation transitions in combination with environmental variables. The impact of vegetation transitions on important ecological indicators is quantified by probabilistically linking attributes such as carbon storage and biodiversity to vegetation states. Here, we describe the SVD approach and present results of applying the framework in a meta‐modelling context. We trained a DNN using simulations of a process‐based forest landscape model for a complex mountain forest landscape under different climate scenarios. Subsequently, we evaluated the ability of SVD to project long‐term vegetation dynamics and the resulting changes in forest carbon storage and biodiversity. SVD captured spatial (e.g. elevational gradients) and temporal (e.g. species succession) patterns of vegetation dynamics well, and responded realistically to changing environmental conditions. In addition, we tested the computational efficiency of the approach, highlighting the utility of SVD for country‐ to continental scale applications. SVD is the—to our knowledge—first vegetation model harnessing deep neural networks. The approach has high predictive accuracy and is able to generalize well beyond training data. SVD was designed to run on widely available input data (e.g. vegetation states defined from remote sensing, gridded global climate datasets) and exceeds the computational performance of currently available highly optimized landscape models by three to four orders of magnitude. We conclude that SVD is a promising approach for combining detailed process knowledge on fine‐grained ecosystem processes with the increasingly available big ecological datasets for improved large‐scale projections of vegetation dynamics.


S1: A scalable model of vegetation transitions
SVD is a model for the simulation of vegetation dynamics across large spatial extents. The core component is a deep neural network (DNN) which predicts the transitions between vegetation states for each simulated cell (cell size currently set to 100m). The DNN itself can be trained with data from different sources, for instance from simulation modeling or remote sensing. Section S2 provides more details on the specific setup and training of the network used in this study. For the description in S1 we assume that a fully trained network is already available.
The SVD model is a standalone software that integrates the Deep Learning framework TensorFlow (Abadi et al. 2016) for DNN inference, i.e. the process of applying a trained model with new data. The core model is programmed in C++, which not only provides high flexibility and performance but also allows a full technical integration of the core libraries of TensorFlow into the framework. The model is designed with a particular focus on performance: it makes heavy use of parallel processing, facilitating the multicore architecture of modern CPUs and offloads the DNN inference to a graphical processing unit (GPU) whenever possible. In addition, the model is memory efficient, as the state of a simulated cell   Whenever a batch is full (or the landscape is fully traversed), the data is sent to the DNN.
Based on the predictions of the DNN, the core model schedules future updates (e.g., one cell might transition to a certain state in four years, while another cell might remain in the same state for the next ten years). After all cells have been processed, the scheduled changes for the current year are applied, and the residence time (R) of each cell is increased. Whenever the DNN model receives a batch of cells, it runs the neural network inference using TensorFlow, and returns a probability distribution over the future state S* and time-until-state-change The outputs of SVD include wall-to-wall maps for S and R for any given time step as well as results for ecosystem attributes derived from VAD for these states. Since the attributes data are probabilistic for each state, several options are available: If the central tendency is of particular interest the mean/ median of the attribute distribution can be assigned for each combination of S×R. Uncertainties can be assessed via the standard deviation or percentile ranges of the underlying distributions in the VAD.

S2: Deep neural network training Training data
This section describes the training of the DNN used in the application described in the main text. We used the individual based forest landscape and disturbance model iLand (Seidl et al. 2012) to generate training data for the DNN. We applied iLand to Kalkalpen National Park (KANP) in the Austrian Alps. The landscape with a size of 20,850 ha is located in the northern front range of the Alps (N47.47°, E14.22°) and ranges from 385 m to 1,963 m a.s.l.
It encompasses three of the most important forest types of Central Europe, that is, European beech (Fagus sylvatica (L.)) forests, Norway spruce (Picea abies (L.) Karst.) forests, and mixed forests of Norway spruce, silver fir (Abies alba (Mill.)), and European beech. More details to the study landscape can be found in Thom et al. (2016) and Thom, Rammer & Seidl (2016. In order to generate training data for the DNN, we simulated forest dynamics at KANP starting from the current landscape composition (2013) transition was simulated over a ten year period. The spatial context was calculated for both the local and the intermediate neighborhood ( Figure S1), and was defined as the average share of each species in these neighborhoods. In addition, we derived annual values for selected ecosystem attributes for each from iLand. The attributes used in this study were live tree carbon (C) and D, the exponent of the Shannon index for alpha-diversity (based on basal area shares of tree species on a given cell).
Training data for the DNN was generated from the raw simulation output of iLand (   Figure S1) SSC Two indicators for static site conditions: fertility rating (plant available nitrogen in kg/ha*yr -1 ) and soil depth (m).

CLIM10
Monthly mean values for temperature and precipitation for 10 years, with

Network structure
We used TensorFlow (Abadi et al. 2016) and the top-level Python library Keras (https://keras.io/) for defining the DNN architecture as well as for network training. The architecture of the DNN was a feed forward neural network with 1.33 Mio trainable parameters that integrated concepts from natural language processing ( Figure S2). The DNN merged different types of inputs (see Table S2 2008)). Following the notion that the response to a given climate forcing remains consistent over time, we used a "TimeDistributed" layer that applies the same weights for each year in the climate input data. To decrease the generalization error of the network, we included Dropout layers.

Figure S2. Structure of the trained DNN. FC=fully connected layer. Numbers in parenthesis indicate the number of neurons in each layer. State and climate inputs are processed separately and merged with all other inputs to a single layer (Concatenate layer). From this layer two separate branches for ΔR and S* lead to the two final Softmax classification layer.
We used categorical cross-entropy as the loss function for both output layers and calculated the total loss as a weighted sum (state: 0.66, time: 0.33). After evaluating different activation functions, we selected the Exponential Linear Unit (Clevert, Unterthiner & Hochreiter 2015) which showed slightly better performance than rectified linear units or self-normalizing linear units (Klambauer et al. 2017).
We use the ADAM optimizer (Kingma & Ba 2014) and a simple scheme to reduce the learning rate after three consecutive epochs without progress by a factor 0f 0.5 from 0.001 to 0.00001. The training of the final network took approximately three hours on a single workstation (Intel i5-6600 CPU, Nvidia GTX-1070 GPU). The final network structure and parameters were stored for later use in SVD applications.

Results of the training and experiments
The DNN was able to predict vegetation transitions accurately over a wide range of climatic (e.g., mean annual temperatures from 3.5 to 13° Celsius) and edaphic conditions. The achieved classification accuracy (categorical cross entropy) was 0.86 for both the future state and time until state transition. More insightful than the raw accuracy information is top-K accuracy: This is the fraction of examples for which the network predicted either the correct label ("Correct") or the correct label was within the top K predicted classes (

Effect of spatial context
In the underlying process based model iLand the simulated transitions depend not only on the vegetation state and the environmental conditions experienced on a given cell, but are also influenced by neighboring cells, e.g. due to an influx of seeds. SVD incorporates spatial context information: For example, the DNN might learn that a transition to a state with higher oak share is more likely, if oak is already present in neighboring cells. We tested the relevance of spatial context by training DNN variants with and without spatial context information, and compared their performance regarding prediction accuracy metrics. The version with spatial context used all available information as described in the previous section (i.e., the species distribution in the local and intermediate neighborhood ( Figure S1) as well as the distance to the nearest seed source outside of the boundary of the simulated area), while the version without spatial context lacked both types of information. Figure S4 shows that the DNN was able to extract meaningful information from the spatial context information, and that including this data improved the DNN predictions. Figure S4. DNN training performance for models with and without spatial context information. A: total loss on the validation data set (lower is better), B: accuracy relative to the validation data set for the classification of the future state (higher is better). Classification performance improved when spatial context information was available during training.
We also analyzed whether disregarding spatial context information translates into different vegetation patterns in the dynamic simulations with SVD. Figure S5 indicates that while the broad spatial patterns (e.g., areas dominated by beech or Norway spruce) persisted in both variants, the results without spatial context showed much higher local variation/ noise.

Simulating vegetation transitions
The following figures compare the simulated species composition in SVD with the results of the process-based model iLand for all four climate scenarios considered.