Volume 9, Issue 8 p. 1810-1821
IMPROVING BIODIVERSITY MONITORING USING SATELLITE REMOTE SENSING
Open Access

Integration of satellite remote sensing data in ecosystem modelling at local scales: Practices and trends

Damiano Pasetto

Corresponding Author

Damiano Pasetto

Laboratory of Ecohydrology, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

Correspondence

Damiano Pasetto

Email: [email protected]

Search for more papers by this author
Salvador Arenas-Castro

Salvador Arenas-Castro

CIBIO/InBIO, Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal

Search for more papers by this author
Javier Bustamante

Javier Bustamante

Estación Biológica de Doñana, CSIC, Sevilla, Spain

Search for more papers by this author
Renato Casagrandi

Renato Casagrandi

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy

Search for more papers by this author
Nektarios Chrysoulakis

Nektarios Chrysoulakis

Institute of Applied and Computational Mathematics, Foundation for Research and Technology Hellas, Heraklion, Greece

Search for more papers by this author
Anna F. Cord

Anna F. Cord

Department of Computational Landscape Ecology, UFZ – Helmholtz Centre for Environmental Research, Leipzig, Germany

Search for more papers by this author
Andreas Dittrich

Andreas Dittrich

Department of Computational Landscape Ecology, UFZ – Helmholtz Centre for Environmental Research, Leipzig, Germany

Search for more papers by this author
Cristina Domingo-Marimon

Cristina Domingo-Marimon

Grumets Research Group, CREAF, Universitat Autònoma de Barcelona, Bellaterra, Spain

Search for more papers by this author
Ghada El Serafy

Ghada El Serafy

Deltares, Delft, The Netherlands

Department of Applied Mathematics, Delft University of Technology, Delft, The Netherlands

Search for more papers by this author
Arnon Karnieli

Arnon Karnieli

Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Beersheba, Israel

Search for more papers by this author
Georgios A. Kordelas

Georgios A. Kordelas

Information Technologies Institute, Centre for Research and Technology Hellas, Thermi, Greece

Search for more papers by this author
Ioannis Manakos

Ioannis Manakos

Information Technologies Institute, Centre for Research and Technology Hellas, Thermi, Greece

Search for more papers by this author
Lorenzo Mari

Lorenzo Mari

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy

Search for more papers by this author
Antonio Monteiro

Antonio Monteiro

CIBIO/InBIO, Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal

Search for more papers by this author
Elisa Palazzi

Elisa Palazzi

Institute of Atmospheric Sciences and Climate, National Research Council, Turin, Italy

Search for more papers by this author
Dimitris Poursanidis

Dimitris Poursanidis

Institute of Applied and Computational Mathematics, Foundation for Research and Technology Hellas, Heraklion, Greece

Search for more papers by this author
Andrea Rinaldo

Andrea Rinaldo

Laboratory of Ecohydrology, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

Department of Civil Environmental and Architectural Engineering, University of Padova, Padova, Italy

Search for more papers by this author
Silvia Terzago

Silvia Terzago

Institute of Atmospheric Sciences and Climate, National Research Council, Turin, Italy

Search for more papers by this author
Alex Ziemba

Alex Ziemba

Deltares, Delft, The Netherlands

Department of Applied Mathematics, Delft University of Technology, Delft, The Netherlands

Search for more papers by this author
Guy Ziv

Guy Ziv

School of Geography, Faculty of Environment, University of Leeds, Leeds, UK

Search for more papers by this author
First published: 06 August 2018
Citations: 42

Abstract

  1. Spatiotemporal ecological modelling of terrestrial ecosystems relies on climatological and biophysical Earth observations. Due to their increasing availability, global coverage, frequent acquisition and high spatial resolution, satellite remote sensing (SRS) products are frequently integrated to in situ data in the development of ecosystem models (EMs) quantifying the interaction among the vegetation component and the hydrological, energy and nutrient cycles. This review highlights the main advances achieved in the last decade in combining SRS data with EMs, with particular attention to the challenges modellers face for applications at local scales (e.g. small watersheds).
  2. We critically review the literature on progress made towards integration of SRS data into terrestrial EMs: (1) as input to define model drivers; (2) as reference to validate model results; and (3) as a tool to sequentially update the state variables, and to quantify and reduce model uncertainty.
  3. The number of applications provided in the literature shows that EMs may profit greatly from the inclusion of spatial parameters and forcings provided by vegetation and climatic-related SRS products. Limiting factors for the application of such models to local scales are: (1) mismatch between the resolution of SRS products and model grid; (2) unavailability of specific products in free and public online repositories; (3) temporal gaps in SRS data; and (4) quantification of model and measurement uncertainties. This review provides examples of possible solutions adopted in recent literature, with particular reference to the spatiotemporal scales of analysis and data accuracy. We propose that analysis methods such as stochastic downscaling techniques and multi-sensor/multi-platform fusion approaches are necessary to improve the quality of SRS data for local applications. Moreover, we suggest coupling models with data assimilation techniques to improve their forecast abilities.
  4. This review encourages the use of SRS data in EMs for local applications, and underlines the necessity for a closer collaboration among EM developers and remote sensing scientists. With more upcoming satellite missions, especially the Sentinel platforms, concerted efforts to further integrate SRS into modelling are in great demand and these types of applications will certainly proliferate.

1 INTRODUCTION

Anthropogenic and climate change pressures constitute serious threats to the integrity of the delicate ecosystems of several protected areas, such as National Parks, UNESCO World Heritage sites and Natura 2000 sites (Marris, 2011). Ecosystem models (EMs) help researchers to understand the dynamics of these terrestrial environments, and improve monitoring capabilities by filling spatiotemporal data gaps and predicting short- and long-term impacts of different management strategies. Mechanistic ecohydrological models that couple hydrological and vegetation processes (Chen, Wang, Ma, & Liu, 2015) can estimate features like forest productivity and growth (Huber et al., 2013), or evaluate the system water stress under different climate scenarios (Bhattarai, Wagle, Gowda, & Kakani, 2017). These EMs mainly differ in the complexity of the vegetation component and its interaction with the carbon, nutrient and water cycles. Research in this field has mainly focused on improving the physical description of physiological processes (Chen et al., 2015; Fatichi, Ivanov, & Caporali, 2012) to accurately quantify vegetation photosynthesis and growth. However, this modelling effort requires the measurement or estimation of several biophysical parameters (e.g. LAI, canopy height) and input fluxes (e.g. precipitation, land surface temperature, irradiation) that are heterogeneous at the landscape scales and evolve in time (Fatichi, Pappas, & Ivanov, 2016; Pappas, Fatichi, & Burlando, 2016).

Monitoring systems based on in situ, airborne and unmanned aerial vehicles (UAVs) measurements may not always be sufficient to provide the large amount of data required by EMs. Despite the very high spatial resolution (few centimetres), UAV data suffer from low spectral resolution, limited flight endurance of drones, and, for several countries, strict laws regulating UAV use for research purposes (Paneque-Gálvez, McCall, Napoletano, Wich, & Koh, 2014), while in situ and airborne measurements are time-, cost-, and labour-intensive. The constantly expanding list of ready-to-use satellite remote sensing (SRS) products is useful to integrate and, if necessary, replace these measurements (Lawley, Lewis, Clarke, & Ostendorf, 2016) with the main advantages of being freely available for the past two or three decades (e.g. since 1972 for Landsat or 2000 for MODIS), with almost-regular repetition (depending on atmospheric conditions), and at high spectral resolution.

In a seminal work, Plummer (2000) highlighted the crucial role that SRS was already playing two decades ago for the improvement of terrestrial models, defining four strategies for linking SRS data and EMs:
  1. using SRS data for the estimation of EMs forcings;
  2. using SRS data for the calibration and validation of EMs;
  3. using SRS data for updating the state variables of EMs;
  4. using EMs to interpret SRS data.

As foreseen by Plummer's conclusions, research efforts in the past decade have been promoted by a closer collaboration among ecological modellers and the RS community (e.g. the ESA Climate Change Initiative), which led to improved SRS products together with the estimation of SRS uncertainties (Merchant et al., 2017). Nowadays, a number of online repositories allow direct access to images from different satellite missions and sensors in near real-time, and to higher level global SRS products certified through quality assurance (QA) tests and standards. While SRS data have been extensively used for updating the state of global terrestrial models, e.g. for hydrological (De Lannoy & Reichle, 2016) and carbon cycle (Scholze, Buchwitz, Dorigo, Guanter, & Quagan, 2017) models, the potential of combining SRS data and EMs for near real-time monitoring at local scales, such as those of small watersheds and protected areas (area up to few hundreds of km2), is yet to be fully exploited. This step is fundamental to achieve the objectives of projects such as the ECOPOTENTIAL H2020 project (http://www.ecopotential-project.eu), which brought together ecologists, park managers and SRS experts with the goal of monitoring several European protected areas through integration of SRS products into EMs.

Given the marked advances since Plummer (2000), we present updates on the state-of-the-art approaches and discuss some remaining difficulties that modellers may face when integrating SRS data into EMs. As high spatial and temporal resolutions of model drivers and parameters are particularly important to describe vegetation dynamics at local scales, Section 2 highlights the strategies adopted in the literature to properly downscale high-level SRS products (i.e. level 2 and level 3 products, Section 2.1) or to directly obtain the quantities of interest through processing of the observations at the sensor level (level 1 products, Sections 2.2 and 2.1). Best practices to compare EM outputs to the associated SRS products are described in Section 2.2.2. Data assimilation (DA) techniques that are proving successful to enhance EMs forecast abilities through SRS data are described in Section 2.2.3, together with techniques for the evaluation of model and data uncertainties (Sections 2.2 and 3). Finally, Section 3.1 concludes the paper and suggests strategies for further advancements.

2 PREPARING SRS DATA FOR USE IN LOCAL TERRESTRIAL EMS

Satellite remote sensing products are useful to characterize spatial parameters and forcings related to the hydrological component (precipitation, evapotranspiration [ET], soil moisture, snow cover, etc.), the energy balance (land surface temperature [LST], irradiation, albedo) and the vegetation properties (land use, vegetation cover, vegetation height/biomass, fraction of absorbed photosynthetically active radiation [FAPAR], LAI; Amitrano et al., 2014). Vegetation indices (VIs) have been frequently used as indicators of vegetation cover type and vegetation photosynthetic activity, and for the estimation of phenological parameters. van der Kwast et al. (2009), for example, used the normalized difference vegetation index (NDVI) derived from ASTER to estimate the input parameters of the Surface Energy Balance System, while Turner et al. (2006) adopted land cover and seasonal LAI derived from Landsat Enhanced Thematic Mapper+ (ETM+) as input to the Biome-BGC carbon cycle model for the estimation of landscape gross primary production (GPP) and net primary production (NPP; see Table 1 for further examples).

Table 1. A non-exhaustive list of examples of ecosystem models (EMs) applications at local scales using satellite remote sensing (SRS) products to describe model parameters and/or forcings. Detailed model descriptions and references are available in Supplementary Information 3
Study EM model EM description Modelled area (km2) Cell size/resolution (m) Time step SRS products used
Turner et al. (2006) Biome-BGC Process-based model derived from Forest-BCG considering the physical and biological processes governing energy, water, carbon and nitrogen fluxes among the vegetation and soil layers 25 25 1 day

Land cover, LAI (estimated from Landsat ETM+)

NPP/GPP (from MODIS for validation)

van der Kwast et al. (2009) SEBS Single-source model for the estimation of turbulent heat fluxes and ET 5 90 Albedo, Vegetation cover (estimated from ASTER)
Govind et al. (2009) BEPS-TerrainLab

Couples the hydrological model TerrainLab with the biophysical simulator BEPS

Estimates mass and energy fluxes among soil, vegetation and atmosphere, together with plant growth and C-cycle

~40 25 1 day

Land cover (from ESOD)

LAI (estimated from NDVI of Landsat TM)

Strauch and Volk (2013) SWAT Semi-distributed watershed model for simulations of daily discharge, nutrient, pesticide and sediment loads. Spatial heterogeneities are considered by dividing the domain into sub-basins 242 1 day ET and LAI (from MODIS)
Hwang et al. (2008) RHESSys Ecohydrological modelling framework designed to simulate carbon, water and nutrient fluxes 15.8 30 1 day LAI (estimated from NDVI of Landsat ETM+)
Lopes, Aranha, Walford, O'Brien, and Lucas (2014) Forest-BCG Process-based ecophysiological model for NPP estimations based on canopy interception, evaporation, photosynthesis, growth, carbon allocation, litter fall, nitrogen mineralization and mortality 60 and 24 30 1 day LAI (estimated from EVI and NDVI of Landsat ETM+)
The spatial resolution, temporal frequency and accuracy of “off-the-shelf” high-level SRS products (Table S1) frequently do not match modelling requirements for local applications, where grid cells from tens to hundreds of metres in size are used. Moreover, the assumptions and ancillary data (which are often “hidden” in the technical documentation, cascades of scientific articles or provider's websites for more recent changes) used for the computation of these high-level products might not be consistent with the assumptions or other inputs used in local scale EMs. For example, LAI estimated by MODIS are based on specific land cover map characterized by eight biomes (Yan et al., 2016), which may differ from those used in the EM. We present three typical strategies that modellers are currently adopting and combining to address the aforementioned problems:
  1. downscaling low-resolution SRS products, which is particularly useful for climatic products having resolutions of several kilometres (Table S1);
  2. deriving high-resolution products from SRS level 1 data, which is relevant to obtain accurate vegetation-related parameters for the domain under study (Table S2);
  3. applying multi-sensor/multi-platform fusion techniques, which can fill temporal gaps among the estimated vegetation parameters.

2.1 Downscaling methods for climatic products

Downscaling helps modellers overcome the scale mismatch between high-level (levels 2, 3) SRS products and the desired model resolution. While applicable to many types of SRS, downscaling methods are frequently used for climatic variables (e.g. precipitation and LST), which are among the main driving factors of terrestrial EMs describing ecosystem seasonality and long-term trends.

2.1.1 Precipitation

Satellite remote sensing-based precipitation measurements (Table S1) constitute a valid alternative to datasets obtained through spatial interpolation of in situ measurements. These datasets achieve regional-(DAYMET, PRISM over North America) or global-(WorldClim) scale coverage with resolutions up to 1 km, but are known to have poor reliability in areas where the density of ground sensors is low and uneven (Mourtzinis, Rattalino Edreira, Conley, & Grassini, 2017). Moreover, from the temporal perspective, the monthly resolution of the provided climatologies is too coarse to effectively drive many EMs relying on daily or sub-daily forcing data (Table 1). Main problems arising from SRS-based rainfall products concern spatial coverage, because different satellites cover different ranges of latitudes (i.e. GPCP, CMAP or GPM), and their generally coarse spatial resolution, from 2.5 deg (about 280 km at the equator) up to 0.1 deg (about 11 km). A recent comparison of precipitation datasets derived from gauges, models and SRS data also highlighted the variability of different products, in particular in SRS-derived seasonal precipitation and distribution of extreme events (Sun et al., 2018).

Statistical downscaling has been used to obtain rainfall data at about 1 km resolution from e.g. TRMM using classic geostatistical analysis (Chen, Liu, Liu, & Li, 2014; Shi & Song, 2015). Covariates available at higher spatial resolution (e.g. VIs, elevation and other topographic parameters, and in situ weather data) are used to explain part of the large spatiotemporal variability of the precipitation field. Stochastic methods are particularly suitable to generate synthetic precipitation patterns whose statistical properties are consistent with those of observed precipitation. One example is represented by the Rainfall Filtered Autoregressive Model (RainFARM) based on the extrapolation to the small scales of the Fourier power spectrum of coarse-scale precipitation (D'Onofrio, Palazzi, von Hardenberg, Provenzale, & Calmanti, 2014; Terzago, Palazzi, & von Hardenberg, 2018).

2.1.2 Land surface temperature

Satellite remote sensing measurements provide daily global spatial coverages of LST at resolutions that vary from 1 km (Sentinel 3) to 56 km (AMSRE-E). Higher resolution data (30 m) are provided at 16 days interval from Landsat 8 (Table S1). Spatial downscaling of LST (also known as sharpening or disaggregating) relies on information about the soil type, emissivity and vegetation cover, frequently using NDVI or LAI as proxy for the latter. Downscaling has been applied to estimate LST at higher spatial resolution (e.g. 250 m) from MODIS and AVHRR data while maintaining their original temporal resolution (Liu & Pu, 2008; Metz, Rocchini, & Neteler, 2014). Fusion approaches have been tested in keeping the temporal resolution of MODIS (daily) while downscaling it down to Landsat resolution (30 m) or ASTER (90 m; Weng, Fu, & Gao, 2014; Yang et al., 2016). It is worth stressing that LST may differ by several degrees from near-surface air temperature measured by surface stations (Good, 2016). Approaches to estimate near-surface air temperature from satellite observations are still under development, and daily global mapping will be accomplished in the EUSTACE H2020 project (Brugnara, Auchmann, & Brönnimann, 2017).

It is important to remember that SRS climate products might have low accuracy in topographically complex areas mainly due to their low resolution. To improve the accuracy for local applications, downscaling can be applied in conjunction to bias correction techniques and considering the ancillary data used to obtain the original SRS products (e.g. Maggioni, Meyers, & Robinson, 2016).

2.2 Deriving high-resolution SRS products

Assessing the key parameters for describing photosynthesis, ET, and NPP is crucial to compute energy, water and carbon fluxes. Several global SRS products related to vegetation biogeochemistry are freely available (Table S1). Although these products have the clear advantage of passing external QA, they differ in terms of algorithms, ancillary data and product uncertainty, and might not be consistent with the particular requirements and assumptions of EMs, especially for local applications. Exploitation of satellite images using empirical, semi-empirical or physically based approaches may be required to obtain consistent products at high spatiotemporal resolutions. We underline, however, that a deep expertise on remote sensing is needed to exploit correctly the available dataset depending on the desired accuracy and application, in particular for assessing the uncertainty of the retrieved variables. The indirect nature of SRS measurements makes them potentially hard to interpret and relate to physically measurable quantities (Disney, 2016).

2.2.1 Empirical approaches

Empirical approaches establish mathematical relationships between SRS data and the biophysical variables of interest via calibration on in situ data (Chuvieco & Huete, 2009). Various empirical approaches to identify leaf and canopy properties from SRS data have been proposed in literature (Tables S2 and S3), including estimations of biophysical parameters, such as water content, nitrogen, chlorophyll content from VIs or, more recently, from solar-induced chlorophyll fluorescence. Empirical models require extensive field measurements for calibration and the modelling results are dependent of the site condition, time period and sensor, thus limiting their applicability to other sites (Croft, Chen, & Zhang, 2014). However, their reduced computational cost makes them an appreciated asset. For instance, recent applications to Sentinel-2 data have provided LAI and chlorophyll content at high resolution (10 m) and 5 days revisit period (Clevers, Kooistra, & van den Brande, 2017). Empirical approaches are also implemented within cloud computing platforms such as Google Earth Engine, which contains both raw imagery archives (including Landsat and Sentinel-2) and machine-learning algorithms (e.g. robust linear regression, random forest, support-vector machines).

2.2.2 Physically based models

Physically based models attempt to describe the surface reflectance through physical laws of the radiation transfer inside the canopy and its interaction with the soil surface, and offer an explicit connection between the biophysical variables of vegetation and soil and canopy reflectance (Banskota et al., 2015; Houborg, Mccabe, Cescatti, et al., 2015). Physically based models have strong advantages over empirical approaches: they permit to infer causality and perform predictions, can be adapted to a wide range of land cover situations, time periods and sensor configurations, while at the same time not requiring the simultaneous acquisition of in situ and SRS data. Physically based models are fine-tuned using inversion techniques (Chuvieco & Huete, 2009), including quasi-Newton algorithm, look-up tables and artificial neural networks (e.g. Sehgal, Chakraborty, & Sahoo, 2016). On top of the effort to adequately describe the light interaction processes, one of the main challenges of radiative transfer models (RTMs) remains the development of correction factors to take into account the uncertainty in the radiative response associated to 3D-heterogeneous vegetation. The RAMI4PILPS comparison experiments (Widlowski et al., 2011), which evaluate the consistency of several simple RTMs used within EMs, show that as the structure of the plant canopies becomes more complex, model-to-model agreements generally deteriorate and model-to-reference deviation increases as well.

2.2.3 Semi-empirical models

Semi-empirical models rely on the theoretical formulations used in physical models, while adjusting some parameters through empirical relationships based on SRS data. Such models reduce the complexity of physical models by reducing the number of parameters requiring calibration. A possibility is to approximate the outputs of RTMs through surrogate functions, e.g. Gaussian process emulators, whose evaluation is faster. Calibration procedures such as Markov Chain Monte Carlo (MC) or DA schemes are necessary for the inversion also in this case (Gómez-Dans, Lewis, & Disney, 2016). Semi-empirical models have been applied to estimate, among others, canopy height and structure and time series of LAI (Kumar, Kumari, & Saha, 2013).

2.3 Multi-sensor/multi-platform approaches

Expert SRS users are frequently adopting multi-sensor approaches to overcome the fixed temporal and spatial resolutions of single data sources. Data fusion techniques, which require knowledge of sensor limitations and uncertainties, offer advantages for temporally continuous mapping of vegetation parameters (Dusseux, Corpetti, Hubert-Moy, & Corgne, 2014) and for the spatial extrapolation of in situ observations. Limitations of VIs derived from optical data, such as saturation effects and dependence on weather conditions, can be reduced by integration with radar data, taking into account the differences between the responses of optical and radar sensors. In the fields of long-term land use mapping and monitoring, a wide range of studies employ multi-sensor SRS data, mostly combining Landsat with ALOS/PALSAR, Radarsat, or ERS datasets (reviewed in Joshi et al., 2016).

LAI products are highly relevant for EM applications, because the derivation through physically based RTMs makes LAI directly connected to EM parameters and inputs like plant productivity, transpiration and energy fluxes (Asner, Scurlock, & Hicke, 2003). Recent studies showed that the low temporal frequency (16 days) of LAI products derived from Landsat can be improved by combining MODIS reflectance and LAI data with higher temporal resolution (Myneni, Knyazikhin, & Park, 2015), while keeping Landsat spatial resolution (Houborg, McCabe, & Gao, 2015).

Mountain areas pose particular challenges for SRS applications, making indispensable the application of thorough topographic and atmospheric correction (for optical data), as well as of methods to account for foreshortening and layover effects and variations in surface water content affecting dielectric properties (for radar data, Gupta, 2018). In particular, multi-sensor/multi-platform approaches require radiometrically homogeneous and consistent reflectance among images from different sources and different time periods. For instance, Attarchi and Gloaguen (2014) were able to develop a statistical model for above-ground biomass in a mountain forest site in Iran by combining corrected and co-registered Landsat ETM+ and ALOS/PALSAR data, which quality significantly improved compared to the single use of (uncorrected) optical and radar data. In (semi-)arid regions, synergistic optical and radar data were used to model daily ET (Hu & Jia, 2015), a key parameter in this kind of ecosystem. Multi-sensor approaches can also be effective in estuarine environments: Chakraborty, Ferrazzoli, and Rahmoune (2014) analysed MODIS Enhanced Vegetation Index and AMSR-E radar signatures to estimate the amount of vegetation biomass that is submerged during monsoon flooding, while Rangoonwala, Enwright, Ramsey, and Spruce (2016) related persistence of marshland flooding (radar data) to changes in vegetation (optical data).

While the use of multi-sensor approaches offers new opportunities with respect to time-series analysis, cross-sensor calibration is a challenging task. The long-term availability of data from Landsat and SPOT missions makes these sensors still the most suitable for multi-temporal analysis. As Sentinel-2, with the first platform launched in 2015, has a similar spectral coverage but better spatial and temporal resolution, current efforts focus on combining it with Landsat data to provide near daily global coverage at 30 m resolution (NASA Harmonized Landsat-Sentinel-2 project, Claverie & Masek, 2017). Although some research has been carried out to develop methods to obtain radiometric homogenization between different sensor time series (Padró et al., 2017; Pons, Pesquer, Cristóbal, & González-Guerrero, 2014), there are still challenges to be met for the use of multi-temporal, multi-source SRS data in ecology.

3 MODEL CALIBRATION AND VALIDATION USING SRS DATA

Satellite remote sensing is currently used not only as an input to EMs, but also to assess model reliability through validation techniques (Bennett et al., 2013). A large number of SRS products have served to assess EM results at the ecosystem level such as LAI, FAPAR, soil moisture, GPP and NPP (see examples in Table S4).

The operations to process the model outputs to obtain a variable consistent with the measured data constitute the so-called “observation operator” (Kaminski & Mathieu, 2017), which is necessary for the implementation of calibration, validation and assimilation schemes. Two strategies can be used to compare the EM outputs to SRS data (Plummer, 2000), namely indirect or direct comparison, which is related to two different observation operators.

3.1 Indirect comparison

Indirect comparison considers high-level SRS products (e.g. LAI, FAPAR, etc.). In this case, the observation operator adapts model outputs to the measurements, typically through downscaling/upscaling procedures. The correction of biases due to different assumptions between a particular EM and products (e.g. over-simplification of the vegetation layer in the EM) is a particularly important step that, if not considered, might lead to large discrepancies in the results (Liu et al., 2018).

3.2 Direct comparison

Direct comparison couples ecological and reflectance models, where the EM outputs become input of RTMs, so as to directly compare the measured and modelled radiances. Direct comparison is particularly appealing since it avoids possible discrepancies between SRS products and model outputs, and the inversion of a RTM. Examples are the coupling between ecological and reflectance models in hydrological applications, e.g. using temperature brightness from SMOS microwave sensor (De Lannoy & Reichle, 2016), or the computation of modelled canopy reflectance to be compared to MODIS data (Quaife et al., 2008). The observation operator required for direct comparison is not easily implemented in most EMs, because it considers the adaptation of the EM outputs to the RTMs inputs, and the operations in the RTM including backscatter from soil, vegetation and atmosphere. These processes are rarely considered explicitly due to the computational burden of complex dedicated RTMs, and simplified RT schemes are employed (e.g. 1D, effective LAI, etc.). Thus, direct comparison requires the calibration of several additional parameters. Due to these difficulties and the free availability of many SRS products, indirect comparison is still frequently adopted, especially when validation is simply performed by qualitative approaches (Table S4).

Note that indirect comparison requires the evaluation of error metrics between raster maps (Stow et al., 2009), which are typically characterized by strong spatial autocorrelation. Due to the errors introduced by SRS downscaling procedures, the accuracy of the sensor and the complex nature of environmental systems, classical validation metrics based on a per-pixel comparison (such as the root mean squared error, the Pearson's correlation coefficient or Nash-Sutcliffe efficiency) might not be able to evince common spatial patterns, thus limiting the comparison to mere qualitative considerations. In these cases, residual-based metrics should be replaced by the analysis of statistical moments, spectra and other quantitative measures of spatial structure that have been developed to objectively reveal common patterns among maps (Koch, Jensen, & Stisen, 2015). The quantified uncertainties associated with the model results and SRS observations are frequently neglected during the validation of model results, but should be included to weigh their relative contribution to the error metric (see Section 3).

4 ASSIMILATION OF SRS DATA FOR THE UPDATE OF MODEL STATE VARIABLES

Ecosystem models suffer from the presence of many sources of errors that may propagate in time amplifying the uncertainty on the model outputs. Model-Data Fusion techniques (MDFs) reduce and control this model uncertainty by consistently combining EMs and data (Peng, Guiot, Wu, Jiang, & Luo, 2011). DA schemes are a particular family of MDFs mainly developed by the meteorological community to improve model forecast by updating the state variables using measured data. DA has been applied in several fields of ecology (Luo et al., 2011), e.g. for modelling the spread of infectious diseases, fire, fisheries (Niu et al., 2014), to improve carbon cycle models (Benavides Pinjosovsky et al., 2017) and hydrological predictions (Lahoz & De Lannoy, 2014).

Open access repositories such as Open DA, the Parallel DA Framework (Kurtz et al., 2016) and the DA Research Testbed (Mizzi, Arellano, Edwards, Anderson, & Pfister, 2016) have been developed to help the coupling between EMs and state-of-the-art DA methods. A particularly useful software for EM applications at local scales is ESA's project EO-Land Data Assimilation System (EO-LDAS, Lewis et al., 2012), which provides a Python library for the retrieval of geophysical parameters by assimilation of the optical medium-resolution data of Sentinel-2.

The core of DA schemes is the update (or analysis) step that corrects the model state variables towards the observations, reducing the forecast uncertainty. Different implementations of the update step characterize three main approaches (Montzka, Pauwels, Hendricks Franssen, Han, & Vereecken, 2012):
  1. Variational methods, which minimize a cost function associated with the residual between forecast and observations (Benavides Pinjosovsky et al., 2017).
  2. Kalman-based methods, which extend the well-known Kalman filter to nonlinear/non-Gaussian models, e.g. using MC simulations as in the Ensemble Kalman filter (EnKF; Quaife et al., 2008).
  3. Particle filters, which directly apply MC sampling using a Bayesian approach (De Bernardis, Vicente-Guijalba, Martinez-Marin, & Lopez-Sanchez, 2016).

The analysis step of DA updates the model state variables based on a balance between model and observation uncertainties, described by estimates of their probability distribution. This is a particularly difficult task for both EM outputs and SRS data.

4.1 SRS measurement uncertainty

The assimilation of SRS data requires the computation of error cross-covariances at the pixel level, taking into account the cumulative effect of different sources of uncertainty: sensor errors, errors in the RTM, errors of representativity (due to upscaling or downscaling steps) and errors introduced when processing the data, e.g. atmospheric and radiometric corrections as well as filtering or masking optical encumbrances such as clouds and haze (Pfeifer, Disney, Quaife, & Marchant, 2012; Waller, Dance, Lawless, & Nichols, 2014). However, the propagation of the instrument and parameter errors is frequently neglected during the inversion of RTMs, and the product accuracy is assessed a posteriori, by means of costly validations against in situ measurements, or comparison with the output of calibrated process-based models (Wanders, Karssenberg, De Roo, De Jong, & Bierkens, 2012). QA of online products usually provides qualitative information on pixel values (e.g. QA band for Landsat VIs specifies the pixel condition about cloud, snow, water, etc.) and only rarely quantitative information on the accuracy of the data (e.g. SD at the pixel level for MODIS LAI/FAPAR) which is what needed for DA. The lack of information on the errors of SRS products can be relieved by using the statistics associated with the residuals between EMs outputs and SRS measurements evaluated during the DA updates (Crow & Reichle, 2008). Direct coupling represents a valid alternative, allowing the assimilation of the optical signal at the sensor, thus describing the observation error as the accuracy of the sensor (e.g. De Lannoy & Reichle, 2016; Zhang, Shi, & Dou, 2012). In this latter case, the evaluation of model uncertainty (see Section 3) has to be propagated through the observation operator (described in Section 2.2.2). Moreover, model uncertainties might be amplified by the unknown parameters of the RTM. As an example, EnKF assimilation of canopy reflectance from MODIS has been shown to improve EM estimates of GPP and reduce model uncertainty (Quaife et al., 2008).

4.2 EM uncertainty

Correct estimation of model error is fundamental for DA. Underestimating model uncertainty reduces the relevance of the data (the assimilation would marginally correct the system state variables), with increased risk of divergence from the actual state of the system. Overestimating it, instead, would result in poor forecast capabilities, with the model forecast that spans a large range of possible solutions. The latter is more favourable to DA analysis, since the true system state is more likely to fall within the prediction interval.

Ecosystem Models uncertainties are mainly evaluated during the forecast step, which drives forward the state variables until the following observation time. The main sources of uncertainty are input variables (initial conditions, external forcing), unknown parameters, structural uncertainties due to the physical simplification of the governing processes, and numerical approximations for the discretization of continuous processes (Refsgaard, van der Sluijs, Hojberg, & Vanrolleghem, 2007). MC-based DA schemes, such as EnKF and particle filters, estimate forecast uncertainties by running an ensemble of EM realizations, each associated with different samples of the inputs, forcing term and parameter probability density functions (PDFs). The definition of these PDFs is typically straightforward for the forcing, based on direct error statistics (e.g. De Lannoy & Reichle, 2016). Assessment of the parameter PDFs is more difficult and computationally expensive. For many systems, literature provides insights into the distribution limits, type of density function and modal value, supplemented through local knowledge and expertise. PDF tuning can be done through sensitivity analysis and Bayesian parameter calibration techniques (Harrison, Kumar, Peters-Lidard, & Santanello, 2012). Finally, the characterization of model structural uncertainty is non-trivial and constitutes one of the main issues in model identification, with important implications for forecasting. Structural uncertainty represents the fundamental inability of a model to represent the processes it is designed to replicate. Numerical errors, which arise from the spatiotemporal discretization of model equations, are frequently considered together with the structural uncertainty (El Serafy et al., 2011). The simplest approach to account for structural uncertainties is to consider the model error as a Gaussian distribution, where the estimation of the covariance matrix follows a procedure developed ad hoc for each model and application (Reichle, 2008). As an example of EM uncertainty estimation, De Lannoy and Reichle (2016) assigned static perturbation statistics for the error of the land surface model GEOS-5 CLSM. In this case, two model state variables are perturbed with an additive noise characterized by fix temporal and spatial correlations. In a more sophisticated fashion, El Serafy et al. (2011) proposed an iterative procedure based on a MC sampling method to estimate the error covariance of a model for suspended particulate matter concentration (Delft3D-WAQ). In general, the determination of structural uncertainties through sensitivity analyses, parameterization, adn conceptual models (Matott, Babendreier, & Purucker, 2009; Refsgaard, van der Sluijs, Brown, & van der Keur, 2006) is heavily reliant on a sufficiently large number of in situ data (Uusitalo, Lehikoinen, Helle, & Myrberg, 2015).

Many studies neglect these different sources of uncertainty, thus possibly underestimating the model output variances (Matott et al., 2009). Further research and inquiry are required to provide standard methods for the estimation of model structural uncertainty.

5 CONCLUSIONS

In the two decades since Plummer (2000), the ecological community has acquired more confidence in the use of SRS data, mainly owing to the increased availability of global products subject to rigorous QA tests. Global SRS datasets are nowadays used also by ecologists and conservation managers who may not have experience or knowledge of how SRS datasets are generated. For modellers, major risks of integrating SRS products into EMs are that the specific assumptions underlying their production (e.g. the land cover considered in the RTM) might be incompatible with the assumptions of the EMs, or that the practices in downscaling the SRS data for local-scale modelling (e.g. nearest neighbour resampling) are inappropriate. Better information about the conditions, assumptions and ancillary data behind SRS data production needs to be provided by remote sensing experts with off-the-shelf products, perhaps via community-driven adoption of detailed metadata standards. On the other hand, modellers need to be educated to properly use SRS and their QA metadata, starting from improved university programmes in ecology and physical geography, which currently lack sufficient depth on SRS (Bernd et al., 2017).

Global SRS products are increasingly available to the ecological modelling community, but their application to local scale EMs is not straightforward, requiring downscaling, or the adaptation of the developed algorithms to higher resolution satellite imagery. So far, the lack of shared codes, the difficulties of adapting them to different SRS products and the need for larger computational resources have hindered the use of such algorithms and products. Two emerging trends may be key to changing this. First, the surge of cloud computing (e.g. Google Earth Engine, ESA RUS) allows SRS experts to upscale their algorithms and produce large-scale coverage (e.g. Pekel, Cottam, Gorelick, & Belward, 2016). This will be further pushed by the six Copernicus Data and Information Access Services (DIAS) platforms to be launched in 2018. Second, the adoption of open research practice (and not just open data) for new projects, such as those funded by the European Framework Programmes (e.g. Horizon 2020), is pushing the SRS community to release the codes developed for local applications. New initiatives, such as the “Model Web” within the GEO System of Systems, aim to make such models and algorithms accessible and executable online.

To fully explore the potentiality of SRS data, we suggest that the development of spatially explicit EMs should adopt the following strategies:
  1. Direct coupling of EMs with RTMs, a strategy that is rarely adopted in ecological applications at present. Although direct coupling would require the calibration of a larger number of model parameters, the possibility of directly simulating the system reflectance clearly gives considerable advantages for (1) the validation of model outputs, (2) the assimilation of SRS data in near real time (without the necessity to wait the production of higher level products) and (3) the estimation of measurement uncertainties, which is required in DA techniques and allows for prediction.
  2. structuring the simulation codes, so that they can be easily connected to the available DA platforms. In fact, the relatively small investment required in terms of updating model implementation would be greatly compensated by the potential to access a number of state-of-the-art, well-tested algorithms for the assimilation of SRS data and the assessment of model uncertainty.

Further collaborations between the remote sensing and the ecosystem modelling scientific communities will help to overcome the outlined difficulties and develop standardized techniques to include SRS data into EMs. There are promising prospects for new SRS missions, and for the development of new algorithms, standards and platforms. Realizing these will require collaborative projects demonstrating the use of SRS in EMs, similar to the Horizon 2020 ECOPOTENTIAL project whose focus was on improving protected areas management using remote sensing.

The coming years promise great potential for SRS and EMs. In the next decade, several new missions are planned to be launched, including P-band SAR (BIOMASS mission) and ISS-mounted LiDAR (GEDI) for estimating biomass, hyperspectral imaging spectrometers for more accurate RTM inversion (EnMAP, HyspIRI, Hisui), high-resolution sun-induced fluorescence for estimating plant photosynthetic activity (FLEX), and a thermal radiometer to monitor water stress (ECOSTRESS). In addition, constellations of small and less-expensive satellites taking high-resolution imagery of the entire Earth every day, like PlantScope (optical) and ICEYE (SAR), are becoming a reality. In combination with longer time-series data from Sentinel missions and the computational power of cloud platforms such as Google Earth Engine and the upcoming DIAS platforms—the sky is the limit for ecological modellers.

ACKNOWLEDGEMENTS

This work has been carried out within the H2020 project “ECOPOTENTIAL: Improving Future Ecosystem Benefits Through Earth Observations”, coordinated by CNR-IGG (http://www.ecopotential-project.eu). The project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement no. 641762.

AUTHORS’ CONTRIBUTION

D.P., I.M., A.F.C. and G.Z. conceived the ideas and the scope of the review; D.P. led the writing of the manuscript. All authors critically contributed to the drafts and gave final approval for publication.

DATA ACCESSIBILITY

The manuscript does not include any data.