Volume 9, Issue 5
APPLICATION
Free Access

An integrated phenology modelling framework in r

Koen Hufkens

Corresponding Author

E-mail address: koen.hufkens@gmail.com

INRA, UMR ISPA, Villenave d'Ornon, France

Department of Applied Ecology and Environmental Biology, Ghent University, Aquitaine, Belgium

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA

Correspondence

Koen Hufkens

Email: koen.hufkens@gmail.com

Search for more papers by this author
David Basler

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA

Search for more papers by this author
Tom Milliman

Earth Systems Research Center, University of New Hampshire, Durham, NH, USA

Search for more papers by this author
Eli K. Melaas

Department of Earth & Environment, Boston University, Boston, MA, USA

Search for more papers by this author
Andrew D. Richardson

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA

School of Informatics, Computing and Cyber Systems, Northern Arizona University, Flagstaff, AZ, USA

Center for Ecosystem Science and Society, Northern Arizona University, Flagstaff, AZ, USA

Search for more papers by this author
First published: 17 January 2018
Citations: 32

Abstract

  1. Phenology is a first‐order control on productivity and mediates the biophysical environment by altering albedo, surface roughness length and evapotranspiration. Accurate and transparent modelling of vegetation phenology is therefore key in understanding feedbacks between the biosphere and the climate system.
  2. Here, we present the phenor r package and modelling framework. The framework leverages measurements of vegetation phenology from four common phenology observation datasets, the PhenoCam network, the USA National Phenology Network (USA‐NPN), the Pan European Phenology Project (PEP725), MODIS phenology (MCD12Q2) combined with (global) retrospective and projected climate data.
  3. We show an example analysis, using the phenor modelling framework, which quickly and easily compares 20 included spring phenology models for three plant functional types. An analysis of model skill using the root mean squared (RMSE) error shows little or no difference regardless of model structure, corroborating previous studies. We argue that addressing this issue will require novel model development combined with easy data assimilation as facilitated by our framework.
  4. In conclusion, we hope the phenor phenology modelling framework in the r language and environment for statistical computing will facilitate reproducibility and community driven phenology model development, in order to increase their overall predictive power, and leverage an ever growing number of phenology data products.

1 INTRODUCTION

Seasonal leaf development, or vegetation phenology, is strongly linked to seasonal changes in temperature and considered an indicator of climate change. Currently, rising temperatures due to climate change have moved spring forward in time by 2.3 days per decade since the 1970s (Rosenzweig et al., 2007). Vegetation phenology hereby does not only disproportionately influence ecosystem productivity by advancing and delaying the season (Richardson et al., 2010, 2013), it also changes canopy properties such as albedo and atmospheric boundary layer properties (Hollinger et al., 1999; Sakai, Fitzjarrald, & Moore, 1997). As such, models of seasonal leaf development, rigorously validated against in situ observations, are key to understanding how climate change will affect ecosystem productivity and biophysical vegetation properties.

Luckily, phenology has been recorded by amateurs and professionals, such as national meteorological institutions, supporting contemporary analysis of past or ongoing climate change (Chuine et al., 2004). Recently, individual observations have been formalized into rigorous citizen science efforts through, for example, the USA National Phenology Network (USA‐NPN; https://www.usanpn.org/; Betancourt et al., 2005) and Project Budburst (http://budburst.org/). In addition, automated camera networks (i.e. the PhenoCam network, https://phenocam.sr.unh.edu/; (Richardson et al., 2018)) or remote sensing (Zhang et al., 2003) provide a canopy wide continuous way of evaluating the development of vegetation across larger areas in a consistent and continuous fashion (Melaas, Friedl, & Richardson, 2016; White et al., 2009). Numerous studies have demonstrated the value of the PhenoCam‐derived Gcc index, a measure of vegetation greenness as percentage green within a digital image, for characterizing the seasonal trajectory of vegetation color and activity (Hufkens et al., 2016; Keenan et al., 2014; Klosterman et al., 2014; Toomey et al., 2015). Similarly, the MODIS MCD12Q2 phenology product has been a proven source of phenological data (Chen, Melaas, Gray, Friedl, & Richardson, 2016).

These observations of vegetation phenology allow us to estimate changes in the timing of vegetation development in response year to year variation in weather, climate change and climate variability (Chuine et al., 2004; Melaas et al., 2016; Vitasse, Porté, Kremer, Michalet, & Delzon, 2009). Most process‐based models try to simulate various internal and environmental influences, such as whole plant physiological status (paradormancy), internal factors of developing bud (endodormancy) and external factors driving or suppressing seasonal development (ecodormancy) (Lang, Early, Martin, & Darnell, 1987).

One of the first such ecodormancy models was the growing degree day model as proposed by De Reaumur dating back to 1735. Although vegetation phenology is often driven by temperature multiple additional constraints have been proposed including daylength, chilling degrees, precipitation, relative humidity or vapour pressure deficit (Chuine & Cour, 1999; García‐Mozo et al., 2009; Hunter & Lechowicz, 1992; Laube, Sparks, Estrella, & Menzel, 2014; Laube et al., 2013; Xin, Broich, Zhu, & Gong, 2015). Similarly, fall senescence has been modelled, using chilling degree days with additional constraints such as daylength (Archetti, Richardson, O'Keefe, & Delpierre, 2013; Gill et al., 2015; Jeong & Medvigy, 2014). These various models are either used in isolation to address particular physiological questions or included in land surface models to scale phenological processes (Richardson et al., 2011). Model development, in isolation or coupled to larger land surface models, often integrate multiple environmental drivers, which increases model complexity (Chen et al., 2016; Jeong & Medvigy, 2014). Yet, models which include more complex concepts, based upon growing degree days, do not necessarily perform better than a simple regression‐based approach. As such, model structures still explain a limited amount of the year‐to‐year variability, and fail to generalize well (Basler, 2016; Clark, Salk, Melillo, & Mohan, 2014; Fisher, Richardson, & Mustard, 2007; Linkosalo, Häkkinen, & Hänninen, 2006; Schaber & Badeck, 2003). For example, model studies have shown that biologically “incorrect” models can be parameterized to provide good predictions but lacking any biological representation (Hunter & Lechowicz, 1992). A study by Migliavacca et al. (2012) has shown that between‐model differences by the end of the century are almost as large as differences between‐climate scenario values. As a consequence, different model assumptions will behave disproportionately different under future scenarios affecting their potential impacts and uncertainties (Migliavacca et al., 2012).

With vegetation phenology as a first‐order control on ecosystem productivity, accurate and transparent model predictions of vegetation phenology in a changing climate are key. In order to facilitate easy model comparison and future development of new models, we developed the phenor model framework for the r language and environment for statistical computing (R Core Team 2016). The phenor r package assimilates four important phenological records across a variety of ecosystem, plant functional types and scales. The assimilated datasets provide extensive coverage in the US and Europe and results can be easily scaled globally using various gridded data products made accessible through the software. Here, we provide a worked example for the phenor r package using the recent standardized PhenoCam dataset (Richardson et al. 2018; http://phenocam.us ) to demonstrate the ease with which a suite of phenological models (Table 1) can be evaluated and scaled up from sites to regions and biomes, and extrapolated in both forecast and hindcast modes.

2 MATERIALS AND METHODS

2.1 The phenor r package

The phenor r package assimilates four important phenological records of either observational, near‐surface and satellite remote sensing‐based records across a variety of ecosystem and plant functional types. The phenor r package combines data from near‐surface remote sensing through the PhenoCam network using phenocamr and daymetr r packages into a phenology modelling framework, which covers data preparation, model optimization and model visualization and consists of a number of key functions. In addition, data from the USA National Phenology Network (USA‐NPN), the Pan European Phenology Project (PEP725) and the MODIS land surface phenology product (MCD12Q2) can be ingested. In the interest of brevity both phenocamr, daymetr r packages and the PhenoCam source data are described in Appendix S1.

The format_phenocam() phenor function combines phenophases (also called transition dates) generated by the phenocamr r package, with the climate data downloaded, using the daymetr r package. The function requires the location (path) of the generated phenophase output files, together with parameters specifying the phenophase (direction = rising; with rising for spring or falling for autumn) and the threshold value used (threshold = 25), the Gcc percentile to use (gcc_value = 90, Sonnentag et al. 2012) and the offset as a day‐of‐year value. The offset is the day‐of‐year in the previous year on which to start reporting climate data, running until this day in the subsequent year. The function returns model calibration/validation and driver data a nested list of data frames, used in subsequent model optimization (df, see description of optimize_parameters() and model_calibration() below). image

Similarly, the format_pep725() function uses PEP725 observational data together with European E‐OBS climate data (Haylock et al., 2008) to compile a consistent calibration/validation dataset for European observational records (e.g. Basler, 2016). Data can be downloaded using the download_pep725() function. We provide similar functionality for the USA‐NPN data. Data can be downloaded through the USA‐NPN application programming interface, using download_npn() and correctly formatted with format_npn(). Furthermore, the format_modis() function correctly formats a directory of MODIS MCD12Q2 land surface phenology data (i.e. phenophases, Zhang et al., 2003) as downloaded with the MODISTools r package (Tuck et al., 2014).

Spatially scaling of model results is facilitated through a number of functions. The format_daymet() function uses gridded pre‐processed Daymet tiles to generate spatially explicit driver data (download_daymet_tiles() and daymet_tmean() functions of daymetr, Appendix S1). The format_eobs() function provides the same functionality for the E‐OBS climate data. Yet another source of hindcast data is compiled using the format_berkeley_earth() function, which allows the user to subset 1 × 1 degree global historical climate data for any year since 1850 through the Berkeley Earth project (Rohde et al., 2012). Similarly, the format_cmip5() function formats 1/4th degree NASA Earth Exchange (NEX) global gridded Coupled Model Intercomparison Project (CMIP5) data of historical reanalysis and representative concentration pathway (RCP 4.5 and RCP8.5) projections. Unlike format_phenocam() or format_modis() no calibration/validation data is included in the gridded spatial data.

The resulting dataset of all formatting functions is a nested list with the following layout:image

Within the phenor data structure, the top level is a particular site. For each site, critical parameters such as the day‐of‐year range (doy, as specified by the offset), the geographic location (or georeferencing) and matrices holding, minimum temperature (Tmini), maximum temperature (Tmaxi) and mean daily temperature (Ti), precipitation (Pi), vapour pressure deficit (VPDi), daylength in hours (Li) and calibration/validation transition date (transition_dates) data are provided. Matrices are organized with columns representing a given year, and rows representing a given day‐of‐year. Other data are represented as vectors matching the number of columns present in the climate data matrices. Where necessary, data are truncated to match the available climate data. When certain data sources are missing the content of a field is set to NULL.

The optimize_parameters() function allows for the easy optimization of model parameters. This function uses two common optimizers, GenSA (Xiang, Gubian, Suomela, & Hoeng, 2013) and rgenoud (Mebane & Sekhon, 2011). The GenSA algorithm combines both the Boltzmann machine and faster Cauchy machine simulated annealing approaches for fast optimizations (Tsallis & Stariolo, 1996), while the genoud routine combines an evolutionary algorithm with a derivative‐based (quasi‐Newton) method to solve difficult optimization problems (Mebane & Sekhon, 2011). To optimize a calibration/validation dataset (df), one specifies a particular model (e.g. the Thermal Time model, TT), a defined optimizer (e.g. GenSA), an objective function such the root mean squared error (RMSE), upper‐lower parameter limits and parameter starting values, when required. Additional control parameters, such as the maximum number or iterations (e.g. max.call), can be provided using a list of options to the control parameter. An example function call to optimize a the TT model is provided below.image

Final predicted values for the optimized parameters can be retrieved by running the estimate_phenology() function with the optimized parameters.image

The output will automatically be formatted as a map of phenology dates or a vector, depending on the input data class. However, running models across all grid cells of spatial data would provide a naively broad representation of land surface phenology. For example, only a small subset of the US is dominated by any particular plant functional type (PFT), such as deciduous broadleaf forests. In order to better differentiate between different dominant PFT, we include a function land_cover_density() which calculates the percentage coverage of a particular MODIS MCD12Q1 IGBP land cover class (Friedl et al., 2010) within a given raster cell for a given location (i.e. CMIP5 data, see Figure 5a,b).

A wrapper function, model_calibration(), is provided for both the optimize_parameters() and estimate_phenology() functions which integrates the previously described steps providing both summary statistics (RMSE and AICc) and a plot (Figure 1) of the model fit. Likewise, the model_comparison() function serves as a wrapper for multi‐model parameter optimization runs. For a visual comparison limited to two model runs we provide an arrow plot function, arrow_plot(), displaying directional changes in the modelled values between the model outputs (Figure 2).

image
Output of the model_calibration() function in the phenor r package which produces a scatter plot of measured and modelled budburst dates. In this case an optimization was run for the deciduous broadleaf data on a Thermal Time model, using 40,000 iterations of the generalized simulated annealing routine, the package default optimizer. Model fit statistics such as the RMSE and AIC are provided in the top right and bottom left corner of the graph and on the command line output. RMSE NULL is the RMSE assuming the mean of the measured values as the optimal model output. RMSE, root mean squared error; AIC, akaike information criterion
image
Output of the arrow_plot() function in the phenor r package which produces a scatter plot of measured and modelled budburst dates showing directional changes between model runs of the thermal time and photo thermal time models. The direction and magnitude of the change is indicated by arrow's colour and the length, respectively. When changes delay phenological development arrows are coloured orange, when phenological development is advanced arrows are coloured blue. Values unchanged between models structures are indicated by small black dots

2.2 A worked example: a quick model comparison

As a worked example we partially recreate the spring phenology model comparison by Basler (2016), using PhenoCam data. However, we note that a similar exercise could be executed with any of the other phenology data sources available through phenor. The model structures included in this worked example can be described by the three broad categories: (1) as simple linear regression to spring temperature, (2) models explaining ecodormancy release only, (3) models explaining the release of endo‐ and ecodormancy. A reference NULL model assumes a fixed mean date of leaf unfolding.

A total of 22 phenology models are included in the package (Table 1). These include 20 spring phenology models including precipitation driven models, one fall senescence chilling degree day model and one grassland pollen release model. In our worked example of the phenor r package, we will focus only on the 20 spring phenology models. A full list of the model structures and parameter ranges for the models are provided in Table 2 of Appendix S2 and included in the phenor library https://github.com/khufkens/phenor/blob/master/inst/extdata/parameter_ranges.csv.

Table 1. Adapted from Basler (2016) Table 1)
Model Name Drivers # Parameters References/comments
NULL 1 Mean date of leaf unfolding
LIN F 2 Linear regression against spring temperatures
Ecodormancy release only
Thermal Timeaa Also calibrated using a sigmoid temperature response (Hänninen, 1990; Kramer, 1994), adding one parameter.
(TT, TTs)
F 3 (4) (Cannell & Smith, 1983; Chuine, Cour, & Rousseau, 1999; De Reaumur, 1735; Hänninen, 1990; Hunter & Lechowicz, 1992; Kramer, 1994; Leinonen, Repo, & Hänninen, 1997; Wang, 1960)
Chilling Degree Day (CDD) C 3 (Jeong & Medvigy, 2014)
Photothermal‐timeaa Also calibrated using a sigmoid temperature response (Hänninen, 1990; Kramer, 1994), adding one parameter.
(PTT, PTTs)
PF 3 (4) (Črepinšek, Kajfež‐Bogataj, & Bergant, 2006; Masle, Doussinault, Farquhar, & Sun, 1989)
M1aa Also calibrated using a sigmoid temperature response (Hänninen, 1990; Kramer, 1994), adding one parameter.
(M1s)
PF 4 (5) (Blümel & Chmielewski, 2012)
Endo‐ and ecodormancy releases
Alternating (AT) CF 5 (Cannell & Smith, 1983; Murray, Cannell, & Smith, 1989)
Sequentialbb Also calibrated using a bell‐shaped chilling response (Chuine, 2000).
(SQ, SQb)
CF 8 (Hänninen, 1990; Kramer, 1994)
Parallel (PA, PAb) CF 9 (Hänninen, 1990; Kramer, 1994; Landsberg, 1974)
Unified (UN) CF 9 (Chuine, 2000)
Sequential M1bb Also calibrated using a bell‐shaped chilling response (Chuine, 2000).
(SM1, SM1b)
CPF 9 Combination with the M1 model
Parallel M1bb Also calibrated using a bell‐shaped chilling response (Chuine, 2000).
(PM1, PM1b)
CPF 10 Combination with the M1 model
Unified M1 (UM1) CPF 10 Combination with the M1 model
Growing Season Indexcc Calibrated using a cummulative response rather than the rolling mean.
(SGSI, AGSI)
FPV 7 (Xin et al., 2015)
Grassland Pollen model (GRP) FR 5 (García‐Mozo et al., 2009)
  • Overview of the phenological models for leaf unfolding and leaf senescence, and pollen release included in this study. The models are grouped by implemented processes and drivers: chilling temperatures (C), forcing temperatures (F), photoperiod (P), precipitation (R), and vapour pressure deficit (V). Function names in round brackets while full model structures are listed in Appendix S2 of Table 1.
  • a Also calibrated using a sigmoid temperature response (Hänninen, 1990; Kramer, 1994), adding one parameter.
  • b Also calibrated using a bell‐shaped chilling response (Chuine, 2000).
  • c Calibrated using a cummulative response rather than the rolling mean.

For this study, we combined spring phenology dates based on PhenoCam 3‐day summary data from the standardized PhenoCam Dataset (Richardson et al. 2018) with Daymet data (Thornton et al., 2017) for three common PFTs, deciduous broadleaf forests, evergreen needleleaf forest and grasslands. A total of 102 sites and 508 site years were included in our calibration/validation dataset, of which 63 were deciduous broadleaf forest sites (358 site years), 18 were evergreen needleleaf forest sites (63 site years) and 21 were grasslands (88 site years). Deciduous broadleaf sites which are moisture limited, and all sites outside Daymet coverage, were excluded from our analysis. The final selected sites span a large geographic area ranging from New Mexico to Southern Alaska, and Maine to California (Figure 3a).

image
Various data products which the phenor r package can query for model calibration/validation, and their respective demonstration dataset locations. (see demo_data.r to compile this dataset) (a) All locations of the selected 104 PhenoCam Dataset V1.0 sites as used in the worked example. (b) Example locations of all Acer rubrum locations for which “Breaking Leaf Buds” observations were made within the USA National Phenology Network. (c) Example locations of MODIS MCD12Q2 grassland phenology colocated with the PhenoCam grassland sites. (d) All Fagus Sylvatica ssp. observations of BBCH code 11 or “first true leaf” for sites where at least 60 observation exist. The three plant functional types diplayed, deciduous broadleaf forests, evergreen needleleaf forests and grasslands, are marked with open yellow squares, green circles and blue triangles respectively

We acknowledge that phenological development as measured, using PhenoCam data represent different physiological processes for different PFTs. For example, the phenology of deciduous forests or grasslands is closely linked to the development of new leaf tissues (Hufkens et al., 2016; Keenan et al., 2014) where evergreen forest phenology is determined by dehardening of existing needles at the end of the winter season. Thus, optimized model parameters and their interpretation are specific to each PFT.

For all PFTs, model optimization was executed using the default generalized simulated annealing (GenSA) package and algorithm minimizing the RMSE between the greenness rising PhenoCam phenophase estimations and model predictions (see Appendix S1). The optimizer was run for 40,000 iterations with a starting temperature of 10,000. To determine the influence of locations at the margin of the forest biome on model optimizations, a subset of sites centrally located within the deciduous forest biome was created (Melaas et al., 2016, Appendix S2 Table 2). This subset was optimized separately and compared to results for the complete deciduous broadleaf dataset. We assess proper convergence of the optimized parameters by initializing the optimizer, using 12 random sets of parameters. We report mean and standard deviations of the RMSE between observations and predictions on the optimized parameter values for all datasets. We compare model performance with a log transformed ANOVA, combined with a post hoc Tukey HSD test. Model errors are evaluated for normality, using a Shapiro–Wilk test.

For illustrative purposes, we produce overview maps (Figure 5) of spatial patterns both in hindcast and forecast mode. In hindcast mode, we use 1 × 1 km Daymet gridded data across New England, while we present the difference in predicted spring phenology (ΔDOY) between years 2099 and 2011 for forecast CMIP5 IPSL‐CM5A‐MR (Mid‐Resolution Institut Pierre Simon Laplace Climate Model 5) model runs, across the contiguous US. The Thermal Time (TT) and Accumulated Growing Season Index models were optimized for deciduous broadleaf and grassland PhenoCam data, respectively. For forecast data only pixels dominated by their particular PFT (>50% coverage) are displayed, limiting a naively broad interpretation of the results.

Our comparison of 20 spring phenology models across three PFTs showed that most models were significantly different from the NULL model, with the exception of the SGSI model in the evergreen PFT (post hoc Tukey HSD test, p < .001, Figure 4). The model performance of the centrally located deciduous broadleaf sites was marginally greater (RMSE: c. 7.6 ± 0.7 days) compared to the complete deciduous broadleaf dataset (RMSE: c.7.9 ± 1.2 days). This difference between the full deciduous broadleaf forest dataset and a subset of more centrally located sites within the biome suggests an influence of geographic location on model error.

image
Model comparison for 20 spring phenology models and three main plant functional types in the PhenoCam Dataset V1.0 (deciduous broadleaf, evergreen needleleaf and grassland). An additional subset of the deciduous broadleaf dataset is created, matching the PhenoCam locations used in Melaas et al., (2016). All panels show boxplots of the root mean squared error of measured versus predicted budburst dates (in days) for the 20 models used in this study. Models accounting for ecodormancy influences are marked in orange. Models accounting for endo‐ and ecodormancy are marked in blue. The linear model is marked in black, while the NULL model is represented by the dashed horizontal line

The influence of different model structures on individual values was visualized using the arrow plot (Figure 3) between two model runs. When visually comparing the Thermal Time (TT) and the Photothermal Time (PTT) models small changes are noted (Figure 3). Both models accumulate growing degree days where the PTT model adds a photoperiod component to the original TT model. In the PTT model, leaf unfolding is therefore in part dependent on a daylength requirement in addition to thermal forcing. In our example, the addition of a daylength requirement shifts model results for early and late developing plants toward earlier leaf out dates, while at the same time, shifting mid‐season developing plants towards later leaf out dates. Despite these changes, the overall model accuracy remains the same. A description of all statistical results is provided in Appendix S1, Section 5.

3 DISCUSSION

Here, we have demonstrated how the phenor r package and its included “model zoo”, together with consistent estimates of vegetation phenology through PhenoCam network (e.g. phenocamr, Appendix S1) or other phenology data sources can be leveraged for a fast and transparent model comparisons. More so, easy access to various gridded data sources allows for quick spatial scaling of optimized models in both hindcast and forecast mode (Figure 5). For example, the code required to partially reproduce a study by Basler (2016) relied on a mere 15 r commands (see run_model_comparison.r in the phenor manuscript github repository), while the models used are easily readable and well documented. Furthermore, adding model structures is easy compared to other frameworks which rely on either low level languages, are closed source or do not work cross‐platform (Brown et al., 2014; Chuine, Garcia Cortazar Atauri, Kramer, & Hänninen, 2013; Hänninen & Kramer, 2007). More so, to execute our complete case study, reasonable processing times were recorded (c.48 CPU hours on a recent desktop workstation) although relying on a slower scripting language. Computational loads for data generation and processing at a global scale remained marginal. The case study demonstrates the ease with which we executed our model comparisons in phenor, corroborating previous studies and once more highlighting the limitations of current model structures in explaining year‐to‐year variability (Basler, 2016; Clark et al., 2014; Fisher et al., 2007; Linkosalo et al., 2006). This result therefore underscores the need for tools such as phenor to facilitate easy and transparent model development and comparison. Furthermore, new visualization methods such as the arrow plot might help in this process. The arrow plot (Figure 2) suggests that the assessment of model skill through summarizing values such as the RMSE seem suboptimal, hiding structural differences in model performance. The non‐normal distribution of model errors within all models (p < .001) suggests as much.

image
Overview map comparing various spatial outputs of the TT and AGSI model optimized to the deciduous broadleaf and grassland PhenoCam data respectively. (a) phenor model output of the difference in estimates of spring phenology between the year 2100 and 2011 for 1/4th degree NASA Earth Exchange (NEX) global gridded CMIP5 Mid‐Resolution Institut Pierre Simon Laplace Climate Model 5 (IPSL‐CM5A‐MR) model runs, using the TT model parameterized on deciduous forest PhenoCam sites. Only pixels with more than 50% deciduous broadleaf or mixed forest cover per 1/4th degree pixel, using MODIS MCD12Q1 land cover data, are shown; (b) phenor model output of the difference in estimates of spring phenology between the year 2099 and 2011 for NEX CMIP5 IPSL‐CM5A‐MR model runs using the AGSI model parameterized on grassland PhenoCam sites. Only pixels with more than 50% grassland coverage per 1/4th degree pixel, using MODIS MCD12Q1 land cover data, are shown; (c) phenor model output for 11 Daymet gridded datasets (tiles) for the year 2011. TT, Thermal Time; AGSI, Accumulated Growing Season Index, CMIP5, Coupled Model Intercomparison Project 5

4 KNOWN LIMITATIONS

We acknowledge that previous efforts have been made to provide phenology model frameworks (Brown et al., 2014; Chuine et al., 2013; Hänninen & Kramer, 2007). Yet, their use and interoperability and scaleability is limited due to platform restrictions or their closed source nature. We recognize that the models as currently presented are by no means an exhaustive list of all model structures found in literature. However, the phenor r package is open source and adding model structures is easy compared to other low level languages (e.g. C/Fortran) and is actively encouraged. We are aware that the phenor r package does not include all possible phenological climatological drivers or phenological calibration/validation data either, although we stress that access to four larger freely available phenological data sources is provided by the phenor r package. Similarly, other sources of climate data, such as the ERA‐interim reanalysis data (Dee et al., 2011), could be integrated as long as the described data structure is followed.

5 CONCLUSION

Accurately representing vegetation phenology, a first‐order control on ecosystem productivity, under future conditions is key in understanding feedback between the climate and the biosphere. Here, we demonstrated the advantages of the phenor r package and modelling framework, through a worked example, by quickly and easily comparing 20 spring phenology models and their model skill for three plant functional types. Our results corroborate previous analysis, showing little or no difference in predictive power between models, which suggest convenient tools for further analysis or novel model development are needed to capture current and future phenological changes as well as their underlying physiological processes. We hope the phenor phenology modelling framework in r will allow for a better integration of observational and experimental data providing opportunities to better understand the environmental factors driving seasonality, and past and future responses of vegetation to climate change and variability.

ACKNOWLEDGEMENTS

The Richardson Lab acknowledges support from the NSF Macrosystems Biology programme (awards EF‐1065029 and EF‐1702697). D.B. acknowledges the Harvard Forest Bullard Fellowship programme. K.H. acknowledges support from the LabEx COTE MicroMic project and BELSPO Brain programme (project BR/175/A3/COBECORE). We thank ORNL, the Daymet team and Michele M. Thornton for the continued support in developing daymetr. We thank the World Climate Research Program's Working Group on Coupled Modelling, which is responsible for CMIP, and the climate modelling groups for producing and making available their model output. We are grateful to the US Department of Energy's Program for Climate Model Diagnosis and Inter‐comparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. Climate scenarios used were from the NEX‐GDDP dataset, prepared by the Climate Analytics Group and NASA Ames Research Center using the NASA Earth Exchange, and distributed by the NASA Center for Climate Simulation (NCCS). We thank the NASA Earth Exchange project for making these data available. Finally, we thank our many collaborators, including site PIs and technicians, for their efforts in support of PhenoCam.

AUTHOR'S CONTRIBUTIONS

K.H. conceived and designed all three packages with contributions of D.B.; T.M. developed the application programming interface queried by phenocamr; K.H. analysed the data and interpreted the results; K.H., D.B., E.M., T.M. and A.D.R. interpreted the results. K.H. drafted the manuscript. All authors commented on and approved the final manuscript.

CODE AND DATA AVAILABILITY

All three packages, phenor (https://doi.org/10.5281/zenodo.1145369), phenocamr (https://doi.org/10.5281/zenodo.1136008), daymetr (https://doi.org/10.5281/zenodo.1136231), can be found on the author's personal github (http://github.com/khufkens) and are easily installed and loaded in the r statistical software using the following commands.image

The data used in the worked examples are freely available as a curated dataset (Richardson et al. 2018). Manuscript data and figures can be generated using the r scripts listed in the manuscript repository (https://github.com/khufkens/phenor_manuscript/, https://doi.org/10.5281/zenodo.1136233).

    Number of times cited according to CrossRef: 32

    • Remote sensing of temperate and boreal forest phenology: A review of progress, challenges and opportunities in the intercomparison of in-situ and satellite phenological metrics, Forest Ecology and Management, 10.1016/j.foreco.2020.118663, 480, (118663), (2021).
    • Rapid adaptive evolution of the diapause program during range expansion of an invasive mosquito, Evolution, 10.1111/evo.14029, 74, 7, (1451-1465), (2020).
    • Interactive climate factors restrict future increases in spring productivity of temperate and boreal trees, Global Change Biology, 10.1111/gcb.15098, 26, 7, (4042-4055), (2020).
    • DATimeS: A machine learning time series GUI toolbox for gap-filling and vegetation phenology trends detection, Environmental Modelling & Software, 10.1016/j.envsoft.2020.104666, (104666), (2020).
    • Climate change shifts in habitat suitability and phenology of huckleberry (Vaccinium membranaceum), Agricultural and Forest Meteorology, 10.1016/j.agrformet.2019.107803, 280, (107803), (2020).
    • A Semiprognostic Phenology Model for Simulating Multidecadal Dynamics of Global Vegetation Leaf Area Index, Journal of Advances in Modeling Earth Systems, 10.1029/2019MS001935, 12, 7, (2020).
    • The Impact of Seasonal and Annual Climate Variations on the Carbon Uptake Capacity of a Deciduous Forest Within the Great Lakes Region of Canada, Journal of Geophysical Research: Biogeosciences, 10.1029/2019JG005389, 125, 9, (2020).
    • Conspectus of flora, fauna and micro-climate data in Tasik Kenyir from Mac 2015–February 2016, Data in Brief, 10.1016/j.dib.2020.105328, (105328), (2020).
    • Developmental trap or demographic bonanza? Opposing consequences of earlier phenology in a changing climate for a multivoltine butterfly, Global Change Biology, 10.1111/gcb.14959, 26, 4, (2014-2027), (2020).
    • Benefits and tradeoffs of reduced tillage and manure application methods in a Zea mays silage system, Journal of Environmental Quality, 10.1002/jeq2.20125, 49, 5, (1236-1250), (2020).
    • PhenoCams for Field Phenotyping: Using Very High Temporal Resolution Digital Repeated Photography to Investigate Interactions of Growth, Phenology, and Harvest Traits, Frontiers in Plant Science, 10.3389/fpls.2020.00593, 11, (2020).
    • Mapping Temperate Forest Phenology Using Tower, UAV, and Ground-Based Sensors, Drones, 10.3390/drones4030056, 4, 3, (56), (2020).
    • Predicted alteration of surface activity as a consequence of climate change, Ecology, 10.1002/ecy.3154, 0, 0, (2020).
    • Comparative Quality and Trend of Remotely Sensed Phenology and Productivity Metrics across the Western United States, Remote Sensing, 10.3390/rs12162538, 12, 16, (2538), (2020).
    • Can landscape heterogeneity promote carnivore coexistence in human-dominated landscapes?, Landscape Ecology, 10.1007/s10980-020-01077-7, (2020).
    • Projected impacts of climate change on the range and phenology of three culturally-important shrub species, PLOS ONE, 10.1371/journal.pone.0232537, 15, 5, (e0232537), (2020).
    • Evaluation of VEGETATION and PROBA-V Phenology Using PhenoCam and Eddy Covariance Data, Remote Sensing, 10.3390/rs12183077, 12, 18, (3077), (2020).
    • On quantifying the apparent temperature sensitivity of plant phenology, New Phytologist, 10.1111/nph.16114, 225, 2, (1033-1040), (2019).
    • Greenness Index from Phenocams Performs Well in Linking Climatic Factors and Monitoring Grass Phenology in a Temperate Prairie Ecosystem, Journal of Resources and Ecology, 10.5814/j.issn.1674-764x.2019.05.003, 10, 5, (481), (2019).
    • Does an Early Spring Indicate an Early Summer? Relationships Between Intraseasonal Growing Degree Day Thresholds, Journal of Geophysical Research: Biogeosciences, 10.1029/2019JG005297, 124, 8, (2628-2641), (2019).
    • Testing Hopkins’ Bioclimatic Law with PhenoCam data, Applications in Plant Sciences, 10.1002/aps3.1228, 7, 3, (2019).
    • Tracking vegetation phenology across diverse biomes using Version 2.0 of the PhenoCam Dataset, Scientific Data, 10.1038/s41597-019-0229-9, 6, 1, (2019).
    • Detecting temporal changes in the temperature sensitivity of spring phenology with global warming: Application of machine learning in phenological model, Agricultural and Forest Meteorology, 10.1016/j.agrformet.2019.107702, 279, (107702), (2019).
    • Phenology, Reference Module in Earth Systems and Environmental Sciences, 10.1016/B978-0-12-409548-9.11739-7, (2019).
    • A spatially explicit modeling analysis of adaptive variation in temperate tree phenology, Agricultural and Forest Meteorology, 10.1016/j.agrformet.2018.12.004, 266-267, (73-86), (2019).
    • Exploring differences in spatial patterns and temporal trends of phenological models at continental scale using gridded temperature time-series, International Journal of Biometeorology, 10.1007/s00484-019-01826-7, (2019).
    • Using R in hydrology: a review of recent developments and future directions, Hydrology and Earth System Sciences, 10.5194/hess-23-2939-2019, 23, 7, (2939-2963), (2019).
    • A simple time-stepping scheme to simulate leaf area index, phenology, and gross primary production across deciduous broadleaf forests in the eastern United States, Biogeosciences, 10.5194/bg-16-467-2019, 16, 2, (467-484), (2019).
    • Tracking seasonal rhythms of plants in diverse ecosystems with digital camera imagery, New Phytologist, 10.1111/nph.15591, 222, 4, (1742-1750), (2018).
    • Experiments Are Necessary in Process-Based Tree Phenology Modelling, Trends in Plant Science, 10.1016/j.tplants.2018.11.006, (2018).
    • pyPhenology: A python framework for plant phenology modelling, Journal of Open Source Software, 10.21105/joss.00827, 3, 28, (827), (2018).
    • Tracking vegetation phenology across diverse North American biomes using PhenoCam imagery, Scientific Data, 10.1038/sdata.2018.28, 5, (180028), (2018).