Volume 14, Issue 6 p. 1424-1431
PRACTICAL TOOLS
Open Access

iPhenology: Using open-access citizen science photos to track phenology at continental scale

Yves P. Klinger

Corresponding Author

Yves P. Klinger

Division of Landscape Ecology and Landscape Planning, Justus Liebig University Gießen, Gießen, Germany

Correspondence

Yves P. Klinger

Email: [email protected]

Search for more papers by this author
R. Lutz Eckstein

R. Lutz Eckstein

Department of Environmental and Life Sciences, Biology, Karlstad University, Karlstad, Sweden

Search for more papers by this author
Till Kleinebecker

Till Kleinebecker

Division of Landscape Ecology and Landscape Planning, Justus Liebig University Gießen, Gießen, Germany

Search for more papers by this author
First published: 23 April 2023
Citations: 1
Handling Editor: Hooman Latifi

Abstract

en

  1. Photo observations are a highly valuable but rarely used source of citizen science (CS) data. Recently, the number of publicly available photo observations has increased strongly, for example, due to the use of smartphone applications for species identification. This has enabled the raising of ecological insights in poorly studied subjects. One of the fields with the highest potential to benefit from the use of photo observations is phenology.
  2. We propose a workflow for iPhenology, the use of publicly available photo observations to track phenological events at large scales. The workflow comprises data acquisition, cleaning of observations, phenological classification and modelling spatiotemporal patterns of phenology. We explore the suitability of iPhenology to observe key phenological stages in the plant reproductive cycle of a model species and discuss limitations and future prospects of the approach using the example of an invasive species in Europe.
  3. We show that iPhenology is suitable to track key phenological events of widespread species. However, the number and quality of available observations may differ among species and phenological stages.
  4. Overall, publicly available CS photo observations are suitable to track key phenological events and can thus significantly advance the knowledge on the timing and drivers of plant phenology. In future, integrating the workflow with automated image processing and analysis may enable real-time tracking of plant phenology.

Abstract

de

  1. Fotobeobachtungen sind eine äußerst wertvolle, aber selten genutzte Quelle von Citizen Science Daten. In jüngster Zeit hat die Zahl der öffentlich verfügbaren Fotobeobachtungen stark zugenommen, unter anderem durch den Einsatz von Smartphone-Apps zur Artbestimmung. Diese Entwicklung ermöglicht es, Erkenntnisse in wenig erforschten Gebieten der Ökologie zu gewinnen. Die Phänologie ist einer der Bereiche mit dem größten Potenzial, von der Nutzung von Fotobeobachtungen zu profitieren.
  2. Wir beschreiben einen Workflow für iPhenology, die Nutzung öffentlich zugänglicher Fotobeobachtungen zur Analyse phänologischer Ereignisse. Der Workflow umfasst Datenerfassung, Bereinigung der Beobachtungen, phänologische Klassifizierung und die Modellierung räumlich-zeitlicher Muster. Wir untersuchen die Eignung von iPhenology zur Beobachtung unterschiedlicher phänologischer Ereignisse im pflanzlichen Reproduktionszyklus anhand einer Modellart und erörtern aktuelle Begrenzungen/Einschränkungen und zukünftige Anwendungsgebiete des Ansatzes.
  3. Wir zeigen, dass öffentlich verfügbare Fotobeobachtungen geeignet sind, phänologische Schlüsselereignisse wie die Blüte oder Fruchtreife weit verbreiteter Arten zu analysieren. Die Anzahl und Qualität der verfügbaren Beobachtungen kann sich jedoch je nach Art und phänologischer Phase stark voneinander unterscheiden.
  4. Fotobeobachtungen sind geeignet, um wichtige phänologische Ereignisse zu analysieren und können somit das Wissen über den zeitlichen Ablauf und die Einflussfaktoren der phänologischer Phasen erheblich erweitern. In Zukunft könnte die Integration des iPhenology-Workflows mit Tools zur automatischen Bildverarbeitung und -analyse die Echtzeitbeobachtung phänologischer Ereignisse ermöglichen.

INTRODUCTION

Citizen science (CS) comprises the participation of nonprofessional volunteers in scientific projects or investigations. During the last decades, CS has enabled data collection at unprecedented scales (Dickinson et al., 2012). Among the different types of data collected by citizen scientists, photo observations are an invaluable but underused source of research data (Depauw et al., 2022). Recently, the collection of photos potentially available for ecological research has strongly increased, mainly due to the use of smartphone applications for species identification, such as iNaturalist (inaturalist.org), Pl@ntNet (plantnet.org) or Flora incognita (Mäder et al., 2021). These applications have become widely popular during the last years with more than 12 million downloads for the abovementioned apps alone. This has drastically increased both the number and range of photo observations. Many photos are (semi-)automatically used to populate databases such as the Global Biodiversity Information Facility (GBIF) (gbif.org). This upsurge enables the raising of ecological insights in relatively poorly studied subjects or processes.

Phenology remains one of the most understudied aspects of plant functional ecology (Garnier et al., 2016). The capability to reach pivotal reproductive stages such as bud burst, flowering and seed production under differing climates is essential to persist across geographical ranges. Consequently, studies on the latitudinal distribution of species can give valuable information on their climatic niches and are crucial to predict future range shifts under a warming climate. Despite their relevance, empirical phenological studies across large geographical scales are limited (but see Ludewig et al., 2022; Nordt et al., 2021). This is mainly because such studies require frequent simultaneous observations across different latitudes. This is very time-intensive and usually requires the cooperation of scientists from many different institutions, which makes observations costly and hard to organize. Accordingly, such projects can only observe a limited number of individuals, species and ecosystems, often under relatively artificial conditions. Due to these limitations, phenological research is one of the fields that can benefit the most from the use of CS photo observations.

Despite their potential in phenology (Depauw et al., 2022), few papers have yet explored the use of CS photo observations to track phenological events across large scales (but see Puchałka et al., 2022; Reeb et al., 2022). As CS data collected without guidance may be biased concerning their spatial and temporal distribution (Pötzelsberger et al., 2021; Tiago et al., 2017), it can be difficult for researchers interested in phenology to deal with challenges arising from the use of CS data. Here, we propose a workflow for iPhenology, the use of publicly available photos to track phenological events (following the iEcology definition of Jarić et al., 2020). In the workflow, we check the data for frequently occurring biases, provide an example of phenological classification and model the spatiotemporal patterns of two key phenological stages. We use the herbaceous species Lupinus polyphyllus to demonstrate the workflow of iPhenology. The species is particularly suitable, as it is widespread, has prominent flowers and fruits, and is easy to identify due to the lack of native relatives in Europe (Eckstein et al., 2023). Through the utilization of photo observations, we track the flowering and fruiting stages in the invaded range across Europe. In particular, we address the following questions:
  1. Can key phenological stages be tracked at the continental scale using iPhenology?
  2. What are the current limitations and future prospects of iPhenology?

METHODS

The workflow for iPhenology comprises the acquisition of photo observations, pre-processing, and phenological classification (Figure 1). With the resulting phenological observations, a plethora of potential analysis is possible, for example, assessing spatiotemporal patterns or modelling climatic drivers of phenology (see e.g. Puchałka et al., 2022).

Details are in the caption following the image
Proposed workflow for iPhenology. First, observations are pre-processed by removing problematic observation and, if necessary, reducing spatial aggregation. Second, photos are checked for correct identification and suitability before being classified. Unsuitable or misidentified photos are removed. For the resulting phenological observations, there are many potential uses.

Image acquisition and pre-processing

We acquired data using the GBIF (gbif.org). In the GBIF query, L. polyphyllus was selected based on human observations providing at least one image. The query covered the distribution of the species across Central and Northern Europe and included photo observations from −4.6° to 41.3° Longitude and from 42.3° to 69.7° Latitude for the years 2018–2021. The observations originate from several sources (e.g. iNaturalist Research-grade observations, Plant.net observations, Swedish Species Observation Service records and others). The full original dataset contains 8429 observations and is available under https://doi.org/10.15468/dl.jnxvnn, classified observations are available under https://doi.org/10.5061/dryad.h70rxwdpk.

We used the CoordinateCleaner package (Zizka et al., 2019) to identify and remove observations with problematic coordinates (e.g. locked to centroids of administrative areas or located in the ocean). Furthermore, coordinates with an uncertainty of >20 km were removed from the dataset. Additionally, observations with identical coordinates were removed. After cleaning the data, we used a custom R-script to access links to the images in each GBIF observation (Appendix S1). Images were then manually classified according to their phenology using a modified version of the R-Shiny application found in the appendix of Puchałka et al. (2022). Classification of ~8,000 images took approximately one working day (8 h) each for a researcher and a student assistant experienced in phenological classification.

For phenological classification, we used photo observations showing a full stand, plant or inflorescence. If multiple images were associated to one observation, we checked the first image for suitability. Originally, we distinguished between seven phenological stages, based on flowering and fruiting phenology, which were then aggregated to four main stages (vegetative, flowering, fruiting and open pods; Table S1). Classification was based on the majority of flowers, individuals were classified to be flowering and fruiting at the same time if both flowers and fully developed fruits were visible. Observations were assigned to the class open pods when any opened pods were visible. Images with low resolution, displaying only single leaves or small parts of inflorescences or with personal information/people visible were excluded from the analysis and classified as either unsuitable for phenological classification or misidentified if no specimen of L. polyphyllus could be found in the image.

Data analysis

For the analysis of spatiotemporal patterns of the flowering phenology, observations were dummy-coded to represent flowering (classes 2–5 in Table S1) vs. non-flowering (all others) and fruiting (class 6) vs. non-fruiting (all others) individuals. We used generalized additive models (GAMs) for binomially distributed response data to model the probability of flowering/fruiting of L. polyphyllus using restricted maximum likelihood estimation. GAMs provide a highly flexible framework for exploratory data analysis and can be used to model both spatial and temporal patterns in complex datasets (Wood, 2017). We included smoothers for the day of the year (doy), the coordinates (long/lat) and the interaction between both as explanatory variables. Year was included as factor to account for yearly variation in phenology. To address seasonal patterns emerging from phenological data, we included fixed knots for doy at the start and the end of each year. Elevation above sea level (GMTED, 2010) and if the observations were located in urban heat islands, areas with a distinct urban microclimate (CIESIN-Columbia University, 2016), were included first but later removed as they had no effect on flowering or fruiting phenology. Model assumptions were checked using the gam.check function. Analyses were performed using the mgcv package (Wood, 2017) in R 4.1.3 (R Core Team, 2022), plots were created using the ggplot2 package (Wickham, 2016).

RESULTS

Spatiotemporal patterns

Suitable photo observations of L. polyphyllus (n = 5780) showed a distinct temporal pattern with a summer peak in June (Figure 2a), following the phenological cycle of the species (Figure 2b). The number of photo observations of L. polyphyllus increased nearly 10-fold from 185 observations in 2018 to 1646 observations in 2020 and stayed on a similar level in 2021 (Figure 2a). In the course of the year, most observations were performed in spring and summer (May–July), when the species is flowering. Thus, more than half of photo observations (2958) showed flowering or flowering and fruiting (640) individuals, whereas vegetative (1290) and fruiting (484) specimens were observed less frequently (Figures 2b and 3, Table S2). Almost half of suitable observations (48%) were carried out in urban heat islands (CIESIN-Columbia University, 2016). Unsuitable observations of L. polyphyllus (n = 1570) were mostly due to low image quality, images not allowing phenological classification, or the specimen being removed from its habitat. Only very few (23) observations were misidentified, that is they did not show an individual of L. polyphyllus.

Details are in the caption following the image
Temporal patterns of photo observations of L. polyphyllus. (a) Monthly number of photo observations for the years 2018–2021 (n = 5780). While the number of observations has increased between 2018 and 2020, the highest number of observations is carried out in June, when the species is flowering in Europe. (b) Box-Whisker-Plot showing the timing of observations classified to main phenological stages vegetative (n = 1290), flowering (n = 2958), flowering and fruiting (n = 640), fruiting (n = 484) and open pods (n = 408). Plot shows median ± 1,5-fold IQR.
Details are in the caption following the image
Spatial distribution of CS observations of different phenological stages 2018–2021. CS observations mostly comprise (b) flowering or (c) flowering and fruiting individuals, whereas (a) vegetative and (d) fruiting individuals are observed less frequently. Senescent plants with open seed pods (e) are observed the rarest

Flowering and fruiting phenology

According to the model prediction, flowering of L. polyphyllus begins in the Southwest of the Europe and gradually extents further north (Figure 4). Additionally, southern populations of L. polyphyllus have either an extended or a second flowering period in July–August. Flowering probability of L. polyphyllus changed with doy (p < 0.001) and longitude/latitude (p < 0.001) and there was a significant interaction between both (p < 0.001, adjusted R2 = 0.55). There were significant but minor differences in total flowering probability between the 3 years (2019–2021), with year 2021 having lower flowering probabilities than 2019 and 2020. Fruiting of L. polyphyllus was characterized by one peak between doy 180 and 220, depending on the location. As for flowering, the GAM model identified a distinct pattern of fruiting with changing doy (p < 0.001) and longitude/latitude (p < 0.001; Figure 5). According to the model, the fruiting probability was lower compared to the flowering model, and the model captured less variation in the data (adjusted R2 = 0.3).

Details are in the caption following the image
Predicted flowering probabilities of L. polyphyllus across Europe. Flowering happens between May and October. Northern populations flower later and shorter compared to Southern populations, that can have a second flowering period in August. Model predictions shown for year 2021.
Details are in the caption following the image
Predicted fruiting probabilities of L. polyphyllus across Europe. Fruiting happens between June and October. Northern populations fruit later and shorter compared to Southern populations. Compared to flowering, fruiting probability reaches a lower maximum (~0.8), as less fruiting specimens are observed. Model predictions for year 2021.

DISCUSSION

Our results confirm that iPhenology, the use of publicly available CS photo observations, is suitable to track plant phenology across large geographical scales. It can be assumed that prominent and widespread species are most suitable for the approach. However, depending on the ecology of the target species and the behaviour of citizen scientists, there may be large differences between the observed phenological phases and the suitability of photos for phenological classification. In this regard, Puchalka et al. (2022) found that for the forest herb Anemone nemorosa, most observations comprised flowering individuals and available CS data were less suitable to track other phenological stages such as fruiting. In contrast to other CS data, photo observations can be checked for correct identification and suitability before further handling. For our model species, most images were suitable for phenological classification and photo observations covered the distributional range of L. polyphyllus across Europe (cf. Eckstein et al., 2023). Thus, they can be considered representative of the species’ geographical distribution.

Despite the spatial aggregation of observations in densely populated areas, which is typical for CS observations (Speed et al., 2018), more than half of observations were carried out outside of highly urbanized areas. However, it has been shown that opportunistic CS activities are prone to further spatial bias such as being more frequent in areas with higher accessibility (e.g. higher density of roads and footpaths, Tiago et al., 2017), or for some species groups in protected areas (Girardello et al., 2019). For plants, Tiago et al. (2017) also found a highly significant positive relationship between number of CS observations and the cover of forests and other (semi-)natural habitats. Since the habitat-specific species pool of these habitats is large (e.g. Jiménez-Alfaro et al., 2018; Pärtel et al., 2005) and many widespread non-native species reside in anthropogenic habitats such as roadsides (Meyer et al., 2021), we suggest that opportunistic CS observations may provide robust representations of the distributions of forest and grassland plants as well as many invasive species. To address potential issues, methods to identify spatial bias in the data, such as cleaning dubious coordinates (Zizka et al., 2019) may be necessary, although suitable tests and thresholds strongly depend on the data at hand (Zizka et al., 2020). In our workflow, we use GAMs to model the timing of flowering and fruiting across Europe. GAMs provide a flexible framework to model complex relationships and have been widely used for ecological data (Wood, 2017). Alternatively, tools to reduce spatial aggregation such as spatial thinning (Aiello-Lammens et al., 2015) and spatial regression models (Kühn, 2007) may help address oversampling in densely populated areas, but require informed choices, as they can also negatively affect the performance of ecological models (Steen et al., 2021). Despite the potential of iPhenology, our results clearly show that not all phenological phases are observed to the same extent. Furthermore, less accessible areas and ecosystems may be under-represented in the dataset and for species not typically found in anthropogenic habitats, spatial bias may be more problematic compared to our model species.

Concerning temporal patterns, observations of L. polyphyllus clearly followed its phenological cycle (Ludewig et al., 2022). It has been shown that temporal patterns in iEcology data follow human interactions with species, for example, for Google searches on ticks in Denmark (Jensen et al., 2022). These patterns are probably amplified for species observations using mobile applications, where the observation process is part of the interaction. Furthermore, plant species are mostly observed during summer, as was shown for example, for Norway (Speed et al., 2018), when many plants in temperate regions are flowering and more people are outdoors carrying out recreational activities. This finding is confirmed in our dataset, where more than half of observations comprise flowering individuals. By identifying spatiotemporal patterns and potential bias in the available observations, targeted CS assessments of underrepresented species, areas or phenological phases may help address such issues (Aavik et al., 2020).

Potential for future applications

iPhenology, the observation of phenological events using publicly available CS photo observations, is highly promising approach to advance phenological research for many widespread species. Among the many potential fields of application are comparing expert-based phenology data with CS data, modelling climatic drivers of phenology using CS observations or determining the right timing for the management of invasive alien species based on their phenology. In future, phenological classification of CS photos using deep learning may allow automated real-time assessments of phenological events for a vast number of species.

AUTHOR CONTRIBUTIONS

Yves P. Klinger conceptualized the study, performed the data analysis and led the manuscript preparation with input from Till Kleinebecker and R. Lutz Eckstein. All authors edited the manuscript.

ACKNOWLEDGEMENTS

We are grateful to several hundred citizen scientists that contributed to this study. We thank two anonymous reviewers for their insightful comments on a previous version of the manuscript. We greatly thank A. Horn, L. Degott and H. Paikert for invaluable help with data mining and phenological classification and M. Spenner for drawings of phenological phases. Open Access funding enabled and organized by Projekt DEAL.

    CONFLICT OF INTEREST STATEMENT

    None.

    DATA AVAILABILITY STATEMENT

    All data used for this study are publicly available in the Global Biodiversity Information Facility (gbif.org; https://doi.org/10.15468/dl.jnxvnn), classified observations are available under https://doi.org/10.5061/dryad.h70rxwdpk.