Volume 10, Issue 8
APPLICATION
Free Access

AeRobiology: The computational tool for biological data in the air

Jesús Rojo

Corresponding Author

E-mail address: jesus.rojo.ubeda@gmail.com

Institute of Environmental Sciences (Botany), University of Castilla‐La Mancha, Toledo, Spain

Center of Allergy & Environment (ZAUM), Member of the German Center for Lung Research (DZL), Technical University of Munich/Helmholtz Center, Munich, Germany

Correspondence

Jesús Rojo

Email: jesus.rojo.ubeda@gmail.com

Search for more papers by this author
Antonio Picornell

Department of Plant Biology, University of Malaga, Malaga, Spain

Search for more papers by this author
Jose Oteros

Center of Allergy & Environment (ZAUM), Member of the German Center for Lung Research (DZL), Technical University of Munich/Helmholtz Center, Munich, Germany

Search for more papers by this author
First published: 06 May 2019
Citations: 20

Abstract

  1. Aerobiological databases are constantly increasing. Many of them contain long and extensive time series of data which are very difficult and tedious to manage.
  2. The development of new real‐time automatic sampling devices also requires new tools to reduce time of calculations and data management. In this sense, the AeRobiology r package has been implemented to accelerate and facilitate these tasks.
  3. This package was structured in three sections based on (a) the checking of the database, (b) calculation of the main aerobiological indexes and (c) visualization of the results.
  4. The AeRobiology package contains numerous functions which, in conjunction, solve the main general tasks that scientists must assume for the analysis of the biological data.
  5. The package is freely distributed under GNU General Public License and can directly be installed from CRAN (http://cran.r-project.org/). The reference manual is available at https://cran.r-project.org/web/packages/AeRobiology. Contact: aerobiology.package@gmail.com.

1 INTRODUCTION

The analysis of the biotic content of the troposphere is very interesting from ecological, agronomic and medical points of view (Oteros et al., 2019; Recio et al., 2018; Romero‐Morte, Rojo, Rivero, Fernández‐González, & Pérez‐Badia, 2018). AeRobiology is the scientific field based on the study of abundance and dynamics of bioaerosols (pollen, spore, bacteria, virus …) in the air. Different devices have been used all over the world for monitoring airborne biological particles, but during the last decades a standardized methodology has been established in Europe based on the use of Hirst‐type volumetric traps (Buters et al., 2018; Galán et al., 2014). The foundation of the aerobiological monitoring networks in Europe and other parts of the world during the recent decades has increased the amount of scientific publications related to aerobiological topics (Beggs, Šikoparija, & Smith, 2017).

New aerobiological methods with automatic sampling devices have been developed over the last years and new devices are under improvement (Crouzy, Stella, Konzelmann, Calpini, & Clot, 2016; Kawashima et al., 2017; Oteros et al., 2015). Such novel techniques will require new methods for analysing the huge amount of real‐time aerobiological data provided (Ghitarrini et al., 2018). In any case, conventional and real‐time methods need faster techniques for analysing the aerobiological data. Programming in an environment as  r Software (R Core Team, 2018) is an efficient solution to interpret the results from the raw data (e.g., Big Data analysis, generation of reports or developing forecasting models). In addition, other advantages of using programming for data science are the reproducibility of the results or the robustness of the analysis against human errors.

Some functionalities such as the definition of the pollen season or the replacement of outliers in aerobiological data were previously implemented in r Software by the pollen package (Nowosad, 2018). Other packages related to air pollution analysis such as openair are helpful for modelling (Carslaw & Ropkins, 2012). forecast package is also useful for general statistical methods (e.g., time series analysis) (Hyndman & Khandakar, 2008). However, a new tool was necessary to manage and visualize aerobiological data, including tasks such as checking the aerobiological time series, filling the missing gaps in databases, elaborating the main calculations or multiple visualizations of the results. Following these objectives, the AeRobiology package has been developed for r Software (R Core Team, 2018).

2 STRUCTURE OF THE AeRobiology PACKAGE

The functions of the package AeRobiology are structured in three sections based on their functionality (Table 1): (a) checking the data quality, (b) data analysis and (c) visualization of the results.

Table 1. List of main functions included on the AeRobiology package
Section Functions Description
Checking the data quality quality_control Checking the quality of an historical database of several types of bioaerosol
interpollen Filling the gaps of the missing data according to different implemented methods of interpolation
Data analysis calculate_ps Estimation of the main parameters of the pollen or spore season
pollen_calendar Pollen calendar using different commonly used methods from a historical pollen database
analyse_trend Trends analysis for the main seasonal indexes of the reproductive season (start‐dates, peak‐dates, end‐dates and amount of bioaerosols)
Visualization of the results iplot_abundance Plot of the relative abundance of the bioaerosols in the air
iplot_pheno Phenological plot based on phenological parameters such as start and end‐dates of the reproductive season
iplot_pollen Interactive plotting pollen data (one reproductive season)
iplot_years Interactive plotting pollen data (one type of bioaerosol)
plot_ps Plotting the reproductive season of a single type of bioaerosol
plot_normsummary Plotting the amplitude of several reproductive seasons
plot_summary Plotting the data during several reproductive seasons
plot_trend Calculating and plotting trends from a historical pollen database
plot_hour Intradiurnal analysis and plotting of the hourly patterns of pollen concentration
plot_heathour Graphical representation of the hourly patterns of pollen concentrations as heatplot

2.1 Checking the data quality

The required data input in the functions of the package consists of a database of biological particles in air structured as a time series, that is, different types of bioaerosol measured along a time sequence. This dataset for the AeRobiology package must include a first column with dates and the rest of columns belonging to each bioaerosol type. Presumably, AeRobiology package will be applied to manage databases for biological data in the air, despite several general functionalities, such as quality control or interpolation of time series, could also be applied to other environmental data (e.g., meteorological time series).

In a global scientific discipline as aerobiology, it is very important to maintain a quality control that allows the data to be comparable and reproducible. This quality control have been carried out in Europe to check each step of the sampling procedures (Galán et al., 2014; Oteros, Galán, Alcázar, & Domínguez‐Vilches, 2013). Moreover, the quality_control function from the AeRobiology package was implemented in order to check the quality of an historical database before data analysis. The dataset is checked searching for missing data within time series and regarding to certain criteria, for example, number and length of the gaps and position of the gaps during the season. Then, the user may fill the gaps according to different methods implemented in the interpollen function: application of the moving average of the daily concentrations; application of different regressions models (linear or spline regressions); interpolation according to the seasonality of the historical database; and interpolation of the missing data of a station by modelling its airborne particulate matter using data from a neighbour station. These functions were implemented for aerobiological tasks but other environmental time series could be checked and completed using this r package (e.g., databases of inorganic air pollutants or meteorological time series).

2.2 Data analysis

The dynamics of the bioaerosols are determined by phenological parameters such as the dates of onset and end of the period of the year when a given particle is registered in the air. Nevertheless, the methods to calculate these parameters are subject to controversy (Bastl, Kmenta, & Berger, 2018) and the use of one or another method depends on the purpose and the type of bioaerosol (Jato et al., 2006). All the actual methods used to estimate the pollen season have been implemented in the calculate_ps function, obtaining easily the results about the phenological parameters and pollen intensity measurements (Andersen, 1991; Galán, García‐Mozo, Cariñanos, Alcázar, & Domínguez‐Vilches, 2001; Pfaar et al., 2017; Ribeiro, Cunha, & Abreu, 2007). Moreover, a new method is proposed for the first time in this package by the authors. According to this new method, the definition of the pollen season is established when the moving average of the aerobiological series reaches a threshold. This method avoids the high variability introduced by daily fluctuations of the concentrations of airborne particles, which can make more difficult the estimation of the phenological dates. Therefore, AeRobiology package represents a crucial contribution to the automation of the analysis of a key aerobiological issue such as the definition of the pollen season.

Another important aerobiological task is the calculation of the pollen or spore calendar, which is a graphical representation of the biological spectrum of the atmosphere at a certain location throughout the year. These calendars include information about phenology and intensity of the concentration of different types of bioaerosol in the air. The pollen_calendar function allows to calculate the calendar from a historical aerobiological database using the most common methods in the aerobiology field (O'Rourke, 1990; Rojo et al., 2016; Spieksma, 1991; Werchan, Werchan, & Bergmann, 2018).

In addition, the general spectrum of bioaerosols in the air may be analysed by using the iplot_abundance function. Thus, the relative abundances (as percentages) of the specific biological particles in the air are calculated. In a long‐term temporal context, studies focused on trends of time series are very useful when analysing extensive historical series of environmental data. For this reason, the analyse_trend function was included in the AeRobiology package. These functions allow to carry out trend analysis of the main seasonal indexes over several years, a crucial analysis to evaluate the influence of climatic variations over the dynamics of the bioaerosols (Recio et al., 2018).

2.3 Visualization of the results: static and interactive plots

AeRobiology package offers numerous options to visualize results. On the one hand, a proper visualization of the raw aerobiological data may be a great help before analysing the data (plot_normsummary and plot_summary functions). In addition, the user may carry out a comparison between the airborne daily concentrations of different years and types of bioaerosols (iplot_pollen and iplot_years functions) or may want to graphically visualize the phenological parameters (iplot_pheno function). Intradiurnal analysis and graphical representation of the hourly patterns of pollen concentration may be performed with plot_hour and plot_heathour functions. Furthermore, plot_ps and pollen_calendar functions provide different graphical representations. Most of the functions included in the AeRobiology package provide some kinds of visualization, either to internally check the results as well as to produce elegant figures that the users may publish. The users may choose between static (Wickham, 2016) and interactive plots (Chang, Cheng, Allaire, Xie, & McPherson, 2018; Sievert, 2018) for improving the interpretation of the results.

3 EXAMPLE OF APPLICATION

This example illustrates how the AeRobiology package is applied for the management and analysis of the pollen database of Munich (period 2010–2015). This database was provided by the research team of Prof. Jeroen Buters (Zentrum Allergie und Umwelt, ZAUM, directed by Prof. Carsten B. Schmidt‐Weber).
  • > library (AeRobiology)
  • > data(munich_pollen)
Then, in the Figure 1 the outputs obtained by several functions of the AeRobiology package are shown. In the first step, it is very interesting to check the quality of the data (Figure 1.1) and to fill the gaps of the missing data using a spline regression (Figure 1.2):
  • > quality_control(munich_pollen, int.window = 2, perc.miss = 20) # Figure 1.1
  • > interpollen(munich_pollen, method = "spline", spar = 0.2) # Figure 1.2
image
Example of the different tasks implemented in the AeRobiology package. All these graphs may be obtained by running the script provided in the example of the text: 1 Quality control; 2 Interpolation of the pollen data for filling the missing data; 3 Estimation of the pollen season for Betula pollen; 4 Pollen calendar; 5 Relative abundances of the bioaerosols; 6 Main phenological parameters; 7 Graph for the average and amplitude of the Poaceae pollen; 8 Trends of the pollen time series. Pollen data obtained at Munich (Zentrum Allergie und Umwelt, ZAUM) and provided in this package
The users may perform numerous relevant calculations in the field of aerobiology, for example to calculate phenological parameters of the season or generate a calendar that represents the abundance and dynamics of the bioaerosols in the atmosphere. This example shows the calculation of the season (start, peak and end‐dates) for the Betula pollen season following the criterion proposed by Pfaar et al. (2017) for birch (Figure 1.3). Furthermore, Figure 1.4 shows the output of the calculation of the calendar using a violin plot design (O'Rourke, 1990):
  • > plot_ps(munich_pollen, pollen.type = "Betula", year = 2011, method = "clinical", type = "birch") # Figure 1.3
  • > calendar.plot <‐ pollen_calendar(munich_pollen, method = "violinplot") # Figure 1.4
Different examples of visualization of the data are shown in Figure 1 such as graphs about relative abundance of the bioaerosols (Figure 1.5), phenological graphs (Figure 1.6), graphs showing the average daily concentrations and the amplitude of a specific biological particle (Figure 1.7) or graphs about trends of the data (Figure 1.8):
  • > iplot_abundance(munich_pollen, n.types = 7, type.plot = "static") # Figure 1.5
  • > iplot_pheno(munich_pollen, method = "percentage", perc = 95, type.plot = "static") # Figure 1.6
  • > plot_normsummary(munich_pollen, pollen = "Poaceae", color.plot = "darkgreen") # Figure 1.7
  • > analyse_trend(munich_pollen, export.plot = FALSE, export.result = FALSE)[[1]] # Figure 1.8

4 CONCLUSIONS

The AeRobiology r package means an important contribution to the scientific field of any aerosol science, especially for aerobiology, since it provides proper solutions for the automatization of calculations and time‐consuming procedures of the data analysis. This advance in efficiency and effectiveness for the data analysis is especially relevant in this historical moment where for the first time technological development allows us to automatically get reliable and big databases about the biological quality of the air.

ACKNOWLEDGEMENTS

We specially acknowledge the team of Prof. Jeroen Buters (Christine Weil & Ingrid Weichenmeier) and the Zentrum Allergie und Umwelt (ZAUM, directed by Prof. Carsten B. Schmidt‐Weber) for the contribution of the pollen data from Munich. We acknowledge our institutions (University of Castilla‐La Mancha [UCLM], University of Malaga [UMA] and ZAUM) for their support. A.P. was supported by a predoctoral grant financed by the Ministry of Education, Culture and Sport of Spain, in the Program for the Promotion of Talent and its Employability (FPU15/01668). J.O. was supported by a Postdoctoral grant of Helmholtz Zentrum Munich PFP II 2018‐2020. We are also grateful to Dr. Jakub Nowosad for many helpful suggestions to improve the r package.

    AUTHORS' CONTRIBUTIONS

    J.R., A.P., and J.O. designed the  r package; J.R., A.P., and J.O. contributed to developing and checking the r functions; and J.R., A.P., and J.O. wrote and revised this manuscript.

    DATA ACCESSIBILITY

    The package is freely distributed under GNU General Public License (GPL) and can directly be installed from CRAN (http://cran.r-project.org/). The reference manual is available at https://cran.r-project.org/web/packages/AeRobiology.

      Number of times cited according to CrossRef: 20

      • High post-season Alnus pollen loads successfully identified as long-range transport of an alpine species, Atmospheric Environment, 10.1016/j.atmosenv.2020.117453, (117453), (2020).
      • Understanding hourly patterns of Olea pollen concentrations as tool for the environmental impact assessment, Science of The Total Environment, 10.1016/j.scitotenv.2020.139363, 736, (139363), (2020).
      • Simplified procedures for seismic design verification and evaluation of lead rubber bearing base‐isolated buildings based on free‐vibration response, The Structural Design of Tall and Special Buildings, 10.1002/tal.1751, 29, 12, (2020).
      • Effects of Heat Waves and Light Deprivation on Giant Kelp Juveniles (Macrocystis pyrifera, Laminariales, Phaeophyceae), Journal of Phycology, 10.1111/jpy.13000, 56, 4, (880-894), (2020).
      • Evaluation of fish freshness using impedance spectroscopy based on the characteristic parameter of orthogonal direction difference, Journal of the Science of Food and Agriculture, 10.1002/jsfa.10435, 100, 11, (4124-4131), (2020).
      • HYSPLIT as an environmental impact assessment tool to study the data discrepancies between Olea europaea airborne pollen records and its phenology in SW Spain, Urban Forestry & Urban Greening, 10.1016/j.ufug.2020.126715, (126715), (2020).
      • “Whole” vs. “fragmented” approach to EAACI pollen season definitions: A multicenter study in six Southern European cities, Allergy, 10.1111/all.14153, 75, 7, (1659-1671), (2020).
      • Estimation of Chilling and Heat Accumulation Periods Based on the Timing of Olive Pollination, Forests, 10.3390/f11080835, 11, 8, (835), (2020).
      • Impact of Plane Tree Abundance on Temporal and Spatial Variations in Pollen Concentration, Forests, 10.3390/f11080817, 11, 8, (817), (2020).
      • Meteorological factors driving airborne grass pollen concentration in central Iberian Peninsula, Aerobiologia, 10.1007/s10453-020-09647-7, (2020).
      • Indoor biological particles in a train: comparative analysis with outdoor atmosphere, Aerobiologia, 10.1007/s10453-020-09646-8, (2020).
      • DNA barcoding and mucilage ducts in the stipe reveal the presence of Hedophyllum nigripes (Laminariales, Phaeophyceae) in Kongsfjorden (Spitsbergen), Journal of Phycology, 10.1111/jpy.13012, 0, 0, (2020).
      • Contribution of land cover and wind to the airborne pollen recorded in a South European urban area, Aerobiologia, 10.1007/s10453-020-09634-y, (2020).
      • Land-Use and Height of Pollen Sampling Affect Pollen Exposure in Munich, Germany, Atmosphere, 10.3390/atmos11020145, 11, 2, (145), (2020).
      • The development of birch pollen seasons over 30 years in Munich, Germany—An EAACI Task Force report*, Allergy, 10.1111/all.14470, 0, 0, (2020).
      • Medium- and long-range transport events of Alnus pollen in western Mediterranean, International Journal of Biometeorology, 10.1007/s00484-020-01944-7, (2020).
      • The late flowering of invasive species contributes to the increase of Artemisia allergenic pollen in autumn: an analysis of 25 years of aerobiological data (1995–2019) in Trentino-Alto Adige (Northern Italy), Aerobiologia, 10.1007/s10453-020-09663-7, (2020).
      • Detecting distant sources of airborne pollen for Poland: Integrating back-trajectory and dispersion modelling with a satellite-based phenology, Science of The Total Environment, 10.1016/j.scitotenv.2019.06.348, 689, (109-125), (2019).
      • Building an automatic pollen monitoring network (ePIN): Selection of optimal sites by clustering pollen stations, Science of The Total Environment, 10.1016/j.scitotenv.2019.06.131, 688, (1263-1274), (2019).
      • Preliminary study of the atmospheric pollen in Sierra de las Nieves Natural Park (Southern Spain), Aerobiologia, 10.1007/s10453-019-09591-1, (2019).