Volume 30, Issue 6
Standard Paper
Open Access

The importance of biotic interactions for the prediction of macroinvertebrate communities under multiple stressors

Nele Schuwirth

Corresponding Author

Eawag: Swiss Federal Institute of Aquatic Science and Technology, Ueberlandstrasse 133, P.O.Box 611, CH‐8600 Dübendorf, Switzerland

Correspondence author. E‐mail: nele.schuwirth@eawag.chSearch for more papers by this author
Anne Dietzel

Eawag: Swiss Federal Institute of Aquatic Science and Technology, Ueberlandstrasse 133, P.O.Box 611, CH‐8600 Dübendorf, Switzerland

Search for more papers by this author
Peter Reichert

Eawag: Swiss Federal Institute of Aquatic Science and Technology, Ueberlandstrasse 133, P.O.Box 611, CH‐8600 Dübendorf, Switzerland

Search for more papers by this author
First published: 13 November 2015
Citations: 14

Summary

  1. The community assembly of macroinvertebrates in streams depends on the regional taxon pool, dispersal limitations, local habitat conditions and biotic interactions. By integrating existing knowledge about these processes from theoretical ecology in a mechanistic model, we can test our mechanistic understanding and disentangle multiple stressor effects on community assembly.
  2. To assess to which degree we can predict the community composition of macroinvertebrates, we integrated these processes in the mechanistic food web model Streambugs and tested it on 36 sites in the Glatt catchment on the Swiss plateau. The model predicts the observation probability of taxa from a regional taxon pool at each site taking into account uncertain knowledge on parameters, environmental conditions at the sites and sampling errors.
  3. We use allometric scaling according to the metabolic theory of ecology, ecological stoichiometry and autecological data from trait databases that include the current knowledge on habitat requirements of the different taxa to parameterize their growth, respiration and death.
  4. Without any calibration, for the majority of taxa at the 36 sites, the difference between the observed and predicted relative frequency of occurrence is <50% when taking prior parameter uncertainty and the uncertainty of environmental conditions into account (79% compared to 61% for the random model). By calibrating taxon‐specific modification factors for the growth rate, we can increase the model compliance with data.
  5. Analysing the influence of different ecological traits and their corresponding environmental influence factors reveals that feeding types and sensitivity to organic toxicants contribute most to the predictive capabilities of the model in this catchment. The influence of temperature stress and oxygen depletion due to pollution with organic matter on the community composition is negligible. These results confirm our expectations regarding the most important water quality issues of streams on the Swiss plateau. Current velocity plays an intermediate role in this model application.
  6. The contribution of the feeding types to model performance highlights the importance of taking biotic interactions (competition for food sources and predator–prey interactions) into account to predict the coexistence of taxa. Better knowledge of the actual feeding links in the food web (e.g. from gut content or stable isotope data) that are currently inferred from feeding types, body size and food availability could further improve this approach.

Introduction

Ecologists are interested in community assembly and the coexistence of species for decades (e.g. Hutchinson 1961; Diamond 1975; Connor & Simberloff 1979). Understanding the rules that determine the formation of local communities from species of a regional pool is a fundamental aim in a broad range of research fields, such as evolutionary biology, biogeography, community ecology, conservation biology, and biodiversity research across terrestrial, marine and freshwater ecosystems. This topic is of particular importance regarding pressures induced by rapid climate and land‐use changes, which influence the structure (biodiversity) and functioning of ecosystems. It is therefore an attractive challenge to combine the wealth of knowledge about important processes that control the community assembly into a mechanistic model that is intended to explore to which degree we can predict the local community composition.

Community assembly can be described by the following processes (e.g. Lake, Bond & Reich 2007; Hillerislambers et al. 2012): a regional species pool is formed by evolutionary and biogeographic processes. Dispersal and chance events lead to the colonization of local sites. Environmental factors (such as habitat conditions and food availability) as well as biotic interactions (intra‐ and interspecific competition, predation) finally determine the local coexistence of taxa. Disentangling local effects of such natural and anthropogenic ‘stressors’ on the community assembly process would considerably improve our understanding of observed communities and help to improve decision‐making for river management. An example, where such an integrative perspective would be helpful, is the coordination between hydromorphological river restoration and water quality management.

Benthic macroinvertebrates are the most widely used organisms in freshwater biomonitoring of human impacts (Bonada et al. 2006; Cao & Hawkins 2011). Understanding the mechanisms that lead to observed communities is thus very important. Our knowledge of these mechanisms can be explored by formulating it in the form of mathematical models and testing their predictions with observed data. In addition, predicting the community composition of macroinvertebrates in natural and impacted streams is of high relevance for the management of freshwater ecosystems. Due to this interest from applied science, autecological traits of these organisms are well studied (Poff et al. 2006; Statzner, Bonada & Dolédec 2007; Tachet et al. 2010; Schaefer et al. 2011; Schmidt‐Kloiber & Hering 2012). This enables us to include this knowledge in mechanistic models. To model the community composition of macroinvertebrates in streams, we recently integrated concepts of theoretical food web modelling, the metabolic theory of ecology, and ecological stoichiometry, and the use of functional trait databases into the model Streambugs (Schuwirth & Reichert 2013). In this model, we evaluate the long‐term equilibrium of biomasses of taxa from the regional taxon pool to determine stable coexistence. For each taxon, vital rates are described as density‐dependent nonlinear processes of growth, respiration and death that scale with temperature and mean body mass according to the metabolic theory of ecology (Brown et al. 2004). A self‐inhibition term in the growth rate describes the effect of stabilizing niche differences (Chesson 2000), that is negative intraspecific interactions at high population densities. In this term, habitat capacity depends on environmental conditions (regarding temperature, current, and microhabitat) and the tolerance of taxa to these conditions that are assembled in autecological trait databases (e.g. Schmidt‐Kloiber & Hering 2012). Ecotoxicological effects of water quality as well as effects of oxygen depletion due to organic matter are implemented by increasing death rates for sensitive species using the ‘species at risk’ concept (Liess & Von Der Ohe 2005; Liess, Schaefer & Schriever 2008) and the saprobic system (Rolauffs et al. 2004; DIN 38410‐1:2004‐10). The differences in autecological traits lead to relative fitness differences of taxa that influence local survival or extinction of the population. Biotic interactions included in the model are competition for food resources and predation. We account for effects of ecological stoichiometry by confining the yield of consumption based on the elemental composition and energy content of food sources (Elser & Hassett 1994; Hessen, Faerovig & Andersen 2002; Andersen, Elser & Hessen 2004; Reichert & Schuwirth 2010).

When using models to make predictions, it is very important to acknowledge the sources of uncertainty that influence the uncertainty of model outcomes. Ecological models usually have many parameters about which we have only uncertain prior information. It is very important to propagate the uncertainty of model parameters and other model inputs to the model results and compare them with observational data to judge the predictive capability of the model. Ideally, we can use Bayesian inference (Ellison 2004; Gelman et al. 2004) to learn from the observations about model parameters in addition to using prior knowledge about the parameters. This would allow us to base the predictions on the best available scientific knowledge. However, for models with a high number of parameters and a likelihood function with a complex structure (e.g. with many local optima induced by a large number of data partially contradictory to the model), this can be numerically infeasible.

The aim of this paper was to test our mechanistic understanding of the community assembly processes of macroinvertebrates by comparing predictions of an extended version of the model Streambugs with field observations of 36 sites in the Glatt catchment on the Swiss Plateau. We address the following research questions: How good are a priori predictions of the community composition in the Glatt catchment by the model Streambugs compared to a random null model? To which degree can we increase the accuracy of predictions with model calibration? How important are the different environmental influence factors and corresponding traits of taxa (regarding temperature tolerance, current velocity, water quality, food sources) for the model predictions in this catchment? Finally, how can we improve the model by identifying and reducing model deficits?

Materials and methods

Model Description

The model streambugs 1.0 (Schuwirth & Reichert 2013) describes growth, respiration and death of periphyton and macroinvertebrate taxa (species, genus or family level) based on the following environmental conditions: temperature, irradiation, nutrient concentration, current conditions, microhabitats, water quality regarding organic toxicants (plant protection products and biocides) and organic matter that leads to oxygen depletion (saprobic conditions), input of coarse organic matter (leaf litter) and suspended organic matter. We formulated the growth, consumption and death processes as source and sink terms in the ordinary differential equations for the time evolution of taxa biomass. These processes depend on the environmental conditions. Biotic interactions are included via predation and competition for food. Details of the model are provided in Schuwirth & Reichert (2013), and all equations and parameters are given in the Appendix S1 and Table S3 (Supporting Information). The model is implemented in the statistics and graphics software r (R Development Core Team, 2014), and the differential equations are solved with the r package desolve (Soetaert, Petzoldt & Setzer 2010). In addition to the version 1.0, the improved implementation used in this paper allows specifying different habitat types per reach (regarding all environmental conditions). Furthermore, it allows including fish predation and using time‐dependent inputs and parameters. However, these features are not used in the current application. Furthermore, the improved implementation allows solving the differential equations in C as an alternative to r, which accelerates the computation speed approximately by a factor of 20. This is important for the use of Monte Carlo methods for uncertainty propagation and Bayesian inference, where many model evaluations are needed. Stoichiometric coefficients are automatically calculated based on elemental composition, energy content and feeding relationships with the r package stoichcalc (Reichert & Schuwirth 2010).

Observational Data and Environmental Conditions

To evaluate the predictive capacity of the model, we used environmental and taxonomic data from 36 sites in the catchment of the river Glatt. The Glatt valley is situated in the north‐eastern part of the Swiss Plateau, and the river is a tributary of the Rhine River (Fig. 1). Data from the Glatt valley sites were provided by the Office of Waste, Water, Energy and Air of the Canton of Zurich (AWEL – Amt für Wasser, Abwasser, Energie und Luft, Kanton Zürich). They include discharge, temperature, oxygen, nitrate, ammonium and phosphate concentrations, insecticides, hydromorphology, shading and the abundance of observed invertebrate taxa (on species, genus, family, order or class level) at each site. Invertebrates were collected with kick‐sampling using a multihabitat sampling strategy (AWEL 2006; Stucki 2010). For the chemical and temperature data, monthly grab samples were available from 1967 to 2011. Macroinvertebrate samples were taken 4–9 times between 1995 and 2011, usually in spring and autumn. In addition, GIS analyses gave information about the location of wastewater treatment plants (WWTP), the slope at the sites, as well as the fraction of broad‐leaved trees around the sites. We did not account for any data collected before 1990, since the conditions may have changed over time. In the following, we describe how the model input was derived from observational data.

image
Four subcatchments in the Glatt valley (position of the Glatt valley in Switzerland is shown in the upper right) and sites with calibration data, WWTP, wastewater treatment plants.

Sources: Einzugsgebietsgliederung Schweiz EZGG‐CH©2012, Bundesamt für Umwelt BAFU, Bern; swisstopo (Art. 30 GeoIV): 5704 000 000/swissTLM3D@2011/Vector200©2010, reproduced with permission of swisstopo/JA100119; Kläranlagen der Schweiz ©2011, Eawag: überarbeitet auf der Basis des Projektes: Maurer M. und Herlyn A. (2007) Zustand, Kosten und Investitionsbedarf der schweizerischen Abwasserentsorgung. Eawag/BAFU Bericht.

Mean water temperature

The mean water temperature was calculated as the average of the measured temperatures of all monthly grab samples. The standard deviation of the mean water temperature was calculated as the standard deviation of the measurements divided by the square root of the number of measurements (Fig. S1).

Water temperature class

The water temperature class corresponding to the temperature tolerance trait is defined based on either the maximum morning temperature in summer or the mean maximum in summer, depending on data availability (e.g. Graf et al. 2008). As a proxy for this characteristic temperature, we took the second highest temperature from all monthly grab samples at each site to estimate the temperature class of the site. We assumed a higher uncertainty compared to the mean temperature described by a standard deviation of 3 °C, as the classification into different temperature classes for values close to each other is somehow arbitrary (Fig. S2).

River width

The reach width was included in the hydromorphological assessment according to the Swiss modular concept for stream assessment (Bundesamt für Umwelt (BAFU), 1998, www.modul-stufen-konzept.ch). For the few sites at which hydromorphological assessment was not available, the width was estimated from aerial pictures.

Light intensity

As turbidity measurements are missing, we could not account for light attenuation in the water column. For the light intensity at the water surface at unshaded areas, we used an annual mean value of 125 W m−2 (≈1000 kWh m−2 year−1, Šúri et al. 2007) for all sites, assuming a standard deviation of 5%.

Shading

The fraction of shaded area at each site was taken from observations from AWEL and checked by analysing aerial pictures. We assumed an absolute standard deviation of 5% shading with the exception of one unshaded site where we assumed 0·1% as mean and standard deviation (Fig. S3).

Nutrient concentrations

We took the median of all monthly grab samples for phosphate and the median of the sum of ammonium and nitrate as an indication for the nutrient levels at each site, assuming a relative standard deviation of 5% (Figs S4 and S5).

Suspended particulate organic matter

Measurements of suspended particulate organic matter were not available. For all sites, we assumed a mean value and absolute standard deviation of 1 mg DM L−1, taking into account measured values in the Moenchaltorfer Aa, which is one of the subcatchments in this study (L. Spalinger, unpublished).

Current

We estimated the current class of each site directly from estimated flow velocities (v) since the estimation of shear stress and comparison with FST hemispheres as it was done in Schuwirth & Reichert (2013) seemed to be even more uncertain. We used the following classification: standing: = 0 m s−1, slow current: 0 < v ≤ 0·25, moderate current: 0·25 < v ≤ 0·5, high current: > 0·5 m s−1. The mean flow velocity was estimated by solving Manning–Strickler's equation (e.g. Chow 1959),
urn:x-wiley:02698463:media:fec12605:fec12605-math-0001(eqn 1)
with friction coefficient n [s/m1/3], slope S0 [−], average discharge Q [m3/s] and width w [m]. The average discharge was derived from AWEL data, the slope was calculated by a GIS analysis, and the information about the width was used as described above. The friction coefficient n was estimated after Cowan (1956) as the sum of the coefficients n1,…, n6, where n1 gives information about the character of the channel and its surface material, n2 refers to surface irregularities, n3 to the variations of the size and shape of the cross section, n4 to obstructions, n5 to vegetation and flow conditions and n6 indicates meanders and curves. The coefficients were estimated from the hydromorphology data. For n2, this was not possible, and hence, a default value was used. To describe the uncertainty for the current class information, we assigned 80% probability to the estimated current class and 10% to each of the neighbouring or 20% to the only neighbouring current class (Fig. S6).

Organic toxicants

For the estimation whether a site is classified as affected by organic toxicants (such as pesticides) or not, information about WWTPs from GIS data was used. In case a WWTP is situated upstream of the site, it was classified as polluted. Additionally, sites were classified as polluted that got a moderate, poor, or bad insecticides evaluation based on substance measurements (AWEL, 2006). If none of the conditions applied to a site, it was classified as ‘not polluted’. Since all sites are within catchments with agricultural as well as urban land use, we cannot be sure that those sites are not affected by organic toxicants (e.g. from agricultural nonpoint sources or combined sewer overflows). Therefore, we assigned a 49% probability that these sites are polluted. For those sites that were already classified as polluted, we were less uncertain, and only assigned a 20% probability that the site is not affected by organic toxicants (Fig. S7).

Water quality class corresponding to the Saprobic System

The water quality class that corresponds to the Saprobic system was estimated from ammonium and oxygen concentrations given in the AWEL data according to Table S9. For the oxygen concentration, the minimum of all samples of each site was used. To account for uncertainty, especially due to the fact that the biological oxygen demand was not available, we assigned 80% probability to the estimated class and 10% to each of the neighbouring or 20% to the only neighbouring class (Fig. S8).

Input of leaf litter

The litter input in gDM m−2 year−1 was estimated from the fraction of the area of the riverbed and of the 10 m wide riparian zone that is covered by broad‐leaved trees from an aerial view analysis (see Appendix S2 for details). We assumed a standard deviation of 20% (Fig. S9).

Microhabitat class

We did not have information about microhabitat/substrate classes in the catchment and could therefore not include the limitation by microhabitat requirements that is implemented in the model.

Regional taxon pool and observations

To account for dispersal limitation, the Glatt valley was divided into four subcatchments [Pfaeffikersee (63 km2, outlet at 441 m.a.s.l.), Moenchaltorfer Aa (51 km2, outlet at 435 m.a.s.l.), Glatt1 (270 km2, outlet at 418 m.a.s.l.) and Glatt2 (417 km2, outlet at 339 m.a.s.l.), Fig. 1], compassing 6–11 measurement sites each. The four areas have a similar land use. The percentage of urban land‐use ranges from 19% (Moenchaltorfer Aa) to 30% (Glatt1). The Moenchaltorfer Aa has 63% agricultural area whereas the three other subcatchments have 41–43% agricultural land use. The percentage of forest cover ranges from 17% (Moenchaltorfer Aa) to 24% (Glatt2). The regional invertebrate taxon pool was estimated for each subcatchment separately and is given in Table S4. It was defined by those taxa that occurred at more than 25% of all sites within the subcatchment and that were identified at least on family level. If a taxon belonged to another taxon on a higher level that was also included in the source pool, we modelled only the higher level taxon (e.g. if an invertebrate family is part of the source pool, species or genera belonging to that family are not modelled separately). As an exception, Baetis and Baetis rhodani were modelled separately, as their traits, especially their sensitivity to organic toxicants, differ. This resulted in five of the 49 taxa being included in the taxon pool of only one subcatchment, 18 included in two subcatchments, eight included in three subcatchments and 18 included in all four subcatchments (Table S4). We assumed that taxa of the taxon pool are not dispersal limited within each subcatchment.

To obtain taxa observations of the chosen source pool taxa comparable to model results and suitable for model calibration, the abundance per square metre of each taxon at each site and sampling date was calculated by dividing the individuals’ counts by the sampling area. For each taxon we checked, if additional taxa on a lower level belonging to the same taxon were observed, which are not part of the regional taxon pool. For these cases, the abundance of the lower level taxon was added to the abundance of the higher level taxon.

Systems Analysis

To estimate the taxonomic composition of the macroinvertebrate community at each site, we evaluated the long‐term steady state of the biomass density for each taxon at each site under stationary environmental conditions, Bss(x,θ). Here, B = (B1,…, Bn) is the vector of biomass densities of all taxa, n is the number of taxa, ss indicates the steady‐state solution of the model equations, x are the environmental conditions (xj for site j), and θ is the parameter vector (see Schuwirth & Reichert 2013 for the details of the equations that are solved to calculate Bss).

We compare model results with observational data by specifying a likelihood function urn:x-wiley:02698463:media:fec12605:fec12605-math-0002 depending on parameters θ and observations y. The likelihood is defined as the probability distribution for potential observations {yi,j,k} given model parameters θ under the assumption that the model is true. We define yi,j,k = 1 if taxon i is observed and yi,j,k = 0 if it is not observed at site j and sampling date k.

The probability to not observe a taxon i at a site j on a sampling event k, urn:x-wiley:02698463:media:fec12605:fec12605-math-0003 is given by:
urn:x-wiley:02698463:media:fec12605:fec12605-math-0004(eqn 2)
where pabs is the probability that a taxon is absent at sampling date k despite the fact that it has a stable population at sampling site j (e.g. due to short term disturbance by floods or emergence just before the sampling event), pobs is the probability of observing (i.e. catching and correctly identifying) a taxon at sampling event k that is present at site j at average environmental conditions at steady state with one individual per sampling area Aj,k. The exponent of (1 – pobs) indicates the absolute number of predicted individuals at the site where Mi is the mean individual biomass of taxon i, and ndrift is the typical number of individuals per m2 that are introduced by drift or misclassification.
We assume an exponential decrease of the probability pabs with the predicted abundance:
urn:x-wiley:02698463:media:fec12605:fec12605-math-0005(eqn 3)
where pabs0 is the y‐axis intercept and Kabs the predicted abundance at which the probability pabs is 50% of pabs0.
The probability to observe a taxon i is one minus the probability to not observe it:
urn:x-wiley:02698463:media:fec12605:fec12605-math-0006(eqn 4)
The likelihood is calculated as the product of the probabilities urn:x-wiley:02698463:media:fec12605:fec12605-math-0007 of all taxa, sites and sampling dates.
urn:x-wiley:02698463:media:fec12605:fec12605-math-0008(eqn 5)

For each site, we had 4–9 observations from several seasons and several years (between 1995 and 2011). For pobs, we assume 0·5; for pabs0, we assume 0·1; and for ndrift, we assume 0·05 ind m−2 (Fig. 2). Note that it would be possible to make the parameters pobs, pabs0, and ndrift and Kabs taxon, site and sampling date specific to include knowledge about emergence patterns, drift or catastrophic loss due to floods if it is available.

image
(a) Probability that a taxon which has a stable population is absent due to short term disturbance, for example by floods or emergence with pabs 0 = 0·1, Kabs = 100 according to eqn 3; (b and c) probability to observe or not observe a taxon according to eqn 4 for different ranges of the x‐axis. (d) as b and c but with the logarithm of the probability at the y‐axis which corresponds to the log likelihood, with pobs = 0·5, ndrift = 0·05 ind. m−2, and a sampling area of 1 m2.
We defined a prior probability distribution for all parameters that reflects the current state of our knowledge about these parameters. The definitions of these marginal distributions are given in Table S3. We assume no correlation between the parameters. The model predictions and their uncertainty were estimated by propagating the distributions of parameters and environmental conditions through the model (using Monte Carlo simulation with a sample size of 1000). We evaluated model results by propagating [A] prior parameter uncertainty at the mean of our knowledge about environmental conditions, and [B] prior parameter uncertainty plus uncertainty about environmental conditions. To compare the model results with the observed relative frequency of occurrence of taxon i at site j, fobs,i,j, we calculate the predicted, expected relative frequency of occurrence of taxon i at site j, urn:x-wiley:02698463:media:fec12605:fec12605-math-0009, as:
urn:x-wiley:02698463:media:fec12605:fec12605-math-0010(eqn 6)
urn:x-wiley:02698463:media:fec12605:fec12605-math-0011(eqn 7)
where Nj is the number of sampling dates at site j, and N is the number of parameter samples used for approximating the expected value by Monte Carlo simulation. Density plots of the difference between observed and predicted relative frequency of occurrence (urn:x-wiley:02698463:media:fec12605:fec12605-math-0012) for all taxa i and sites j were produced using the implementation of the truncated normal distribution of the r package truncnorm (Trautmann et al. 2014). Furthermore, we evaluate the number of taxa for which the absolute difference between observed and predicted relative frequency of occurrence is below a threshold that varies between 0 and 1.

To evaluate the predictive capacity of the model, we compare it to a random null model where the predicted frequency of occurrence (fpred,i,j) is binomially distributed with n corresponding to the number of sampling events at site j and = 0·5. For this model, urn:x-wiley:02698463:media:fec12605:fec12605-math-0013 = 0·5.

The prior probability of the observed data can be estimated by the marginal of the joint distribution of parameters and observations. This can be approximated by the average of the likelihood values for the prior parameter sample θ with a size of N:
urn:x-wiley:02698463:media:fec12605:fec12605-math-0014(eqn 8)
For numerical reasons (because the likelihood values are very small numbers), we calculated the logarithm of this probability and added an offset C to the log likelihood values that we subtracted afterwards (otherwise the result would be 0 due to the numerical imprecision):
urn:x-wiley:02698463:media:fec12605:fec12605-math-0015(eqn 9)

We analysed deficits in the model predictions to identify possibilities for model improvement.

To explore to which degree the integration of different ecological traits and corresponding environmental factors contributed to the predictive power of the model, we subsequently excluded traits (temperature tolerance, current tolerance, sensitivity to organic toxicants and saprobic conditions, feeding types by defining all taxa as omnivorous and combinations of these) and analysed resulting a priori model predictions. For this analysis, we used Monte Carlo simulation with a parameter sample size of 1000.

Due to the large number of uncertain parameters and a nonsmooth surface of the likelihood function with many local maxima, we were not successful in applying Bayesian inference. The complicated shape of the likelihood function (as a function of the parameters for given, actual observations) is caused by the large number of sites and multiple observations per site. Different parameter combinations can lead to similarly good overall results by modelling different taxa at different sites correctly. These parameter combinations can be confined to isolated narrow regions in the parameter space.

To still get insight in how much model predictions could improve by calibration, we analysed the model results of the parameter set with the highest posterior density from Markov chain Monte Carlo runs where we inferred only the taxon‐specific modification factors of the growth rate fgrotax. The global optimization algorithms we tested (Xiang et al. 2013) were not successful in finding parameter values with a higher posterior density. This does not imply that it is the parameter set at the maximum of the posterior; it just highlights the difficulty to find the global maximum. This analysis does not provide any information about the predictive posterior uncertainty but just illustrates the potential of improving the model by calibration. We performed a local sensitivity analysis at the mean of the prior and at the parameter set with the highest posterior density to visualize the surface of the posterior density function at these points in parameter space.

Results

Without any calibration, for 79% per cent of the taxa at the 36 sites, the difference between the observed and predicted relative frequency of occurrence is <50% when taking prior parameter uncertainty and the uncertainty of environmental conditions (=model inputs) into account (Table 1). The median of the absolute difference between the observed and predicted relative frequency of occurrence is 0·24 (Fig. 3). Including or excluding uncertainty of environmental conditions has a minor influence on model results (Table 1, Fig. 3). The overall percentage of taxa with urn:x-wiley:02698463:media:fec12605:fec12605-math-0016 slightly increases but the prior probability of observations (estimated with the average likelihood) slightly decreases when including uncertainty of environmental conditions (Table 1). More taxa are overestimated than underestimated by the model (Table 1), as indicated by the skewness of the density plot of the difference between the predicted and observed relative frequency of occurrence (Fig. 3). The uncalibrated model performs considerably better than the random model that results in 61% taxa with urn:x-wiley:02698463:media:fec12605:fec12605-math-0017.

Table 1. Number of taxa at sites with urn:x-wiley:02698463:media:fec12605:fec12605-math-0018 of the model when propagating prior parameter uncertainty using the best estimate of environmental conditions (A) or including uncertainty regarding environmental conditions (B) and for a random model where fpred,i,j is binomially distributed with = 0·5 and = number of sampling events at site j, leading to urn:x-wiley:02698463:media:fec12605:fec12605-math-0019. Taxa at sites with urn:x-wiley:02698463:media:fec12605:fec12605-math-0020 are called ‘overestimated’, taxa with urn:x-wiley:02698463:media:fec12605:fec12605-math-0021 are called ‘underestimated’
Parameters Environ. condition Case Fig. 3 Total no. of taxa at sites No. of taxa with urn:x-wiley:02698463:media:fec12605:fec12605-math-0022 No. of over‐estim. taxa No. of under‐estim. taxa % of taxa with urn:x-wiley:02698463:media:fec12605:fec12605-math-0023 Log prior probability of observations (eqs 8 and 9)
a b c d b/a f
prior dist. Mean A 1233 948 226 59 0·77 −9012
prior dist. Uncertain B 1233 979 193 61 0·79 −9134
Random model 1233 757 359 117 0·61
image
(a): Density plots (smoothing results over taxa and sites) of the difference between predicted and observed relative frequency of occurrence propagating prior parameter uncertainty at the mean environmental conditions (black line, A), prior parameter uncertainty plus uncertainty about environmental conditions (grey line, B), for a random model with a binomial distribution (dashed grey line); (b) Percentage of taxa at sites with urn:x-wiley:02698463:media:fec12605:fec12605-math-0024 below a certain threshold that varies between 0 and 1.

The different ecological traits and their corresponding environmental factors influence the predictive capabilities of the model to different degrees (Table 2 and Fig. 4), and their effects slightly differ between the four subcatchments. The trait with the largest influence on the fraction of taxa with urn:x-wiley:02698463:media:fec12605:fec12605-math-0025 is the feeding type in the subcatchments Glatt1 and Glatt2 and the sensitivity against organic toxicants in the subcatchments Pfaeffikersee and Moenchaltorfer Aa.

Table 2. Fraction of taxa at sites with an absolute difference between the observed and predicted relative frequency of occurrence below 0·5 when propagating prior parameter uncertainty and uncertainty of environmental conditions and excluding ecological traits and their corresponding environmental influence factors in the model
Sub‐catchment incl. all traits excl. Taa T, limitation due to temperature stress; S, limitation due to saprobic conditions; C, limitation due to current conditions; Tox, limitation due to organic toxicants; excl. F., ignoring feeding types = all taxa are omnivore.
excl. Saa T, limitation due to temperature stress; S, limitation due to saprobic conditions; C, limitation due to current conditions; Tox, limitation due to organic toxicants; excl. F., ignoring feeding types = all taxa are omnivore.
excl. Caa T, limitation due to temperature stress; S, limitation due to saprobic conditions; C, limitation due to current conditions; Tox, limitation due to organic toxicants; excl. F., ignoring feeding types = all taxa are omnivore.
excl. Toxaa T, limitation due to temperature stress; S, limitation due to saprobic conditions; C, limitation due to current conditions; Tox, limitation due to organic toxicants; excl. F., ignoring feeding types = all taxa are omnivore.
excl. Faa T, limitation due to temperature stress; S, limitation due to saprobic conditions; C, limitation due to current conditions; Tox, limitation due to organic toxicants; excl. F., ignoring feeding types = all taxa are omnivore.
excl. T, S, C, Tox excl. T, S, C, Tox, F
Pfsee 0·80 0·80 0·77 0·75 0·68 0·74 0·57 0·41
Moench 0·77 0·78 0·76 0·71 0·64 0·74 0·56 0·43
Glatt1 0·80 0·79 0·78 0·73 0·72 0·60 0·61 0·35
Glatt2 0·81 0·81 0·80 0·74 0·75 0·61 0·63 0·36
Whole catchment 0·79 0·80 0·78 0·73 0·70 0·66 0·60 0·38
Log prior prob. of obs.bb Equations 8 and 9.
−9134 −9150 −9127 −9437 −9108 −11100 −9423 −11580
  • a T, limitation due to temperature stress; S, limitation due to saprobic conditions; C, limitation due to current conditions; Tox, limitation due to organic toxicants; excl. F., ignoring feeding types = all taxa are omnivore.
  • b Equations 8 and 9.
image
(a) Density plots (smoothing results over taxa and sites) of the difference between the predicted and observed frequency of occurrence; ‘All’: incl. all traits; ‘T’: excl. limitation due to temperature conditions; ‘S’: excl. saprobic limitation; ‘C’: excl. current limitation; ‘Tox’: excl. sensitivity to organic toxicants; ‘F’: ignoring feeding types = all taxa are omnivore; ‘TSCTox’: excl. temperature, saprobic, current limitation, and sensitivity to organic toxicants; ‘TSCTox_F’: excl. temperature, saprobic, current limitation, sensitivity to organic toxicants and ignoring feeding types, random model; (b) Fraction of taxa at sites with urn:x-wiley:02698463:media:fec12605:fec12605-math-0026 below a threshold that varies between 0 and 1.

Summarized over all subcatchments, feeding types followed by sensitivities regarding organic toxicants and tolerances to current conditions have the strongest influence on results; each of them leading to a decrease in the fraction of taxa with urn:x-wiley:02698463:media:fec12605:fec12605-math-0027 from 79% to 66%, 70% and 73%, respectively (Table 2, Fig. 4). The predicted frequency of occurrence of taxa with low frequency of observation increases due to the exclusion of these traits (Fig. 4a). The exclusion of stress due to temperature and saprobic conditions has a minor influence on model results. The exclusion of all five traits leads to a decrease in the fraction of taxa with urn:x-wiley:02698463:media:fec12605:fec12605-math-0028 to 45% in all cases which is below the random model.

At the mean of the prior, the model is most sensitive to the universal growth and respiration rate of invertebrates (fcons, fresp), to the taxon‐specific modification factors of the growth rates for Elmis and Riolus (fgrotax) and the parameter determining the increase in death rate for sensitive taxa due to pollution with organic toxicants (forg pollut crit) (Fig. S10).

By calibrating the taxon‐specific multipliers of the growth rates (fgrotax), we can increase the fraction of taxa with |fpred,i,j − fobsi,j| < 0·5 from 66% at the mean of the prior to 86% at the parameter set with the highest posterior density we found (Table S10). The log likelihood increases from −15 717 at the mean of the prior to −6764. This analysis demonstrates the potential for model improvement but does not provide a mechanistic explanation for the deviation of taxa from the metabolic theory of ecology (see discussion below).

Discussion

The Streambugs model performs considerably better than the random model, indicating that (i) species assembly is not a purely random process and (ii) we already have sufficient knowledge about important mechanisms to create a model that is better than random. The majority of taxa have a difference between the mean predicted and observed frequency of occurrence below 50% with the uncalibrated model (79% compared to 61% for the random model). However, the model still has some deficits. Analysing remaining deficits can help us improving the model. Reasons why some taxa are systematically under‐ or overestimated at specific sites can be: (a) incomplete or incorrect trait information, especially for genus or family level taxa, (b) incomplete or incorrect estimation of the environmental conditions, (c) deficiencies in the model structure and process description. Examples for (a) are temperature and microhabitat preferences that are available only for some orders of invertebrates (Ephemeroptera, Plecoptera, Trichoptera, Chironomidae) in the freshwater ecology database that was used in this study. Feeding interactions of taxa are inferred from feeding types and the body size. Better information about feeding links from stable isotope analyses, gut content analyses or a database with observed feeding links (Gray et al. 2014) could further improve this approach.

Regarding (b), we did not have monitoring data for suspended organic matter and used a mean value of measurements from one part of the catchment as input for all sites. Furthermore, we did not have information on the microhabitats (substrate) at the sites; therefore, we could not take into account the limitation of taxa by available microhabitats in this study. The slope of each reach has a big influence on the estimated current conditions. However, we only have this information from digital elevation maps that is not very precise and does not take into account drops.

Regarding (c), some taxa may have specific habitat requirements that are not included in the model yet, for example the sensitivity regarding a clogging of the river bed with fine sediments (Turley et al. 2015). Furthermore, ecotoxicological effects may not be resolved sufficiently yet, taxon‐specific fish predation is not yet explicitly accounted for in our application (but implemented in the model) and just included in the death rate due to missing information on fish densities and taxon‐specific predation. All these factors might contribute to the fact that the density in Fig. 3 is skewed and we have more overestimated than underestimated taxa.

The current version of the model does not include a dispersal process. We account for dispersal limitation of taxa by extracting the modelled taxon pool from observations of the regional pool. To estimate how much this restriction contributes to the predictive capacity of the Streambugs model, we compared it to a random null model, which shows to be significantly inferior. A larger taxon pool including dispersal limited taxa would result in more overestimated taxa (in both the random and the Streambugs model). Differences among the four subcatchments regarding the predictive performance of the model are rather small (Table 2). This could be explained by a comparable range in environmental conditions (Figs S1–S9) and the fact that they belong to the same catchment.

The following taxa were systematically overestimated with the prior distribution of parameters and environmental conditions: Bithynia tentaculata was overestimated at 20 of 21 sites. The temperature tolerance of this taxon is not included in the trait database, and it is coded as indifferent regarding current conditions. However, according to Tachet et al. (2010), it only occurs in standing and slow flowing waters. A revision of the current tolerance (defining this taxon as limno‐ to rheophil instead of indifferent) improves model results for this taxon (increasing the log likelihood from −15 717 to −15 476 and the percentage of taxa with urn:x-wiley:02698463:media:fec12605:fec12605-math-0029 from 79 to 81) since the sites in this catchment have moderate to high current (Fig. S6). Gammarus pulex was overestimated at 22 of 36 sites, Physella acuta at 14 of 21 sites, Proasellus at all six sites, Radix balthica at 13 of 15 sites, Riolus at 20 of 30 sites and Tabanidae at 12 of 15 sites. For these taxa, information on temperature tolerance is missing. For Tabanidae, current tolerance and saproby is missing as well. Complementing trait information could help improving the model predictions regarding these taxa. The calibration leads to a decrease of the taxon‐specific modification factors of the growth rates for these taxa (Table S10).

The only taxon that was systematically underestimated is Simulium (at 19 of 36 sites). This taxon, a filter feeder, is food‐limited in the model. Better information on the concentration of suspended organic particles as well as on the half‐saturation parameter regarding food limitation would help improving model results.

The analysis of the influence of different traits and their corresponding environmental conditions reveals that the feeding type has the largest influence on the results. Assuming all taxa are omnivore leads to the largest decrease in the likelihood and to the largest decrease in the number of taxa with urn:x-wiley:02698463:media:fec12605:fec12605-math-0030 in the subcatchments Glatt1 and Glatt2. This highlights the importance of biotic interactions in the model, which are expressed via competition for food sources and predator–prey interactions. Such factors cannot be easily included in statistical habitat models like the RIVPACS approach (Wright, Sutcliffe & Furse 2000). Better knowledge of the actual links in the food web could further improve this approach. Excluding the limitation of sensitive taxa by organic toxicants leads to the second largest decrease in the number of taxa with urn:x-wiley:02698463:media:fec12605:fec12605-math-0031 over all catchments but not to a significant change of the prior probability of observations, whereas water quality conditions based on organic matter conditions only play a minor role for model results in the Glatt catchment. Most sites in the Glatt catchment are in a water quality class corresponding to β‐mesosaprobic conditions (Fig. S8) which is not limiting for most of the taxa; therefore, this factor has only a minor influence on model results in this catchment. This confirms our expectation regarding the most important water quality issues in Switzerland, where the WWTPs mainly fulfil the standards regarding nutrient and organic matter removal, but do not yet remove micropollutants such as pesticides and biocides from agricultural and urban sources to a sufficient degree. The upgrade of WWTPs with an ozonation or powdered activated carbon step to remove micropollutants is currently a topic of large importance in Switzerland (The Federal Office for the Environment (FOEN), 2009, 2012). The marked influence of organic toxicants on the community composition of macroinvertebrates is in concordance with findings for small streams in Germany, France and Australia (Beketov et al. 2013). The minor influence of temperature tolerance on model results may partly be explained by the discrete classification into four temperature groups, where only the boundary of 18 °C affects model results in the catchment and the fact that this trait information is only available for a part of the community (Ephemeroptera, Plecoptera, Trichoptera, Chironomidae). Including information for missing taxa from other databases (e.g. Tachet et al. 2010 which is more complete but distinguishes only between temperature below and above 15 °C), and furthermore, a better resolution of the information about the temperature sensitivity of taxa might increase the sensitivity of the model to this factor.

Conclusions

The mechanistic model Streambugs allows us to disentangle the effects of biotic interactions and environmental factors on the prediction of the community composition of macroinvertebrates. Even without any calibration, for 79% of the taxa the absolute difference between observed and predicted frequency of occurrence is below 0·5 when taking uncertainty about parameters and environmental conditions into account. With a calibration of taxon‐specific modification factors of the growth rate, we can increase the predictive capability of the model. From the analysis of existing model deficits, we conclude that the trait information should be complemented and refined, for example by including information from other trait databases. To further decrease uncertainty and improve the predictive capability of the model, the estimation of current conditions could be improved (based on measured current velocity) and information on substrate/microhabitats could be included. Our study shows that biotic interactions described by food web processes and competition for food sources have a higher influence on model results than traits that describe abiotic habitat requirements. This highlights the importance of integrating biotic and abiotic factors in a model to predict community assembly of macroinvertebrates in streams and confirms ecological theory on community assembly (Hillerislambers et al. 2012).

Acknowledgements

Monitoring data were kindly provided by the Office for Waste, Water, Energy and Air of the Canton of Zurich (AWEL Zurich). We acknowledge Rosi Siber for GIS analyses and help with Fig. 1. We thank Cédric Mondy, Christian Stamm, Christopher Robinson and Amael Paillex for stimulating discussions. We thank Carlo Albert, Andreas Scheidegger, Mark Honti and Dmitri Kavetski for numerous discussions about numerical algorithms. We acknowledge the constructive comments of three anonymous reviewers on earlier versions of the manuscript. This study was part of the project ‘iWaQa: Integrated river water quality management’ funded by the Swiss National Science Foundation (NRP61 on Sustainable Water Management).

    Data accessibility

    All data are included in the manuscript and supporting information.

      Number of times cited according to CrossRef: 14

      • Integrating uncertain prior knowledge regarding ecological preferences into multi-species distribution models: Effects of model complexity on predictive performance, Ecological Modelling, 10.1016/j.ecolmodel.2020.108956, 420, (108956), (2020).
      • Towards a unified study of multiple stressors: divisions and common goals across research disciplines, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2020.0421, 287, 1926, (20200421), (2020).
      • Energía calórica, biomasa y estructura de los macroinvertebrados acuáticos en la reserva La Nitrera, Concordia, Antioquia, Colombia, Acta Biológica Colombiana, 10.15446/abc.v25n1.76435, 25, 1, (29-36), (2020).
      • Qualifying the effects of single and multiple stressors on the food web structure of Dutch drainage ditches using a literature review and conceptual models, Science of The Total Environment, 10.1016/j.scitotenv.2019.03.497, (2019).
      • The Importance of Ecological Networks in Multiple-Stressor Research and Management, Frontiers in Environmental Science, 10.3389/fenvs.2019.00059, 7, (2019).
      • Impact assessment of local land use on ecological water quality of the Guayas river basin (Ecuador), Ecological Informatics, 10.1016/j.ecoinf.2018.08.009, (2018).
      • Ecological Models to Infer the Quantitative Relationship between Land Use and the Aquatic Macroinvertebrate Community, Water, 10.3390/w10020184, 10, 2, (184), (2018).
      • An improved null model for assessing the net effects of multiple stressors on communities, Global Change Biology, 10.1111/gcb.13852, 24, 1, (517-525), (2017).
      • Bayesian parameter inference for individual-based models using a Particle Markov Chain Monte Carlo method, Environmental Modelling & Software, 10.1016/j.envsoft.2016.11.001, 87, (110-119), (2017).
      • Biotic and abiotic factors investigated in two Drosophila species – evidence of both negative and positive effects of interactions on performance, Scientific Reports, 10.1038/srep40132, 7, 1, (2017).
      • Integrating ecological theories and traits in process‐based modeling of macroinvertebrate community dynamics in streams, Ecological Applications, 10.1002/eap.1530, 27, 4, (1365-1377), (2017).
      • Mechanistic modelling for predicting the effects of restoration, invasion and pollution on benthic macroinvertebrate communities in rivers, Freshwater Biology, 10.1111/fwb.12927, 62, 6, (1083-1093), (2017).
      • Changes in macroinvertebrate trophic structure along a land-use gradient within a lowland stream network, Aquatic Sciences, 10.1007/s00027-016-0506-z, 79, 2, (407-418), (2016).
      • Modeling Macroinvertebrate Community Dynamics in Stream Mesocosms Contaminated with a Pesticide, Environmental Science & Technology, 10.1021/acs.est.5b04068, 50, 6, (3165-3173), (2016).