The importance of biotic interactions for the prediction of macroinvertebrate communities under multiple stressors
Summary
- The community assembly of macroinvertebrates in streams depends on the regional taxon pool, dispersal limitations, local habitat conditions and biotic interactions. By integrating existing knowledge about these processes from theoretical ecology in a mechanistic model, we can test our mechanistic understanding and disentangle multiple stressor effects on community assembly.
- To assess to which degree we can predict the community composition of macroinvertebrates, we integrated these processes in the mechanistic food web model Streambugs and tested it on 36 sites in the Glatt catchment on the Swiss plateau. The model predicts the observation probability of taxa from a regional taxon pool at each site taking into account uncertain knowledge on parameters, environmental conditions at the sites and sampling errors.
- We use allometric scaling according to the metabolic theory of ecology, ecological stoichiometry and autecological data from trait databases that include the current knowledge on habitat requirements of the different taxa to parameterize their growth, respiration and death.
- Without any calibration, for the majority of taxa at the 36 sites, the difference between the observed and predicted relative frequency of occurrence is <50% when taking prior parameter uncertainty and the uncertainty of environmental conditions into account (79% compared to 61% for the random model). By calibrating taxon‐specific modification factors for the growth rate, we can increase the model compliance with data.
- Analysing the influence of different ecological traits and their corresponding environmental influence factors reveals that feeding types and sensitivity to organic toxicants contribute most to the predictive capabilities of the model in this catchment. The influence of temperature stress and oxygen depletion due to pollution with organic matter on the community composition is negligible. These results confirm our expectations regarding the most important water quality issues of streams on the Swiss plateau. Current velocity plays an intermediate role in this model application.
- The contribution of the feeding types to model performance highlights the importance of taking biotic interactions (competition for food sources and predator–prey interactions) into account to predict the coexistence of taxa. Better knowledge of the actual feeding links in the food web (e.g. from gut content or stable isotope data) that are currently inferred from feeding types, body size and food availability could further improve this approach.
Introduction
Ecologists are interested in community assembly and the coexistence of species for decades (e.g. Hutchinson 1961; Diamond 1975; Connor & Simberloff 1979). Understanding the rules that determine the formation of local communities from species of a regional pool is a fundamental aim in a broad range of research fields, such as evolutionary biology, biogeography, community ecology, conservation biology, and biodiversity research across terrestrial, marine and freshwater ecosystems. This topic is of particular importance regarding pressures induced by rapid climate and land‐use changes, which influence the structure (biodiversity) and functioning of ecosystems. It is therefore an attractive challenge to combine the wealth of knowledge about important processes that control the community assembly into a mechanistic model that is intended to explore to which degree we can predict the local community composition.
Community assembly can be described by the following processes (e.g. Lake, Bond & Reich 2007; Hillerislambers et al. 2012): a regional species pool is formed by evolutionary and biogeographic processes. Dispersal and chance events lead to the colonization of local sites. Environmental factors (such as habitat conditions and food availability) as well as biotic interactions (intra‐ and interspecific competition, predation) finally determine the local coexistence of taxa. Disentangling local effects of such natural and anthropogenic ‘stressors’ on the community assembly process would considerably improve our understanding of observed communities and help to improve decision‐making for river management. An example, where such an integrative perspective would be helpful, is the coordination between hydromorphological river restoration and water quality management.
Benthic macroinvertebrates are the most widely used organisms in freshwater biomonitoring of human impacts (Bonada et al. 2006; Cao & Hawkins 2011). Understanding the mechanisms that lead to observed communities is thus very important. Our knowledge of these mechanisms can be explored by formulating it in the form of mathematical models and testing their predictions with observed data. In addition, predicting the community composition of macroinvertebrates in natural and impacted streams is of high relevance for the management of freshwater ecosystems. Due to this interest from applied science, autecological traits of these organisms are well studied (Poff et al. 2006; Statzner, Bonada & Dolédec 2007; Tachet et al. 2010; Schaefer et al. 2011; Schmidt‐Kloiber & Hering 2012). This enables us to include this knowledge in mechanistic models. To model the community composition of macroinvertebrates in streams, we recently integrated concepts of theoretical food web modelling, the metabolic theory of ecology, and ecological stoichiometry, and the use of functional trait databases into the model Streambugs (Schuwirth & Reichert 2013). In this model, we evaluate the long‐term equilibrium of biomasses of taxa from the regional taxon pool to determine stable coexistence. For each taxon, vital rates are described as density‐dependent nonlinear processes of growth, respiration and death that scale with temperature and mean body mass according to the metabolic theory of ecology (Brown et al. 2004). A self‐inhibition term in the growth rate describes the effect of stabilizing niche differences (Chesson 2000), that is negative intraspecific interactions at high population densities. In this term, habitat capacity depends on environmental conditions (regarding temperature, current, and microhabitat) and the tolerance of taxa to these conditions that are assembled in autecological trait databases (e.g. Schmidt‐Kloiber & Hering 2012). Ecotoxicological effects of water quality as well as effects of oxygen depletion due to organic matter are implemented by increasing death rates for sensitive species using the ‘species at risk’ concept (Liess & Von Der Ohe 2005; Liess, Schaefer & Schriever 2008) and the saprobic system (Rolauffs et al. 2004; DIN 38410‐1:2004‐10). The differences in autecological traits lead to relative fitness differences of taxa that influence local survival or extinction of the population. Biotic interactions included in the model are competition for food resources and predation. We account for effects of ecological stoichiometry by confining the yield of consumption based on the elemental composition and energy content of food sources (Elser & Hassett 1994; Hessen, Faerovig & Andersen 2002; Andersen, Elser & Hessen 2004; Reichert & Schuwirth 2010).
When using models to make predictions, it is very important to acknowledge the sources of uncertainty that influence the uncertainty of model outcomes. Ecological models usually have many parameters about which we have only uncertain prior information. It is very important to propagate the uncertainty of model parameters and other model inputs to the model results and compare them with observational data to judge the predictive capability of the model. Ideally, we can use Bayesian inference (Ellison 2004; Gelman et al. 2004) to learn from the observations about model parameters in addition to using prior knowledge about the parameters. This would allow us to base the predictions on the best available scientific knowledge. However, for models with a high number of parameters and a likelihood function with a complex structure (e.g. with many local optima induced by a large number of data partially contradictory to the model), this can be numerically infeasible.
The aim of this paper was to test our mechanistic understanding of the community assembly processes of macroinvertebrates by comparing predictions of an extended version of the model Streambugs with field observations of 36 sites in the Glatt catchment on the Swiss Plateau. We address the following research questions: How good are a priori predictions of the community composition in the Glatt catchment by the model Streambugs compared to a random null model? To which degree can we increase the accuracy of predictions with model calibration? How important are the different environmental influence factors and corresponding traits of taxa (regarding temperature tolerance, current velocity, water quality, food sources) for the model predictions in this catchment? Finally, how can we improve the model by identifying and reducing model deficits?
Materials and methods
Model Description
The model streambugs 1.0 (Schuwirth & Reichert 2013) describes growth, respiration and death of periphyton and macroinvertebrate taxa (species, genus or family level) based on the following environmental conditions: temperature, irradiation, nutrient concentration, current conditions, microhabitats, water quality regarding organic toxicants (plant protection products and biocides) and organic matter that leads to oxygen depletion (saprobic conditions), input of coarse organic matter (leaf litter) and suspended organic matter. We formulated the growth, consumption and death processes as source and sink terms in the ordinary differential equations for the time evolution of taxa biomass. These processes depend on the environmental conditions. Biotic interactions are included via predation and competition for food. Details of the model are provided in Schuwirth & Reichert (2013), and all equations and parameters are given in the Appendix S1 and Table S3 (Supporting Information). The model is implemented in the statistics and graphics software r (R Development Core Team, 2014), and the differential equations are solved with the r package desolve (Soetaert, Petzoldt & Setzer 2010). In addition to the version 1.0, the improved implementation used in this paper allows specifying different habitat types per reach (regarding all environmental conditions). Furthermore, it allows including fish predation and using time‐dependent inputs and parameters. However, these features are not used in the current application. Furthermore, the improved implementation allows solving the differential equations in C as an alternative to r, which accelerates the computation speed approximately by a factor of 20. This is important for the use of Monte Carlo methods for uncertainty propagation and Bayesian inference, where many model evaluations are needed. Stoichiometric coefficients are automatically calculated based on elemental composition, energy content and feeding relationships with the r package stoichcalc (Reichert & Schuwirth 2010).
Observational Data and Environmental Conditions
To evaluate the predictive capacity of the model, we used environmental and taxonomic data from 36 sites in the catchment of the river Glatt. The Glatt valley is situated in the north‐eastern part of the Swiss Plateau, and the river is a tributary of the Rhine River (Fig. 1). Data from the Glatt valley sites were provided by the Office of Waste, Water, Energy and Air of the Canton of Zurich (AWEL – Amt für Wasser, Abwasser, Energie und Luft, Kanton Zürich). They include discharge, temperature, oxygen, nitrate, ammonium and phosphate concentrations, insecticides, hydromorphology, shading and the abundance of observed invertebrate taxa (on species, genus, family, order or class level) at each site. Invertebrates were collected with kick‐sampling using a multihabitat sampling strategy (AWEL 2006; Stucki 2010). For the chemical and temperature data, monthly grab samples were available from 1967 to 2011. Macroinvertebrate samples were taken 4–9 times between 1995 and 2011, usually in spring and autumn. In addition, GIS analyses gave information about the location of wastewater treatment plants (WWTP), the slope at the sites, as well as the fraction of broad‐leaved trees around the sites. We did not account for any data collected before 1990, since the conditions may have changed over time. In the following, we describe how the model input was derived from observational data.

Sources: Einzugsgebietsgliederung Schweiz EZGG‐CH©2012, Bundesamt für Umwelt BAFU, Bern; swisstopo (Art. 30 GeoIV): 5704 000 000/swissTLM3D@2011/Vector200©2010, reproduced with permission of swisstopo/JA100119; Kläranlagen der Schweiz ©2011, Eawag: überarbeitet auf der Basis des Projektes: Maurer M. und Herlyn A. (2007) Zustand, Kosten und Investitionsbedarf der schweizerischen Abwasserentsorgung. Eawag/BAFU Bericht.
Mean water temperature
The mean water temperature was calculated as the average of the measured temperatures of all monthly grab samples. The standard deviation of the mean water temperature was calculated as the standard deviation of the measurements divided by the square root of the number of measurements (Fig. S1).
Water temperature class
The water temperature class corresponding to the temperature tolerance trait is defined based on either the maximum morning temperature in summer or the mean maximum in summer, depending on data availability (e.g. Graf et al. 2008). As a proxy for this characteristic temperature, we took the second highest temperature from all monthly grab samples at each site to estimate the temperature class of the site. We assumed a higher uncertainty compared to the mean temperature described by a standard deviation of 3 °C, as the classification into different temperature classes for values close to each other is somehow arbitrary (Fig. S2).
River width
The reach width was included in the hydromorphological assessment according to the Swiss modular concept for stream assessment (Bundesamt für Umwelt (BAFU), 1998, www.modul-stufen-konzept.ch). For the few sites at which hydromorphological assessment was not available, the width was estimated from aerial pictures.
Light intensity
As turbidity measurements are missing, we could not account for light attenuation in the water column. For the light intensity at the water surface at unshaded areas, we used an annual mean value of 125 W m−2 (≈1000 kWh m−2 year−1, Šúri et al. 2007) for all sites, assuming a standard deviation of 5%.
Shading
The fraction of shaded area at each site was taken from observations from AWEL and checked by analysing aerial pictures. We assumed an absolute standard deviation of 5% shading with the exception of one unshaded site where we assumed 0·1% as mean and standard deviation (Fig. S3).
Nutrient concentrations
We took the median of all monthly grab samples for phosphate and the median of the sum of ammonium and nitrate as an indication for the nutrient levels at each site, assuming a relative standard deviation of 5% (Figs S4 and S5).
Suspended particulate organic matter
Measurements of suspended particulate organic matter were not available. For all sites, we assumed a mean value and absolute standard deviation of 1 mg DM L−1, taking into account measured values in the Moenchaltorfer Aa, which is one of the subcatchments in this study (L. Spalinger, unpublished).
Current
(eqn 1)Organic toxicants
For the estimation whether a site is classified as affected by organic toxicants (such as pesticides) or not, information about WWTPs from GIS data was used. In case a WWTP is situated upstream of the site, it was classified as polluted. Additionally, sites were classified as polluted that got a moderate, poor, or bad insecticides evaluation based on substance measurements (AWEL, 2006). If none of the conditions applied to a site, it was classified as ‘not polluted’. Since all sites are within catchments with agricultural as well as urban land use, we cannot be sure that those sites are not affected by organic toxicants (e.g. from agricultural nonpoint sources or combined sewer overflows). Therefore, we assigned a 49% probability that these sites are polluted. For those sites that were already classified as polluted, we were less uncertain, and only assigned a 20% probability that the site is not affected by organic toxicants (Fig. S7).
Water quality class corresponding to the Saprobic System
The water quality class that corresponds to the Saprobic system was estimated from ammonium and oxygen concentrations given in the AWEL data according to Table S9. For the oxygen concentration, the minimum of all samples of each site was used. To account for uncertainty, especially due to the fact that the biological oxygen demand was not available, we assigned 80% probability to the estimated class and 10% to each of the neighbouring or 20% to the only neighbouring class (Fig. S8).
Input of leaf litter
The litter input in gDM m−2 year−1 was estimated from the fraction of the area of the riverbed and of the 10 m wide riparian zone that is covered by broad‐leaved trees from an aerial view analysis (see Appendix S2 for details). We assumed a standard deviation of 20% (Fig. S9).
Microhabitat class
We did not have information about microhabitat/substrate classes in the catchment and could therefore not include the limitation by microhabitat requirements that is implemented in the model.
Regional taxon pool and observations
To account for dispersal limitation, the Glatt valley was divided into four subcatchments [Pfaeffikersee (63 km2, outlet at 441 m.a.s.l.), Moenchaltorfer Aa (51 km2, outlet at 435 m.a.s.l.), Glatt1 (270 km2, outlet at 418 m.a.s.l.) and Glatt2 (417 km2, outlet at 339 m.a.s.l.), Fig. 1], compassing 6–11 measurement sites each. The four areas have a similar land use. The percentage of urban land‐use ranges from 19% (Moenchaltorfer Aa) to 30% (Glatt1). The Moenchaltorfer Aa has 63% agricultural area whereas the three other subcatchments have 41–43% agricultural land use. The percentage of forest cover ranges from 17% (Moenchaltorfer Aa) to 24% (Glatt2). The regional invertebrate taxon pool was estimated for each subcatchment separately and is given in Table S4. It was defined by those taxa that occurred at more than 25% of all sites within the subcatchment and that were identified at least on family level. If a taxon belonged to another taxon on a higher level that was also included in the source pool, we modelled only the higher level taxon (e.g. if an invertebrate family is part of the source pool, species or genera belonging to that family are not modelled separately). As an exception, Baetis and Baetis rhodani were modelled separately, as their traits, especially their sensitivity to organic toxicants, differ. This resulted in five of the 49 taxa being included in the taxon pool of only one subcatchment, 18 included in two subcatchments, eight included in three subcatchments and 18 included in all four subcatchments (Table S4). We assumed that taxa of the taxon pool are not dispersal limited within each subcatchment.
To obtain taxa observations of the chosen source pool taxa comparable to model results and suitable for model calibration, the abundance per square metre of each taxon at each site and sampling date was calculated by dividing the individuals’ counts by the sampling area. For each taxon we checked, if additional taxa on a lower level belonging to the same taxon were observed, which are not part of the regional taxon pool. For these cases, the abundance of the lower level taxon was added to the abundance of the higher level taxon.
Systems Analysis
To estimate the taxonomic composition of the macroinvertebrate community at each site, we evaluated the long‐term steady state of the biomass density for each taxon at each site under stationary environmental conditions, Bss(x,θ). Here, B = (B1,…, Bn) is the vector of biomass densities of all taxa, n is the number of taxa, ss indicates the steady‐state solution of the model equations, x are the environmental conditions (xj for site j), and θ is the parameter vector (see Schuwirth & Reichert 2013 for the details of the equations that are solved to calculate Bss).
We compare model results with observational data by specifying a likelihood function
depending on parameters θ and observations y. The likelihood is defined as the probability distribution for potential observations {yi,j,k} given model parameters θ under the assumption that the model is true. We define yi,j,k = 1 if taxon i is observed and yi,j,k = 0 if it is not observed at site j and sampling date k.
is given by:
(eqn 2)
(eqn 3)
(eqn 4)
of all taxa, sites and sampling dates.
(eqn 5)For each site, we had 4–9 observations from several seasons and several years (between 1995 and 2011). For pobs, we assume 0·5; for pabs0, we assume 0·1; and for ndrift, we assume 0·05 ind m−2 (Fig. 2). Note that it would be possible to make the parameters pobs, pabs0, and ndrift and Kabs taxon, site and sampling date specific to include knowledge about emergence patterns, drift or catastrophic loss due to floods if it is available.

, as:
(eqn 6)
(eqn 7)
) for all taxa i and sites j were produced using the implementation of the truncated normal distribution of the r package truncnorm (Trautmann et al. 2014). Furthermore, we evaluate the number of taxa for which the absolute difference between observed and predicted relative frequency of occurrence is below a threshold that varies between 0 and 1.
To evaluate the predictive capacity of the model, we compare it to a random null model where the predicted frequency of occurrence (fpred,i,j) is binomially distributed with n corresponding to the number of sampling events at site j and p = 0·5. For this model,
= 0·5.
(eqn 8)
(eqn 9)We analysed deficits in the model predictions to identify possibilities for model improvement.
To explore to which degree the integration of different ecological traits and corresponding environmental factors contributed to the predictive power of the model, we subsequently excluded traits (temperature tolerance, current tolerance, sensitivity to organic toxicants and saprobic conditions, feeding types by defining all taxa as omnivorous and combinations of these) and analysed resulting a priori model predictions. For this analysis, we used Monte Carlo simulation with a parameter sample size of 1000.
Due to the large number of uncertain parameters and a nonsmooth surface of the likelihood function with many local maxima, we were not successful in applying Bayesian inference. The complicated shape of the likelihood function (as a function of the parameters for given, actual observations) is caused by the large number of sites and multiple observations per site. Different parameter combinations can lead to similarly good overall results by modelling different taxa at different sites correctly. These parameter combinations can be confined to isolated narrow regions in the parameter space.
To still get insight in how much model predictions could improve by calibration, we analysed the model results of the parameter set with the highest posterior density from Markov chain Monte Carlo runs where we inferred only the taxon‐specific modification factors of the growth rate fgrotax. The global optimization algorithms we tested (Xiang et al. 2013) were not successful in finding parameter values with a higher posterior density. This does not imply that it is the parameter set at the maximum of the posterior; it just highlights the difficulty to find the global maximum. This analysis does not provide any information about the predictive posterior uncertainty but just illustrates the potential of improving the model by calibration. We performed a local sensitivity analysis at the mean of the prior and at the parameter set with the highest posterior density to visualize the surface of the posterior density function at these points in parameter space.
Results
Without any calibration, for 79% per cent of the taxa at the 36 sites, the difference between the observed and predicted relative frequency of occurrence is <50% when taking prior parameter uncertainty and the uncertainty of environmental conditions (=model inputs) into account (Table 1). The median of the absolute difference between the observed and predicted relative frequency of occurrence is 0·24 (Fig. 3). Including or excluding uncertainty of environmental conditions has a minor influence on model results (Table 1, Fig. 3). The overall percentage of taxa with
slightly increases but the prior probability of observations (estimated with the average likelihood) slightly decreases when including uncertainty of environmental conditions (Table 1). More taxa are overestimated than underestimated by the model (Table 1), as indicated by the skewness of the density plot of the difference between the predicted and observed relative frequency of occurrence (Fig. 3). The uncalibrated model performs considerably better than the random model that results in 61% taxa with
.
of the model when propagating prior parameter uncertainty using the best estimate of environmental conditions (A) or including uncertainty regarding environmental conditions (B) and for a random model where fpred,i,j is binomially distributed with p = 0·5 and n = number of sampling events at site j, leading to
. Taxa at sites with
are called ‘overestimated’, taxa with
are called ‘underestimated’
| Parameters | Environ. condition | Case Fig. 3 | Total no. of taxa at sites | No. of taxa with ![]() |
No. of over‐estim. taxa | No. of under‐estim. taxa | % of taxa with ![]() |
Log prior probability of observations (eqs 8 and 9) |
|---|---|---|---|---|---|---|---|---|
| a | b | c | d | b/a | f | |||
| prior dist. | Mean | A | 1233 | 948 | 226 | 59 | 0·77 | −9012 |
| prior dist. | Uncertain | B | 1233 | 979 | 193 | 61 | 0·79 | −9134 |
| Random model | 1233 | 757 | 359 | 117 | 0·61 |

below a certain threshold that varies between 0 and 1.
The different ecological traits and their corresponding environmental factors influence the predictive capabilities of the model to different degrees (Table 2 and Fig. 4), and their effects slightly differ between the four subcatchments. The trait with the largest influence on the fraction of taxa with
is the feeding type in the subcatchments Glatt1 and Glatt2 and the sensitivity against organic toxicants in the subcatchments Pfaeffikersee and Moenchaltorfer Aa.
| Sub‐catchment | incl. all traits | excl. Taa
T, limitation due to temperature stress; S, limitation due to saprobic conditions; C, limitation due to current conditions; Tox, limitation due to organic toxicants; excl. F., ignoring feeding types = all taxa are omnivore.
|
excl. Saa
T, limitation due to temperature stress; S, limitation due to saprobic conditions; C, limitation due to current conditions; Tox, limitation due to organic toxicants; excl. F., ignoring feeding types = all taxa are omnivore.
|
excl. Caa
T, limitation due to temperature stress; S, limitation due to saprobic conditions; C, limitation due to current conditions; Tox, limitation due to organic toxicants; excl. F., ignoring feeding types = all taxa are omnivore.
|
excl. Toxaa
T, limitation due to temperature stress; S, limitation due to saprobic conditions; C, limitation due to current conditions; Tox, limitation due to organic toxicants; excl. F., ignoring feeding types = all taxa are omnivore.
|
excl. Faa
T, limitation due to temperature stress; S, limitation due to saprobic conditions; C, limitation due to current conditions; Tox, limitation due to organic toxicants; excl. F., ignoring feeding types = all taxa are omnivore.
|
excl. T, S, C, Tox | excl. T, S, C, Tox, F |
|---|---|---|---|---|---|---|---|---|
| Pfsee | 0·80 | 0·80 | 0·77 | 0·75 | 0·68 | 0·74 | 0·57 | 0·41 |
| Moench | 0·77 | 0·78 | 0·76 | 0·71 | 0·64 | 0·74 | 0·56 | 0·43 |
| Glatt1 | 0·80 | 0·79 | 0·78 | 0·73 | 0·72 | 0·60 | 0·61 | 0·35 |
| Glatt2 | 0·81 | 0·81 | 0·80 | 0·74 | 0·75 | 0·61 | 0·63 | 0·36 |
| Whole catchment | 0·79 | 0·80 | 0·78 | 0·73 | 0·70 | 0·66 | 0·60 | 0·38 |
| Log prior prob. of obs.bb
Equations 8 and 9.
|
−9134 | −9150 | −9127 | −9437 | −9108 | −11100 | −9423 | −11580 |

below a threshold that varies between 0 and 1.
Summarized over all subcatchments, feeding types followed by sensitivities regarding organic toxicants and tolerances to current conditions have the strongest influence on results; each of them leading to a decrease in the fraction of taxa with
from 79% to 66%, 70% and 73%, respectively (Table 2, Fig. 4). The predicted frequency of occurrence of taxa with low frequency of observation increases due to the exclusion of these traits (Fig. 4a). The exclusion of stress due to temperature and saprobic conditions has a minor influence on model results. The exclusion of all five traits leads to a decrease in the fraction of taxa with
to 45% in all cases which is below the random model.
At the mean of the prior, the model is most sensitive to the universal growth and respiration rate of invertebrates (fcons, fresp), to the taxon‐specific modification factors of the growth rates for Elmis and Riolus (fgrotax) and the parameter determining the increase in death rate for sensitive taxa due to pollution with organic toxicants (forg pollut crit) (Fig. S10).
By calibrating the taxon‐specific multipliers of the growth rates (fgrotax), we can increase the fraction of taxa with |fpred,i,j − fobsi,j| < 0·5 from 66% at the mean of the prior to 86% at the parameter set with the highest posterior density we found (Table S10). The log likelihood increases from −15 717 at the mean of the prior to −6764. This analysis demonstrates the potential for model improvement but does not provide a mechanistic explanation for the deviation of taxa from the metabolic theory of ecology (see discussion below).
Discussion
The Streambugs model performs considerably better than the random model, indicating that (i) species assembly is not a purely random process and (ii) we already have sufficient knowledge about important mechanisms to create a model that is better than random. The majority of taxa have a difference between the mean predicted and observed frequency of occurrence below 50% with the uncalibrated model (79% compared to 61% for the random model). However, the model still has some deficits. Analysing remaining deficits can help us improving the model. Reasons why some taxa are systematically under‐ or overestimated at specific sites can be: (a) incomplete or incorrect trait information, especially for genus or family level taxa, (b) incomplete or incorrect estimation of the environmental conditions, (c) deficiencies in the model structure and process description. Examples for (a) are temperature and microhabitat preferences that are available only for some orders of invertebrates (Ephemeroptera, Plecoptera, Trichoptera, Chironomidae) in the freshwater ecology database that was used in this study. Feeding interactions of taxa are inferred from feeding types and the body size. Better information about feeding links from stable isotope analyses, gut content analyses or a database with observed feeding links (Gray et al. 2014) could further improve this approach.
Regarding (b), we did not have monitoring data for suspended organic matter and used a mean value of measurements from one part of the catchment as input for all sites. Furthermore, we did not have information on the microhabitats (substrate) at the sites; therefore, we could not take into account the limitation of taxa by available microhabitats in this study. The slope of each reach has a big influence on the estimated current conditions. However, we only have this information from digital elevation maps that is not very precise and does not take into account drops.
Regarding (c), some taxa may have specific habitat requirements that are not included in the model yet, for example the sensitivity regarding a clogging of the river bed with fine sediments (Turley et al. 2015). Furthermore, ecotoxicological effects may not be resolved sufficiently yet, taxon‐specific fish predation is not yet explicitly accounted for in our application (but implemented in the model) and just included in the death rate due to missing information on fish densities and taxon‐specific predation. All these factors might contribute to the fact that the density in Fig. 3 is skewed and we have more overestimated than underestimated taxa.
The current version of the model does not include a dispersal process. We account for dispersal limitation of taxa by extracting the modelled taxon pool from observations of the regional pool. To estimate how much this restriction contributes to the predictive capacity of the Streambugs model, we compared it to a random null model, which shows to be significantly inferior. A larger taxon pool including dispersal limited taxa would result in more overestimated taxa (in both the random and the Streambugs model). Differences among the four subcatchments regarding the predictive performance of the model are rather small (Table 2). This could be explained by a comparable range in environmental conditions (Figs S1–S9) and the fact that they belong to the same catchment.
The following taxa were systematically overestimated with the prior distribution of parameters and environmental conditions: Bithynia tentaculata was overestimated at 20 of 21 sites. The temperature tolerance of this taxon is not included in the trait database, and it is coded as indifferent regarding current conditions. However, according to Tachet et al. (2010), it only occurs in standing and slow flowing waters. A revision of the current tolerance (defining this taxon as limno‐ to rheophil instead of indifferent) improves model results for this taxon (increasing the log likelihood from −15 717 to −15 476 and the percentage of taxa with
from 79 to 81) since the sites in this catchment have moderate to high current (Fig. S6). Gammarus pulex was overestimated at 22 of 36 sites, Physella acuta at 14 of 21 sites, Proasellus at all six sites, Radix balthica at 13 of 15 sites, Riolus at 20 of 30 sites and Tabanidae at 12 of 15 sites. For these taxa, information on temperature tolerance is missing. For Tabanidae, current tolerance and saproby is missing as well. Complementing trait information could help improving the model predictions regarding these taxa. The calibration leads to a decrease of the taxon‐specific modification factors of the growth rates for these taxa (Table S10).
The only taxon that was systematically underestimated is Simulium (at 19 of 36 sites). This taxon, a filter feeder, is food‐limited in the model. Better information on the concentration of suspended organic particles as well as on the half‐saturation parameter regarding food limitation would help improving model results.
The analysis of the influence of different traits and their corresponding environmental conditions reveals that the feeding type has the largest influence on the results. Assuming all taxa are omnivore leads to the largest decrease in the likelihood and to the largest decrease in the number of taxa with
in the subcatchments Glatt1 and Glatt2. This highlights the importance of biotic interactions in the model, which are expressed via competition for food sources and predator–prey interactions. Such factors cannot be easily included in statistical habitat models like the RIVPACS approach (Wright, Sutcliffe & Furse 2000). Better knowledge of the actual links in the food web could further improve this approach. Excluding the limitation of sensitive taxa by organic toxicants leads to the second largest decrease in the number of taxa with
over all catchments but not to a significant change of the prior probability of observations, whereas water quality conditions based on organic matter conditions only play a minor role for model results in the Glatt catchment. Most sites in the Glatt catchment are in a water quality class corresponding to β‐mesosaprobic conditions (Fig. S8) which is not limiting for most of the taxa; therefore, this factor has only a minor influence on model results in this catchment. This confirms our expectation regarding the most important water quality issues in Switzerland, where the WWTPs mainly fulfil the standards regarding nutrient and organic matter removal, but do not yet remove micropollutants such as pesticides and biocides from agricultural and urban sources to a sufficient degree. The upgrade of WWTPs with an ozonation or powdered activated carbon step to remove micropollutants is currently a topic of large importance in Switzerland (The Federal Office for the Environment (FOEN), 2009, 2012). The marked influence of organic toxicants on the community composition of macroinvertebrates is in concordance with findings for small streams in Germany, France and Australia (Beketov et al. 2013). The minor influence of temperature tolerance on model results may partly be explained by the discrete classification into four temperature groups, where only the boundary of 18 °C affects model results in the catchment and the fact that this trait information is only available for a part of the community (Ephemeroptera, Plecoptera, Trichoptera, Chironomidae). Including information for missing taxa from other databases (e.g. Tachet et al. 2010 which is more complete but distinguishes only between temperature below and above 15 °C), and furthermore, a better resolution of the information about the temperature sensitivity of taxa might increase the sensitivity of the model to this factor.
Conclusions
The mechanistic model Streambugs allows us to disentangle the effects of biotic interactions and environmental factors on the prediction of the community composition of macroinvertebrates. Even without any calibration, for 79% of the taxa the absolute difference between observed and predicted frequency of occurrence is below 0·5 when taking uncertainty about parameters and environmental conditions into account. With a calibration of taxon‐specific modification factors of the growth rate, we can increase the predictive capability of the model. From the analysis of existing model deficits, we conclude that the trait information should be complemented and refined, for example by including information from other trait databases. To further decrease uncertainty and improve the predictive capability of the model, the estimation of current conditions could be improved (based on measured current velocity) and information on substrate/microhabitats could be included. Our study shows that biotic interactions described by food web processes and competition for food sources have a higher influence on model results than traits that describe abiotic habitat requirements. This highlights the importance of integrating biotic and abiotic factors in a model to predict community assembly of macroinvertebrates in streams and confirms ecological theory on community assembly (Hillerislambers et al. 2012).
Acknowledgements
Monitoring data were kindly provided by the Office for Waste, Water, Energy and Air of the Canton of Zurich (AWEL Zurich). We acknowledge Rosi Siber for GIS analyses and help with Fig. 1. We thank Cédric Mondy, Christian Stamm, Christopher Robinson and Amael Paillex for stimulating discussions. We thank Carlo Albert, Andreas Scheidegger, Mark Honti and Dmitri Kavetski for numerous discussions about numerical algorithms. We acknowledge the constructive comments of three anonymous reviewers on earlier versions of the manuscript. This study was part of the project ‘iWaQa: Integrated river water quality management’ funded by the Swiss National Science Foundation (NRP61 on Sustainable Water Management).
Data accessibility
All data are included in the manuscript and supporting information.
References
Citing Literature
Number of times cited according to CrossRef: 14
- Peter Vermeiren, Peter Reichert, Nele Schuwirth, Integrating uncertain prior knowledge regarding ecological preferences into multi-species distribution models: Effects of model complexity on predictive performance, Ecological Modelling, 10.1016/j.ecolmodel.2020.108956, 420, (108956), (2020).
- James A. Orr, Rolf D. Vinebrooke, Michelle C. Jackson, Kristy J. Kroeker, Rebecca L. Kordas, Chrystal Mantyka-Pringle, Paul J. Van den Brink, Frederik De Laender, Robby Stoks, Martin Holmstrup, Christoph D. Matthaei, Wendy A. Monk, Marcin R. Penk, Sebastian Leuzinger, Ralf B. Schäfer, Jeremy J. Piggott, Towards a unified study of multiple stressors: divisions and common goals across research disciplines, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2020.0421, 287, 1926, (20200421), (2020).
- Julián Andrés Ruiz Toro, Nestor Jaime Aguirre Ramirez, Juan Pablo Serna Lopez, Esnedy Hernández Atilano, Fabio De Jesús Vélez Macías, Energía calórica, biomasa y estructura de los macroinvertebrados acuáticos en la reserva La Nitrera, Concordia, Antioquia, Colombia, Acta Biológica Colombiana, 10.15446/abc.v25n1.76435, 25, 1, (29-36), (2020).
- Sally Bracewell, Ralf C.M. Verdonschot, Ralf B. Schäfer, Alex Bush, David R. Lapen, Paul J. Van den Brink, Qualifying the effects of single and multiple stressors on the food web structure of Dutch drainage ditches using a literature review and conceptual models, Science of The Total Environment, 10.1016/j.scitotenv.2019.03.497, (2019).
- Andreas Bruder, André Frainer, Thibaut Rota, Raul Primicerio, The Importance of Ecological Networks in Multiple-Stressor Research and Management, Frontiers in Environmental Science, 10.3389/fenvs.2019.00059, 7, (2019).
- Minar Naomi Damanik-Ambarita, Pieter Boets, Hanh Tien Nguyen Thi, Marie Anne Eurie Forio, Gert Everaert, Koen Lock, Peace Liz Sasha Musonge, Natalija Suhareva, Elina Bennetsen, Sacha Gobeyn, Tuan Long Ho, Luis Dominguez-Granda, Peter L.M. Goethals, Impact assessment of local land use on ecological water quality of the Guayas river basin (Ecuador), Ecological Informatics, 10.1016/j.ecoinf.2018.08.009, (2018).
- Minar Damanik-Ambarita, Gert Everaert, Peter Goethals, Ecological Models to Infer the Quantitative Relationship between Land Use and the Aquatic Macroinvertebrate Community, Water, 10.3390/w10020184, 10, 2, (184), (2018).
- Patrick L. Thompson, Megan M. MacLennan, Rolf D. Vinebrooke, An improved null model for assessing the net effects of multiple stressors on communities, Global Change Biology, 10.1111/gcb.13852, 24, 1, (517-525), (2017).
- Mira Kattwinkel, Peter Reichert, Bayesian parameter inference for individual-based models using a Particle Markov Chain Monte Carlo method, Environmental Modelling & Software, 10.1016/j.envsoft.2016.11.001, 87, (110-119), (2017).
- Michael Ørsted, Mads Fristrup Schou, Torsten Nygaard Kristensen, Biotic and abiotic factors investigated in two Drosophila species – evidence of both negative and positive effects of interactions on performance, Scientific Reports, 10.1038/srep40132, 7, 1, (2017).
- Cédric P. Mondy, Nele Schuwirth, Integrating ecological theories and traits in process‐based modeling of macroinvertebrate community dynamics in streams, Ecological Applications, 10.1002/eap.1530, 27, 4, (1365-1377), (2017).
- Amael Paillex, Peter Reichert, Armin W. Lorenz, Nele Schuwirth, Mechanistic modelling for predicting the effects of restoration, invasion and pollution on benthic macroinvertebrate communities in rivers, Freshwater Biology, 10.1111/fwb.12927, 62, 6, (1083-1093), (2017).
- Simone D. Baumgartner, Christopher T. Robinson, Changes in macroinvertebrate trophic structure along a land-use gradient within a lowland stream network, Aquatic Sciences, 10.1007/s00027-016-0506-z, 79, 2, (407-418), (2016).
- Mira Kattwinkel, Peter Reichert, Johanna Rüegg, Matthias Liess, Nele Schuwirth, Modeling Macroinvertebrate Community Dynamics in Stream Mesocosms Contaminated with a Pesticide, Environmental Science & Technology, 10.1021/acs.est.5b04068, 50, 6, (3165-3173), (2016).






