Machine learning algorithms to infer trait-matching and predict species interactions in ecological networks

1. Ecologists have long suspected that species are more likely to interact if their traits match in a particular way. For example, a pollination interaction may be more likely if the proportions of a bee's tongue fit a plant's flower shape. Empirical estimates of the importance of trait-matching for determining species interactions, however, vary significantly among different types of ecological networks. 2. Here, we show that ambiguity among empirical trait-matching studies may have arisen at least in parts from using overly simple statistical models. Using simulated and real data, we contrast conventional generalized linear models (GLM) with more flexible Machine Learning (ML) models (Random Forest, Boosted Regression Trees, Deep Neural Networks, Convolutional Neural Networks, Support Vector Machines, naïve Bayes, and k-Nearest-Neighbor), testing their ability to predict species interactions based on traits, and infer trait combinations causally responsible for species interactions. 3. We found that the best ML models can successfully predict species interactions in plant–pollinator


| INTRODUC TI ON
The understanding and analysis of species interactions in ecological networks has become a central building block of modern ecology. Research in this field, however, has concentrated in particular on analyzing observed network structures (e.g. Galiana et al., 2018;González, Dalsgaard, & Olesen, 2010;Mora, Gravel, Gilarranz, Poisot, & Stouffer, 2018;Poisot, Stouffer, & Gravel, 2015). Our understanding of why particular species interact, and others not, is comparatively less developed (cf. Bartomeus et al., 2016;Poisot et al., 2015). A key hypothesis regarding this question is that species interact when their functional properties (traits) make an interaction possible (e.g. Eklöf et al., 2013;Jordano, Bascompte, & Olesen, 2003). In plant-pollinator networks, for example, one would imagine that an interaction is easier to achieve when the tongue or body of the bee matches with the shape and size of the flower (Garibaldi et al., 2015;Stang, Klinkhamer, & van der Meijden, 2007). The idea that interactions will occur when traits are compatible is known as trait-matching (e.g. Schleuning, Fründ, & García, 2015, see also Figure 1).

F I G U R E 1
An illustration of the trait-matching concept. (a) Two classes of organisms, each with their own traits, interact in a bipartite network. (b) The goal of the statistical algorithm is to predict the probability of a plantpollinator interaction, based on their trait values and (c) to infer the trait-trait interaction structure (trait-matching) that is causally responsible for those interactions Progress on these questions is complicated by the fact that, until very recently, analyses of empirical networks relied almost exclusively on conventional regression models and phylogenetic predictors (Brousseau et al., 2018;Pearse & Altermatt, 2013;Pomeranz et al., 2019), or on simple regression trees (e.g. Berlow et al., 2009).
Reasonable doubts exist as to whether these models are flexible enough to capture the way traits give rise to interactions (see e.g. Mayfield & Stouffer, 2017). Machine Learning (ML) models could be a solution to this problem. Modern ML models can flexibly detect interactions between predictors (trait-trait interactions), depend on fewer a-priori assumptions and usually achieve higher predictive performance than traditional regression techniques (e.g. Breiman, 2001b can successfully predict food web interactions. It therefore seems promising to further explore the performance of machine learning algorithms for predicting species interactions from measurable traits, and whether those more flexible models change our view on the importance of trait matching for plant-pollinator interactions. When assessing the suitability of ML algorithms for this problem, it is important to note that, while ML models tend to excel in predictive performance, their interpretation is often challenging (e.g. Ribeiro, Singh, & Guestrin, 2016). Ecologists, however, would likely not be satisfied with predicting species interactions, but would also want to know which traits are causally responsible for those interactions, for instance due to their importance as essential biodiversity variables (see Kissling et al., 2018). Unlike for statistical models, however, fitted ML models typically provide no direct information about how they generate their predictions. In recent years, also in response to issues such as fairness and discrimination (see Olhede & Wolfe, 2018), techniques aiming at interpreting fitted ML models have emerged (e.g. Guidotti et al., 2018). For example, permutation techniques (Fisher, Rudin, & Dominici, 2018) allow estimating the importance of predictors for any kind of model, similar to the variable importance in tree-based models (Breiman, 2001a). In this case, however, we are not primarily interested in the effects of a single predictor, but we want to know how interactions between predictors (trait-traitmatching) influence interaction probabilities. A suggested solution to this problem is the H-statistic (Friedman & Popescu, 2008), which uses partial dependencies to estimate feature-feature (trait-trait) interactions from fitted ML models. Assuming that networks emerge due to a few important trait-trait interactions (Eklöf et al., 2013), the H-statistic should be able to identify those from a fitted ML, but to our knowledge, the efficacy of this or similar techniques in inferring causal traits has not yet been demonstrated.
The purpose of this study is to (a) systematically assess the pre-

| Machine learning models for predicting species interactions from trait-matching
Throughout this paper, we consider that empirical observations of species interactions may be available either as binary (presenceabsence) or weighted (counts, intensity, interaction frequencies) data. The objective for the models is to predict those plant-pollinator interactions based on the species' traits. We selected seven classes of ML models, either because they were previously used for trait-matching, or because the general ML literature suggests that they should perform well for this task (Table 1). For more details on the respective models, see the column 'Design principle' and the cited literature in Table 1, and the Supporting Information S1 in the Appendix.
Each of the ML models in Table 1 includes model-specific tuning parameters (so-called hyperparameters, for instance to control the model's learning behaviour) that can be adjusted by hand or optimized. To factor out idiosyncrasies due to the choice of these parameters, we optimized each models' hyperparameters with a random search in 30 (20 for empirical data) steps (see also Bergstra & Bengio, 2012), with nested cross-validations to avoid overfitting (for details see Appendix S1). Furthermore, ML models often perform poorly with imbalanced classes (proportion of plant-pollinator interactions to no plant-pollinator interactions is extremely low/high, Krawczyk, 2016). To address this, we applied the standard approach

| Simulating plant-pollinator interactions
To assess predictive and inferential performance of the models, we

| Predicting species interactions in simulated plant-pollinator networks
To assess predictive performance, we simulated reference data with six traits for each plant and pollinator. A possible issue with measuring predictive performance is that hidden correlations or structure in the data can lead to seemingly higher-than-random predictive performance even on random data (e.g. Roberts et al., 2017). To check that this is not the case, we created a first baseline scenario, consisting of equal species abundances and no trait-trait interactions (no trait-matching, the latter was achieved by setting the trait-trait interaction niche extremely wide). A second issue is that interactions of rare species will be less frequent than those of abundant species. As a result, models can achieve higher-than-random performance even without any trait-trait interactions when species abundances are TA B L E 1 Machine learning models and their usage for trait-matching

ML models Type Design principle Applied with trait-matching
Random forest (RF) Tree-based Ensemble of a finite number of regression trees (see Breiman, 2001a).
Desjardins-Proulx et al. ( 2017), Ryo and Rillig (2017) and Hu, Li, Yang, Shen, and Yu ( 2016) Boosted regression trees (BRT) Tree-based After fitting the first weak regression tree to the response, subsequent regression trees are fitted on the previous residuals (see Friedman, 2001).

Distancebased
In the n-dimensional feature space, a hyperplane to separate the classes is fitted (see Cristianini & Shawe-Taylor, 2000). Fang et al. (2013) Deep neural networks (DNN) Neural networks By learning to represent the input over several hidden layers, they are able to identify the patterns in the data for the task Wen et al. (2017) Convolutional neural networks (CNN) Neural networks Topological patterns in the input space (images, sequences) are preserved and processed by a number of kernels to extract features (see LeCun et al., 2015).
Liu, Tang, Chen, and Wang (2016) Naive Bayes Probabilistic classifier The model learns the probability belonging to a class given a specific input vector.
Fang et al. (2013) GLM Parametric A specific theory or model is fitted to the data Pomeranz et al. (2019) uneven. To ensure that the performance of our models exceeds these trivial performance levels, we created a second baseline scenario with exponential abundance distributions, but without trait-matching.
For the trait-matching scenario, we simulated networks with even abundance distributions and three trait-trait interactions (A1-B1, A2-B2, and A3-B3), each with a weight of 10. The scale parameter controlling the niche width was randomly sampled between 0.5 and 1.2 for simulating varying degrees of specialization in ecological networks (cf. Blüthgen et al., 2007). The even abundance distributions assumed here are unrealistic to some extent, but allow a better contrast between the models (because abundance effects are removed). In the case studies, we consider real abundance distributions. Other than that, the trait-matching scenario used the same parameter settings as the baseline scenarios (network size 50*100, ≈40% class balance). To test additionally for the effect of network sizes and observation time, we also varied network size to 25*50 and 100*200 (plants*pollinators) setting and proportions plant-pollinator interactions to ≈10%, ≈25%, and ≈40% one-factor-at-a-time from the base setting.  Tables S1 and S2 for detailed information). When traits for a species were available from different sources, they were averaged. We filled missing trait values with a multiple imputation algorithm based on random forest (Stekhoven & Bühlmann, 2012). We used all available traits as predictors in our models.

| Measures of predictive performance
To assess the models' predictive performance on the simulated plant-pollinator networks, we used the area under the receiver operating characteristic curve (AUC, measures how well the models are able to distinguish between plant-pollinator interaction and no plant-pollinator interaction regardless of classification threshold) and true skill statistic (TSS, which assess the predictive performance under a specific classification threshold, see Allouche, Tsoar, & Kadmon, 2006) for presence-absence, and spearman's correlation for interaction frequencies. Because the TSS for the empirical plant-pollinator database (case study 1) was similar, we additionally calculated classification threshold-dependent performance measurements: accuracy (proportion of correct predicted labels), sensitivity (recall), precision, and specificity (true negative rate).
Classification thresholds were optimized with TSS. The interpretation of these statistics is as follows: if our focus is to detect plantpollinator interactions, we want to achieve a high true positive rate (sensitivity) with an acceptable rate of false positives in the as true predicted labels (precision). Specificity estimates the rate of true negatives of all predicted negatives (no plant-pollinator interaction).

| Inferential performance in simulated plantpollinator networks
To assess the accuracy with which causal trait combinations can be identified from the fitted models via Friedman's H-statistic, we con- interactions between the two species' groups. We calculated for each, the averaged true positive rate (true trait-trait interaction in found interactions with highest H-statistic) over the eight/ten repetitions. In a second step, based on our interim results (see results), we repeated the procedure with BRT and DNN for 50*100 (plants*pollinators) simulated networks (see Appendix S1 for details regarding model fitting).
For GLMs, we selected the n (n = number of true trait-trait interactions) predictors with lowest p-value to calculate the true positive rate.

| Case study 2-Inferring trait-matching in a plant-hummingbird network
As a case study for inferring causally responsible traits, we used a To predict plant-pollinator interactions, we used bill length, bill curvature, body mass, wing length, and tail length of hummingbirds, and corolla length, corolla curvature, inner corolla diameter width, and external corolla diameter width of plants. Flower volume was calculated by corolla length and external diameter (Maglianesi et al., 2014). We used all available traits because the ML models should automatically learn trait-trait interactions.
We fitted the BRT with a Poisson maximum likelihood estimator and RF with a root mean squared error (RMSE) objective function (we did not log count data). We optimized DNNs with Poisson and negative binomial likelihood loss functions. We trained models on each elevation and on combined elevations (e.g. Low, Mid, High, Low-Mid-High, for details see Appendix S1). We calculated for the

Low, Mid, High and Low-Mid-High models interaction strengths (H-statistics) for all possible trait-trait interactions (with trait-trait
interactions within hummingbird/plant group). We checked the eight trait-trait interactions with highest interaction strengths for their biological plausibility by reviewing relevant literature.

| Predictive performance in simulated plantpollinator networks
In While all models improved their predictive performance with increasing network sizes with count data (Figure S2c), only DNN, RF, and BRT improved their performance with increasing network sizes with presence-absence plant-pollinator interactions ( Figure S2a,b).
Prolonging the observation time (i.e. creating more plant-pollinator interactions and thus reducing data imbalance) generally increased the models' performances ( Figure S2d-f).

| Predicting species interactions in a global crop-pollination database
After fitting the models to real data from a global crop-pollination database, we calculated AUC, TSS and additional performance measures ( Figure 3, Table S4) on the left-out samples. kNN achieved the highest TSS (0.36), RF achieved the highest AUC (0.73), and naïve Bayes achieved highest TPR, followed by CNN. Overall, RF achieved the overall best predictive performance with highest AUC and second highest TSS (Figure 3, Table S4).

| Inference of causal trait-trait interactions in simulated networks
In the second analysis step, we tested the ability of the H-statistics When increasing network size (from 25*50 to 50*100), DNN and BRT improved their overall performance to 70%-95% and 87%-98% for presence-absence networks ( Figure S3a), but showed a lower TPR for count data ( Figure S3b).

| Inference of causal trait-trait interactions in a plant-hummingbird network
In a second case study, we computed interaction strength (H-statistic) for all possible trait-trait interactions in plant-hummingbird networks ( Figure S5). The seven trait-trait interactions with highest interaction strength were identified by RF (Figure 5b). These interactions also achieved highest predictive performance ( Figure S4). The four traittrait interactions with highest interaction strength identified by BRT were in accordance with the ones that RF identified ( Figure S5).
RF and BRT identified corolla length-bill length, corolla curvature-bill length, inner diameter-bill length, and external diameterbody mass as the most important trait-trait interactions (Figure 5b, Figure S5). The models identified varying trait-trait interactions for networks at different elevations, but corolla and bill associations tended to be most important across elevations ( Figure S5).

| D ISCUSS I ON
We assessed the ability of seven ML models, plus GLM as a reference, to predict plant-pollinator interactions based on their traits.
In a second step, we tested whether it is possible to identify the causally responsible trait-trait interaction structure (trait-matching) from the fitted models. Our main results are that the best ML models (RF, BRT, and DNN) outperform GLMs to a substantial degree in predicting plant-pollinator interactions from their traits, and that it is possible to identify the trait-trait interactions causally responsible for plant-pollinator interactions from the fitted models with satisfying accuracy. The best ML models outperformed the simpler GLMs particularly for more complex trait-trait interaction structures, for which GLM performance dropped sharply.

| Comparison of performance in predicting species interactions
In our analysis of predictive performance, we found that ML models such as RF, BRT and DNNs exceeded GLM performance for predicting plant-pollinator interactions from trait-matching data. They also worked surprisingly well with small network sizes (25*50, Figure   S2a), such that performance did not increase substantially for larger networks (50*100, 100*200, Figure S2a-c).
An important point, also for comparing our performance indicators to the literature, is that all algorithms can achieve higher than naïve random performance (e.g. AUC of 0.5) when species distributions are uneven, even when plant-pollinator interactions are not tied to traits (Figure 2). These results, in line with earlier findings (e.g. Aderhold, Husmeier, Lennon, Beale, & Smith, 2012;Canard et al., 2014), highlight the importance of considering abundance when analyzing network structures: frequent species tend to have more observed interactions, and this effect might interfere with the trait-matching signal (e.g. Olito & Fox, 2015). While the trait-matching effect may influence which plant-pollinator interaction is feasible, the species abundance effect determine the actual observed plant-pollinator interactions. Without adjusting observed F I G U R E 5 (a) Elevation profile for the three plant-hummingbird networks in Costa Rica (details see Maglianesi et al., 2014). (b) The eight strongest traittrait interactions (blue-yellow gradient) inferred with the H-statistic from RF models fitted to the combined planthummingbird network (colors code the ranking of strengths). Corolla length-bill length and corolla curvature-bill length had the highest interaction strength plant-pollinator interactions for species abundances, it is difficult to separate the contributions of abundance and trait-matching to predictive performance (Olito & Fox, 2015).
Observation time and type are further critical factors in ecological network analysis. Short observation times often lead to sparse networks with many unobserved plant-pollinator interactions, potentially creating biases in the analysis. Moreover, few plant-pollinator interactions result in data with imbalanced class distributions, presenting challenges for many ML methods (Krawczyk, 2016), which is also reflected in our results ( Figure S2d-f). On the other hand, too long observation times could also negatively affect predictive performance, in particular when using binary links. The reason is that, given sufficient time, even weak links will be included in the network, potentially reducing the models' ability to identify the essential traits. Count data are more robust to these problems, and as our approach is equally applicable with count data, this data type seems generally preferable (see also Dormann & Strauss, 2014).
While the ML models detected the important trait-trait interactions automatically, GLMs were pre-specified with all possible two-way trait-trait interactions. To check that the resulting high complexity did not disadvantage them unduly, we additionally confirmed that AIC selection on their interaction structure did not increase their predictive performance. We therefore believe that their lower performance is either explained by the fact that GLMs are not flexible enough to capture the complex form of the trait-matching structures (see also Mayfield & Stouffer, 2017), or that ML methods are more successful than AIC variable selection in addressing overfitting induced by the high combinatorial number of possible trait-trait interactions. These results mirror findings in the literature: while a few studies showed that GLM can predict species interactions based on trait-matching (e.g. Gravel et al., 2013), most studies struggled in predicting species interactions with the trait-matching signal alone (e.g. Brousseau et al., 2018;Pearse & Altermatt, 2013;Pomeranz et al., 2019). We speculate based on our results that previous studies based on GLMs may have underestimated the importance of trait-matching considerably, unless a very small number of trait-trait interactions (1-2) is dominantly responsible for the structure of the networks.
Previous studies often showed improved performance in predicting species interactions by using phylogenetic predictors, serving as proxies for unobserved traits (see Morales-Castilla, Matias, Gravel, & Araújo, 2015). The drawback, however, is that such phylogenetic proxies can be hard to interpret in the context of specific ecological hypothesis of why species interact (see Díaz et al., 2013). For example, a phylogenetic signal could arise both as a result of trait-matching (because traits tend to be phylogenetically conserved), or as a result other genetically coded preferences for particular interactions that are not accessible as traits. Based on our results, we expect that the relative importance of phylogenetic proxies will decrease when using appropriate ML models, which could help to better explore to what extent species interactions are determined by measurable functional traits.
We found that the models' predictive performance was lower for the empirical plant-pollinator database than for the simulated networks. There are several plausible reasons for this. Firstly, trait-matching rules may change over scales (Poisot et al., 2015).
As the database consists of globally observed plant-pollinator interactions, this may complicate the identification of a common trait-matching signal. Secondly, the high share of discrete predictors and high-class imbalance is likely to negatively affect the predictive performance. Despite these obstacles, kNN, RF, and CNN achieved >0.3 TSS, and CNN and RF >70% AUC (Figure 2, Table S4), much higher than null expectation, and consistent with results from the simulated networks. While it may be possible to improve GLM performance by manual selection of predictors, we also find that the case study highlights that algorithms such as RF and BRT are more parsimonious and robust in their use than a GLM which further suffered convergence problems.

| Causal inference of trait-matching
To infer trait-trait interactions causally responsible for species interactions, we used the H-statistics. We found that this method, coupled with RF, DNN and BRT, could identify around 90% of the true trait-trait interactions in simulated plant-pollinator networks ( Figure 4, Figure S5). Increasing the network size improved the detection accuracy of true trait-matches for BRT and DNN ( Figure S3).
When increasing the number of trait-trait interactions, the approach outperformed GLMs (Figure 2).
Our results demonstrate that identifying trait-matching from fitted models with the H-statistic works, but it also comes with drawbacks. The H-statistic depends on partial dependencies (Friedman & Popescu, 2008) and is therefore sensitive to collinearity (see Apley, 2016 Kress, 2009;Maglianesi et al., 2014;Vizentin-Bugoni, Maruyama, & Sazima, 2014;Weinstein & Graham, 2017). Collinearity of traits likely explains other matches. For instance, body mass is positively correlated with tail length, explaining why corolla volume was associated with tail length. These results further support the view that it is possible to infer trait-matching with ML in ecologically realistic settings, without a priori assumptions.
Estimated trait-trait interactions in the plant-hummingbird networks differed for the three elevations, but the match of corolla length-bill length was generally most important ( Figure S5). Maglianesi et al. (2014) and Maglianesi, Blüthgen, Böhning-Gaese, and Schleuning (2015) reported similarly varying trait-trait interactions in plant-hummingbird networks across elevations, consistent with our results. While interactions in ecological networks vary over scales (Poisot et al., 2015), a common backbone is assumed (Mora et al., 2018). With corolla length-bill length, identified by RF and BRT with highest interaction strength (Figure 5b, Figure S5), we speculate that we identified with ML the central trait-matching phenomenon in plant-hummingbird networks.

| CON CLUS IONS
In conclusion, our study demonstrates that RF, BRT, and DNN exceeded GLM performance in predicting plant-pollinator interactions from trait information. ML models could also identify causally responsible trait-trait interactions with a higher accuracy than GLMs.
The ability to automatically extract species interactions from observed networks and traits, and causally interpreting the underlying trait-trait interactions, makes our approach, which we provide in an r package, a powerful new tool for ecologists. While we considered only plant-pollinator networks in this study, our method could be applied to other types of species interaction networks such as any mutualistic and antagonistic interactions in complex food webs (this is also supported by Desjardins-Proulx et al., 2017). In either of these ecological network types, there are ample opportunities for further analyses, for example how species interactions will change under global change or how species interactions will rewire in novel communities with reshuffled species and trait composition (Bailes et al., 2015;see Kissling & Schleuning, 2015). By identifying crucial rules of trait-matching between species, our approach can give insights into how biotic interactions shape community assembly and also contribute to the identification of Essential Biodiversity Variables in the context of global change (Kissling et al., 2018).

DATA AVA I L A B I L I T Y S TAT E M E N T
The plant-hummingbird data associated with this study is available