New robust weighted averaging‐ and model‐based methods for assessing trait–environment relationships

Statistical analysis of trait–environment association is challenging owing to the lack of a common observation unit: Community‐weighted mean regression (CWMr) uses site points, multilevel models focus on species points, and the fourth‐corner correlation uses all species‐site combinations. This situation invites the development of new methods capable of using all observation levels. To this end, new multilevel and weighted averaging‐based regression methods are proposed. Compared to existing methods, the new multilevel method, called MLM3, has additional site‐related random effects; they represent the unknowns in the environment that interact with the trait. The new weighted averaging method combines site‐level CWMr with a species‐level regression of Species Niche Centroids on to the trait. Because species can vary enormously in frequency and abundance giving diversity variation among sites, the regressions are weighted by Hill's effective number (N2) of occurrences of each species and the N2‐diversity of a site, and are subsequently combined in a sequential test procedure known as the max test. Using the test statistics of these new methods, the permutation‐based max test provides strong statistical evidence for trait–environment association in a plant community dataset, where existing methods show weak evidence. In simulations, the existing multilevel model showed bias and type I error inflation, whereas MLM3 did not. Out of the weighted averaging‐based regression methods, the N2‐weighted version best controlled the type I error rate. MLM3 was superior to the weighted averaging‐based methods with up to 30% more power. Both methods can be extended (a) to account for phylogeny and spatial autocorrelation and (b) to select functional traits and environmental variables from a greater set of variables.


| INTRODUC TI ON
Functional traits are useful in going beyond species-based understanding of community assembly processes (McGill, Enquist, Weiher, & Westoby, 2006;Shipley, 2010a). They allow predicting species' responses to environmental and trait variation, even if the trait-environment relationship is derived from data in which the species of interest were rare or even absent (Funk et al., 2017;Lavorel & Garnier, 2002). However, the selection and testing of relevant traits and environmental variables from data is statistically challenging, because they lack a common observation unit: traits are observed on species, environmental variables on sites, and the mediating abundances on species-site combinations. The typical data format is shown in Figure 1.
A main issue in all statistical approaches to assess trait-environment association is the number of statistical units. Figure 2 contrasts three standard methods: Community-weighted mean regression (CWMr), the multilevel model, and the fourth-corner correlation. In the popular CWMr, there are as many points as there are sites (n) ( Figure 2a). In the model-based analyses proposed by Pollock, Morris, and Vesk (2012) and Jamil, Ozinga, Kleyer, and ter Braak (2013), that is the first and second multilevel models (MLM1 and MLM2) in Table 1, there are as many points as there are species (m). MLM1 and MLM2 focus on the variability in the species response to the environmental variable. This is illustrated in Figure 2b which plots species-specific regression coefficients against the trait. Finally, the fourth-corner correlation, a weighted correlation between trait and environmental variable, can be displayed as a plot of n × m points corresponding to all species-site combinations (Figure 2c). It is worth noting that all three methods use all the available data, but in some way or another, the points not represented are either averaged out or, in the case of the fourth-corner correlation, not yet calculated.
For instance MLM1 and MLM2 allow for the prediction and simulation of abundances from the model in order to construct a graph like Figure 2a. However, the n points so calculated do not play a role in the statistical assessment of the trait-environment association.
Peres-Neto, Dray, and ter Braak (2017) and Miller, Damschen, and Ives (2019) showed that CWMr with its n points (Figure 2a) detects traits to be associated with environmental variables far too often in the null situation of no association (see Zelený (2018) for a review). The generalized linear model (GLM) model proposed by Brown et al. (2014) and Warton, Shipley, and Hastie (2015), denoted as MLM2(glm) in Table 1, essentially uses n points and is too liberal as well Miller et al., 2019). Of course, MLM2(glm) uses all data, but statistical inference is based on bootstrapping the residuals of n sites. For the maximum entropy approach (maxent) that is related to MLM2(glm) (Warton, Shipley, et al., 2015), Shipley (2010b) proposed inference via permutation of trait values (i.e. of species). Later, ter  argued that valid detection of trait-environment association by MLM2(glm) (i.e. a model without random effects) requires both site-and species-level tests to make up for the missing randomness. Each test results in a p-value and both p-values are then combined by taking the maximum, which is known as the max test (ter Braak, Cormont, & Dray, 2012).
At this point, it is not clear whether MLM1 and MLM2 (Figure 2b) are valid methods for detecting and selecting traits that interact with particular environmental variables. Does the statistical machinery of generalized linear mixed models (GLMM) overarching MLM1 and MLM2 prevent Type I error rate inflation? Is the scatter of the random species-specific effects around the line a sufficient basis for inference ? Miller et al. (2019) showed already that MLM1 and MLM2 may give elevated Type I error rates, supporting the idea that current applications of GLMM to test for trait-environment association may not be as robust as assumed. Miller et al. (2019) were the first to perform a simulation-based comparison of the computationally elaborate multilevel methods and the simple weighted averaging methods, such as CWMr and fourth-corner. Miller et al. (2019) introduced the term weighted correlation methods, but I prefer the term weighted averaging (WA) because of its efficiency in ecological niche models (ter Braak & Barendregt, 1986;ter Braak & Looman, 1986) and its common use F I G U R E 1 The data needed to assess trait-environment association between a single trait t and a single environmental variable e with Y the table with abundance values of species across sites. Derived statistics, which are important in the WA-based methods, are the community weighted means, which are sitespecific trait means, and the species niche centroids, which are species-specific environmental means. The size of circles (black for species, white for sites and grey for species-site combinations) is proportional to the value in the reconstruction of palaeo environments (Birks, Line, Juggins, Stevenson, & ter Braak, 1990;Juggins & Birks, 2012). In the simulations by Miller et al. (2019) the site-level only methods, CWMr and MLM2(glm), had severely inflated Type I error rate, in agreement with recent results Peres-Neto et al., 2017). Miller et al. (2019) studied the association between the trait leaf carbon-to-nitrogen ratio (C:N) and the environmental variable topographic moisture gradient (TMG) in a subset of the Whittaker Siskiyou Mountains Revisit data (Damschen, Harrison, & Grace, 2010). In this paper, 'Revisit data' refer to this subset. After all their analyses, with the relatively well-behaving methods resulting in p-values just below or above .05, Miller et al. (2019) posed the question 'is there a C:N-TMG association'? and their answer was 'most likely not'.  to have different overall abundance; 0 e i gives the mean response of species to the environmental gradient e = {e i }(site i = 1,...,n); c j e i allows species to differ in their response to the gradient e; 0 t j gives the mean response of abundance to the trait t = {t j }; d i allows sites to differ in abundance independently from the gradient e; b i t j allows the abundance to respond differently to the trait t in different sites; te t j e i gives the trait-environment association, which is the target of statistical testing.
c This model has fixed effects only and is thus a GLM, but inference proceeds by bootstrapping residuals of n sites, making it multilevel.
TA B L E 1 Parameters and terms in the linear predictor of GLMMs, also called multilevel models (MLMs), for trait-environment association (f/r = fixed/ random)

| Dataset
The data are from the revisit to the low-elevation, non-serpentine sites of Robert Whittaker's historic plant community study sites in the Siskiyou Mountains of Southwest Oregon (Damschen et al., 2010;Whittaker, 1960). Briefly, the data consist of the abundance of 75 species in 52 sites, calculated from the number of 100 quadrat corners per site that each species intersected (Miller et al., 2019). The environmental variable and functional trait are standardized versions of the original topographic moisture gradient (TMG) and the leaf carbon-to-nitrogen ratio (C:N), respectively (Miller et al., 2019). Note that Miller et al. (2019) were not able to find convincing statistical evidence for a C:N-TMG association in the data based on methods reviewed in this paper.

| Multilevel model approaches (MLM1, MLM2, MLM3)
This whereas b * i and c * j denote regression coefficients with respect to the trait and environmental variable that may depend on site and species respectively. Finally, d i represents the site-specific error, which accounts for unobserved site-level variables that influence all species in the same way and a j accounts for differential abundance among species. If b * i = 0 and c * j = 0 , then Equation 1 is a main effects only model, but in general the coefficients b* and c* may be site-and species-specific. Trait-environment association is introduced by submodels that are linear in e and t, where 0 and 0 are intercepts and ′ te and ′′ te are slopes with respect to {e i } and {t j } respectively; b i and c j are the errors in these submodels, taken to be independent with variances 2 b and 2 c respectively. It should be noted that this can be generalized to account for spatial and phylogenetic (auto)correlation (Li & Ives, 2017;Ovaskainen et al., 2017). Figure 3 shows these models for the Revisit data. Insertion of these submodels into Equation 1 gives the MLM3 model: with te = � te + �� te the usual trait-environment interaction coefficient. Table 1 describes the purpose of each term. There is no t-e association under the null hypothesis H 0 : te = 0, while under the alternative (1) In Appendix S7, unrestricted count data are modelled similarly with a Poisson or a negative binomial response distribution with log link.
Compared to MLM1 (Miller et al., 2019;Pollock et al., 2012), MLM3 has the extra terms d i + ( 0 + b i )t j , that is the main trait effect and two site-specific terms (Table 1) if needed, despite the fact that these could in principle be absorbed in the random species-and site-specific random intercepts  The multisite maxent approach to community assembly by trait selection (CATS) (Shipley, 2010a(Shipley, , 2010b) is a special case of Equation 1 in which a j is a known offset that accounts for the abundances in the local species pool, c * j = 0 for each j, and b * i and d i are treated as a fixed effects. This maxent approach thus lacks explicit environmental variables, but, as MLM3, includes site-specific trait effects. Warton, Shipley, et al. (2015) extended CATS to include environmental variables resulting in MLM(glm).

| Weighted averaging-based methods (N 2 -weighted regressions)
The best known WA-based method is CWMr, a site-level method which detects trait-environment association far too often when no such association exists (Miller et al., 2019;Peres-Neto et al., 2017). This defect can be ameliorated (ter Braak, Peres-Neto, et al., 2018) by combining CWMr with a species-level method in which SNC is regressed against the trait (SNCr). The SNC is a species-specific environmental mean ( Figure 1) Effectively, however, this SNC is an average of almost one value because of the dominant 50. The Hill number N 2 expresses this notion (Hill, 1973;ter Braak & Verdonschot, 1995). In formula, the N 2 -number of species j is with y +j = ∑ i y ij , the species total. In the example, N 2j = 1.08. When all non-zero abundances of a species are equal, N 2j is simply the number of non-zero values. Hill numbers of order other than 2 appear less relevant to weighted averages. The N 2 -number for sites is the inverse of the Simpson index (Hill, 1973). When all species present in a site have equal abundance, its N 2 -number equals richness. Note that the The correlations that follow directly from CWMr and SNCr are not suited for measuring trait-environment association; they can be overly variable and unstable and also differ between the two regressions (Peres-Neto et al., 2017). The fourth-corner correlation is a better measure but is, in terms of weights in CWMr and SNCr, restricted to site and species totals. Appendix S3.5 generalizes the fourth-corner correlation to a general weighting system. In particular, it proposes a definition for the N 2 -weighted and unweighted fourth-corner correlations.

| Data analysis and simulations
The trait and environmental variables were standardized to zero The MLM3 model was used to simulate new datasets with the R code simulate(MLM3), from which a parametric bootstrap confidence interval for te was constructed. Subsequently, the MLM3 object was modified by changing the entry for ̂t e so as to simulate from models without and with varying strengths of trait-environment interaction. Additional simulations were carried out to investigate the robustness of the methods to large site-specific trait effects, both in terms of type I error rate (for te = 0), bias in ̂t e and its standard error.
For this, the entry for ̂b was increased to 1.
Appendix S7 contains similar analyses and simulations based on another real dataset using log-linear models for count data and Appendix S8 contains simulations from models with residual correlation between species, so as to see how sensitive the methods are to such correlation.
Sometimes, species and sites had no occurrence in the simulation dataset and such species and sites were removed before fitting the models. For maximum precision of method comparison, all methods were applied to the same datasets with permutation tests using identical permutations.
Appendix S1 is a tutorial in R and Appendix S10 contains the R-functions and scripts used in this paper. The code and a package implementing the methods, called TraitEnvMLMWA, is available at https://doi.org/10.6084/m9.figshare.8152655.

| Dataset
MLM3 gave a better fit than MLM2 ( 2 2 = 28, p << 0.0001); adding squared main effect terms did not. The trait-related standard deviation (̂b = 0.34) is of the same order as the environment-related one    Figure 6). In simulations with log-linear models, MLM2 was unbiased for te but downward biased for its standard error (Appendixes S7 and S8). In simulations of logit models with residual correlation, MLM2 and MLM3 showed 30 and 20% downward bias in ̂t e , respectively, which, for MLM3 only, disappeared without residual correlation; no such bias was found in log-linear models (Appendix S8). In MLM3, the standard error of ̂t e increases sharply with the size of the variance components 2 b and 2 c but only slightly with the size the structured noise; MLM2 fails to pick up the variance in 2 b , as it does not include this random component ( Figure A2 in Appendix A8.3).

| Simulations
In simulations with and without residual correlation (Appendix S8), MLM3 controlled the type I error rate, whereas MLM2 did not (showing type I error rates up to 0.52). However, for small number of species or sites (say, less than 20) the MLM3-based Wald test performed less well (Table A5 in Appendix S8). In a small simulation study, the parametric bootstrap test had 15% more power than MLM3-based max test, at the cost of being based on more assumptions (Appendix S5.2).
Whereas the model-based methods showed type I error inflation for small number of species or sites, the WA-based methods showed some inflation for large numbers, when both b and c are large. Of these methods, N 2 -weighted lm is the least affected (Appendix S8).
If either b or c is zero, the WA-based max tests do control the type I error (Appendices S5.1 and S8.3).
TA B L E 2 A P-values for tests on trait (C:N ratio)-environment (topographic moisture gradient: TMG) association in the Revisit data for five statistical methods and four ways of obtaining

| D ISCUSS I ON
This paper introduces two new methods for statistical analysis of trait-environment association, N 2 -weighted lm and MLM3.
The permutation-based max tests using these methods and the MLM3-based bootstrap test provide strong statistical evidence for a C:N-TMG association in the Revisit data (  The max test attempts to account for random effects by resampling sites and species, even when the test statistic, such as the fourth-corner correlation, does not account for them. Braga, ter Braak, Thuiller, and Dray (2018) proposed a max test using the fourth-corner correlation with randomizations of the trait and environmental variable that conserve phylogenetic and spatial correlations. The model-based approaches of Warton, Blanchet, et al. (2015) and Ovaskainen et al. (2017) can also account for spatial and phylogenic correlations. These approaches also model residual correlation which may make them robust to the issues found in MLM2.
MLM3 and the WA-based methods give very similar results for the Revisit data. This does not need necessarily hold true for other data, for example data where species have differential niche widths (Jamil, Kruk, & ter Braak, 2014 F I G U R E 6 Bias (boxplot) and standard error (red violin plot) of ̂t e in MLM2 (left) and MLM3 (right) in simulated data as in Figure 1 with b = 1, that is data simulated from the MLM3 model of the Revisit data with b set to 1. The standard deviation of ̂t e across simulations is shown as a red diamond

MLM3
With highly variable data, an easy approach, warned against by Warton, Lyons, Stoklosa, and Ives (2016), is to transform the data by taking square-roots or logarithms. Such transformations also implicitly change the model, and hence the (link-)scale on which the interaction is tested. The mass ratio hypothesis, the importance of which is emphasized by Funk et al. (2017), also argues against transformation of abundance in CWM and SNC calculations. An issue is that these methods lack a formal way of choosing the transformation, enabling potential cherry-picking of the transformation showing the most significant effect (in the Revisit data this is the quarter-root transformation). MLM3 and max-test-based WA-methods are superior to MLM2.
One may argue that MLM2 gives higher power than the WA-based methods, but this power comes at a cost of (a) lack of protection against unobserved environmental variation that interacts with the trait, (b) sensitivity to residual correlation, at least in logistic models, and (c) much higher computational costs. MLM3 is superior to the WA-based methods; (a) it gives better type I error control, at least for reasonable numbers of species and sites, and (b) had up to 30% higher power in the simulations. These advantages are offset by greater computational complexity. If a practitioner applies both WA-based methods and MLM3 to the same data with sufficient numbers of species and sites (both >20), the results of MLM3 should be leading.

ACK N OWLED G EM ENTS
I thank Henry Ehlers, John Birks, Kevin Mildau, Stéphane Dray and Pedro Peres-Neto for comments on this or an earlier version.
Comments by the Editor, and by Tony Ives, Bill Shipley and the third reviewer led to clarifications and improvements. The author is the senior author of the software Canoco.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data are available from the Dryad Digital Repository https ://doi. org/10.5061/dryad.7gj0s3b (Miller, Damschen, & Ives, 2018