Effect sizes and standardization in neighbourhood models of forest stands: potential biases and misinterpretations
Summary
- Effects of conspecific neighbours on survival and growth of trees have been found to be related to species abundance. Both positive and negative relationships may explain observed abundance patterns. Surprisingly, it is rarely tested whether such relationships could be biased or even spurious due to transforming neighbourhood variables or influences of spatial aggregation, distance decay of neighbour effects and standardization of effect sizes.
- To investigate potential biases, communities of 20 identical species were simulated with log-series abundances but without species-specific interactions. No relationship of conspecific neighbour effects on survival or growth with species abundance was expected. Survival and growth of individuals was simulated in random and aggregated spatial patterns using no, linear, or squared distance decay of neighbour effects.
- Regression coefficients of statistical neighbourhood models were unbiased and unrelated to species abundance. However, variation in the number of conspecific neighbours was positively or negatively related to species abundance depending on transformations of neighbourhood variables, spatial pattern and distance decay. Consequently, effect sizes and standardized regression coefficients, often used in model fitting across large numbers of species, were also positively or negatively related to species abundance depending on transformation of neighbourhood variables, spatial pattern and distance decay.
- Tests using randomized tree positions and identities provide the best benchmarks by which to critically evaluate relationships of effect sizes or standardized regression coefficients with tree species abundance. This will better guard against potential misinterpretations.
Introduction
Whether or not conspecific negative density dependence (CNDD) at small neighbourhood scales shapes species abundances in tropical tree communities at larger scales is far from resolved, and we probably should not even expect the answer to be simple. In principle, there are several possibilities. First, the strength of CNDD is unrelated to abundance. Secondly, the strength of CNDD is negatively related to abundance (strong CNDD for abundant but weak CNDD for rare species). This would prevent abundant species becoming even more abundant and thereby competitively excluding other species. Moreover, it would confer a rare-species advantage and possibly lead to a community compensatory trend (Connell, Tracey & Webb 1984). Thirdly, the strength of CNDD is positively related to abundance (strong CNDD for rare but weak for abundant species). This would explain the rarity and low abundance of the species with strong CNDD and the high abundances of species with weak CNDD (Comita et al. 2010). There remain though two further possibilities which are that either a mix of positive and negative processes is operating, or the observed relationships are simply spurious (i.e. the result of a statistical artefact).
In an empirical study, Newbery & Stoll (2013) showed negative effects of conspecific neighbours on absolute growth rate of medium-sized trees. The argument was that reduced growth of an individual tree will – other factors being equal – translate into survivorship and fecundity reductions and hence affect species abundances. Nevertheless, direct effects of conspecifics on survival could be more relevant for population dynamics of different species within communities. Therefore, the tests reported here simulate both individual survival and growth rate and use a framework of neighbourhood analysis similar to that of Newbery & Stoll (2013) to show that all possible relationships of the strength of CNDD and abundance may emerge without any species-specific or effects of abundances. Moreover, we show that potential biases do not depend on the nature of the dependent variable.
Relationships between the strength of CNDD and abundance were investigated using a simple, spatially explicit and individual-based, model which simulated identical species without any species-specific interactions. Thus, any relationships between the strength of CNDD and abundance in communities simulated under these assumptions would not be expected. Nevertheless, relationships do emerge because of interfering effects of spatial patterns and distance decay (i.e. the functional form relating neighbour effects to distance from focal trees, Fig. 1) and, perhaps more importantly, due to transforming (e.g. log-transformation) and/or scaling (e.g. standardization or z-transformation) of the input variables. For example, if rare species have lower variability in the number of conspecifics in their local neighbourhoods compared to common species, scaling is expected to decrease effect sizes (or standardized partial correlation coefficients) of rare relative to common species, possibly leading to spurious negative relationships between the strength of CNDD and abundances. Scaling is recommended (e.g. Schielzeth 2010) and applied especially in hierarchical Bayesian modelling to speed up or even ensure numerical convergence (e.g. Gelman & Hill 2007).
Motivation to investigate the relationships between the strength of CNDD and abundance more carefully using simulations came from the opposite outcomes of two recent publications. A consistent negative relationship between the strength of CNDD (i.e. effect sizes derived from statistical neighbourhood models) and abundance (total basal area of species) in randomization tests was shown by Newbery & Stoll (2013). By contrast, a strong positive relationship between the strength of CNDD and abundance was found by Comita et al. (2010). Whilst such different results are interesting, and might be explained by different underlying biological mechanisms operating on different species at different locations, before making such a conclusion possible differences arising from artefacts and biases of the statistical methods should first be ruled out.
Materials and methods
A completely neutral forest without any species-specific effects was simulated. Initial size distributions of individuals (basal area, ba) were log-normal with mean 2 and standard deviation 1, and simulations were initialized with no spatial dependency in individual size. Individuals of 20 identical species with log-series abundances (i.e. 2827, 1408, 935, 699, 557, 462, 395, 344, 305, 273, 248, 226, 208, 192, 179, 167, 157, 147, 139, 132) were placed on plots (200 × 400 m) either randomly or with aggregated spatial patterns. The aggregated pattern was realized by dispersing individuals around ‘parent trees’ (assigned random locations according to a homogeneous Poisson process), using a Gaussian dispersal kernel with mean 0 and standard deviation 3 m. Thus, the species distributions were modelled as a Thomas cluster process, which in turn is a special case of a Neyman-Scott cluster process (Neyman & Scott 1952), and this method means species are spatially independent of one another.
Neighbourhood models (as in Stoll & Newbery 2005) were then fitted to the simulated data over all possible combinations of radii for HET and CON neighbours using r (R Development Core Team 2012) and parameter estimates taken from those models yielding the highest adjusted R2-values. Five runs with different seeds were performed and estimates of regression coefficients from best-fitting neighbourhood models, effect sizes (Cohen 1988; Nakagawa & Cuthill 2007) or standardized regression coefficients (e.g. Warner 2012) averaged across the five runs. Effect sizes (i.e. squared partial correlation coefficients, t2/[t2 + residual degrees of freedom], t = t-value) and standardized regression coefficients (b = β's obtained from regressions with standardized variables by subtracting their mean and dividing by their standard deviation) were then correlated with species abundances (i.e. plot level basal area, BA, log-transformed). Standardized regression coefficients can also be calculated from unstandardized β's as b = β * SDX, if the dependent variable itself is not standardized (e.g. survival). In the case of continuous dependent variables (e.g. agr), however, the dependent variable itself is often also standardized as well. In these cases, variability in the dependent variable is also involved in standardizing regression coefficients and b = β * SDX/SDY. In both cases, a positive correlation of b with abundance implies that less abundant, rare species have stronger CON effects – β is more negative – (as in Comita et al. 2010), whereas a negative relationship implies more abundant species have stronger CON effects (as in Newbery & Stoll 2013). Note, however, that possible correlations of b with abundance may be biased due to correlations of SDX (or additionally SDY) with abundance. But if β's are negative, large SDX lead to more negative b-values and the relationship with abundance may switch direction not because of a difference in the strength of conspecific interactions between rare and common species, but because of differences in the variability of number and abundance of conspecific neighbours. Moreover, because the simulations and analyses for both survival and agr as dependent variables are based on a multiple regression approach, the basic consequences described above (i.e. possible biases in standardized regression coefficients because of differences in SDX between rare and common species) are essentially the same independent of the nature of the dependent variable.
Results
There were no significant regressions for conspecific density-dependent effects (regression coefficient β3 in eqn 1) on survival and species abundance (Fig. 2), regardless of whether an untransformed or log-transformed number of conspecific neighbours were used to quantify the neighbourhood. However, variability in number of conspecific neighbours was positively correlated with abundance if untransformed (eqn 1) but negatively related if log-transformed (eqn 2). Consequently, standardized regression coefficients were negatively correlated with abundance if number of conspecific neighbours was quantified on the untransformed but positively correlated with abundance if number of conspecific neighbours were log-transformed. Frequency distributions for the rarest (n = 132) and most common (n = 2827) simulated species on untransformed and log-transformed scales (Fig. 3) demonstrate that log-transforming the number of conspecific neighbours for rare species (small values) expands variability but compresses the variability in number of conspecific neighbours for common species (large values). This variability in number of conspecific neighbours (SDX) increases from rare to common species on untransformed scales but decreases from rare to common species on transformed scales.
There were no significant regressions for conspecific density-dependent effects on growth (regression coefficient β3 in eqn 3) and species abundance (plot level basal area) regardless of distance decay or spatial pattern (Fig. 4). Variation in parameter estimates was largest for squared distance decay and random spatial pattern. Best-fitting radii for bigger conspecific neighbours were unbiased in neighbourhood models without distance decay and random spatial pattern (Table 1). However, in the aggregated pattern and with linear distance decay, they were slightly underestimated. With estimates (mean ± SD) of 15·9 ± 2·6 in the random spatial pattern and 14·5 ± 3·2, the underestimation was more pronounced with squared distance decay.
Distance decay | Spatial pattern | |
---|---|---|
Random | Aggregated | |
No | 20·0 ± 0·0 | 19·8 ± 0·4 |
Linear | 19·6 ± 0·5 | 19·1 ± 1·0 |
Squared | 15·9 ± 2·6 | 14·5 ± 3·2 |
Variability in local conspecific neighbour density (within 20 m) varied depending on distance decay and spatial pattern (Fig. 5). A strong negative regression with abundance emerged without distance decay in both spatial patterns. With linear distance decay, the regression was not significant with random spatial pattern but still negative in the aggregated pattern. With squared distance decay, the regression switched to positive in the random pattern, but it was not significant in the aggregated pattern.
As a consequence of variation in local conspecific neighbour density, effect sizes (Fig. 6) and standardized regression coefficients (b3, Fig. 7) showed various relations with abundance depending on distance decay and spatial pattern. Without distance decay, both effect sizes and standardized regression coefficients were positively related with abundance, regardless of spatial pattern. This was also the case for effect sizes and linear distance decay, whereas standardized regression coefficients were not significantly related with abundance in random spatial pattern but still positively related with abundance in the aggregated pattern. For squared distance decay, both effect sizes and standardized regression coefficients were negatively related with abundance in random spatial patterns but unrelated in aggregated patterns. Apparently, the squared distance decay cancelled the effect of aggregation.
Discussion
The simulations and neighbourhood analyses with individual survival or growth as dependent variable showed that estimates of regression coefficients (β) were unrelated to species abundances independent of transformations, spatial pattern and distance decay – as expected based on the simulations of identical species without species-specific interactions. However, variability in local density of conspecifics (SDX) showed various relationships with species abundances depending on transformations of neighbourhood variables, degree of spatial pattern and form of distance decay. As a consequence (i.e. b = β * SDX), relationships between effect sizes, or standardized regression coefficients (b), and species abundances were either non-significant, positive or negative.
If untransformed scales are used to quantify conspecific neighbourhoods, relationships with variability and abundance are expected to be generally positive (Fig. B2 in Appendix S2) at least in the cases and range of abundances investigated here. In these cases, relationships between standardized effect sizes with abundance will be negative. If, however, log-transformed scales are used to quantify conspecific neighbourhoods, relationships with variability and abundance (Fig. B2 in Appendix S2) can be modified in all possible ways, that is be absent, positive or negative, depending on spatial pattern, exact form of distance decay, but also on whether or not relative size differences are taken into account. There are many and sometimes rather non-transparent possibilities making it very difficult to systematically evaluate the published literature on neighbourhood models and possible relationships between the strength of CNDD and species abundance, especially where details of how variables were handled are incompletely reported, and data have not been archived to allow independent checks.
By using neighbourhood models without distance decay and unstandardized input variables, in single-species analyses, a negative relationship between CNDD and forest-level abundance was found, at least for the first of the two 10-year periods analysed (Newbery & Stoll 2013). Using no distance decay, yet standardizing before fitting their models, Lin et al. (2012) found positive relationships over their dry-season interval. Using an exponential distance decay, Comita et al. (2010) centred (subtracted the mean) but did not standardize (divide by standard deviation) their input variables (L. Comita, personal communication) and found a strong positive relationship too. Whereas Lin et al. (2012) fitted mixed models using maximum likelihood estimation, that is without any prior information being involved, Comita et al. (2010) used a hierarchical Bayesian analysis with non-informative priors distributed according to the scaled inverse-Wishart function. This conjugate distribution models the covariance matrix of the species-level regression. Nevertheless, both studies did find positive relationships, thereby apparently supporting one another's conclusions.
The specific scale and distribution of the priors used by Comita et al. (2010) might have introduced additional critical information that determined in part the estimation of their coefficients, in a similar way as standardization did in our simulations, and may also have done for Lin et al. (2012). Gelman & Hill (2007) discuss the use of the inverse Wishart distribution in some detail and highlight in particular the need to confirm that Bayesian priors are indeed non-informative across the same ranges of independent variables that result in the posterior probabilities. Dennis (1996) has discussed fundamental issues concerning the use of non-informative priors and Bayesian analysis for ecology in general.
The results of Newbery & Stoll (2013) dealt with effects of conspecific neighbours (as large tree abundance) on growth of small trees, whereas those of Comita et al. (2010) concerned conspecific neighbour effects (as either local tree seedling density or tree abundance) on survival of those seedlings. Their result could be more generally important if confirmed to be fully robust to statistical treatment. It might then support the notion that fundamentally different density-dependent processes are likely operating at the seedling as opposed to the small-tree stage in tropical forest dynamics (Uriarte et al. 2004a,b; Newbery & Stoll 2013).
Because of conceptual similarities of neighbourhood analyses of Comita et al. (2010) and those of others (e.g. Uriarte et al. 2004a,b; Lin et al. 2012), the analysis presented here could be more widely relevant. Since standardization can lead to spurious relationships between CNDD and species abundances, its potential influence needs to be carefully considered when interpreting relationships of small-scale effects of conspecific neighbours on larger scale abundance patterns within diverse tree communities. Similarly, care should be taken when specifying and justifying prior information in hierarchical Bayesian analyses. Our recommendation, following from Newbery & Stoll (2013), is that tests that randomize tree positions and identities indeed provide the best benchmark by which to critically evaluate and judge relationships between effect sizes, or standardized regression coefficients, and tree species abundances.
Acknowledgement
We thank the Editors and M. Uriarte for comments on earlier drafts of this forum article.
Data accessibility
This manuscript does not use any data.