The central role of mean‐variance relationships in the analysis of multivariate abundance data: a response to Roberts (2017)

The mean‐variance relationship is a central property of multivariate abundances – it has been shown that when not accounted for, potentially serious artifacts can be introduced to analyses. One such effect is the confounding of location and dispersion. Roberts (in press) recently argued that mean‐variance relationships are not important in understanding properties of distance‐based analyses, and that row standardisation fixes the problem of apparent location‐dispersion confounding. We use simulation to disprove both statements. In situations where there is a shift in total abundance from one location to the next, the effects of location‐dispersion confounding can be considerable. But we also show that even if there is no systematic difference in total abundance between two communities, and no change in dispersion, distance‐based analysis may falsely claim that there is a change. We agree that multivariate abundance data are hierarchical, and that it is helpful to study effects at both the species‐level and the community level. However, disentangling species‐level from community‐level effects is not possible without a hierarchical method of analysis. Recently proposed model‐based approaches to ordination offer a way forward to resolve the issues discussed here.


Introduction
In distance-based multivariate analyses, scaling decisions are made when combining data from multiple taxa into a single distance measuredoes one treat all taxa equally, divide by total abundance first, etc. Warton, Wright & Wang (2012) argued that these scaling decisions can be understood as making implicit assumptions concerning the mean-variance relationship in the data. The mean-variance relationship is a key property in multivariate data because the variance of abundance typically varies over several orders of magnitude, often over a million-fold, from one taxon or location to another (Warton, Wright & Wang 2012). Failing to account for this appropriately can introduce serious artifacts into analysis, one of which is the confounding of location and dispersion effects in ordination. Specifically, in situations where there is no difference in dispersion between two communities, differences in means can lead to apparent differences in dispersion in ordinations, because changes in mean lead to changes in variance. This point was illustrated in Warton, Wright & Wang (2012) using a coral dataset which had almost a 12-fold change in total abundance (in response to a bleaching event), which led to ordinations that pointed to a significant change in dispersion, but no change in location. Simulations verified that this situation can arise when two communities are identical in relative composition, but which differ in total abundance. Roberts (in press) recently wrote a response to Warton, Wright & Wang (2012), making a number of claims, and the purpose of this paper is to respond to the core claims. Most critically, Roberts (in press) argued that mean-variance relationships 'play no role' in understanding the properties of distance-based methods of multivariate analysis, and that row standardisation 'fixes the paradox' of changes in magnitude of within-sample distances under shifts in total abundance. In this paper, we use simulation to disprove both claims. We agree with the claim that multivariate analysis needs to be thought about in a hierarchical fashion, and outline a way to action this. In addition, we identify arguments that Roberts (in press) made using semantics rather than statistics, find claims not supported by his own simulations, and consider the implications of his simulation design, which are not actually contradictory to Warton, Wright & Wang (2012 Roberts (2017) made two main assertions in relation to the ordination simulations of Warton, Wright & Wang (2012), which he referred to as WWW data: that the apparent change in dispersion in WWW data was 'independent' of mean-variance relationships; and that the location-confounding issue could be addressed by row standardisation, to account for differences in total abundance. We have been able to disprove both of these assertions by simply changing the mean-variance relationship in the simulation that Roberts (in press) used to support his argument.
We simulated data from the Poisson distribution rather than the negative binomial, and reran analyses using the scripts previously used (Roberts in press). Everything else about analyses was kept exactly as previously, including the random seed (Details of the simulation design are described in Appendix S2 of Warton, Wright & Wang 2012; and code is available as Appendix S3 of this paper). If the mean-variance relationship were not important in multivariate analysis, changing the mean-variance relationship would not affect analyses. If row standardisation were to fix the issue, it would do so here as well as in Roberts (in press).
However, results clearly showed that there were substantial differences in dispersion between the two samples for all distance measures, and that substantial differences remain after row standardisation (Fig. 1). Looking at results via non-metric multi-dimensional scaling ordinations (MDS) (Kruskal & Wish 1978) revealed the same pointin all cases, a substantial dispersion effect was evident, both before and after row standardisation (Fig. 2).
The reason row standardisation proved ineffective can be understood via the mean-variance relationship (Fig. 3). Standard rules for variances tell us that if you divide a random variable by a constant, its variance is divided by the square of the constant, whereas the mean is divided by the constant and not by its square. So the effect of row standardisation, roughly speaking, is to shift the two samples of points towards each other until the means align, but shifting them along a line of slope two (such as y = x 2 , the red dashed line on Fig. 3) given the different ways that means and variances are affected by rescaling. For negative binomial data (with sufficiently large means) the mean-variance relationship is not far from a line of slope two, and so row-standardisation does a good job of aligning the variances across samples (Fig. 3, top right). For Poisson data, the mean approximately equals the variance rather than its square (Fig. 3, bottom left), so shifting the points to align their means, as row standardisation (approximately) does, will misalign their variances (Fig. 3, bottom right).
There is an extensive literature on the types of meanvariance relationships seen in abundance data, since the seminal paper introducing Taylor's power law (Taylor 1961). Abundance counts are typically overdispersed compared to the Poisson (Warton 2005), but a slope less steep than two is not uncommon (Taylor 1961 ; table 1). Furthermore, one should expect that when measuring abundance in different ways (e.g. cover estimates, as in the coral dataset reanalysed in Roberts in press) the abundances will have a different distribution hence a different mean-variance relationship. So in ecological practice, the generality of the claim that row standardisation addresses location-confounding is questionable.  Fig. 1. Boxplots of within-and between-community distances and dissimilarities, as in figure 1 of Roberts (in press), but using data simulated from a Poisson distribution. Left column has simulated data for the WWW dataset, right column has row-standardised data, and four different distances are used on the four rows: (a,b) Euclidean, (c,d) Manhattan, (e,f) Canberra, (g,h) Bray-Curtis. Note in all cases that a strong dispersion effect is evident, with distances tending to be of quite different magnitudes for samples A and B, despite communities differing in total abundance only and not in composition.

Formally testing for dispersion
Roberts (in press) declared a difference in dispersion between two samples if the median within-sample distance for one sample did not fall between the quartiles of the other. This is a very conservative check, and is somewhat subjectively motivated. This approach led Roberts (in press) to claim, for his simulated WWW dataset, that row standardisation removed dispersion effects for all distance measures. He also claimed that when using the Canberra and Bray-Curtis distances, no difference in dispersion was evident for the simulated WWW dataset. Only a single simulated dataset was considered by Roberts (in press), so as well as the question of whether his test was sufficiently powerful, we should also question the extent to which the patterns he was seeing were due to systematic vs. random variation. A common solution to this problem is to replicate, such that one can better detect the effects above and beyond that which can be explained by random sampling variation.
We repeated the simulation of Roberts (in press), but generated 1,000 datasets using his simulation models, and tested for dispersion using the distance-based test proposed by Anderson (2006). We recorded the number of times the test falsely declared significance at the 5% level, sometimes referred to as a Type I error simulation. Results (Fig. 4a, WWW) show that Roberts (in press) was incorrect to claim that row-standardisation removes dispersion effects in the WWW dataset, and to claim that the Canberra and Bray-Curtis distances suggest no dispersion. For some datasets simulated using the WWW code, no dispersion was apparent, but for others there was, depending on random sampling variation. For the original simulation considered by Roberts (in press), under all distance measures, irrespective of row standardisation, false positives were observed at least 22% of the time. This is more than four times as often as they were supposed to be observed, when using a 5% significance level. Row standardisation had an effect in reducing false positive rates for the Euclidean and Manhattan distances, but for Canberra and Bray-Curtis distances, the false positive rate actually increased under row standardisation.
Upon changing the mean-variance relationship, by simulating under the Poisson distribution rather than the negative binomial (Fig. 4b, WWW), all distance and standardisation choices gave the wrong answer 100% of the time! Choice of example datasets Roberts (in press) argued that the coral dataset used to motivate the WWW simulation is unrealistic. We agree that it is an extreme example, with a 12-fold reduction in total abundance in response to an El Niño bleaching event. However it is an important example and a situation that is likely to be seen elsewhere, which ecologists should be aware of. The example is well-known: it was used as an example in the PRIMER manual (Clarke & Gorley 2006); it was used to argue that the multivariate technique ANOSIM is capable of detecting changes in dispersion (Clarke 1993), and to argue that disturbance increased dispersion in communities (Warwick & Clarke 1993; along with three other datasets with similar patterns); it has since appeared as an example in a number of methodological articles (Clarke 1993;Warton 2008;Anderson et al. 2011;Warton, Wright & Wang 2012;Hui et al. 2015;Chao & Chiu 2016). The example describes a situation likely to be seen elsewhere, because there are many situations where abundance can be expected to vary considerably across sampling times. For example, it is common to see a large number of plants attempt to establish following a disturbance, and to thin over  figure 5 of Roberts (in press), but using data simulated from a Poisson distribution. Left column has simulated data for the WWW dataset, right column has row-standardised data, and four different distances are used on the four rows: Euclidean, Manhattan, Canberra, and Bray-Curtis. Note that in all cases, a substantial change in dispersion is evident, despite communities differing in total abundance only and not in composition.
time, with abundance per unit area reducing by orders of magnitude (Westoby 1984). Arthropod abundance can vary considerably seasonally -Richards & Windsor (2007) found 10fold variation across one season. It can also vary considerably on short timescales, Morrisey et al. (1992) sometimes observed two-fold variation in invertebrate abundance from 1 day to the next. Thus the possibility of artifacts in multivariate analysis, purely from changes in total abundance, might be important to plant ecologists collecting multivariate data across sites with different times since disturbance, and to ecologists studying invertebrate communities using data collected at different times during the year, possibly even on different days. Ironically, the simulation design Roberts (in press, 'Data sets' section) used in place of WWW ('DWR' simulation) was quite unrealisticwhile it assumed a very large change in community composition (with complete turnover of half the taxa), it was engineered such that the total variance (i.e. the variance computed across all sites) was exactly identical across the two communities, for each of the 30 taxa. There was also unusually little variation in means across taxa (sixfold variation), but this point is less important. Largely because there was no change in variance, none of the distance measures used for analysis detected a dispersion effect, with or without standardisation. We can draw two points from this example simulation: the distance-based test for dispersion (Anderson 2006) seems to be a valid test for equality of (all) total variances, for the range of distance measures and standardisations considered here (Fig. 4); misspecifying the mean-variance relationship does not always lead to confounding of location and dispersion effects, i.e. there are at least some settings where this will not happen. It is worth noting that neither of these points refutes any of the findings in Warton, Wright & Wang (2012), but instead they are complementary results.
We made a minor change to the DWR simulation to study the performance of the distance measures considered when there was a substantial change in turnover but no systematic change in total mean or variance, across communities (the 'WH method'). Specifically, we let the means be random rather than fixed, within the range specified by Roberts (in press). Roberts (in press) set each group of 10 taxa to have a mean whose square root was evenly distributed along the interval between 1 and 40, at fixed points along the interval, and at the Mean (log scale) Fig. 3. Mean-variance plots (on the log-log scale) for WWW data (left column) and standardised WWW data (as a percentage, right column) for counts generated from: (a) a negative binomial distribution; (b) a Poisson distribution. A line with slope two on the log-log plot (y = x 2 ) is indicated with a red dashed line, because row standardisation can be understood (roughly speaking) as moving samples of points along a line with slope two until their means align. Note that negative binomial data has a mean-variance relationship whose slope approaches two (a, left), so row standardisation approximately aligns variances across samples (a, right), which is not the case for Poisson data (b).
same points across communities. Instead we generated means uniformly at random between 1 and √40, separately for each set of 10 taxa hence separately across communities. Overall, across repeated simulations, one would not expect an overall difference in total mean or in total variance across communities. But in any individual simulated dataset, there will be a difference, although probably a small one. This scenario aligns with the situation where an ecologist would not expect differences in dispersionthe variance of abundance was generated in exactly the same way across communities. This was achieved without artificially constraining variances to be equal for each taxon.
Under this simulation scenario, when data were simulated under the Poisson distribution, the false positive rate was again too high for all distances, irrespective of row standardisation (Fig. 4b, 'WH'). Tests using these distances incorrectly claimed there was a change in dispersion across communities 13-38% of the time, rather than maintaining the desired 5% false positive rate. As previously, these tests fared much better when data were overdispersed, with Type I error inflation only for the Euclidean and Manhattan distances without row standardisation (Fig. 4a, 'WH').
These patterns are far less strong than for the WWW dataset, but they are still of concern, leading us to conclude that these distances and standardisations cannot be relied upon to test for dispersion even in a setting where there is no systematic change in total abundance from one community to the next. We can conclude, contrary to Roberts (in press), that one does not need a shift in mean and variance across all taxa in order to confound location and dispersion effects. Even with no systematic shift in total abundance, the same effect can still arise, albeit not as strongly.

Distances vs. dissimilarities
A theme receiving some emphasis in Roberts (in press) was whether a given measure was a distance or a dissimilarityin mathematics, this distinction would be more commonly understood as concerning whether or not a distance is metric (see e.g. Lefschetz 2015). Roberts (in press) made no coherent statistical argument as to why one should prefer a metric or a non-metric distance in ecology, or why failure to satisfy the triangle inequality (or otherwise) might be desirable in ecology. The literature suggests results and their interpretation can be quite   (Legendre & Gallagher 2001) or not. One point of difference is that metric distances can have computational advantages (Legendre & De C aceres 2013). In focussing on the distance-dissimiliarity dichotomy, Roberts (in press) may have been making a point largely about semantics, when instead what needs to be the focus is the statistical properties implied by using one distance measure vs. another, and how well those properties align with those of the observed data. One point that is clear from our results (Figs 1-3, see also Warton, Wright & Wang 2012) is that there is potential for a misalignment between the sorts of scaling decisions being made by common distance measures, and the properties of multivariate abundance datasets, which can introduce undesired artifacts into analyses. This perhaps arises in part due to a lack of understanding of what a given distance measure implicitly assumes in its scaling decisions, and whether this has dealt appropriately with the properties of the data at hand. There are frequent calls for more study of the properties of different distance measures (e.g. Legendre & De C aceres 2013;Blanchet et al. 2014), and we argue that the focus should be on their statistical properties. If distance-based measures are to have any sort of future in the analysis of multivariate abundances, this knowledge gap desperately needs to be addressed.
Hierarchical nature of multivariate abundances Roberts (in press) argued that multivariate abundances are hierarchical, and that there are species-level and communitylevel effects to consider. We agree. Species-level effects inevitably interact with community-level effects (as in Roberts in press; extinction reduces alpha diversity). But to tease the two apart, one would require a statistical model that accounts for both species and community-level effects, without confounding one with the other. Such a model is commonly referred to as a hierarchical model (Cressie et al. 2009), and such an approach is increasingly used in multivariate analysis in ecology (Dunstan, Foster & Darnell 2011;Pollock et al. 2014;Hui et al. 2015;Warton et al. 2015;Ovaskainen et al. 2017). Thinking in a hierarchical framework, the issues with distancebased analysis methods discussed in Warton, Wright & Wang (2012) can be understood as confounding species and community effectsmaking incorrect scaling decisions is akin to misspecifying the species-level model, which has been shown to lead to incorrect inferences about dispersion effects in the community.
A better approach would be to use an ordination method capable of explicitly accounting for mean-variance relationships in the data. Such a method has been developed for ecology in recent years, and is commonly referred to as using a latent variable model or a factor analytical approach (Walker & Jackson 2011;Hui et al. 2015). Software is available to fit such models (Walker 2015;Hui 2016;Ovaskainen et al. 2017) and a review of these types of models recently appeared . Simulations have shown that this method can be much better at recovering underlying gradients than MDS, even when assumptions of the model are not satisfied . The basic idea of the model is that we assume species (or other taxonomic classification) respond to some underlying gradients z, the ordination axes, and that the observed data y would follow a GLM if the values of z were known. For example, if we had overdispersed counts, we could assume a negative binomial model: where typically we assume the overdispersion parameter / ij takes a common value across all observations of a response, / ij = / j , and the variance of the latent variables r 2 i is similarly assumed constant. The a i are optional, but these have the effect of row standardisation, if one wished to study relative abundance while controlling for differences in sampling intensity Warton et al. 2015). Model-based ordination of the coral dataset was discussed in Hui et al. (2015), and it was shown that inclusion of row effects in the model explained most of the differences in abundance across sampling times, i.e. the bleaching event primarily affected total abundance, and had little effect on composition.
In the above model, note that distributional assumptions were made in two placeswe assumed the observations y ij were negative binomial in eqn 1, hence made a species-level assumption, and we assumed the unobserved latent variables z ij were independently normally distributed in eqn 2, a community-level assumption. The species-level assumption, and its implications for the mean-variance relationship, are very important to performance of the method, however the normality assumption is less important .
One point that has been missing from the discussion so far is an appropriate definition of the notion of multivariate dispersion. Multivariate dispersion has often been discussed in ecology (e.g. Legendre, Borcard & Peres-Neto 2005;Anderson et al. 2011), and algorithms have been proposed for its estimation, but without a clear definition of the target quantity of interest, it is not obvious how one should approach estimating it. Dispersion cannot be defined via distance measures aloneit is illogical to define an estimator and then try to work out what it is estimating afterwards. Moreover, the quantity being estimated would change whenever the distance measure or its implementation was altered. Instead, what is needed first is a definition of multivariate dispersion. Warton, Wright & Wang (2012) in effect equated changes in dispersion with changes in the mean-variance relationship. Roberts (in press) argued that this is a species-level definition, and that changes in species means can lead to changes in dispersion at the communitylevel. This is a good point, and a hierarchical model offers a way forward in understanding this issue, as there are two places where variability enters into the model: via a change in mean-variance relationship through changes in dispersion (/ ij in eqn 1); or via a change in variability of the latent variables hence in the species means (r 2 i in eqn 2). The latent variability r 2 i may be of particular interest, as it refers to how much variation there is in points mapped onto an ordination. One helpful avenue of future research would be to test for dispersion via estimating this target quantity and looking for changes in its value across samples (for a given set of ordination axes/factor loadings). For example, in the coral dataset, we could set the latent variance to a constant value for each of 1981 and 1983 samples, and compare or test for differences in the estimates of r 1981 and r 1983 . We leave this as an avenue for future research.
Authors' contributions D.W. designed and carried out the simulations and led the writing, F.H. provided ideas and input, and contributed critically to drafts.