Bringing multivariate support to multiscale codependence analysis: Assessing the drivers of community structure across spatial scales
Abstract
- Multiscale codependence analysis (MCA) quantifies the joint spatial distribution of a pair of variables in order to provide a spatially explicit assessment of their relationships to one another. For the sake of simplicity, the original definition of MCA only considered a single response variable (e.g. a single species). However, that definition would limit the application of MCA when many response variables are studied jointly, for example when one wants to study the effect of the environment on the spatial organisation of a multi-species community in an explicit manner.
- In the present paper, we generalise MCA to multiple response variables. We conducted a simulation study to assess the statistical properties (i.e. type I error rate and statistical power) of multivariate MCA (mMCA) and found that it had honest type I error rate and sufficient statistical power for practical purposes, even with modest sample sizes. We also exemplified mMCA by applying it to two ecological datasets.
- The simulation study confirmed the adequacy of mMCA from a statistical standpoint: it has honest type I error rates and sufficient power to be useful in practice. Using mMCA, we were able to detect variation in fish community structure along the Doubs River (in France), which was associated with large spatial structures in the variation of physical and chemical variables related to water quality. Also, mMCA usefully described the spatial variation of an Oribatid mite community structure associated with a gradient of water content superimposed on various smaller-scale spatial features associated with vegetation cover in the peat blanket surrounding Lac Geai (in Québec, Canada).
- In addition to demonstrating the soundness of mMCA in theory and practice, we further discuss the strengths and assumptions of mMCA and describe other potential scenarios where it would be helpful to biologists interested in assessing influence of environmental conditions on community structure in a spatially explicit way.
1 INTRODUCTION
Multiscale codependence analysis (MCA; Guénard, Legendre, Boisclair, & Bilodeau, 2010) is a statistical method to estimate the joint spatial structures of pairs of variables by quantifying to what extent they fluctuate in unison, following the same trends in space, which are described by an orthonormal set of geographic structuring variables called spatial eigenvectors (described in particular by Blanchet, Legendre, & Borcard, 2008; Borcard & Legendre, 2002; Dray, Legendre, & Peres-Neto, 2006; Griffith, 2000; Griffith & Peres-Neto, 2006). Any mention to space in the present paper may equally apply to time or space-time data and processes. These structuring variables can be calculated from regularly or irregularly-spaced points. This aspect is important for applicability to ecological datasets where sampling may often not be regular along a transect or on a grid. The interest of MCA for the analysis of ecological data lies in the fact that natural processes are almost always operating at particular spatial scales and, consequently, the ecosystem features that derive from these processes are also generally structured in space. Hence, the assessment of the structures emerging from spatiotemporal organisation is now widely recognised as a cornerstone paradigm to understand ecological processes (Cottenie, 2005; Legendre, 1993; Wagner & Fortin, 2005; Wiens, Stenseth, Van, Horne, & Ims, 1993). For instance, landscape ecology is concerned about how the spatial organisation of environmental features of the landscape structures the functioning of ecosystems (Forman, 1995; Forman & Godron, 1986).
Multiscale codependence analysis was initially developed as a way of incorporating spatiotemporal information about environmental conditions in modelling the distribution of a species. In its original definition, MCA was presented as a method applicable only to single response variable. That limitation does not reflect the impossibility of calculating multivariate codependence but, rather, a choice done in that early version of the method for the sake of simplicity. It is expected, however, that MCA could be utilised in a much broader range of applications if it could handle multivariate response data. Ecosystems are often characterised by their species content for different target groups of organisms, which are multivariate data. There is therefore a need for statistical methods that allow scientists to quantify the join spatial trends of community structure (or some other similar multivariate ecosystem response) and environmental conditions.
The objective of the present study is to develop a multivariate implementation of MCA, assess its statistical properties (type I error rate and statistical power) using a Monte-Carlo simulation study, and present a few examples of applications to help readers figure out its relevance and the practical interpretation of its results. Monte-Carlo simulations were performed for a variety of sample sizes using both parametric and permutation testing whereas the examples encompassed case scenarios from river fish ecology and wetland ecology.
2 MATERIALS AND METHODS
2.1 Computation of multivariate MCA
To quantify the joint spatial dependence of a response and an explanatory data table, MCA requires a set of spatial eigenvectors (Borcard & Legendre, 2002; Dray et al., 2006; Griffith & Peres-Neto, 2006; U) suitable to represent spatial patterns of variation in the data (Guénard et al., 2010). These variables have to be centred (i.e., their values have to sum to 0) and orthonormal (i.e., their cross-product to one another for all i ≠ j, and the sum of squares
for all i, where ⊤ denotes the matrix transpose). In short, these variables represent a suite of potential spatial patterns of various shapes, such as gradients, ridges, and bumps, and sizes ocurring at different locations along a sampled transect or surface. By combining spatial eigenvectors in a linear equation, one put together a representation of the multiple features of a landscape. To understand how to compute spatial eigenvectors, see the documentation file of the function of the adespatial package in R (Dray et al., 2016; available at https://cran.r-project.org/web/packages/adespatial/adespatial.pdf). Readers who want to understand spatial eigenvectors could see the video course “Multi-scale modelling of the spatial structure of ecological communities” by P. Legendre on the Web at http://adn.biol.umontreal.ca/numericalecology/Trieste16/day5.html.
















The assumptions related to testing are the union of those of the multivariate regression of Y against ui with those of the linear regression of x against ui. Notably, residuals of both Y and x with respect to ui (and other eigenvectors in Us, if any) have to be (multivariate) normally distributed and their variances should be homogeneous along the range of values in ui. In cases where the normality assumption (for either Y or x, or both) is not met or difficult to ascertain (e.g. when sample size is too small to reliably assess the probability distribution), testing may be done using Monte-Carlo permutations. It is also noteworthy that while the τ statistic was signed and allowed one to perform both one-way or two-way inference tests, the ϕ statistic in strictly positive and tests the null hypothesis (H0) of no codependence against multiple two-way alternative hypotheses (i.e. H1: presence of codependence of any sign depending on particular responses yj in Y).
- Compute the vector [CU;Y,x] of the codependence coefficients.
- Sort values of [CU;Y,x] in descending order.
- Select the spatial eigenvector umax, associated with the highest codependence coefficient
among those that have not been tested (i.e. umax is not a member of Us at that point).
- Calculate
and its associated probability (P) using the theoretical distribution or by permutation.
- Test the significance of umax by comparing its p-value to a predetermined significance level α. If significant, incorporate umax permanently in Us and proceed again from step 3 to test another coefficient. If non-significant, stop here.
That method ensures that we highlight the best codependence coefficients, but since many eigenvectors are generally tested (sometimes as many as the sample size minus one), it comes at the price of inflated type I error. As for MCA(u), that issue can be addressed by considering all possible inference tests as a family of independent tests (eigenvectors being orthogonal) and apply a correction to transform the probabilities of single tests (i.e. testwise p-values) in probabilities for the whole family of tests (i.e. familywise p-values). We propose using a sequential version of the Šidák correction Sˇidák, 1967; Wright, 1992), the same method used by Guénard et al. (2010) for MCA(u).







2.2 Simulation study
We ran Monte-Carlo simulations to estimate the type I and II error rates (i.e. the probability of rejecting the null hypothesis when it is true and that of failing to reject it when it is false, respectively) generated by mMCA when it was applied to pairs of variables Y (multivariate) and x (univariate). Simulations were performed using parametric testing for normal random deviates and by permutation testing for non-normal random deviates simulating species abundances. These non-normal deviates were generated as the floor-rounded integers of the exponential of random normal deviates with M = 0 and SD = 1.5. That approach generated a zero-inflated distribution. We regarded that distribution as a fair approximation of that often encountered for species abundances in the wild.
The procedure consisted in generating transects of N evenly spaced sampling locations, by assigning sets of pseudo-random numbers to an N × M response data matrix Y and to a descriptor vector x with N elements. We used seven different sample sizes (N) between 10 and 1,000, which we each combined with four different numbers of species (M) between 1 and 500 (Table 1), resulting in 28 different conditions which were all analysed using parametric tests, whereas samples with sizes up to 100 were also analysed using permutations tests. The grand total of simulated conditions, including those with parametric and permutation tests, was therefore 44. Each conditions was tried 10,000 times; 440,000 simulations were thus done.
Number of sites (N) | Test | Number of species (M) | |||
---|---|---|---|---|---|
10 | Parametric | 1 | 2 | 3 | 5 |
10 | Permutations | 1 | 2 | 3 | 5 |
25 | Parametric | 1 | 3 | 5 | 10 |
25 | Permutations | 1 | 3 | 5 | 10 |
50 | Parametric | 1 | 5 | 10 | 20 |
50 | Permutations | 1 | 5 | 10 | 20 |
100 | Parametric | 1 | 10 | 20 | 50 |
100 | Permutations | 1 | 10 | 20 | 50 |
250 | Parametric | 1 | 20 | 50 | 100 |
500 | Parametric | 1 | 50 | 100 | 250 |
1,000 | Parametric | 1 | 100 | 250 | 500 |












2.3 Illustrative examples
We used two well-studied dataset to illustrate the application of mMCA. The first dataset was collected by Verneaux (1973) and consists of 30 sites sampled along a 453 km transect in the Doubs, a river located in eastern France, in which 27 fish species were observed (the response variables) and 11 explanatory quantitative variables (the descriptors) were measured. These descriptors were the river slope (slope, ‰), mean minimum discharge (flow, m3/s), pH, hardness (hardness, i.e. Calcium concentration, mg/L), biological oxygen demand (BOD, mg/L), dissolved phosphate (, mg/L), nitrate (
, mg/L), ammonium (
, mg/L), and oxygen ([O2], mg/L). Spatial eigenfunctions were calculated on the basis of the distance from the source of the river (in km, as the fish swim). Fish count data were Hellinger-transformed into square-rooted profiles of relative species abundances before analysis. As in previous published studies of this dataset, site 8, where no fish were caught, was excluded from our analysis.
The second dataset was collected by Borcard and Legendre (1994) and consisted of 70 cores mostly consisting of Sphagnum mosses, sampled from a rectangular plot, c. 2.5 m × 10 m, located on the peat mat surrounding Lac Geai, which is a bog lake located on the territory of the Station de Biologie de l'Université de Montréal in Saint-Hippolyte, Québec, Canada (latitude +45.9954, longitude −73.9936). The dataset consists of a response table of abundances (counts) of 35 morpho-species of Oribatid mites (Acari) and a second table containing five environmental variables: two quantitative (substratum density, g/L; water content of the substratum, in % of volume) and three qualitative (substrate composition, seven classes; presence and abundance of shrubs, three ordered classes; micro-topography the peat, two classes). For the analysis, the qualitative variables were transformed into 12 (i.e. 7 + 3 + 2) binary (dummy) variables, yielding a grand total of 14 descriptors. Spatial eigenfunctions were calculated using the geographic (i.e. Euclidean) distances between the sampling sites in the rectangular plot.
2.4 Computer package
The R computer package “codep”, originally developed for MCA(u), has been updated to support mMCA from version 0.6-5 onward. It is freely available online for multiple computer platforms from the Comprehensive R Archive Network (cran: https://cran.r-project.org/).
3 RESULTS
3.1 Simulation study
3.1.1 Type I error rate
The type I error rates obtained from the simulation study were close to the significance levels of the test. The expected rejection values under the null hypothesis of absence of codependence are the significance levels. This was true for all significance levels tested and all simulated sample sizes (N and M), for both the parametric (Figure 2) and permutation (Figure 3) tests. For N = 10 sites and M = 1 to 5 species, the permutation test was somewhat conservative, the simulations producing fewer spurious signal detection events than expected for the smallest α significance levels (0.01 and 0.005). The N = 10 sample size is lower than what would be found in real studies, and statistical power is extremely low under such conditions.




3.1.2 Statistical power
Statistical power increased as N increased (parametric test: Figure 4; permutation test: Figure 5), with a comparatively smaller but noticeable positive influence of M. Also, permutation tests carried out on non-normal deviates were slightly less powerful than the parametric test computed on normally distributed data, but the method remained entirely fit for practical purposes. For instance, for a statistical power of 0.95, the permutation test detected a signal with snr = 0.53 for N = 50 and M = 20, and snr = 0.96 (a roughly equal amount of signal and random noise) for N = 25 and M = 5. For the same statistical power and under the same two (N, M) combinations, the parametric test could detect comparatively weaker signals (i.e. smaller snr) on average: 0.36 and 0.60, respectively. For species abundance data, which seldom (if ever) conform to the normal distributions, the permutation test will be the preferred method because it carries fewer assumptions than the parametric test.


3.2 Illustrative examples
3.2.1 Doubs River
The first sampling site was located 300 m from the source of the Doubs River and the last one was 453 km from it, with distance between neighbouring sites ranging from 1.9 to 34.4 km (average: 16.17 km). The first explanatory variable found to be significant by the mMCA test was flow and it was associated with the scale of the first spatial eigenvector (that with the largest eigenvalue. The second one was BOD and it was related to the fourth spatial eigenvector. Then, , related to the third spatial eigenvector, and finally, [O2], at the scale of the second spatial eigenvector (Table 2).
Scale | Descriptor |
![]() |
ν1 | ν2 | p |
---|---|---|---|---|---|
MEM 1 | flow | 2,434.3 | 27 | 27 | .005 |
MEM 4 | BOD | 30.67 | 27 | 26 | .01 |
MEM 3 |
![]() |
27.78 | 27 | 25 | .01 |
MEM 2 | [O2] | 42.85 | 27 | 24 | .01 |
The first principal component of the fish community structure (PC1) was positively associated with a species having preference for small and well-oxygenated streams or rivers (TRU, a Salmonid), which was found in the upstream portion of the watershed, as opposed to the more tolerant species found in large and more oxygen-depleted reaches located in the downstream portion of the watershed (Figure 6a). The sum of the four components of the spatial codependence corresponds to a slight increase in PC1 loadings in the first 150 km from the river source, followed by a steep decrease from 150 to 300 km, and a plateau from 300 km to the river mouth (Figure 6b). That figure, which shows a way of representing the influence of the MEM eigenfunctions along a river, could also be used to represent the results of mMCA analysis of transects or time series.


The second principal component (PC2), was associated with species having good tolerance to oxygen deprivation, yet showing low propensity to high . This was not the case for the salmonid species (TRU), which had a high positive loading on PC1. The sum of the components corresponds to a decrease in PC2 loading between 0 and 100 km followed by a rather sharp increase between 100 and 200 km, then an even sharper decrease between 200 and 310 km, and, finally, an increase from 310 km to the river mouth (Figure 6c).
A notable feature of the results, which was also previously noted in other studies using these data, is that sites 23, 24 and 25, located immediately before and after the city of Besançon (304 to 327 km from the river source along the abscissa of Figure 6c), are polluted sites. The effect of these three sites on the spatial patterns of community variation is readily visible on PC2 and is driven by the BOD and variables at the spatial scales represented by MEM4 and MEM3, respectively.
Any other principal component associated with a substantial portion of the community variation could have been analysed similarly with respect to spatial codependence.
3.2.2 Oribatid mites
The strongest component of multiscale codependence associated peat water content (WaterCont) with the Oribatid community structure at the scale of the first spatial eigenvector (MEM1; Table 3). The latter covers the whole study plot in the north-south direction (i.e. from the forest in the south to the northern edge where the peat mat meets the open lake water). The second strongest component associated community structure with the prevalence of shrubs (Shrub:Many) at the spatial scale described by the fourth spatial eigenvector (MEM4), which also varies in the north-south direction along the plot, forming a pair of waves having roughly half the wavelength of MEM1. The third component associated community structure with the first type of peat moss assemblage (Subs:Sphagnum1; peat containing Sphagnum rubellum with some S. magellacinum) at the scale of the second spatial eigenvector (MEM2), which describes a wave with similar wavelength and orientation as MEM1, but offset by approximately a quarter of a wavelength (c. 90∘). The fourth and last statistically significant component of multiscale codependence pinpoints hummock (Topo:Hummock, i.e. elevated landforms) as another driver of Oribatid community structure at the scale of the third spatial eigenvector (MEM3). MEM3 varies transversely with respect to the north-south geographic axis of the plot.
Scale | Descriptor |
![]() |
ν1 | ν2 | p |
---|---|---|---|---|---|
MEM 1 | WaterCont | 1,785.1 | 35 | 68 | .005 |
MEM 4 | Shrub:Many | 324.4 | 35 | 67 | .005 |
MEM 2 | Subs:Sphagnum1 | 51.15 | 35 | 66 | .01 |
MEM 3 | Topo:hummock | 67.52 | 35 | 65 | .01 |
Morpho-species with positive loadings on the first principal component of the mite community structure (PC1; e.g. Sp16, Sp31; Figure 7) are found in peat with high water content, few shrubs, while having association with substrate composed with Sphagnum rubellum with some S. magellacinum and elevated peat mounds (Figure 8). They oppose to the species with negative PC1 loadings (e.g. morpho-species Sp13, Sp14, Sp15). The combination of all these separate effects highlight that a large amount of species variation occurs along an edaphic gradient associated with wetter substrate as one approaches the open lake water.


On the other hand, species with positive loadings on the second principal component of the mite community structure (PC2; e.g. morpho-species Sp13, Sp16, Sp23; Figure 7) are found in smaller abundances in peat with high water content, but follow similar trends with respect to the other descriptors, preferring few shrubs, Subs:Sphagnum1, and elevated peat mounds, compared to morpho-species with negative PC2 loadings (e.g. Sp31, Figure 9). The combination of these effects highlights the fact that species were distributed along an axis partially inclined east-west with respect to PC1. This is likely to be due to the fact that species with high positive PC2 are more prevalent in sites on peat mounds, which are more prevalent east of the plot, compared to those with high positive PC1 loadings.

4 DISCUSSION
In the present study, we defined an extension of MCA for multivariate response datasets, and investigated its statistical properties. The method performed as expected, yielding honest inference tests (i.e. correct type I error) and having good statistical power, even for relatively modest sample sizes compared to those generally encountered in community ecology. Adding species improved statistical power, but not as much as adding sampling sites. In that respect, our simulation study was sufficiently extensive, covering a wide range of conditions, to provide a clear demonstration that multivariate MCA (mMCA) is a useful method for practical statistical analysis.
The mMCA method was designed to answer the following question: at what scales do we find important species-environment correlations? A somewhat related approach is multiscale ordination (MSO), a method developed by Wagner (2003, 2004) and implemented in R in package vegan's mso() function (Oksanen et al., 2015).
Multiscale ordination was developed to answer a different question than mMCA: it tests the hypothesis that the explanatory (e.g. environmental) variables can account for the spatial correlation observed in the response matrix, for example in community composition data. The response spatial variation is analysed and represented by a multivariate variogram, which includes a test of significance of the variation accounted for by the various distance classes. The method can then examine, through RDA or partial RDA, if the environmental variables are sufficient to explain that spatial variation and leave spatially unstructured residuals.
mMCA and MSO bring complementary answers to the analysis of scale-dependent effects of explanatory (e.g. environmental) factors on the response data. Their similarity resides in the fact that both methods can use spatial eigenfunctions. In the original publications about MSO, Wagner used polynomials of the geographic coordinates, not spatial eigenfunctions. These eigenfunctions, under the name PCNM, were in their infancy at the time. The analyses reported in the next paragraph were the first to use MSO with spatial eigenfunctions, as an extension of the method.
Multiscale ordination was used in Borcard, Gillet, and Legendre 2011, section 7.5.2) and in Legendre and Legendre 2012, section 14.4) to analyse the mite data (example of the present paper). MSO first showed that the multivariate variogram of the detrended mite data was not flat; it displayed significant spatial structure. In a second analysis with canonical ordination (RDA) involving the environmental factors as explanatory variables, is became clear that the species-environment correlation varied with scale, so that a global estimation was meaningless unless one controlled for the regional scale spatial structure causing the problem. This control was obtained by using spatial eigenfunctions as covariables in the analysis described in Borcard et al. 2011, section 7.5.2). By opposition (present paper), mMCA directly computes codependence coefficients and tests of significance for the relationships between the spatial eigenfunctions representing the spatial scales and the individual environmental variables.
In the mMCA mite analysis shown in the present paper, we identified four significant codependence coefficients between spatial eigenfunctions representing the spatial scales and individual environmental variables (Table 3). These relationships were represented on maps of the sites, separately for ordination axes PCA1 (Figure 8) and PCA2 (Figure 9).
The three main assumptions underlying mMCA with parametric tests include (1) multinormality of the residuals of the response against the spatial eigenvectors involved as well and normality of the residuals of the explanatory variables against these eigenvectors, (2) linear relationships between the response and the eigenvectors and between the descriptors and the eigenvectors, and (3) homogeneity of the residuals' variances (i.e. homescedasticity). Permutation testing relaxes the normality assumptions, leaving assumptions 2 and 3 to be satisfied. In the present study, we did not assess the robustness of the method when these assumptions are not met. Another future development to mMCA would consist in generalising the method for other frequency distributions in the exponential family using Iteratively Re-weighted Least Squares (IRLS), as in generalised linear models (Hastie & Pregibon, 1991; Nelder & Wedderburn, 1972). Calculations would proceed as in the normally-distributed case described in the Methods section, but with IRLS weights.
Fish assemblages in the Doubs were driven by flow quality, which varied following the river's course main gradient, but also by chemical conditions related to water quality (namely BOD, , and [O2]), which varied following large-scale successions. The Brown trout (TRU) was the species most responsive to these effect. The analysis highlighted that this species was positively associated to
-rich waters in spite of its well-known reliance on high concentrations of dissolved oxygen.
is the form of nitrogen that is readily produced by fish through excretion. However, under aerobic conditions any
is rapidly oxidised to ammonia (NH3), nitrite (
), and finally
by ubiquitous bacteria.
, which is the intermediate in the nitrification process, is toxic to fish as it binds to haemoglobin and hinders oxygen transport (see Lewis & Morris, 1986 for a review) and salmonids are among the most sensitive fish to that anion. Local conditions affecting the nitrification process by slowing the conversion of
to toxic
may explain that association between
and fish community structure. A larger study involving more extensive sampling may help shed light on the effect of nitrification on fish assemblages in river ecosystems.
Oribatid mite assemblages in the peat mat surrounding Lac Geai were primarily driven by the peat's water content, which varied widely following a gradient going from the open water (north) towards the forest edge (south), and then by the presence of dense shrubs. The effects of peat moss assemblages and landforms were also evidenced. The mite morpho-species responded in various ways to variation in their habitat structure, probably as a consequence of their traits, such as their ability to move up and down in the peat mat, their preferred sources of food, and multiple physiological requirements. Had we had information about traits for the different species in that dataset, it would have been computationally straightforward to project them on the principal components for the sake of displaying their prevalence in different parts of the sampling plot. In that respect, a future development of the codependence method may involve quantifying the spatially explicit relationships between species traits and environmental variables (e.g. using bilinear algebra) instead of the relationships between multiple species responses and the environment, as illustrated in the present study.
It is noteworthy that it is possible to nest many local-scale mMCAs within a single analysis performed at a larger spatial scale. For instance, one may want to analyse the local and regional patterns of codependence for a mosaic of forested patches spread at the regional scale in a landscape. Assuming that each forested patch was sampled at multiple locations, one could perform mMCAs on each patch and then nest these local mMCAs in a single, regional, mMCA. However, if only a few locally-repeated measurements are available to perform local mMCAs with reasonable statistical power (e.g. N < 20 for the local samples), one should perform a single mMCA.
In the later patchwork scenario, within-patch distances are much smaller than among-patch distances. As a consequence, there is a gap between the smallest patterns of regional spatial variation and the largest patterns of local spatial variation. When a single mMCA is used, representing scales of either the regional or local spatial variation in a discrete fashion, using a set of spatial eigenvectors specially tailored for that purpose, gives results that are easier interpret compared to using a single set of spatial eigenvectors. It can be achieved by first calculating regional-scale spatial eigenvectors, substituting the patch centroid for individual observations. That analysis yields a maximum of Np − 1 non-zero eigenvalues (where Np corresponds to the number of forested patches), their associated eigenvectors being invariant among the sites pertaining to a patch. Then, one can calculate the local spatial eigenvectors for each patch. Each of these sets has to be padded to match the size of the whole dataset, by assigning the value 0 to the elements corresponding to the observations in the other patches, as shown in Appendix 1 of Declerck, Coronel, Legendre, and Brendonck (2011). The local eigenvector sets thus padded are appended to the regional eigenvectors. One computes the cumulative sum of the eigenvalues in the same order as the eigenfunctions are appended. From that procedure, the maximum number of local eigenvectors one can obtain is N − Np, where N corresponds to the total number of sites in all the patches. That number adds to that of the regional eigenvectors to give a great maximum of N − 1 spatial eigenvectors. That number is the same as the maximum number of eigenvectors not accounting for the spatial scale gap associated with the spatial organisation of the patches in the landscape. Other examples where such spatial arrangement can be observed are lakes in a landscape, islands of an archipelago, coral reefs, etc.
Whereas the two illustrative examples presented in the present study featured (Hellinger-transformed) species abundances as the response data, any dimensionally homogeneous set of response variables can be used as well. As for multivariate regression, mMCA implicitly uses the Euclidean metric for distances among the sampling units. It is possible to alleviate that apparent limitation using principal coordinate analysis (PCoA; Gower, 1966; see Legendre & Legendre, 2012, for a description) in a similar fashion as in distance-based Redundancy Analysis Legendre & Anderson, 1999). Using the principal coordinates as a set of response variables in mMCA allows one a great flexibility in the type of ecological questions that it can address. For instance, one can calculate a distance metric incorporating information on both species occurrence and phylogeny, and submit it to PCoA to obtain principal coordinates. The resulting principal axes can then be used as a response variable in mMCA to evidence how ecological drivers intervene on biodiversity at a suite of different spatial scales. For example, a distance metric can be obtained by using the inverse of the phylogenetic (i.e. patristic) distance among species to weight the counts of species occurrence in the calculation of the Jaccard index of similarity among sites (see Legendre & Legendre, 2012, section 7.2, for a description) and then calculating the corresponding distances. Given two pairs of sites with the same total species richness and number of coincident species, the aforementioned distance metric would place species in the pair with the most phylogenetically different species at a greater distance from one another than that in the pair with the most phylogenetically similar species. Following a similar approach, metrics of site dissimilarity can be developed to help answer a broad array of questions in ecology and evolution (e.g. assessing taxonomic or functional diversity).
We are hoping to see many application of mMCA in the near future given its usefulness to ecologists and environment scientists interested in unveiling the role of the naturally-occurring and anthropogenic phenomena structuring the spatial distribution of species assemblages and other environmental responses in the landscape. The now impressive number of large-scale (and often geographically referenced) dataset now being publicly available on the Internet is an opportunity to revisit many hypotheses that might have been left untested by previous studies. The method allows researchers to readily test hypotheses that could not have been directly tested before, which may allow previously overlooked theories about the functioning of nature to emerge.
ACKNOWLEDGEMENTS
We are thankful to the many people, from Université de Montréal and abroad, who helped us during the elaboration of the present study. The present version of the paper has greatly benefit from the insightful comments and suggestions given by Prof. Robert B. O'Hara and two anonymous reviewers. G.G. was supported by Discovery Grant #7738 from the Natural Sciences and Engineering Research Council of Canada (NSERC) to P.L.
AUTHORS' CONTRIBUTIONS
The computational approach to multivatiate MCA is the results of thoughts and discussions between G.G. and P.L. G.G. wrote the software, performed the simulations study, and authored a first draft of the manuscript under close collaboration with P.L. After two rounds of commenting (P.L.) and editing (G.G.), the manuscript was submitted by G.G. upon approval by P.L. Following evaluation by the Journal, G.G. headed the revision, helped by P.L. Both authors approve the present version of the manuscript, agree to be held accountable for any of its aspects, and ensure that questions about the accuracy or integrity of any of its part have been suitably addressed.
DATA ACCESSIBILITY
Computer code and data necessary to replicate the simulation study and examples are available from Dryad Digital Repository https://doi.org/10.5061/dryad.n4288(Guénard & Legendre, 2017).