Inferring competitive outcomes, ranks and intransitivity from empirical data: A comparison of different methods

The inference of pairwise competitive outcomes (PCO) and multispecies competitive ranks and intransitivity from empirical data is essential to evaluate how competition shapes plant communities. Three categories of methods, differing in theoretical background and data requirements, have been used: (a) theoretically sound coexistence theory‐based methods, (b) index‐based methods, and (c) ‘process‐from‐pattern’ methods. However, how they are related is largely unknown. In this study, we explored the relations between the three categories by explicitly comparing three representatives of them: (a) relative fitness difference (RFD), (b) relative yield (RY), and (c) a reverse‐engineering approach (RE). Specifically, we first conducted theoretical analyses with Lotka–Volterra competition models to explore their theoretical linkages. Second, we used data from a long‐term field experiment and a short‐term greenhouse experiment with eight herbaceous perennials to validate the theoretical findings. The theoretical analyses showed that RY or RE applied with equilibrium data indicated equivalent, or very similar, PCO respectively to RFD, but these relations became weaker or absent with data further from equilibrium. In line with this, both RY and RE converged with RFD in indicating PCO over time in the field experiment as the communities became closer to equilibrium. Moreover, the greenhouse PCO (far from equilibrium) were only similar to the field PCO of earlier rather than later years. Intransitivity was more challenging to infer because it could be reshuffled by even a small competitive shift among similar competitors. For example, the field intransitivity inferred by three methods differed greatly: no intransitivity was detected with RFD; intransitivity detected with RY and RE was poorly correlated, changed substantially over time (even after equilibrium) and failed to explain coexistence. Our findings greatly help the comparison and generalization of studies using different methods. For future studies, if equilibrium data are available, one can infer PCO and multispecies competitive ranks with RY or RE. If not, one should apply RFD with density gradient or time‐series data. Equilibria could be evaluated with T tests or standard deviations. To reliably infer intransitivity, one needs high quality data for a given method to first accurately infer PCO, especially among similar competitors.


| INTRODUC TI ON
The inference of long-term competitive outcomes from empirical data is essential to understand how competition shapes the structure, dynamics and functioning of plant communities (Aschehoug, Brooker, Atwater, Maron, & Callaway, 2016). For instance, to understand and predict plant coexistence, we need to explicitly estimate long-term competitive outcomes between plant species (Hart, Freckleton, & Levine, 2018), rather than merely the intensity of competition or overall competitive effects and responses. In theory, a more competitive species will ultimately exclude a less competitive one over the long term unless sufficient niche differences exist between the two in space and time (Chesson, 2000;Gause, 1934). Therefore, we need to focus on the long-term competitive outcomes at equilibrium, rather than the short-term outcomes that do not easily extrapolate to the effects of competition on population dynamics. Based on pairwise competitive outcomes (PCO), multispecies competitive ranks can be constructed (Keddy & Shipley, 1989). If the ranks are not hierarchical, intransitive competition (i.e. intransitivity) can emerge as in the 'paper-rock-scissors' game: species A beats B which beats C which in turn beats A (i.e. A > B > C > A) (Gallien, 2017).
Given their differences in theoretical background and data requirements, we think these methods fall into three categories (Table 1).
The first category of methods are rooted in coexistence theory, and they infer PCO from parameters in dynamic models that describe long-term competition between species (Chesson, 2000;MacArthur & Levins, 1967;Tilman, 1982; see Table 1). These methods provide a theoretically sound measure of PCO, because they indicate which species, in the absence of niche differences, is the competitive winner at equilibrium (Hart et al., 2018). Among them, relative fitness difference (RFD) derived from Chesson's coexistence theory (Chesson, 2000) is the commonly used one in recent studies (e.g. Chu & Adler, 2015;Godoy, Kraft, & Levine, 2014). In theory, RFD is largely consistent with other measures from classical coexistence theories, e.g. MacArthur and Levins (1967) May (1974)'s i j , and Tilman (1982)'s R* (Carroll, Cardinale, & Nisbet, 2011;Letten, Ke, & Fukami, 2017).
However, these methods are challenging to apply, because they require a large amount of density gradient or time-series data (often from field experiments) to parameterize dynamic competition models. The second category comprises traditional indices that are often calculated by comparing the performance of a plant of a given species grown in monoculture (or alone) vs. mixture (Table 1; for a full review, see Weigelt & Jolliffe, 2003). Relative yield (RY), i.e. the ratio of a species' yield in mixture to its yield in monoculture (de Wit, 1960), is the commonly used among them. The application of RY and other indices (e.g. relative mixture response, relative competition intensity) in this category requires much less data (often from greenhouse experiments). However, these indices have been frequently criticized as they may only indicate short-term rather than long-term competitive outcomes (Freckleton & Watkinson, 2000;Hart et al., 2018).
The third category refers to the 'process-from-pattern' methods, which often use statistical tools to infer competition from observed species abundance data based on classic community assembly rules (Diamond, 1975), e.g. a reverse-engineering approach (RE), C-scores, multivariate logistic regression (Table 1). The RE is a recently developed method that applies Markov chain models to estimate competition matrices that best fit the observed species abundances (Ulrich et al., 2014). Within this category, the RE is the only one that can explicitly estimate PCO and has been frequently applied in recent studies (e.g. Soliveres et al., 2015), but how it relates to methods in the other two categories is not well known (but see . In short, the three categories of methods not only have different theoretical backgrounds, but also often use data from different sources (e.g. field experiments, greenhouse experiments, observations) which to some extent characterize different conditions of study systems (from equilibrium to non-equilibrium).
However, it is largely unknown how different methods are related, which hinders the appropriate interpretation and application of them. RFD; intransitivity detected with RY and RE was poorly correlated, changed substantially over time (even after equilibrium) and failed to explain coexistence.
In this study, we chose three commonly used methods: RFD, RY and RE, as representatives of the three categories, respectively, and thoroughly compared them. As it is impossible to comprehensively compare the large number of existing methods (Table 1), and as different methods within each category share similarities, we chose specific methods as representatives of different categories to perform a full comparison. First, we used analytical approaches and simulations with Lotka-Volterra competition models to theoretically analyse how the three methods are related to infer PCO and multispecies competitive ranks and intransitivity, when using data from equilibrium to non-equilibrium conditions. Second, we used data from a long-term biodiversity experiment in the field and a shortterm microcosm experiment in the greenhouse to validate the theoretical findings. Based on our results, we discuss other methods in the three categories and previous studies using different methods to draw general conclusions, and also provide recommendations for future studies to choose an appropriate method for a particular objective and situation.

| Three representative methods
The RFD between two species i and j is often defined as with Lotka-Volterra competition models (Equation 1) (Chesson, 2013). Note, however, equivalent measures can also be derived from other phenomenological or mechanistic competition models (e.g. Carroll et al., 2011;Hart et al., 2018;Saavedra et al., 2017).
where N i,t+1 and N i,t , and N j,t+1 and N j,t are biomasses of species i and j at time t + 1 and t, respectively, and r i and r j are intrinsic growth rates of species i and j, respectively, and alphas (α ii , α jj , α ij , α ji ) are intraspecific and interspecific competition coefficients.
If f i /f j > 1, species i will outcompete species j in the absence of , because species j is overall more sensitive to competition than species i. To estimate the parameters in Equation 1, data on performance of species at very low densities (to estimate r i and r j ) and with different densities of conspecific competitors (to estimate α ii and α jj ) and heterospecific ones (to estimate α ij and α ji ) are required. The required data can be obtained by experimentally creating density gradients (Godoy et al., 2014) or using time-series data of population dynamics with sufficient variation in densities (Chu & Adler, 2015).
The RY is calculated as a species' yield in mixture divided by its yield in monoculture, often using data from the design of replacement series where total density is held constant (sensu de Wit, 1960). This does not specify whether the data should be from study systems at equilibrium or not, and in fact RY is often calculated with 'non-equilibrium' data from short-term greenhouse experiments , TA B L E 1 The three categories of methods and their representatives to infer pairwise competitive outcomes and multispecies competitive ranks and intransitivity Category Theory, representative, and related methods
The RE estimates PCO by using Markov chain models to (a) randomly generate a large number (100,000) of competition matrices whose elements C ij specify the probability that species i outcompetes species j, and C ji = 1 − C ij (if C ij > C ji , species i outcompetes species j), (b) transform the competition matrices to patch transition matrices from which species abundances can be predicted, and (c) select the competition matrix that best fits the observed abundances of species (for more details, see Ulrich et al., 2014). If the goodness-of-fit of the best fitting matrix (r S ) is low, it indicates that forces other than competition (e.g. niche-based processes) may also play important roles in determining species abundances. The RE can be applied to temporal, spatial, or temporal × spatial abundance data, and allows one to infer PCO for many species without doing a large number of pairwise competition experiments (e.g. Soliveres et al., 2015).

| Multispecies competitive ranks and intransitivity
Using PCOs inferred by any of the methods above, multispecies competitive ranks and intransitivity can be constructed.
The competitive ranks can simply be constructed by counting the number of wins each species has. To measure intransitivity, six indices have been proposed to capture different elements of topological variation in an intransitive network (Laird & Schamp, 2018) (see Appendix S1 for details): Slater's and Petraitis's i and

| Theoretical test
First, we used the analytical approach with the two-species Lotka-Volterra model (Equation 1) to derive the relations between RFD, RY and RE in indicating PCO using data at equilibrium and before equilibrium. To seek generality, we also derived the relations using other competition models (e.g. Beverton-Holt model, annual plant model; see Table S1 in Appendix S2). For the 'before-equilibrium' case where it turned out analytical solutions are not possible, however, we used simulations to explore the relations (see below). The details of these theoretical analyses are in Appendix S2.
Second, we used numerical simulations with the Lotka-Volterra model to explore how the relations changed as the systems move away from equilibrium (Appendix S2). In the simulations, we also had eight species and 28 two-species pairs as in the experimental test (see below). Model parameters were randomly drawn from the range of parameters in the Lotka-Volterra models fitted with the field experiment (see below), i.e. with intrinsic growth rates, intra-and inter-specific competition coefficients randomly drawn from 0. > 1, RY i > RY j and C ij > C ji , respectively. All the simulations were performed with the 'deSolve' package (Soetaert, Petzoldt, & Setzer, 2010) in r 3.4.3 (R Core Team, 2018).
Based on PCO inferred by each method, we then constructed multispecies competitive ranks for the eight species. For intransitivity, we first theoretically constructed 30 multispecies communities (species richness >2) as in the experimental test (see below), and then calculated six intransitivity indices for each community based on the PCO. We also counted the number of three-species intransitive loops among the eight species as another measure of intransitivity.

| Experimental test
We used data from a long-term field experiment and a short-term greenhouse experiment to validate the findings in the theoretical test. Below, we briefly describe the two experiments (for details, see Appendix S3).

| PCO and multispecies competitive ranks and intransitivity
As in the theoretical test, we applied each method to the field and greenhouse experiments to calculate PCO and multispecies competitive ranks and intransitivity, and count the number of three-species intransitive loops.

RFD
For the field experiment, we first fitted exponential growth models with annual monoculture biomass data from 2002 to 2003, during which all species appeared to experience exponential growth, to estimate their intrinsic growth rates. We then fitted a joint two-species Lotka-Volterra model with time-series biomass data (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) of all two-species mixtures to simultaneously estimate all intra-and inter-specific competition coefficients. Similarly, for the greenhouse experiment, we estimated intrinsic growth rates with mono1 biomass data in June and September, and competition coefficients with mono2 and mix data in June and September. After the models for the two ex-

RY
For the field experiment, we calculated RY using biomass data from monocultures and two-species mixtures at each year (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015), but in the main text, we only presented the results of 2015 where communities appeared to be closest to equilibrium (see Appendix S5 for the results of other years, and equilibrium evaluation below in Data Analysis). As the monoculture biomass of P. pratense in 2015 was zero, we used its monoculture data from 2014. For the greenhouse experiment, we calculated RY using mono1 and mix data by averaging the biomasses in June and September.

RE
For the field experiment, we applied RE with data of species relative abundance (biomass of each species divided by total biomass) of two 'spatial' replicates of each two-species mixture at each year (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015), but similarly here we only presented the results of 2015 (see Appendix S6 for those of other years). For the greenhouse experiment, we applied RE with data of two replicates of each twospecies mixture.

| Data analysis
For both theoretical and experimental tests, we tested how PCO (Phi correlations), multispecies competitive ranks (Wilcoxon signedrank tests) or values of each intransitivity index (Phi or Pearson correlations depending whether the variable is binary or not) inferred by the three methods were related. Moreover, we checked whether three-species intransitive loops based on different methods differed (Appendix S7). We also used Pearson correlations to test how each intransitivity index is related to extinction rates (i.e. number of species lost by 2015 divided by total number of species; lower extinction rates indicate greater coexistence) of 30 multispecies communities in the field experiment (Appendix S8). To evaluate whether and when the equilibrium is achieved in the field experiment, we used pairwise T tests to test whether the biomasses of species (in monocultures or two-species mixtures) in the current year were significantly different from those in the previous year (Appendix S9). A significant difference should indicate the system is not at equilibrium. We also used standard deviations to evaluate the equilibrium but the results were similar (for the details, see Appendix S9). We illustrated all the correlations using the 'ggpairs' function in the GGally r package (Schloerke et al., 2018), and plotted competitive networks of eight species using igraph r package (Csardi & Nepusz, 2006). All the analyses were performed in r 3.4.3 (R Core Team, 2018).

| PCO
Our analytical and simulation tests demonstrated that, PCO inferred by RY or RE with data from study systems at equilibrium were mathematically equivalent or very similar respectively to those inferred by RFD, regardless of competition models used (Figure 1a; for details see Appendix S2). This is because RY i − RY j > 0 and mostly C ij − C ji > 0 if f i /f j > 1 (Appendix S2). However, the simulations showed that these relationships became weaker the further the system was from equilibrium, where RY i − RY j and C ij − C ji both were affected not only by competition coefficients (as in the equilibrium case) but also by species' intrinsic growth rates and previous biomass or density (Figure 1a and Appendix S2).
In line with the theoretical findings, our tests with the field experiment showed that PCO inferred by RE (using data from 2015) were strongly positively correlated with those inferred by RFD (Figure 1b), but for RY its correlation with RFD peaked at 2010 ( Figure S6 in Appendix S5). Moreover, the correlations between PCO inferred by RY or RE and those by RFD became stronger over time ( Figures S6 and S9 in Appendix S6), as communities became closer to equilibrium (Appendix S9). Note the equilibrium in monocultures and two-species mixtures was apparently achieved around 2005 and 2008 respectively (Appendix S9). The PCO inferred by the three methods were not correlated in the greenhouse experiment (Figure 1a), and overall the greenhouse PCO were only positively correlated with those in the field experiment of earlier (2003)(2004)(2005) rather than later years ( Figures S6 and S9). The reliability of the field RFD (mean = 0.90 and most >0.80) was on average much greater than that of the greenhouse RFD (mean = 0.79, with many lower than 0.75) ( Figures S4 and S5 in Appendix S4).

| Multispecies competitive ranks and intransitivity
Multispecies competitive ranks inferred by three methods showed equilibrium vs. non-equilibrium patterns similar to those of PCO (Table 2 and Figures 2,3; also see Table S4 in Appendix S5 and Table   S6 in Appendix S6). In the simulations, we found positive correlations between the three methods with equilibrium data in inferring most of intransitivity indices (Figure 4) (also see Appendix S7), but the correlations were overall weaker than those for PCO with equilibrium data, and also tended to get much weaker when the system was far from equilibrium ( Figure 4). The tests with the field experiment showed that there was no intransitivity inferred by RFD ( Figure 5). The RY and RE detected intransitivity but their intransitivity values (and three-species intransitive loops) were poorly correlated and also changed substantially over time (Appendix S5 and S6). Moreover, the three methods overall inferred very different values of six intransitivity indices in the greenhouse experiment ( Figure 5), and the values were also poorly correlated with the field ones ( Figure 5; Figure S8 in Appendix S5 and Figure S11 in Appendix S6). In addition, all the intransitivity indices were poorly or counterintuitively correlated with extinction rates (Appendix S8).

| D ISCUSS I ON
Among the three categories, only the coexistence theory-based methods explicitly separate the forces determining competitive outcomes from those causing niche differences, through the strict analysis of population dynamics (Chesson, 2000;Tilman, 1982). Therefore, they provide a theoretically robust measure of PCO (Hart F I G U R E 1 Phi correlations of PCO between eight species (28 pairs in total) based on the three different methods (relative fitness difference: RFD; relative yield: RY; the reverse-engineering approach: RE), in (a) the simulation test and (b) the experimental test. In the simulation test, each method is applied to equilibrium (EQM) and before-equilibrium (before-EQM) conditions. In the experimental test, each method is applied to the field experiment (field) and the greenhouse experiment (GH). For the field experiment, both RY and RE used data from 2015. Asterisks indicate significance of the correlations as **p < .01; ***p < .001. The lines in the lower diagonal are based on linear regressions, and the grey area indicates the 95% confidence interval. Data points are slightly jittered to improve readability F I G U R E 2 Competitive ranks and networks of eight simulated species based on the three different methods (relative fitness difference: RFD; relative yield: RY; the reverse-engineering approach: RE), in the simulation test. Each method was applied to (a-c) equilibrium (EQM) and (d-e) before-equilibrium (bef-EQM) conditions. In the networks, node sizes are proportional to species' competitive ranks (i.e. nodes are larger if ranks are higher), and arrows point to the winner in competition. Different species are indicated with different colours TA B L E 2 Results of Wilcoxon signed-rank tests of competitive ranks of eight species based on the three different methods (relative fitness difference: RFD, relative yield: RY, the reverse-engineering approach: RE), in (a) the simulation test and (b) the experimental test. In the simulation test, each method was applied to equilibrium (EQM) and before-equilibrium (bef-EQM) conditions. In the experimental test, each method was applied to the field experiment (field) and the greenhouse experiment (GH). For the field experiment, both RY and RE used data from 2015. Numbers are the p-values of Wilcoxon signed-rank tests  , 2018), in comparison to the other two categories whose inferences are mainly based on species biomasses or abundances (state variables). Interestingly, however, our theoretical analyses of the three representative methods showed that, RY or RE applied to data at equilibrium, give equivalent or very similar estimates of PCO respectively to RFD. Note, this also implies RY is irrelevant to niche differences in such case, which questions whether RY could, as often claimed (Jolliffe, 2000), indicate niche differentiation between species. As different methods within each category share similarities, our findings could to some extent generalize beyond the three chosen representative methods. For example, as RFD can also be derived from a wide range of other competition models (e.g. resource competition model), our findings should therefore also apply when other coexistence theory-based meaures (e.g. Tilman's R*) are used to infer PCO. Moreover, most of the other index-based methods also compare monoculture vs. mixture treatments as RY (e.g. relative mixture response, relative interaction index), while some compare grown-alone vs. mixture treatments (e.g. relative competition intensity). As the formulas of these indices deviate from that of RY, our theoretical analyses showed that with equilibrium data they could indicate PCO either same to (relative mixture response) or more frequently different from (relative interaction index, relative competition intensity) those inferred by RY and RFD (for details, see Appendix S2). Lastly, among the 'process-from-pattern' methods to tease apart the relative importance of competition vs. other forces (such as niche-based processes, stochasticity) in community assembly, the RE is the only one which can explicitly give competitive outcomes between species. However, our findings of the close link between RE and RFD may also help the other 'process-from-pattern' methods (e.g. C-scores, the null model approach) to build their basis of inferring community assembly on species coexistence theory.
In support of the theoretical results, our test with the field experiment showed that both RY and RE converged with RFD in indicating PCO over time as communities got closer to equilibrium. However, for RY, the convergence peaked in 2010 rather than 2015, which may be because of the deterioration of monoculture performance for some species after 2010 (Appendix S9), possibly due to the accumulation of pathogens over time (Cortois, Schröder-Georgi, Weigelt, Putten, & Deyn, 2016). In line with our findings, a former study found that RY using long-term, presumably equilibrial, data was positively correlated with Tilman's R* (Fargione & Tilman, 2006). Moreover, some recent studies also F I G U R E 3 Competitive ranks and networks of eight species based on the three different methods (relative fitness difference: RFD; relative yield: RY; the reverse-engineering approach: RE), in the experimental test. Each method was applied to (a-c) the field experiment (field) and (d-f) the greenhouse experiment (GH). For the field experiment, both RY and RE used data from 2015. In the networks, node sizes are proportional to species' competitive ranks (i.e. nodes are larger if ranks are higher), and arrows point to the winner in competition. Different species are indicated with different colours showed that RY at or close to equilibrium could reasonably measure the degree to which Lotka-Volterra models parameterized with monoculture and biculture data capture species abundances of polycultures (Fort, 2018;Halty, Valdés, Tejera, Picasso, & Fort, 2017). Our findings also suggest that the former applications of the RE with equilibrium data (e.g. Soliveres et al., 2015;Ulrich et al., 2016)  . If RFD between species i and j > 1, species i is more competitive than species j, and vice versa. Our 'reliability' analyses showed that the estimates of the field RFD were much more reliable than the greenhouse ones, because for the former on average most of their posterior RFD (90%) agreed with the estimates (either >1 or <1), a lot greater than that for the latter (79%) (Appendix S4). Moreover, compared to the greenhouse one, the posteriors of competition coefficients in the field experiment yielded much smaller SD relative to means (Appendix S4).
These results are likely because the field experiment consists of long time-series data with substantial density variations. However, the data from the greenhouse experiment (and also from most of other greenhouse experiments) have very limited density gradients, and this may have led to less accurate estimation of model parameters.
In contrast to the equilibrium case, both RY and RE only poorly predicted PCOs in non-equilibrium systems. This is because the RY and RE calculated from non-equilibrium data are affected not only by competition coefficients (as in the equilibrium cases) but also by species' growth rates, previous biomass or density (Appendix S2) (Freckleton & Watkinson, 2000). This suggests that the use of RY with short-term greenhouse data in many former studies (Table 1) probably fail to correctly indicate PCO. In line with this, our experimental test showed that the greenhouse PCO was only positively correlated with the field PCO from earlier rather than later years. In the non-equilibrium cases, therefore, PCO could only be reliably inferred from RFD by fitting competition models with density gradient or time-series data. However, if the required F I G U R E 4 Pearson correlations of each of intransitivity indices (a-f) (Slater's and Petraitis's i, Ulrich's I, Kendall and Babington Smith's d, Bezembinder's δ, unbeatability u and always-beatability a) derived from the pairwise competitive outcomes based on each of the three different methods (relative fitness difference: RFD; relative yield: RY; the reverse-engineering approach: RE), in the simulation test. Each method was applied to equilibrium (EQM) and before-equilibrium (before-EQM) conditions. Asterisks indicate significance of correlations as *p < .05; **p < .01; ***p < .001. The lines in the lower diagonal are based on linear regressions, and the grey area indicates the 95% confidence interval. Data points are slightly jittered to improve readability data are not of high quality (as in our greenhouse experiment), RFD might not make accurate inferences either (Appendix S4), which may explain the poor correlations between the greenhouse and field RFD.

Multispecies competitive ranks inferred by three methods
showed similar equilibrium vs. non-equilibrium patterns as PCO.
However, competitive relationships among multiple species are more subtle when their competitive ranks are not hierarchical, as intransitivity indices were much more sensitive to the methods used and the condition of study systems (at equilibrium or not). This is because intransitivity tended to occur mainly among similar competitors ('weak intransitivity') (see Figure S12 in Appendix S7; . Therefore, even a small shift in competitive outcomes could reshuffle intransitive relationships. were poorly correlated and also changed substantially over time (Appendix S5 and S6). Moreover, the detected intransitivity failed to explain species coexistence (Appendix S8), which may imply the estimates of intransitivity are not very reliable. These results suggest that the intransitivity inferred by RY or related indices using short-term (often greenhouse) data in former studies (e.g. Grace, Guntenspergen, & Keough, 1993;Keddy & Shipley, 1989;Shipley, 1993) are probably not reliable. Moreover, our findings also call for stricter evaluation when inferring intransitivity by RE with equilibrium data (e.g. Soliveres et al., 2015) and by RFD with density gradient data (e.g. Godoy, Stouffer, Kraft, & Levine, 2017). In short, all this makes it generally challenging to accurately infer intransitivity regardless of the methods, unless there are high quality data to allow a given method to accurately infer competitive outcomes among similar competitors.

F I G U R E 5
Pearson correlations of each of intransitivity indices (a-f) (Slater's and Petraitis's i, Ulrich's I, Kendall and Babington Smith's d, Bezembinder's δ, unbeatability u, and always-beatability a) derived from the pairwise competitive outcomes based on each of the three different methods (relative fitness difference: RFD; relative yield: RY; the reverse-engineering approach: RE), in the experimental test. Each method was applied to the field experiment (field) and the greenhouse experiment (GH). For the field experiment, both RY and RE used data from 2015, and no intransitivity was detected with RFD so its correlations with others are NA. Asterisks indicate significance of correlations as *p < .05; **p < .01; ***p < .001. The lines in the lower diagonal are based on linear regressions, and the grey area indicates the 95% confidence interval. Data points are slightly jittered to improve readability

| Choosing an appropriate method
Based on our findings, we recommend future studies to choose appropriate methods depending on questions of interest and data availability.

Multispecies competitive ranks and intransitivity.
Our recommendations for inferring multispecies competitive ranks are similar to those for PCO. However, for inferring intransitivity, which is also derived from PCO but very sensitive to even a small shift of PCO among similar competitors, our recommendations are stricter. In other words, to reliably infer intransitivity, one needs high quality data for a given method (e.g. unambiguous equilibrium data for RY and RE, and a large amount of density gradient or time-series data with sufficient density variations for RFD) to first accurately infer PCO especially among similar competitors.

ACK N OWLED G EM ENT
This study has been supported by the German Science Foundation (RO2397/8) in the framework of the Jena Experiment (FOR 456/1451).
We thank Anne Ebeling for coordination of the field site and contribution of data, as well as the gardeners and numerous students