qad: An Rpackage to detect asymmetric and directed dependence in bivariate samples
Abstract
 Correlations belong to the standard repertoire of ecologists for quantifying the strength of dependence between two random variables. Classical dependence measures are usually not capable of detecting nonmonotonic or nonfunctional dependencies. Furthermore, they completely fail to detect asymmetry and direction in dependence, which exist in many situations and should not be ignored.
 In this paper, we present qad (short for quantification of asymmetric dependence), a nonparametric statistical method to quantify directed and asymmetric dependence of bivariate samples. Qad is applicable in general (e.g. linear, nonlinear, or nonmonotonic) situations, is sensitive to noise in data, exhibits a good small sample performance, detects asymmetry in dependence, shows high power in testing for independence, requires no assumptions regarding the underlying distribution of the data and reliably quantifies the information gain/predictability of quantity Y given knowledge of quantity X, and vice versa (i.e. q(X,Y) $\ne $ q(Y,X)).
 Here, we briefly recall the methodology underlying qad, introduce the functions of the Rpackage qad, which returns estimates for the measures $q\left(X,Y\right)$ denoting the directed dependence of $Y$ on $X$ (or, equivalently, the influence of $X$ on $Y$), $q\left(Y,X\right)$ the directed dependence of $X$ on $Y$, $a\left(X,Y\right)\u2254q\left(X,Y\right)q\left(Y,X\right)$ the asymmetry in dependence. Furthermore, qad can be used to predict Y given knowledge of X, and vice versa. Additionally, we compare empirical performance of qad with that of seven other well established measures and demonstrate the applicability of qad on ecological datasets.
 We illustrate that direction and asymmetry in dependence are universal properties of bivariate associations. Qad thus provides additional information gain and avoids model bias and will therefore advance and facilitate the understanding of ecological systems.
1 INTRODUCTION
Although the number of available statistical tools is continuously increasing, classical measures such as correlations often remain the first choice for quantifying the dependence between two random variables (Anderson et al., 2021; Bolt et al., 2021). Usually, the decision for a specific correlation method is based on the models' underlying assumptions on the data, for example, Pearson's r should be used for continuous data, whereas Spearman's $\rho $ is advised for data on the ordinal scale. Both just mentioned dependence measures, however, provide information on different aspects of bivariate distributions: Pearson's r quantifies how linear a relationship is, whereas Spearman's $\rho $ measures the extent of monotonicity. Additional insight may be gained by considering other less frequently applied or less wellknown dependence measures. Examples are distance correlation (dCor; Székely et al., 2007), which is implemented in the Rpackage energy or the informationtheoreticbased maximal information coefficient (MIC; Reshef et al., 2011; Rpackage minerva). Very recent developments are the asymmetric dependence measures xicor (Chatterjee, 2021) and quantification of asymmetric dependence (qad; Junker et al., 2021).
In recent years, the usefulness of symmetric dependence measures for inferring the structure of complex systems or causality in bivariate associations has been debated and potential biases have been discussed (see, for instance, Zhang et al. (2015), Wang and Huang (2014), Okimoto (2008), Hirano and Takemoto (2019)). Thus, the concept of asymmetry/direction in dependence, which exists in most situations, should not be ignored in data analysis. Whereas in a linear setting, the dependence between two variables X and Y is indeed symmetric (Figure 1a) in the sense that Y can be equally well predicted by knowing X as vice versa, the situation, however, is different in more complex relationships. For instance, for a twodimensional sample in the form of a parabola (Figure 1b) or a sinusoidal curve (Figure 1c), the dependence structure is clearly asymmetric. In these cases, knowing the value of the variable $X$ strongly improves the predictability of $Y$, whereas in the other direction, the information gain is significantly smaller. As an example, consider the year of deglaciation along a glacier forefield and plant diversity (Junker et al., 2020). Naturally, the year of deglaciation has a strong influence on plant diversity (not vice versa), and this directed dependence structure is clearly captured by qad (Figure 1d). Especially, in cases where no a priori knowledge about the causal relationship is available, directional dependence is a useful measure for exploring and estimating the association between two random variables in a more detailed and more realistic way than classical (symmetric) dependence measures. On top, qad will provide more detailed insights into the structure of communities and functional linkages between organisms or individuals and may thus assist network inference. The limitations of standard methods (e.g. Spearman's correlation coefficient) in network inference have been recently pointed out (Coenen & Weitz, 2018) and directed and asymmetric approaches have been demanded (Amblard & Michel, 2011; Carr et al., 2019; Karmon & Pilpel, 2016).
Here, we present the method qad, a nonparametric and directed, hence asymmetric, measure of dependence, which is publicly available in the free software environment R (Griessenberger et al., 2021; Junker et al., 2021). qad returns estimates for the measures $q\left(X,Y\right)$ denoting the directed dependence of $Y$ on $X$ (or, equivalently, the influence of $X$ on $Y$), $q\left(Y,X\right)$ the directed dependence of $X$ on $Y$ and $a\left(X,Y\right)\u2254q\left(X,Y\right)q\left(Y,X\right)$ the asymmetry in dependence. The measure $a\left(X,Y\right)$ for asymmetry in dependence can be interpreted as the difference of the predictability of $Y$ given knowledge on $X$ and the predictability of $X$ given knowledge on $Y$. In this paper, we first describe the methodology of qad and demonstrate the application of the Rpackage qad. Furthermore, we compare the empirical performance of qad with existing publicly available dependence measures and highlight the information gain by considering asymmetry and direction in dependence. A complementary Rshiny app is available as Supporting Information (https://rqad.shinyapps.io/quantification_of_dependence/) facilitating the interpretation and comparison of the results and performance returned by qad and other dependence measures. An application of qad to real world data concludes the paper. We hope that this introduction to qad and the executed comparative analyses as well as the resources provided will be helpful for ecologists and researchers from other disciplines.
2 BRIEF METHODOLOGICAL DESCRIPTION OF THE COPULABASED DEPENDENCE MEASURE qad
Commonly used approaches to quantify the strength of associations between two variables such as correlation or regression capture only a fraction of the information that is contained in the data. In contrast, copulas contain full information about associations and are therefore frequently applied on finances and other disciplines (Ghosh et al., 2020). In fact, in the bivariate case, copulas are twodimensional distribution functions restricted to the unit square with uniformly distributed univariate marginals. The theorem of Sklar (see Nelsen (2007)) allows to split the joint distribution function $H$ of the random vector $\left(X,Y\right)$ into the dependence structure $C$ and the marginal distributions $F$ and $G$, that is, $H\left(x,y\right)=C\left(F\left(x\right),G\left(y\right)\right)$ for every $\left(x,y\right)\in {\mathbb{R}}^{2}$. The aforementioned dependence structure $C$ is exactly the copula. Since copulas are scaleinvariant (see again Nelsen (2007)), it is natural to study scalefree dependence measures on a copula basis. For more background on copulas and their application in dependence modelling, we refer to the books of Nelsen (2007) and Durante and Sempi (2015). The copulabased dependence measure qad, originally introduced as ${\mathrm{\zeta}}_{1}$ in Trutschnig (2011), is defined as a type of distance between the conditional distribution functions of the copula $C$ underlying the random vector $\left(X,Y\right)$ and the uniform distribution representing independence of X and Y. In other words, qad measures how much the dependence structure of $\left(X,Y\right)$ differs from independence. Contrary to many other approaches, qad is able to detect both complete dependence (i.e. Y is a function of X) as well as independence. The method works as follows: Given a twodimensional sample $\left({x}_{1},{y}_{1}\right),\dots ,\left({x}_{n},{y}_{n}\right)$ of size $n$ from the random vector $\left(X,Y\right)$ (see Figure 2a), the normalized ranks of the sample are calculated first (i.e. we get values of the form $\left(i/n,j/n\right)$ for $i,j\in (1,\dots ,n$)). Then the socalled empirical copula ${\widehat{E}}_{n}$ is computed (see Figure 2b). As next step, the empirical copula is aggregated to the empirical checkerboard copula (twodimensional histogram in the copula setting). In fact, the masses of the small squares (empirical copula) are summed up to the larger $N\times N$ squares, whereby the resolution $N$ depends on the sample size $n$ (see Figure 2c,d). Note that by default the resolution of the empirical checkerboard copula is proportional to the square root of the sample size; thus, as for any statistical method, qad results become more reliable as the sample size increases. We recommend a sample size of no smaller than $n=16$, resulting in a resolution of $N=4$. Finally, the conditional distribution functions of the checkerboard copula are compared with the distribution function of the uniform distribution on the unit interval (in the sense that the area between the graphs is calculated). This step is conducted both for the vertical strips (to calculate the influence of $X$ on $Y$) and the horizontal strips, see, for instance, Figure 2e,f. Computing the sum of all areas and normalizing appropriately with the constant 3 (see Junker et al. (2021)) yields the two directed qadvalues $q\left(X,Y\right)\in \left[0,1\right]$, quantifying the influence of $X$ on $Y$ and $q\left(Y,X\right)\in \left[0,1\right]$, denoting the influence of $Y$ on $X$. High values indicate strong associations, whereas low values describe weakly dependent random variables. Note that for dependence measures which are strictly positive (e.g. qad), deviation from 0 in the case of independence is to be expected. As example, a value of $q\left(X,Y\right)=0.2$ is common for independent random variables X and Y. Thus, the value of $q\left(X,Y\right)$ alone is clearly insufficient for deciding if, or if not, the sample is likely to come from independent random variables. Therefore, overcoming this problem, a permutation test is implemented in the R package qad to obtain a pvalue for $q\left(X,Y\right)$ and $q\left(Y,X\right)$ in testing for independence, that is, testing the hypothesis ${H}_{0}:q\left(X,Y\right)=0=q\left(Y,X\right)$. Therefore, nonsignificant qad values (pvalue >0.05) indicate no dependence. This allows to interpret the obtained values and puts them into perspective.
Furthermore, if we have $q\left(X,Y\right)>q\left(Y,X\right)$, then the qad estimator informs us that the variable $X$ provides more information about $Y$ than vice versa. The same holds for the reverse direction. This information is also gathered in the measure for asymmetry, which is computed as $a\left(X,Y\right)\u2254q\left(X,Y\right)q\left(Y,X\right)$ and can therefore attain values within the interval $\left(1,1\right)$. Additionally, as a rankbased quantity qad is robust to outliers and invariant with respect to monotone transformations, for instance, logtransformations.
3 APPLICATION OF THE R PACKAGE qad
The package qad is implemented in the software R (R Development Core Team, 2020) and is publicly available on CRAN (https://cran.rproject.org/web/packages/qad/index.html). The development version of qad is accessible via GitHub (https://github.com/griefl/qad). In the following, we briefly sketch the main functions of the package. Additionally, each function contains examples in the description, which are called via the Rhelp function (e.g. ?qad). The following code snippets, which are applied on the data depicted in Figure 1d, sketch the application of qad.
3.1 Calculating the directed dependence measure q
Given bivariate observations $\left({x}_{1},{y}_{1}\right),\dots ,\left({x}_{n},{y}_{n}\right)$ of size $n$ the function qad(…) computes the dependence values $q\left(X,Y\right),q\left(Y,X\right)$, the maximum dependence (i. e. max(c(q(X,Y), q(Y,X)))), and the asymmetry in dependence $a\left(X,Y\right)$. The implemented method qad(…) requires two numeric vectors containing the observations of the sample, or, alternatively, accepts a numeric data frame of the form data.frame(sample_X, sample_Y). The optional argument p.value (default is TRUE) allows to calculate pvalues (based on permutations with nperm runs) for q(X,Y) and q(Y,X). A pvalue below 0.05 strengthens the hypothesis that X and Y are not independent. The output of qad shows the dependence values and their respective pvalues as well as further descriptive statistics, for example, sample size and the number of unique ranks, which are essential in calculating the resolution of the underlying empirical checkerboard copula. The checkerboard resolution is adjustable through the parameter resolution, however, since the output strongly depends on the resolution, we highly recommend to use the default setting (resolution = NULL), which uses the optimal choice (optimal in the sense that the estimator performs well independent of the underlying dependence structure; Junker et al., 2021).
Furthermore, the function qad returns an object of class ‘qad’, that allows the application of the generic functions coef(), summary() and plot(). The plot function generates a twodimensional histogram (heatmap) visualizing the empirical checkerboard copula. The colour of each square corresponds to the density of the normalized ranks (the socalled pseudoobservations). The checkerboard plot helps to understand the type of the dependence structure underlying the variables $X$ and $Y$. Setting the optional parameter copula to FALSE yields a twodimensional histogram of the unscaled (raw) data. In our example, we obtain significant qvalues ($q\left(x1,x2\right)=0.478,p<0.001$ and $q\left(x2,x1\right)=0.320,p<0.01$), which indicate evidently an asymmetric setting ($a=0.157$). The additional plots underline the findings and insinuate a slightly inverted Ushaped pattern.
3.2 Using qad as a prediction tool
As a byproduct of the checkerboard approach, the random variable Y given 𝑋=𝑥 and 𝑋 given 𝑌=𝑦 can be predicted for every 𝑥∈𝑅𝑎𝑛𝑔𝑒(𝑋) and 𝑦∈𝑅𝑎𝑛𝑔𝑒(𝑌). This additional feature is implemented in the Rfunction predict.qad(…). Note that prediction is possible only within the range of measured X and Y values; since qad is calculated independently of a parametric regression function, no extrapolation is possible. In contrast to regression methods and many machine learning algorithms, qad does not return point estimates, but probabilities that values of Y fall in a given range given X (or vice versa). The function predict.qad(…) requires three arguments: a ‘qad’ object, the conditioning variable and a vector of xvalues. Then the function returns the probabilities of the event that Y falls into the interval ${I}_{j}$ given X = x, or vice versa. Thereby the intervals ${I}_{j}$ are calculated as the retransformed intervals defining the checkerboard grid, that is, for every $j\in \left(1,\dots ,N\right)$ the interval ${I}_{j}$ is defined as ${I}_{j}\u2254\left[{{G}_{n}}^{}\left(\frac{j1}{N}\right),{{G}_{n}}^{}\left(\frac{j}{N}\right)\right]$, whereby ${{G}_{n}}^{}$ denotes the empirical quantile function of $Y$ and $N$ denotes the resolution of the checkerboard copula. Via several optional parameters, the size and numbers of the prediction intervals as well as visualizations may be adjusted as desired. Exemplarily, we compared the plant diversity within the glacier forefield for two different deglaciation years. The returned plot highlights the corresponding years with red rectangles. As a result, for areas with a deglaciation year around 1920, the Shannon diversity of plants is very unlikely to be below 1.48, whereby for areas with a deglaciation year around 2000 the probability is obviously higher (probability of 0.357).
3.3 Multivariate application of qad
Given a multivariate distribution with more than two variables, the function pairwise.qad(…) can be applied to quantify all pairwise dependencies and allows an interpretation similar to that of a correlation matrix. The method pairwise.qad(…) requires an $n\times d$dimensional numeric matrix, or alternatively, a data.frame of the form data.frame(sample_X1, sample_X2, …, sample_Xd), describing the observations of a ddimensional random vector. Note, that pvalue correction should be applied in multiple testing. To this end, the parameter p.adjust.method in the function pairwise.qad(…) can be used to select a suitable correction method. Among other details, the main output of pairwise.qad(…) is a data.frame containing all pairwise dependencies and corresponding (adjusted) pvalues, which may be readily visualized by heatmap.qad(…). Optional parameters allow to select between the directed dependence measures or the asymmetry values and to highlight all significant pairs.

#simulate a four‐dimensional sample of size 100

x1 <‐ runif(100); x2 <‐ x1^2 + rnorm(100, 0, 0.1);

x3 <‐ runif(100); x4 <‐ x3 ‐ x2

#calculate all pairwise qad‐values

fit <‐ pairwise.qad(cbind(x1, x2, x3, x4), p.value = TRUE, p.adjust.method = "fdr")

#visualize the pairwise qad values and highlight significant pairs

heatmap.qad(fit, select = "dependence", significance = TRUE)
Each of the functions provide several parameters that enables specific adjustments and modifications. For this purpose, we refer to the Rdocumentation (Griessenberger et al., 2021) or the vignette available, for example, using the following lines of code:

#vignette qad‐package (available for qad‐version >= 1.0.1)

browseVignettes("qad")
4 PERFORMANCE AND COMPARISON OF qad WITH OTHER DEPENDENCE MEASURES
The main features of qad compared with seven other well established and in R available dependence measures are summarized in Table 1 and also discussed in Supplementary Information 3. For each measure, we provide information on whether it allows for linear, monotonic or general dependence estimation, whether it is scaleinvariant, whether the estimator returns a value in [0,1] and whether it captures asymmetry in dependence. Dependence measures that capture the dependence in nonlinear situations should assign similar scores of dependence to equally noisy data in a manner independent of the concrete functional relationship (Reshef et al., 2011). Accordingly, the measure qad decreases with increasing noise irrespective of the functional relationship between X and Y (see Figure 3a,b,d–f). Note that qad returned dependence values slightly smaller than 1 in functional settings without noise (see Figure 3a,b,d–f), which is directly caused by the checkerboard binning. It is guaranteed, however, that asymptotically qad attains the maximum value 1 in these settings. Therefore, a direct comparison of two qad values has to be done always on consideration of the sample size. Unlike commonly used measures of association like Pearson's r, Spearman's rho or more recent measures such as distance correlation and MIC, which are symmetric measures by construction, qad (as well as xicor) indicated asymmetry in dependence in settings in which (on average) more information on Y could be obtained by knowing the value of X than vice versa, that is, $q\left(X,Y\right)>q\left(Y,X\right)$ (Figure 3b,d–f,i). Further details on Figure 3 are discussed in Supplementary Information 3.
Rfunction  Detects the following relationships as nonindependent  Scale invariance  Estimator in $\left[0,1\right]$  Asymmetry/Directional  

Linear  Monotonic  Nonmonotonic  
dCor  energy::dcor()  ✓  ✓  ✓  ✗  ✓  ✗ 
MICe  minerva::MIC(,est = “mic_e”)  ✓  ✓  ✓  ✓  ✓  ✗ 
Pearson's г  cor()  ✓  ✗  ✗  ✗  ✓^{a}  ✗ 
qad  qad::qad(…)  ✓  ✓  ✓  ✓  ✓  ✓ 
RCD  rcd::rcd(…)  ✓  ✓  ✓  ✓  ✓  ✗ 
rdc  5 lines Rcode (see LopezPaz et al., 2013)  ✓  ✓  ✓  ✓  ✓  ✗ 
Spearman's ρ  cor(…,method = “spearman”)  ✓  ✓  ✗  ✓  ✓^{a}  ✗ 
xicor  XICOR::xicor()  ✓  ✓  ✓  ✓  ✗  ✓ 
 a If absolute values are considered which is (in this case) essential to assure comparability with the other measures.
In further empirical studies, qad ranked high in both runtime analysis and power analysis compared to all other studied dependence measures. Figure 4 depicts, exemplarily, the estimated power in a linear and two nonlinear settings with noise. Obviously, qad (as well as other nonlinear measures of dependence) outperformed Pearson and Spearman correlation in nonmonotonic settings (which might completely fail to detect any deviation from independence). Further details on the results shown in 4, a runtime evaluation of the different methods, and discussions on the power analysis can be found in Supplementary Information 3. Additionally, to facilitate the applicability and interpretation of the dependence measures, we provide an Rscript as well as an Rshiny app allowing the user to evaluate the effects of sample size, noise and dependence structure on the results obtained by each of the eight dependence measures (see Supplementary Information 2: dep_measures.R and app.R and the online resource, available on https://rqad.shinyapps.io/quantification_of_dependence/).
5 APPLYING qad ON ECOLOGICAL DATA
We tested the qadpackage on a dataset of microbiota and additional environmental metadata publicly available at http://oceanmicrobiome.embl.de/companion.html (Albanese et al., 2018; de Vargas et al., 2015; Sunagawa et al., 2015; Villar et al., 2015). More precisely, we used the aggregated version of the annotated 16S _{mi}tags OTU count table, available in the additional materials of Albanese et al. (2018) and conducted a similar analysis. We computed all pairwise qvalues across the relative abundances of genera with less than 10% ties and the environmental variables (mean depth, mean salinity, mean temperature and mean oxygen level), resulting in 94 variables and $n=115$ samples and compared the qad results with the values of Pearson's and Spearman's correlation coefficient, and illustrated the information gain provided by qad over the classical symmetric methods by the number of detected relationships and some specific examples. Since directly comparing dependence values of different measures is not reasonable, we considered the significant relationships detected by each of the measures. As usual, we used $\alpha =0.05$ and considered the false discovery rate as procedure for the multiple testing correction.
Overall, the measure qad returned 2907 significant relationships, whereas Spearman's $\rho $ (2564) and Pearson's $r$ (1729) found substantially fewer significant pairs. Furthermore, the classical measures $r$ and $\rho $ assigned relatively low dependence scores to many relationships that were highly ranked by the measure qad (see Figure 5a,f). This again results from the fact that the classical measures fail to detect many nonlinear and nonmonotonic dependence structures. We depicted several pairs of variables attaining a high qad value but at the same time a low Pearson and Spearman correlations to demonstrate the major differences in the information gain between symmetric and an asymmetric measure of dependence. For instance, qad detected a significant asymmetric dependence between the variable MethylophilaceaeOM43 clade (variable X) and a Sphingomonas strain (variable Y), whereas Pearson's correlation returned a nonsignificant dependence. The scatterplot depicted in Figure 5d reveals an inverted Ushaped pattern of the data points, that is, knowing the relative abundance of MethylophilaceaeOM43 clade is more informative for the prediction of the Sphingomonas strain than vice versa. Moreover, qad also picked up a highly asymmetric dependence structure between AlteromonadaceaeSAR92 clade and a Marinoscillum strain. The detected dependence structure can be revealed by a logtransformed scatterplot (Figure 5e). Note that qad is scaleinvariant and hence invariant with respect to logtransformation of samples. We obtained similar results, for example, for the variables MethylophilaceaeOM43 clade and AlcaligenaceaeMWHUniP1, see Figure 5i, and the variables MethylophilaceaeOM43 clade and mean temperature in °C, depicted in Figure 5j. Additionally, Pearson's r reacts very sensitive to outliers (see, for instance, Figure 5c), which explains that there are several highly ranked relationships found by Pearson's correlation but ignored by qad or Spearman's correlation.
6 CONCLUSION
Our theoretical and realworld examples demonstrate that the measure qad is able to quantify and indicate the extent of dependence also in nonlinear settings, whereas classical measures only capture linear and monotonic associations. In most realworld situations no, or almost no, prior knowledge about the interdependence of variables is available. Aiming at an objective estimate of the strength of dependence, it is therefore unavoidable to work with measures not relying on distributional assumptions. Considering nonmonotonic and nonfunctional relationships naturally expands our ability to detect more complex, and potentially asymmetric relationships between organisms and their environment. We demonstrated that neither of the methods discussed here outperforms all other methods in full generality, every statistical tool exhibits limitations in specific settings. If it is known in advance that the data originate from a linear or a monotonic setting, we recommend classical measures of association such as Pearson's $r$, Spearman's $\rho $ or dCor. These measures are well established and show greater power in these settings than other methods. In most situations, however, wrongly imposing linearity/monotonicity without prior knowledge may lead to wrong conclusions. We therefore recommend the use of qad for quantifying pairwise dependencies in the general case. We showed that qad is powerful in detecting dependence and provides reliable and easily interpretable results.
Another important property of bivariate associations is asymmetry and direction in dependence in the sense that predictability of quantity Y given knowledge of quantity X is not the same as vice versa. Considering direction and asymmetry in dependence facilitates the detection and extraction of patterns from ecological datasets and the testing of refined hypotheses. For instance, correlation analysis testing for relationships between the abundance of pairs of taxa is usually performed as basis for network inference, which, in turn, facilitates the interpretation of, for example, microbiome structure. Ecological relationships between organisms may be reciprocal in the sense that taxa mutually affect each other, either positively (mutualism) or negatively (competition). They may, however, also be directed in such a way that a given taxon is facilitating or inhibiting the growth of another taxon without being affected itself by the other taxon (e.g. commensalism, amensalism). As shown before, conventional correlation analysis neither detects directed relationships nor discriminates between directed and mutual relationships and is therefore of limited value for the interpretation of community dynamics. We are aware of only two methods that are able to quantify directed dependence, namely qad (Junker et al., 2021) and xicor (Chatterjee, 2021). We have shown that qad has a higher overall power in detecting deviation from independence, especially in very noisy datasets qad performs better than xicor. The power deficiency of xicor is also discussed in Shi et al. (2022). Furthermore, the implemented estimator in qad always attains positive values, whereas xicor can attain negative values, which is hard to interpret. In very large datasets, however, xicor is more efficient with respect to runtime due to the fact that it uses a pvalue based on asymptotic theory, whereas qad runs a permutation test.
An additional feature of the Rpackage qad is that it provides userfriendly outputs and a number of additional features that facilitate the interpretation of the results as well as functions to use qad as a prediction tool.
We conclude that the interpretation of ecological data may be strongly biased by the choice of statistical approaches quantifying dependence between two random variables. The acknowledgement and adequate handling of asymmetry, a universal property of bivariate associations, is an important step towards additional information gain and the avoidance of model bias for small, medium and large datasets, and will advance and allow for a deeper understanding of ecological systems.
AUTHOR CONTRIBUTIONS
Florian Griessenberger, Robert R. Junker and Wolfgang Trutschnig designed the study; Florian Griessenberger analysed the data; Florian Griessenberger, Robert R. Junker and Wolfgang Trutschnig wrote the manuscript.
ACKNOWLEDGEMENTS
This study was funded by the Austrian Science Fund (FWF, Y 1102 B29) granted to RRJ. Moreover, the first and the second authors gratefully acknowledge the support of the WISS 2025 project ‘IDAlab Salzburg’ (20204WISS/225/1972019 and 20102F1901166KZP). Open Access funding enabled and organized by Projekt DEAL. [Corrections added on 4 July 2023, after first online publication: Projekt DEAL funding statement has been added.]
CONFLICT OF INTEREST
The authors declare no conflict of interest.
Open Research
PEER REVIEW
The peer review history for this article is available at https://publons.com/publon/10.1111/2041210X.13951.
DATA AVAILABILITY STATEMENT
All data and supplementary code used in the study can be found at other sources (mentioned at the corresponding paragraphs). The qad package is available for the R programming language and can be downloaded at https://cran.rproject.org/web/packages/qad/index.html. This paper describes the latest CRANversion of qad (v.1.0.2). To instal the package, run instal.packages(‘qad’). The development version of qad is available on GitHub (https://github.com/griefl/qad) and can be installed by running devtools::instal_github(“griefl/qad”, dependencies = TRUE, build_vignettes = TRUE). Code stored at github.com is also archived on Zenodo (Griessenberger et al., 2022, qad v1.0.2 (v1.0.2). Zenodo. https://doi.org/10.5281/zenodo.6816606). Code presented in Supplementary Information 1 and 2 can be found at Mendeley Data (Junker et al., 2022, ‘code: qad: An Rpackage to detect asymmetric and directed dependence in bivariate samples’ Mendeley Data V2 https://doi.org/10.17632/wx5ydxhsry.1). An Rshiny application demonstrating the empirical behaviour of various dependence measures is available on https://rqad.shinyapps.io/quantification_of_dependence/.