sensiPhy: An r‐package for sensitivity analysis in phylogenetic comparative methods
Abstract
- Biological conclusions drawn from phylogenetic comparative methods can be sensitive to uncertainty in species sampling, phylogeny and data. To be confident about our conclusions, we need to quantify their robustness to such uncertainty.
- We present sensiPhy, an r‐package, to easily and rapidly perform sensitivity analysis for phylogenetic comparative methods. sensiPhy allows researchers to evaluate the sampling effort, detect influential species and clades, assess phylogenetic uncertainty and quantify the effects of intraspecific variation, for phylogenetic regression and for metrics of phylogenetic signal, diversification and trait evolution.
- Uniquely, sensiPhy allows users to simultaneously quantify the effects of different types of uncertainty and potential interactions among them.
- Using real data, we show how conclusions from comparative methods can be affected by uncertainty and how sensiPhy can help determine if a conclusion is robust.
- By providing a single, intuitive and user‐friendly resource that can evaluate various sources of uncertainty, sensiPhy aims to encourage researchers, and particularly less‐experienced users, to incorporate sensitivity analyses in their phylogenetic comparative analyses.
1 INTRODUCTION
Over the last few decades, phylogenetic comparative methods have become a central approach in ecology and evolutionary biology, boosted by the expansion of comparative methods available in r (Garamszegi, 2014; Paradis, 2012). Like all statistical models, phylogenetic comparative methods are subject to several types of uncertainty, which can affect conclusions we draw from these analyses (Donoghue & Ackerly, 1996; Felsenstein, 2008; Huelsenbeck, Rannala, & Masly, 2000). Yet, the sensitivity of (biological) conclusions to uncertainty is seldom considered (Cooper, Thomas, & FitzJohn, 2016). This can cause researchers to overestimate the reliability of their findings, for instance by estimating too narrow confidence intervals or by providing biased parameter estimates (Rangel et al., 2015; Silvestro, Kostikova, Litsios, Pearman, & Salamin, 2015).
Three main sources of uncertainty can affect comparative methods (Figure 1). (1) Species sampling uncertainty encompasses uncertainty in parameter estimates resulting from (arbitrary) variation in the species set included. (2) Phylogenetic uncertainty encompasses uncertainty in phylogenies used in comparative analyses. (3) Data uncertainty includes both within‐species variation in trait values as well as measurement error that might occur when determining trait values. Sensitivity analysis is a powerful approach to evaluate if conclusions are influenced by these uncertainties in comparative biology (Cooper et al., 2016; Cornwell & Nakagawa, 2017; Donoghue & Ackerly, 1996). Here, we present sensiPhy, an r‐package, to perform sensitivity analysis for the most frequently used phylogenetic comparative methods. Our main goal is to make it easier for less‐experienced users to implement the best practices when running comparative analyses. To our knowledge, this is the first effort to combine in a single resource functions to account for three types of uncertainty in commonly used comparative methods.

2 THE sensiPhy PACKAGE
sensiPhy is written in the r‐language (R Core Team, 2017) and is available on the CRAN repository. The package provides an umbrella of statistical and graphical methods to estimate and report sensitivity to uncertainty in phylogenetic comparative analysis (PGLS, phylogenetic signal, diversification and trait evolution). We leverage methods implemented in the r‐packages phylolm, phytools and geiger (Harmon, Weir, Brock, Glor, & Challenger, 2008; Ho & Ané, 2014; Revell, 2012) and implement functions to perform sensitivity analysis for phylogenetic generalized least squares models (PGLS; both using linear and logistic regression models), for estimates of phylogenetic signal in trait data (Blomberg, Garland, & Ives, 2003; Pagel, 1999), for macroevolutionary models (both continuous and discrete, binary, traits) and estimates of diversification rates (Harmon et al., 2008; Magallon & Sanderson, 2001). For each type of sensitivity analysis, a specific set of diagnostics graphics and summary statistics are provided (Figure 1). In all PGLS functions, the evolutionary model to use can be specified (e.g. Brownian Motion and Ornstein‐Uhlenbeck; Ho & Ané, 2014), allowing the user to analyse the fit of different models and select the most appropriate one (Cornwell & Nakagawa 2017; Garamszegi, 2014; Pennell, FitzJohn, Cornwell, & Harmon, 2015). Scientists can use sensiPhy to analyse results originally obtained from other software (e.g. PGLS with caper or gls) when available analysis use the same macroevolutionary models implemented in phylolm, phytools and geiger (e.g. Brownian Motion, OU, lambda; see package vignette for examples and details).
3 SOURCES OF UNCERTAINTY
We briefly highlight the three main sources of uncertainty, indicating how they can affect conclusions, and then provide two examples on how researchers can use sensiPhy. A full tutorial, highlighting examples for all sources of uncertainty and implemented functions, can be found in the package vignette and on Github (https://github.com/paternogbc/sensiPhy/wiki) .
3.1 Species sampling uncertainty
Some species, or clades of species, are particularly important drivers of parameter estimates. However, often the set of species sampled in a comparative analysis is determined by considerations that are arbitrary from an evolutionary perspective, like the presence in a trait database or easy access in the field. Also, conclusions can be sensitive to the number of species being studied, or the sampling effort. Moreover, particular species or clades can represent influential cases and can drive key results because they show a pattern that is different in strength or direction than the general pattern. Since in all of these cases, the source of uncertainty is driven by the set of species considered, we group all these issues under the name of species sampling uncertainty.
The samp‐functions (samp_phylm, samp_phyglm, samp_physig, samp_continuous and samp_discrete; Figure 1) uses a jackknifing method to test if models are robust to variation in the set of species and sample size (Efron, 1982; Werner, Cornwell, Sprent, Kattge, & Kiers, 2014). The function fits PGLS regressions, tests for phylogenetic signal or calculates metrics for trait evolution after iteratively removing user‐defined fractions of species at random and compares simulations with the model using the full dataset.
The influ‐functions (Figure 1) perform leave‐one‐out‐deletion analysis to test if specific species are strongly driving the results. For all species, these functions fit a new model without a given species (reduced data) and compare the estimated parameters using the full dataset. This analysis can reveal influential cases (species driving relatively large changes in parameter estimates) and test model stability across samples. The clade‐functions (Figure 1) extend the same leave‐one‐out approach to detect influential clades (or more generally, groupings of species). The functions remove all species belonging to a clade and compare the reduced and the full datasets using a randomization test to correct for the number of species removed.
Three simple measures are used to estimate sensitivity in model parameters.
- the raw difference:
where bi is the estimated parameter for the reduced dataset and b0 is the estimated parameter for the full dataset;
(1) - the standardized difference:
where SDdbi is the standard deviation of dbi, thus Sdbi is a simple z‐score of dbi; and;
(2) - the percentage of change:
where |dbi| is the absolute raw difference (Equation 1). While these functions provide useful estimates of how subsets of the dataset change key results, they do not account for potential structural biases in the available data (e.g. bias in missing data). For instance, a common problem in comparative analyses occurs when data is missing non‐randomly with respect to the phylogeny. To help detect this problem, we provide a supplementary function (miss.phylo.d), which detects phylogenetic signal in missing data (D‐statistics; Fritz & Purvis, 2010; Orme, Freckleton, Thomas, Petzoldt, & Fritz, 2013).
(3)
3.2 Phylogenetic uncertainty
Phylogenetic uncertainty refers to the notion that there are usually a number of alternative phylogenetic hypotheses with different topologies and/or branch lengths. Yet, comparative studies often analyse a single tree which is thought of as the “best” estimate out of a family of candidate phylogenies, without accounting for phylogenetic uncertainty, potentially biasing statistical inference (Donoghue & Ackerly, 1996; Hernandez et al., 2013; Rangel et al., 2015). A simple way to account for phylogenetic uncertainty in comparative methods is to repeat the analysis using a sample of relevant phylogenetic trees (Donoghue & Ackerly, 1996). The influence of phylogenetic uncertainty can be quantified by the amount of variation in model parameters between competing models fitted with alternative trees (Hernandez et al., 2013; Martinez et al., 2015). The tree‐functions (Figure 1) account for multiple phylogenetic hypotheses, by rerunning the models over a multiPhylo object containing different candidate phylogenies and comparing parameter estimates across these reruns.
3.3 Data uncertainty
Intraspecific variation due to differences between individuals or to measurement errors is an important source of uncertainty and can influence both parameter estimation and hypothesis testing (Felsenstein, 2008; Garamszegi & Møller, 2010; Silvestro et al., 2015). One way to account for intraspecific variation is by simulating trait values for each species derived from the intraspecific standard deviation of the mean, which users can calculate from their own data if they have multiple measurements per tip (Martinez et al., 2015). Rather than assuming a single trait value per species, this approach tests the sensitivity of comparative models to variation in the underlying trait data, accounting for the confidence range around the estimate (Garamszegi, 2014). The intra‐functions (Figure 1) account for such uncertainties both in response and explanatory variables. While the statistical distribution of such intraspecific variation may not always be known, the functions implement two potential trait distributions (normal and uniform).
3.4 Interactions among uncertainty types
Most users of phylogenetic comparative methods will face multiple sources of uncertainty simultaneously (Cooper et al., 2016; Cornwell & Nakagawa, 2017). Different types of uncertainty can interact, potentially further reducing the robustness of a result. Yet, the interaction between types of uncertainty is rarely studied (but see: Martinez et al., 2015), even in cases where sensitivity to single uncertainties is quantified (Werner et al., 2014), potentially because of a lack of available tools. We implemented functions to study interactions of both phylogenetic uncertainty (tree‐functions) and data uncertainty (intra‐functions) with sampling uncertainty (clade‐, influ‐ and samp‐functions), as well as interactions between data and phylogenetic uncertainty.
3.5 Example 1: Influential clades
We included two datasets in sensiPhy: “primates” (Jones et al., 2009) and “alien” (González‐Suárez, Bacher, & Jeschke, 2015). Each dataset contains a multiPhylo file with 101 phylogenetic trees originated from pseudo‐posterior distribution and pruned to match species in data (Fritz, Bininda‐Emonds, & Purvis, 2009; Kuhn, Mooers, & Thomas, 2011). As an example, we use the “primates” dataset to investigate how the deletion of entire clades (families) can influence model parameters for a PGLS linear regression between sexual maturity (days) and adult body mass (g).
The function clade_phylm reruns the phylogenetic regression between sexual maturity and body mass, iteratively leaving out individual families. This is defined by the argument “clade.col” which indicates the grouping variable defining which species to include. Typically, these will be taxonomically defined, but other groupings can be used, for instance based on geographical locations, sampling methods or data sources. The function sensi_plot can be used to visualize the results (Figure 2) while summary shows the effect of each clade on model parameters (Table 1; complete output in Supplementary Material).

| Clade removed | Estimate | DIFestimate | Change (%) | pval | pval.randomization |
|---|---|---|---|---|---|
| Cercopithecidae | 0.308 | 0.057 | 22.8 | 5.7E‐11 | .168 |
| Cebidae | 0.220 | −0.031 | 12.2 | 7.3E‐07 | .006 |
| Callitrichidae | 0.226 | −0.024 | 9.8 | 5.3E‐08 | .004 |
| Lemuridae | 0.258 | 0.008 | 3.1 | 1.3E‐09 | .430 |
The analysis reveals that without species from the Cercopithecidae the regression slope is 22.8% higher than the full dataset model (Table 1; Figure 2a), indicating that this family has a major negative influence on the relationship between sexual maturity and mass. Removal of Cebidae species had a smaller and inverse effect (Table 1; Figure 2b) while Lemuridae species had only a minor effect on model parameters (Table 1).
However, Cercopithecidae contains substantially more species (N = 32) than Cebidae (N = 19). We would therefore expect Cercopithecidae to have a larger effect on parameter estimates, by virtue of it containing a larger proportion of the species analysed. To correct for clade size, a randomization test analyses if the change in parameter estimate is significantly different from a null distribution when randomly removing the same number of species as the focal clade. The randomization test shows that in fact the Cercopithecidae are an influential clade only because they contain a large number of species, not because the biological pattern is substantially different (p = .168, Table 1, Figure 2a,b). This is different for the Cebidae (and the Callitrichidae), which strongly influence our parameter estimates even when correcting for clade size, indicating a substantially different pattern (p = .006, Table 1, Figure 2c,d). The exclusion of the Lemuridae continues to have no effect, both in absolute terms and when correcting for clade size (Table 1).
3.6 Example 2: Interaction among influential clades and phylogenetic uncertainty
In the first example, we considered only a single primate phylogeny. However, a range of alternative phylogenetic hypotheses is available for this group (Fritz et al., 2009; Kuhn et al., 2011). We can use the function tree_clade_phylm to evaluate potential interactions among these two uncertainty types.
This function reruns Example 1 across multiple trees to test if the effect of clade removal on model parameters interacts with phylogenetic uncertainty. The number of trees evaluated is set with the argument “n.trees”.
This analysis reveals that clade effects on estimates remained the same after taking into account multiple phylogenetic trees (Figure 3, Supplementary Table 1). For instance, the removal of the Cercopithecidae family continues to cause a strong increase in slope (Figure 3a). Furthermore, the effect of Cebidae (and Callitrichidae) on parameter estimates is significantly different from the null expectation across all alternative phylogenies tested (few blue dots below the red line in Figure 3b), while the effect of Cercopithecidae and Lemuridae falls within the null distribution (Figure 3c,d). Therefore, this analysis confirms the robustness of previous results, suggesting there is no interaction among sampling and phylogenetic uncertainty.

3.7 Implications and solutions of a sensitive result
Sensitivity analyses from sensiPhy can be a starting point for further analyses (Table 2). Considering our examples, a first step could be to verify if the Cebidae data are somehow biased, resulting in a substantially different pattern. For instance, perhaps a different method to estimate sexual maturity was used than in the other primates, which may have overestimated age of sexual maturity in this clade. Alternatively, there could be biological reasons why the Cebidae show a stronger correlation among traits, which could provide interesting biological insight. New biological hypotheses could in turn be tested using comparative analyses. For instance, if an interaction with climate might drive the differential effects of body mass on sexual maturity in the Cebidae and the Callitrichidae, an expanded comparative analysis could test that hypothesis.
| Biological question | sensiPhy method | Implications/potential solutions |
|---|---|---|
| Do influential species or clades drive result? | clade or influ |
|
| Does sampling effort influence results? | samp |
|
| Does intraspecific variation influence results? | intra |
|
| Does phylogenetic uncertainty influence results? | tree |
|
We highlight that a sensiPhy analysis cannot directly reveal the underlying reason why a biological effect is not robust to a given type of uncertainty. This can be for various methodological reasons or reflect an actual biological effect. While the implications of finding that a biological conclusion is sensitive to some, or multiple, forms of uncertainty will be highly context and model‐system specific, we provide general pointers and solutions that users can explore (Table 2).
4 CONCLUSIONS AND FUTURE DIRECTIONS
The sensiPhy package offers a quick and easy approach to check the robustness of frequently used comparative methods to multiple types of uncertainties. Performing sensitivity analysis can greatly benefit authors by providing ways to estimate and account for uncertainties and to detect and report possible bias in inference. The package helps researchers to be extra careful with their results in an easy and straightforward way, increasing transparency in reporting results from comparative analyses. We hope sensiPhy will encourage the inclusion of sensitivity analysis as a common practice in comparative biology. The statistical reasoning implemented in sensiPhy can be applied more generally to many other types of analyses. The package is open‐platform and welcomes users to contribute with new functionalities, facilitating new developments for sensitivity analysis in phylogenetic comparative methods through the Github platform.
ACKNOWLEDGEMENTS
We are grateful to Carlos Fonseca, László Garamszegi, Marcelo Salas and Pablo Martinez for their suggestions and insightful comments on sensiPhy and on earlier versions of this manuscript. We also thank the editor and three reviewers for their many helpful comments. G.B.P. is supported by a CAPES Doctoral Scholarship (Brazil). G.D.A.W. is supported by a Newton International Fellowship (Royal Society).
AUTHORS’ CONTRIBUTIONS
G.B.P., C.P. and G.D.A.W. conceived the ideas, developed the statistical reasoning, wrote the code and the manuscript. All authors contributed equally to this work and gave final approval for publication.
DATA ACCESSIBILITY
All data and code used in this manuscript are available on Github (https://github.com/paternogbc/sensiPhy) and deposited at Zenodo (https://doi.org/10.5281/zenodo.1179248).
REFERENCES
Citing Literature
Number of times cited according to CrossRef: 16
- Wumei Xu, Ping Xiang, Xue Liu, Lena Q. Ma, Closely-related species of hyperaccumulating plants and their ability in accumulation of As, Cd, Cu, Mn, Ni, Pb and Zn, Chemosphere, 10.1016/j.chemosphere.2020.126334, (126334), (2020).
- Ricardo Kriebel, Bryan Drew, Jesús G. González‐Gallegos, Ferhat Celep, Luciann Heeg, Mohamed M. Mahdjoub, Kenneth J. Sytsma, Pollinator shifts, contingent evolution, and evolutionary constraint drive floral disparity in Salvia (Lamiaceae): Evidence from morphometrics and phylogenetic comparative methods, Evolution, 10.1111/evo.14030, 74, 7, (1335-1355), (2020).
- T.K. Jyothilakshmi, Yamila Gurovich, Ken W.S. Ashwell, Numerical Analysis of the Cerebral Cortex in Diprotodontids (Marsupialia; Australidelphia) and Comparison with Eutherian Brains, Zoology, 10.1016/j.zool.2020.125845, (125845), (2020).
- Luiz H. Varzinczak, Mauricio O. Moura, Fernando C. Passos, Strong but opposing effects of climatic niche breadth and dispersal ability shape bat geographical range sizes across phylogenetic scales, Global Ecology and Biogeography, 10.1111/geb.13163, 29, 11, (1929-1939), (2020).
- Gustavo Brant Paterno, Carina Lima Silveira, Johannes Kollmann, Mark Westoby, Carlos Roberto Fonseca, The maleness of larger angiosperm flowers, Proceedings of the National Academy of Sciences, 10.1073/pnas.1910631117, (201910631), (2020).
- Justyna J Miszkiewicz, Julien Louys, Robin M D Beck, Patrick Mahoney, Ken Aplin, Sue O’Connor, Island rule and bone metabolism in fossil murines from Timor, Biological Journal of the Linnean Society, 10.1093/biolinnean/blz197, (2020).
- Christine Ewers‐Saucedo, Paula Pappalardo, Testing adaptive hypotheses on the evolution of larval life history in acorn and stalked barnacles, Ecology and Evolution, 10.1002/ece3.5645, 9, 19, (11434-11447), (2019).
- Carmen Galán-Acedo, Víctor Arroyo-Rodríguez, Ellen Andresen, Luis Verde Arregoitia, Ernesto Vega, Carlos A. Peres, Robert M. Ewers, The conservation value of human-modified landscapes for the world’s primates, Nature Communications, 10.1038/s41467-018-08139-0, 10, 1, (2019).
- Jonas O. Wolff, Gustavo B. Paterno, Daniele Liprandi, Martín J. Ramírez, Federico Bosia, Arie Meijden, Peter Michalik, Helen M. Smith, Braxton R. Jones, Alexandra M. Ravelo, Nicola Pugno, Marie E. Herberstein, Evolution of aerial spider webs coincided with repeated structural optimization of silk anchorages, Evolution, 10.1111/evo.13834, 73, 10, (2122-2134), (2019).
- V. B. Baliga, I. Szabo, D. L. Altshuler, Range of motion in the avian wing is strongly associated with flight behavior and body mass, Science Advances, 10.1126/sciadv.aaw6670, 5, 10, (eaaw6670), (2019).
- Nicholas D. Youngblut, Georg H. Reischer, William Walters, Nathalie Schuster, Chris Walzer, Gabrielle Stalder, Ruth E. Ley, Andreas H. Farnleitner, Host diet and evolutionary history explain different aspects of gut microbiome diversity among vertebrate clades, Nature Communications, 10.1038/s41467-019-10191-3, 10, 1, (2019).
- Louis Bell-Roberts, Angela E. Douglas, Gijsbert D. A. Werner, Match and mismatch between dietary switches and microbial partners in plant sap-feeding insects, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2019.0065, 286, 1902, (20190065), (2019).
- Nicholas R. Friedman, Eliot T. Miller, Jason R. Ball, Haruka Kasuga, Vladimír Remeš, Evan P. Economo, Evolution of a multifunctional trait: shared effects of foraging ecology and thermoregulation on beak morphology, with consequences for song evolution, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2019.2474, 286, 1917, (20192474), (2019).
- Jonas O. Wolff, The Evolution of Dragline Initiation in Spiders: Multiple Transitions from Multi- to Single-Gland Usage, Diversity, 10.3390/d12010004, 12, 1, (4), (2019).
- Han Guo, Nathalie D Lackus, Tobias G Köllner, Ran Li, Julia Bing, Yangzi Wang, Ian T Baldwin, Shuqing Xu, Evolution of a Novel and Adaptive Floral Scent in Wild Tobacco, Molecular Biology and Evolution, 10.1093/molbev/msz292, (2019).
- Shinichi Nakagawa, Pierre De Villemereuil, A General Method for Simultaneously Accounting for Phylogenetic and Species Sampling Uncertainty via Rubin’s Rules in Comparative Analysis, Systematic Biology, 10.1093/sysbio/syy089, (2018).








