Volume 8, Issue 10
Application
Free Access

Intertwining phylogenetic trees and networks

Klaus Schliep

University of Massachusetts Boston, Boston, MA, USA

Search for more papers by this author
Alastair J. Potts

Corresponding Author

E-mail address: potts.a@gmail.com

Nelson Mandela University, Port Elizabeth, South Africa

Correspondence author. E‐mail: potts.a@gmail.comSearch for more papers by this author
David A. Morrison

Uppsala University, Uppsala, Sweden

Search for more papers by this author
Guido W. Grimm

University of Vienna, Vienna, Austria

Search for more papers by this author
First published: 07 March 2017
Citations: 34

Summary

  1. The fields of phylogenetic tree and network inference have dramatically advanced in the past decade, but independently with few attempts to bridge them.
  2. Here we provide a framework, implemented in the phangorn library in R, to transfer information between trees and networks.
  3. This includes: (i) identifying and labelling equivalent tree branches and network edges, (ii) transferring tree branch support to network edges, and (iii) mapping bipartition support from a sample of trees (e.g. from bootstrapping or Bayesian inference) onto network edges.
  4. The ability to readily combine tree and network information should lead to more comprehensive evolutionary comparisons and inferences.

Background and purpose

Traditional phylogenetic inference has almost exclusively relied on the assumption that evolution is successfully captured by a bifurcating tree (Mindell 2013). However, tree‐based methods usually perform poorly when this assumption is violated, and phylogenetic networks should be used instead (Bapteste et al. 2013). Despite advances in both fields (e.g. Yang, Grünewald & Wan 2013; Balvočiūt≐, Spillner & Moulton 2014; Salichos, Stamatakis & Rokas 2014), the interface between trees and networks has rarely been bridged (Holland & Moulton 2003; Holland et al. 2008; Huber et al. 2016). The decision to use trees or networks is usually not based on any arguments about the superiority of one approach over the other (but see Morrison 2014), but rather the evolutionary complexity of the group under investigation and the resulting dataset. Nonetheless, tree‐based methods remain the most common analytical choice.

When the levels of data incongruence are great, however, researchers may resort to networks – often as a last option after all other tree‐based methods have failed – to provide some way of making sense of the patterns within a dataset. The only alternative is filtering ‘rogue’ taxa that are causing topological conflict and decreased branch support (Aberer, Krompass & Stamatakis 2013), or to analyse character subsets that each have an approximately tree‐like history (e.g. non‐recombining sequence blocks). Thus, the wide range of available network methods (Huson & Bryant 2006) have remained underutilised, likely because of the difficulties that arise when comparing trees and networks (such as matching tree branches to network edges).

The advances in tree and network inferences call for an integration of both methodologies. However, a framework enabling automated integration has been lacking. Here, we provide an R‐based framework – implemented in the phangorn library (Schliep 2011), which has a dependency on the extensive ape library (Paradis, Claude & Strimmer 2004) – to intertwine trees and networks.

Using this framework, we can:
  1. Visually compare trees and networks by identifying shared or exclusive branches or edges between trees and networks constructed from the same dataset (Fig. 1a). We hope this will help researchers bridge the conceptual gap between tree and network thinking (Morrison 2010, 2014).
    image
    Mapping tree information onto a network using a mitochondrial gene (cytB) woodmouse (Apodemus sylvaticus) dataset (the standard test set from the APE library). (a) Identification of edge bundles (in black) in a neighbour‐net (NN) network based on uncorrected P‐distances that correspond to branches (labelled 1–12) in a maximum likelihood (ML) tree. Asterisks refer to zero‐length tree branches (soft polytomies), of which one (branch 7) has no corresponding edge bundle in the NN network. (b) Nonparametric ML bootstrap (ML‐BS) support for all branches (branch labels) defining the ML tree mapped onto the corresponding edge bundles of the NN network. (c) Frequencies of bipartitions found in the ML‐BS pseudoreplicates mapped on the corresponding edge bundles of the NN network using a threshold of 10% (i.e. any edge is labelled that occurs in at least 100 of the 1000 ML‐BS pseudoreplicates). Edge bundles not found in the ML tree are labelled using grey numbers.
  2. Map onto a phylogenetic network any support value that can be linked to a tree branch (e.g. nonparametric bootstrap support: Felsenstein 1985; Bayesian posterior probabilities: Rannala & Yang 1996; internode certainty: Salichos, Stamatakis & Rokas 2014) (Fig. 1b). This will help researchers investigate, for example, any ambiguous support (any value <1·0/100) of tree branches, and determine whether this is due to data incompatibility or to insufficient signals in the underlying data (e.g. Draper, Hedenäs & Grimm 2007).
  3. Map bipartition frequencies from a sample of trees (e.g. from nonparametric bootstrapping or Bayesian inference) onto network edges (Fig. 1c; Grimm et al. 2006). This will help provide much needed confidence in networks, and facilitate investigation of topological alternatives that are not captured by the tree inference itself.

This open‐source R‐based tree–network framework also provides a meeting point for the output of network and tree inference software (e.g. SplitsTree: Huson & Bryant 2006; MrBayes: Ronquist et al. 2012; RAxML: Stamatakis 2014), and the results can either be visualised within R or exported to other visualisation software (e.g. SplitsTree; FigTree: Rambaut 2014); note that the phangorn library – including source code, examples and vignettes – can be installed within R via cran (https://cran.r-project.org/web/packages/phangorn/index.html). Ultimately, this framework will provide users with the ability to conduct a range of analyses where transferring information between networks and trees may provide more robust evolutionary interpretations.

Why networks?

As illustrated in Fig. 2, networks are commonly more comprehensive than trees, and they thus have a twofold use in phylogenetic analyses and the interpretation of trees: as a means for exploratory data analysis (EDA; Morrison 2010), and as a basis for hypothesis testing. The essential difference between EDA and hypothesis testing is that in the latter we explicitly set up alternative a priori hypotheses with the intention of claiming support for (only) one of them. This might be likelihood support or Bayesian inference probability. In contrast, in EDA the intention is to explore the data and to ‘see what we can see’. However, there are not necessarily any computational differences between these uses in practice. In either case, the modus operandi is to construct one or several trees, using different data partitions, and to establish branch support.

image
Trees, networks and data incompatibility. Shown is a four taxon set, with taxa differing by colour and shape. Top: The two possible unrooted trees explaining the data and their consensus network. Bottom: The same two trees and their consensus network rooted with an outgroup differing both in colour and shape.

We suggest that additional steps for phylogenetic analysis should include constructing consensus networks of the alternative trees and/or the tree sample(s) used to establish support, and/or use the data directly to infer (via characters or a distance matrix) a network (e.g. median network or neighbour‐net). A combination of both, trees and networks, usually provides a better means to understand the underlying phylogenetic signal. The importance of networks in this context is that inclusion or exclusion of edges in a network is a more powerful representation of the data. Networks include more edges than do trees, and thus display more of the data (EDA) and also potentially test more hypotheses. Moreover, trees exclude all incompatible edges, whereas networks can include them (Fig. 2), and thus the exclusion of an edge from a network is a more powerful statement about relationships than is exclusion from a tree. However, absence of an edge in a network does not necessarily exclude support for it, depending on the type of graph (circular, planar, etc.).

With this in mind, there are four possible data patterns in phylogenetic reconstruction: (1) patterns that are well supported by the data and appear in the optimised tree; (2) patterns that are well supported by (part of) the data but do not appear in the optimised tree, i.e. they are incompatible with the tree; (3) patterns that are weakly supported by the data but appear in the optimised tree anyway, i.e. they are compatible with the tree; and (4) patterns that are weakly supported by the data and do not appear in the optimised tree. Patterns 1 and 4 are accepted in phylogenetics and are captured in the optimised tree, but patterns 2 and 3 are often overlooked when using trees. Networks should effectively address pattern 2, which is their main conceptual advantage. Pattern 3, on the other hand, refers to bipartitions that are compatible with pattern 1, and thus will ‘come along for the ride’ (i.e. reconstructed due to association rather than support in the data) as the tree is being constructed stepwise. They can therefore end up with low‐to‐high branch support (depending on the circumstances), but because they have little character support they are unlikely to be represented as edges in networks based on characters or distances. In this case, explicitly relating the corresponding tree branches and network edges highlights which tree branches may be misleading, i.e. showing high branch support but with little character support. An example of these four patterns is provided in Fig. 3 (see also Morrison 2013a).

image
An example of the relationship between splits weights, from a neighbour‐net splits network (generated using SplitsTree version 4.14.4), and tree‐based pseudoreplicate bootstrap percentages, from a maximum likelihood analysis (generated using RAxML version 8.2.4); both analyses used the same dataset. The data are from Wang, Braun & Kimball (2012), with 28 taxa and 25 700 aligned nucleotides. The figure shows the split weights or bipartition support for the 62 splits that were included in the network. These form a collection of circular splits, which do not necessarily include all of the splits supported by the data – those splits not in the network, but present in the bootstrap sample, are shown in red with a split weight of 0·00001 (rather than zero, to accommodate the log scale). Also, shown in blue are the bootstrap percentages (from 1000 bootstrap pseudoreplicates) for all of the splits in the network plus all of those branches with a bootstrap frequency greater than 1%. Splits present in the network that did not appear in any of the bootstrap pseudoreplicates are shown in green. Note that two splits highly supported by the network (bottom right) did not appear in any bootstrap replicate because they conflict with other better‐supported splits, and thus cannot appear with it in a tree (i.e. pattern 2); and two splits with high bootstrap support (>85%, top left) do not appear in the network because they conflict with several other splits, which are chosen instead for the circular collection displayed in the neighbour‐net network.

Using networks to understand ambiguous tree branch support – a case of bears lost in a forest of trees

N‐dimensional consensus networks can be an efficient means for discriminating patterns 2 and 3 (described above), and they allow for a better understanding of ambiguous support values along one, or several alternative, trees. In contrast to a one‐dimensional tree or a two‐dimensional planar network, a consensus network shows all splits (taxon bipartitions) in a collection of trees, which makes it a versatile tool to investigate bootstrap pseudoreplicate samples and competing branch support. This is necessary when we want to understand why a branch in a tree did not receive full support. Let us assume two incongruent nuclear and mitochondrial (or plastid) genealogies, with the nuclear data being underrepresented in the concatenated dataset. The bootstrap replicates will capture the conflicting signals to some degree and, in the ideal case, reflect them proportionally. For example, if 25% of the distinct alignment patterns come from the nuclear gene partitions, the taxon bipartitions reflecting the nuclear genealogy will ideally receive bootstrap support under ML (BSML) of around ~25%, complementing a BSML of ~75% for the competing mitochondrial (or plastid)‐preferred splits.

A real‐world example is shown in Fig. 4, based on data from Kutschera et al. (2014). Different sets of sequence data available for bears – the paternally inherited complete Y‐chromosome and the maternally inherited mitochondrial genes – prefer largely conflicting but generally well‐supported topologies (mostly following data pattern 1) for the inter‐species relationships in the core clade of modern bears, the Ursinae (Sun Bear, Sloth Bear and species of genus Ursus comprising the North American and Asian Black Bear, Brown Bear and Polar Bear; Fig. 4a vs. b). Recently assembled nuclear‐encoded autosomal‐intron data (biparentally inherited) prefer a tree congruent with the Y‐chromosome tree (Kutschera et al. 2014), but most branches are poorly supported (pattern 3; Fig. 4c). The overall support is lower for the autosomal‐intron, although the deeper branches are longer. The longer branches are due to the autosomal‐intron data having a combination of data pattern 2 in addition to data pattern 3 (Fig. 5; see below). The branch lengths of the trees further indicate that the mitochondrial data have a higher divergence than the Y‐chromosome and nuclear‐intron data by a factor of 10. For instance, the unambiguously inferred sister relationship between Brown and Polar Bears translates into ~2E‐4 and 1E‐3 expected substitutions per site supporting this branch in the case of the Y‐chromosome and autosomal data (BSML = 94/83), respectively, but more than 0·05 (factor of 50!) for the mitochondrial data. A combined tree, however, shows a topology mostly following the preferred topology of the Y‐chromosome/autosomal intron, but with branch lengths relating to the much more divergent mitochondrial data. The substantial conflict between the paternal/biparental and maternal genealogies is only expressed by the low support for critical branches in the Ursinae (BSML ≤60 based on the combined dataset), and the fact that the Sloth and Sun Bears are recognised as sisters (red clade; Fig. 4d) within the ‘Asian’ subclade of the Ursinae (pink clade; Fig. 4a,b,d). Notably, this split is not seen in any of the individual trees, and is hence most probably a branching artefact due to the combination of incongruent datasets.

image
Real‐world example for data pattern 2 (described in text). Trees are largely uninformative when it comes to conflicting signals and their amplitudes, illustrated here as conflicting paternal (Y‐chromosome; YCh), maternal (mitochondrial genes; mtG) and biparental (nuclear autosomal introns, ncAI) genealogies for bears (Ursidae); the original datasets are from Kutschera et al. (2014). (a–d) ‘Best‐known’ maximum likelihood (ML) trees with nonparametric bootstrap support (BSML) annotated along the branches. ‘Jumping’ taxa are represented by open circles; corresponding branches are coloured with the same colour (terminal branches are correspondingly coloured for better visibility of the differences between topologies). (a) Y chromosome tree, recognising the American Black Bear as sister to the Brown and Polar Bear with unambiguous support. (b) Mitochondrial gene tree, recognising a sister relationship between the American and Asian Black Bears with high support; also note the much higher (more than a factor 10) divergence of these data compared to (a: YCh tree) and (c: ncAI tree). (c) Tree inferred from ncAI data, showing a topology identical to the YCh tree (a) but with generally (much) lower support despite similar overall divergence (lower support is due to ambiguous signal in the individual introns). (d) Tree inferred from the concatenated (conc.) dataset, recovering a topology that mostly follows the one shared by the factor 10 lower divergent Y‐chromosome and ncAI data. The exception is the poorly supported sister relationship between the Sloth and Sun Bear (red clade), not seen in any of the individual genealogies (a–c). (e) Consensus network (here: network equivalent of a cladogram, i.e. all edges are set to the same length) of all three ‘best‐known’ ML trees, with competing ML bootstrap support (obtained from the bootstrap sample) annotated along alternative edge bundles. The consensus network shows all topological alternatives realised in the trees (a–c) and allows cross‐mapping of ML bootstrap support from all four analyses. The lower support (BSML [conc.] = 30) for the Sloth, Sun and Asian Black Bear clade (purple; BSML [YCh] = 81; BSML [ncAI] = 42) in the concatenated tree is due to the conflicting signal from the mitochondrial genes rejecting this topological alternative (BSML [mtG] = 0); also both the Y‐chromosome and ncAI data reject the placement of the Sloth Bear as sister to all other core group bears (BSML [YCh/ncAI] = 4/8 vs. BSML [mtG] = 66).
image
Neighbour‐net of the concatenated bear nuclear autosomal introns including bootstrap (BS) support from maximum likelihood (ML) analyses of individual introns and the concatenated dataset. The partitioned and unpartitioned ML analyses BS support values for all neighbour‐net edges relating to the ingroup (Ursinae) are provided, whereas symbols are used for each individual intron's support: ‘+’: well supported, BSML ≥ 85 (mostly 100), no competing second or third alternative found in the BS tree sample; ‘o’: one of two (rarer three) competing possibilities, BSML low (≥15) to moderate (≤70); ‘’: rejected, i.e. split not or very rarely found in the BS tree sample (BSML = 0–10) being incompatible with a well‐supported split; ‘?’: no discriminating signal*. Colouring of edges is the same as in Fig. 4; edges seen in the corresponding ML tree (Fig. 4c) are in bold and thickened. *The signal from intron IGSF22 is highly ambiguous, as only two splits are found with a frequency of more than 20% in the BS tree sample. Thus, IGSF22 neither supports nor rejects any topological alternative.

Figure 4e shows the strict consensus network of the three different ML trees (Fig. 4a‐c); it simultaneously includes all alternative, partly incompatible, taxon bipartitions favoured by the three datasets (the graph is two‐dimensional in this case because there are always only two competing alternatives, i.e. incompatible splits). Using the R function to identify corresponding branches in trees and edges in networks (Fig. 1a), we can match the consensus network edge bundles to the tree branches, and identify whether they come from the Y‐chromosome/autosomal intron or mitochondrial data tree. We can explore each hypothesis (and the one based on the combined data) by mapping the differential bootstrap supports onto the consensus network (see Fig. 1c). We see that several aspects of the combined data hypothesis are rejected by the mitochondrial data, whereas the mitochondriome‐preferred deeper topology, recognising the Sloth Bear as sister to the remainder of the core group, and the Polar and Brown Bear as sister to an Asian‐North American clade, is rejected by the other two datasets.

On the other hand, low bootstrap support may also just reflect pattern 3: a lack of discriminatory signal but compatibility with the overall tree topology. In that case, no alternative will receive high levels of bootstrap support.

To understand ambiguous tree branch support, discrimination of data pattern 2 (conflict) and pattern 3 (low signal amplitude) is crucial. Kutschera et al. (2014) explored the complex phylogenetic relationships among the six morphologically and ecologically distinct species of Ursine bears. Resolving these relationships in a tree‐based bifurcating framework was complicated by conflicting signals between the nuclear and mitochondrial genomes, the paternal and maternal genealogies (Fig. 4) as well as considerable heterogeneity among the nuclear loci. Kutschera et al. (2014) related this to incomplete lineage sorting and introgression, highlighting the idea that these complex patterns cannot be explored in the tree‐based framework, and suggesting that they can only be understood using multiple independently inherited loci in a coalescence framework. Here, we demonstrate how the integration of trees and networks also facilitates the study of such complex data.

Figure 5 shows how EDA would work by integrating a distance‐based (planar) network and bootstrap trees. The neighbour‐net based on species‐consensed sequences computed from the concatenated nuclear‐intron dataset by Kutschera et al. (2014) forms the basis for our assessment. Using our R function to map bipartition frequencies from a collection of trees (bootstrap pseudoreplicate trees in our case) onto the network (Fig. 1c), we can assess which alternative relationships would receive the highest support from the concatenated data and the individual introns without being restricted to a single one‐dimensional tree. With this combination, we can infer the degree to which the ML support estimates match the structure of the distance‐based neighbour‐net, and thus decide whether the pairwise genetic distances reflect potential phylogenetic relationships or not. In the case of the bear nuclear‐intron data, there is good agreement between the overall genetic distances and the preferred phylogenetic relationships, even though the support is generally low. Furthermore, we can pinpoint the contribution of each individual nuclear intron to the overall inference. We note that the more prominent box‐like structures in the neighbour‐net and correspondingly low overall support (e.g. BSML = 40 vs. 45 for Sloth Bear sister to Asian Black Bear vs. Sun Bear; dark green vs. red edges in Figs 4 and 5) relate to conflicting signals from the individual introns (most supporting the latter; data pattern 2), whereas the low support for the central edges (i.e. relatively deep branches in contained trees) is mostly due to lack of discriminating signal (data pattern 3).

When using bootstrap support consensus networks for hypothesis testing or EDA, one will usually resort to applying a frequency cut‐off threshold for most real‐world datasets. Depending on the number of taxa, the number of discriminative sites, and the level of incompatible signals, the n‐dimensional bootstrap consensus networks can quickly become chaotic. A single ‘rogue’ taxon, which fails to be consistently placed in the phylogeny, can have a different position in each bootstrap replicate, and all of these positions will be reflected in the consensus network. For hypothesis testing, we can opt for a higher threshold (e.g. only keeping splits found in >25% or >33% of the bootstrap replicates) because we want to filter only the relatively probable alternatives, or the second‐best alternative. For EDA, thresholds of 10–20% can be useful, depending on the complexity of the data. Traditional parsimony, frequently used for non‐molecular data, puts a strong emphasis on the topologies of the most‐parsimonious trees (MPT). Hence, applying no threshold for the consensus network would be the most honest way to summarise the collection of MPT and the alternative evolutionary scenarios provided by them (Mardulyn 2012).

Using Bayesian posterior probabilities instead of bootstrap support for hypothesis testing and EDA in the case of non‐trivial datasets

In principle, the same workflow (hypothesis testing and EDA; elucidating the reason for ambiguous tree branch support) can be followed using support consensus networks based on a Bayesian tree sample. For example, in their critique of the Bayes Factor for hypothesis testing, Bergsten, Nilsson & Ronquist (2013) compared alternative topologies by their ‘Bayesian model odds’; ‘Bayesian model odds’ are the actual frequencies of alternative splits supporting different evolutionary hypotheses in the Bayesian tree collection (calculated after burn‐in). A more comprehensive approach would be to visualise all bipartition frequencies in the Bayesian tree sample using a network, a support consensus network based on the Bayesian tree sample. Using networks, instead of trees, in this context has the advantage that all alternatives can be simultaneously studied (see Figs 4e and 5).

When doing this, one has to keep in mind that the Bayesian inference does not resample the data. For our theoretical example above, 25% nuclear patterns vs. 75% mitochondrial (or plastid) patterns supporting (partly) conflicting topologies, the topologies fitting the plastid (or mitochondrial) tree will always be more likely than those fitting the nuclear data. The MCMC algorithm, as any other tree inference algorithm, optimises for the topology that best explains the complete data, and quickly filters alternative topologies conflicting with the majority of the data. Accordingly, the nuclear topologies, rejected by 75% of the data patterns, will not (or rarely) be found in the Bayesian tree sample, particularly if we only include the topologies from the final Bayesian plateau. Thus, any PP of x < 1·0 may indicate conflict in the data, if a single competing alternative received a PP of 1−x. So, we would usually opt for lower thresholds when summarising a Bayesian tree sample into a consensus network than for summarising the bootstrap pseudoreplicate sample from the same data.

To avoid overlooking data conflict in multi‐gene datasets, we recommend always doing a full ML tree inference and bootstrapping analysis based on the single‐gene and concatenated datasets in the case of multi‐gene data, and then use the functions/approach described here to map possibly competing support patterns. When summarising the results of an additional Bayesian analysis, it should be obligatory to use the consensus network (with low threshold) instead of the commonly used majority rule consensus tree (Morrison 2013a). We suspect that many studies noting a conflict between ML and Bayesian analysis simply overlooked ambiguous or even conflicting signals in the data (data patterns 2 and 3), which had different effects on decreasing bootstrap support vs. posterior probabilities and eventually led to misleading branches in the optimised ML tree (see, e.g. Grímsson, Grimm & Zetter 2017, file S6; a critical revisitation of Loranthaceae data included in a recent all‐Santalales phylogeny showing partly strongly different BS and PP for critical branches).

Application outlook

Baum, Smith & Donovan (2005) highlighted that many biologists, both students and established researchers, struggle with ‘tree‐thinking’ (the interpretation of phylogenetic trees). This theme of problematic tree perception was further expanded in Baum & Smith (2012). However, a general understanding of phylogenetic networks, especially given the diversity of network types, remains a potentially even greater obstacle than tree interpretation (Morrison 2010, 2013b); even many phylogenetic systematists are unable to interpret phylogenetic networks correctly. We hope that the ability to easily shift information among trees and networks will act as a crucial step to bridge the conceptual gaps between these two phylogenetic models. But more importantly, this framework will aid researchers in developing comprehensive interpretations of the signals in a dataset. Combining trees and networks, often by plotting tree information onto a network, provides a visually accessible means of comparing a multitude of data sources (e.g. Figs 3 and 4), which is a powerful conceptual and EDA tool. Thus, we envisage that this framework for intertwining trees and networks will have a multitude of uses, such as investigating specific phylogenetic signals, identifying competing evolutionary scenarios and pinpointing methodological shortcomings.

For example, tree branches that are not present in the edges of a network can easily be identified, or vice versa; this often highlights significant and identifiable discrepancies between trees, which may arise from specific phylogenetic processes (e.g. rapid ancient radiations) or method‐inherent biases (branching artefacts, model‐induced differences). Networks can also be used to improve phylogenetic tree inference (Morrison 2010). For example, ‘lost branches’ can be identified, i.e. alternative phylogenetic splits that receive relatively high data support but are not represented in the inferred tree. Comparison of support networks based on different tree samples with optimised trees can help to understand, and possibly even dissolve, method‐related discrepancies (e.g. conflict between the ML optimised tree and the Bayesian majority rule consensus tree).

In addition, researchers may be interested in the differential support of topological alternatives that are masked by polytomies in the commonly used majority rule trees, strict consensus trees or single ‘representative’ trees with mapped support values (e.g. Mardulyn 2012). For example, one could use a consensus network to visualise competing, equally good topologies based on a dataset (Maximum Likelihood ‘Islands’, Morrison 2007), and where the ML tree lies on an ‘ML terrace’ (Sanderson et al. 2015), and then map differential support from a bootstrap analysis (or series of analyses) onto this network. The new R/Phangorn functions further allow extraction of the differential support for each competing split in a tabulated form. This would be useful for large datasets with large numbers of taxa (e.g. Izquierdo‐Carrasco, Smith & Stamatakis 2011), which will be hard to investigate visually (or to draw). However, at the moment this is not feasible with the available linear implementations for computing consensus networks (SplitsTree, R/Phangorn) and mapping bootstrap support values to networks (R/Phangorn) due to hardware constraints.

This framework will help phylogenetic practitioners to readily transfer information both ways between tree‐based and network‐based analyses, and thereby visualise and investigate similarities and differences between them. We believe that phylogenetic networks displaying those edges supported by tree‐based algorithms (e.g. maximum likelihood or Bayesian inference) offer the most comprehensive representation of the evolutionary signals in a phylogenetic dataset irrespective of its complexity (Fig. 1c; e.g. Potts, Hedderson & Grimm 2014).

Authors’ contributions

G.W.G., A.J.P. and D.A.M. conceived the ideas and designed methodology; K.S. developed the R algorithms; G.W.G. and D.A.M. conceived and outlined the case study; G.W.G. and A.J.P. led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.

Acknowledgements

This work was supported in part by a grant from the National Science Foundation (DEB 1350474 to K.S.). G.W.G. acknowledges funding by the Austrian Science Fund (FWF), project number M1751‐B16. A.J.P. acknowledges funding from the National Research Foundation (RCA13091944022). D.A.M. acknowledges support from Försäkringskassan and Trygghetsstiftelsen. Thanks to Axel Janke for providing the bear genomic data. Alexandros Stamatakis and an anonymous reviewer contributed ideas and comments that improved this manuscript.

    Data accessibility

    The Phangorn v. 2.0.3 library including source code, examples and vignettes can be installed within R via cran (https://cran.r-project.org/web/packages/phangorn/index.html), and they are available for download on github (https://github.com/KlausVigo/phangorn). Files related to the bear example (original datasets from Kutschera et al. 2014) can be found here: www.palaeogrimm.org/data/Schlp16_AddOn.zip.

      Number of times cited according to CrossRef: 34

      • Novel approach in whole genome mining and transcriptome analysis reveal conserved RiPPs in Trichoderma spp, BMC Genomics, 10.1186/s12864-020-6653-6, 21, 1, (2020).
      • Sirt4 Modulates Oxidative Metabolism and Sensitivity to Rapamycin Through Species-Dependent Phenotypes in Drosophila mtDNA Haplotypes , G3&#58; Genes|Genomes|Genetics, 10.1534/g3.120.401174, 10, 5, (1599-1612), (2020).
      • First record of albinism in spiny rats of genus Proechimys (Rodentia: Echimyidae) from Western Amazon, Mammalia, 10.1515/mammalia-2019-0133, 0, 0, (2020).
      • Crop Origins and Phylo Food: A database and a phylogenetic tree to stimulate comparative analyses on the origins of food crops, Global Ecology and Biogeography, 10.1111/geb.13057, 29, 4, (606-614), (2020).
      • Evidence of genetic isolation between two Mediterranean morphotypes of Parazoanthus axinellae, Scientific Reports, 10.1038/s41598-020-70770-z, 10, 1, (2020).
      • Mating changes the genital microbiome in both sexes of the common bedbug Cimex lectularius across populations , Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2020.0302, 287, 1926, (20200302), (2020).
      • Designing Probiotic Therapies With Broad-Spectrum Activity Against a Wildlife Pathogen, Frontiers in Microbiology, 10.3389/fmicb.2019.03134, 10, (2020).
      • Lake-depth related pattern of genetic and morphological diatom diversity in boreal Lake Bolshoe Toko, Eastern Siberia, PLOS ONE, 10.1371/journal.pone.0230284, 15, 4, (e0230284), (2020).
      • Amphipods from the Wallaby-Zenith Fracture Zone, Indian Ocean: new genus and two new species identified by integrative taxonomy, Systematics and Biodiversity, 10.1080/14772000.2020.1729891, (1-22), (2020).
      • Phylogenetic relationships in the southern African genus Drosanthemum (Ruschioideae, Aizoaceae) , PeerJ, 10.7717/peerj.8999, 8, (e8999), (2020).
      • Genetic Relationships and Reproductive Traits of Romanian Populations of Silver Fir (Abies alba): Implications for the Sustainable Management of Local Populations, Sustainability, 10.3390/su12104199, 12, 10, (4199), (2020).
      • Contributions of Bayesian Phylogenetics to Exploring Patterns of Macroevolution in Archaeological Data, Handbook of Evolutionary Research in Archaeology, 10.1007/978-3-030-11117-5, (161-182), (2019).
      • Oribatid mites show that soil food web complexity and close aboveground-belowground linkages emerged in the early Paleozoic, Communications Biology, 10.1038/s42003-019-0628-7, 2, 1, (2019).
      • A non-parametric analytic framework for within-host viral phylogenies and a test for HIV-1 founder multiplicity, Virus Evolution, 10.1093/ve/vez044, 5, 2, (2019).
      • Genome of Spea multiplicata , a Rapidly Developing, Phenotypically Plastic, and Desert-Adapted Spadefoot Toad , G3&#58; Genes|Genomes|Genetics, 10.1534/g3.119.400705, 9, 12, (3909-3919), (2019).
      • Development of microbial communities in organochlorine pesticide contaminated soil: A post-reclamation perspective, Applied Soil Ecology, 10.1016/j.apsoil.2019.103467, (103467), (2019).
      • Fifty Aureobasidium pullulans genomes reveal a recombining polyextremotolerant generalist, Environmental Microbiology, 10.1111/1462-2920.14693, 21, 10, (3638-3652), (2019).
      • Oribatid mites show how climate and latitudinal gradients in organic matter can drive large‐scale biodiversity patterns of soil communities, Journal of Biogeography, 10.1111/jbi.13501, 46, 3, (611-620), (2019).
      • The extremely halotolerant black yeast Hortaea werneckii - a model for intraspecific hybridization in clonal fungi, IMA Fungus, 10.1186/s43008-019-0007-5, 10, 1, (2019).
      • Convergent evolution of reduced eggshell conductance in avian brood parasites, Philosophical Transactions of the Royal Society B: Biological Sciences, 10.1098/rstb.2018.0194, 374, 1769, (20180194), (2019).
      • Relationships between Tertiary relict and circumboreal woodland floras: a case study in Chimaphila (Ericaceae) , Annals of Botany, 10.1093/aob/mcz018, (2019).
      • Identification of positively selected genes in human pathogenic treponemes: Syphilis-, yaws-, and bejel-causing strains differ in sets of genes showing adaptive evolution, PLOS Neglected Tropical Diseases, 10.1371/journal.pntd.0007463, 13, 6, (e0007463), (2019).
      • Sexual selection and male-biased size dimorphism in a lineage of lungless salamander (Ampibia: Plethodontidae), Biological Journal of the Linnean Society, 10.1093/biolinnean/blz104, (2019).
      • Population Genomics of an Obligately Halophilic Basidiomycete Wallemia ichthyophaga, Frontiers in Microbiology, 10.3389/fmicb.2019.02019, 10, (2019).
      • Analysing Phylogenetic Structures, Multivariate Analysis of Ecological Data with ade4, 10.1007/978-1-4939-8850-1, (261-280), (2018).
      • Mitochondrial and Plastid Genomes from Coralline Red Algae Provide Insights into the Incongruent Evolutionary Histories of Organelles, Genome Biology and Evolution, 10.1093/gbe/evy222, 10, 11, (2961-2972), (2018).
      • Genomic evidence for intraspecific hybridization in a clonal and extremely halotolerant yeast, BMC Genomics, 10.1186/s12864-018-4751-5, 19, 1, (2018).
      • The effects of repeated whole genome duplication events on the evolution of cytokinin signaling pathway, BMC Evolutionary Biology, 10.1186/s12862-018-1153-x, 18, 1, (2018).
      • Bleaching-Associated Changes in the Microbiome of Large Benthic Foraminifera of the Great Barrier Reef, Australia, Frontiers in Microbiology, 10.3389/fmicb.2018.02404, 9, (2018).
      • Is Host Filtering the Main Driver of Phylosymbiosis across the Tree of Life?, mSystems, 10.1128/mSystems.00097-18, 3, 5, (2018).
      • The Contribution of Neutral and Environmentally Dependent Processes in Driving Population and Lineage Divergence in Taiwania (Taiwania cryptomerioides), Frontiers in Plant Science, 10.3389/fpls.2018.01148, 9, (2018).
      • A Winteraceae pollen tetrad from the early Paleocene of western Greenland, and the fossil record of Winteraceae in Laurasia and Gondwana, Journal of Biogeography, 10.1111/jbi.13154, 45, 3, (567-581), (2017).
      • Ixora (Rubiaceae) on the Philippines - crossroad or cradle?, BMC Evolutionary Biology, 10.1186/s12862-017-0974-3, 17, 1, (2017).
      • The fossil Osmundales (Royal Ferns)—a phylogenetic network analysis, revised taxonomy, and evolutionary classification of anatomically preserved trunks and rhizomes, PeerJ, 10.7717/peerj.3433, 5, (e3433), (2017).