Toytree: A minimalist tree visualization and manipulation library for Python
Abstract
- Toytree is a lightweight Python library for programmatically visualizing and manipulating tree‐based data structures. It implements a minimalist design aesthetic and modern plotting architecture suited for interactive coding in IPython/Jupyter.
- Tree drawings are generated in HTML using the toyplot library backend, and display natively in Jupyter notebooks with interactivity features. Tree drawings can be combined with other plotting functions from the toyplot library (e.g. scatterplots, histograms) to create composite figures on a shared coordinate grid, and can be exported to additional formats including PNG, PDF and SVG.
- To parse and store tree data, toytree uses a modified fork of the ete3 TreeNode object, which includes functions for manipulating, annotating and comparing trees. Toytree integrates these functions with a plotting layout to allow node values to be extracted from trees in the correct order to style nodes for plotting. In addition, toytree provides functions for parsing additional tree formats, generating random trees, inferring consensus trees and drawing grids or clouds from multiple trees to visualize discordance.
- The goal of toytree is to provide a simple Python equivalent to commonly used tree manipulation and plotting libraries in R, and in doing so, to promote further development of phylogenetic and other tree‐based methods in Python. Toytree is released under the GPLv3 license. Source code is available on GitHub and documentation is available at https://toytree.readthedocs.io.
1 INTRODUCTION
Tree‐based data structures (e.g. directed acyclic graphs) are commonly used in evolutionary biology, genetics and other fields to represent hierarchical relationships (Baum, Smith, & Donovan, 2005). A common example is a phylogeny—the representation of relationships among species and their common ancestors. Software for displaying and manipulating trees has been developed over several decades and includes both stand‐alone tools with graphical user interfaces (e.g. Figtree; Rambaut, 2009) and programmatic plotting libraries (e.g. ape; Paradis & Schliep, 2019). Of these, the latter is particularly useful for combining data visualization and analysis into reproducible scripts.
The Python programming language has become one of the most widely used tools for scientific computing. Modern applications in Python make extensive use of IPython (Perez & Granger, 2007) for interactive coding; Jupyter (Kluyver et al., 2016) for working in reproducible web‐based documents; Conda (https://conda.io/) for simplifying package management; and numpy (Oliphant, 2015) and scikit‐learn (Pedregosa et al., 2011) for mathematical and statistical operations. As Python programming has shifted towards interactive web‐based documents, visualization tools have also kept pace, with many new libraries (e.g. bokeh, toyplot, altair) supporting vector graphics that display natively in a web browser, often with interactive features (e.g. hover or tool tip functions) enabled by JavaScript. The ability to generate these complex web‐based visualizations with simple Python code is powerful.
Despite these advances, Python currently lags significantly behind the r programming language in support for tree‐based analysis and plotting tools. The r package ape , first released in 2004 (Paradis, Claude, & Strimmer, 2004), kick‐started the development of an ecosystem of tree‐based statistical libraries in r, including many tools for comparative evolutionary analyses on trees (e.g. Harmon, Weir, Brock, Glor, & Challenger, 2008; Revell, 2012). At the heart of this development ecosystem was a robust and simple tree visualization framework.
Python lacks an equivalent tree plotting library, although several options are available. For example, a tree layout can be produced using generic network plotting tools (e.g. networkx) combined with almost any plotting library, but this approach is far from simple since the code syntax and default styling are not specific to tree plotting. Of the several Python libraries developed for working with trees, including dendropy (Sukumaran & Holder, 2010), ete3 (Huerta‐Cepas, Serra, & Bork, 2016), ivy (http://www.reelab.net/ivy/) and Bio.Phylo (Talevich, Invergo, Cock, & Chapman, 2012), none has yet achieved comparable popularity to R's ape for producing publication quality figures, although they are widely used for tree manipulations, comparisons and other analyses. The relative paucity of Python‐based tree plots may reflect differences in their default styling, or the complexity of their code syntax. Another important feature supported in ape is the ability to combine trees with other data (e.g. barplots, scatterplots) on shared coordinate axes. There remains a significant demand for a Python tree plotting library that is simple, lightweight and capable of integrating easily with other data plotting tools.
Here I describe the Python tree plotting library toytree, which can integrate with the plotting library toyplot (Shead, 2014) to generate rich composite figures that combine trees with data. Toytree is available for Python 2.7, 3.5 and later versions. It has few dependencies and can be installed with a single command using Conda or pip. Here I demonstrate a few advantages of the toytree design and ethos. Many more examples can be found in the documentation (https://toytree.readthedocs.io) which also includes a cookbook section. The toytree documentation is automatically generated when the library is updated and is tested at each update through continuous integration. Source code is hosted at https://github.com/eaton-lab/toytree under the GPLv3 license.
2 RESULTS
2.1 ToyTree objects
Toytree can read and write trees in the newick or extended New Hampshire format (based on ete3 newick functions), and additionally supports parsing tree blocks of Nexus formatted files, and those with complex annotations of node and edge labels or names (e.g. mrbayes, bpp and astral tree files). A ToyTree object can be generated by loading data from a string, file path or URL using the .tree() function. The library is explicitly object oriented with few main object classes, and most attributes and functions are accessible from those class objects. For example, the ToyTree object can be used to access attributes of a tree, including features assigned to nodes (e.g. names, support values, edge lengths), and functions for drawing and manipulating the tree. For this, the interactive nature of IPython is useful, as you can use tab completion to interactively view all attributes or functions associated with ToyTree objects.
2.2 Toytree drawings
Tree drawings display natively in jupyter notebooks and can be exported to a number of formats including SVG, PNG and PDF. Toytree aims to not only provide default styling options to make tree plotting simple but also allows for extensive styling options. The function .draw() can be called from ToyTrees to generate a plot that will automatically render in Jupyter notebooks (Figure 1). If the plot objects are stored as variables they can be further modified or saved.

2.3 TreeNodes and Toytrees
Toytree uses a forked and modified version of the TreeNode class object from the ete3 library (Huerta‐Cepas et al., 2016) to represent nodes in a tree. A tree is represented by a collection of TreeNodes with pointers designating ancestor and descendant relationships. Each TreeNode has a default set of features (name, dist, support, height, idx) that can be modified or added to by users. In toytree, TreeNodes are nested within a ToyTree object, and thus most users will not interact with TreeNodes directly. TreeNodes represent individual nodes, and ToyTrees represent the collection of TreeNodes, with attributes and functions at that level. ToyTrees store plotting coordinates that are updated whenever trees are modified; provide user‐friendly functions for modifying trees; and provide functions for accessing attributes of trees, such as node values or tip labels that can be used to extract values in node plot order to easily modify node styles (e.g. size, colour) based on attribute values. Tree modification functions of Toytrees are largely based on ete3 functions but sync with plotting coordinates, return copies instead of operating in‐place to allow for chaining multiple functions together, and allow selecting clades or tips based on fuzzy label matching, simple expressions or common ancestor relationships.
2.4 Reproducible and robust workflows
A challenge when working with trees and tree drawings is to ensure that the correct values (e.g. names or support values) are plotted on the correct nodes or edges of the tree, and that these remain associated throughout manipulations of the tree, such as collapsing or rotating nodes, modifying names or re‐rooting. Toytree aims to reduce errors that come from improperly aligning node or tip data with a tree structure once it is modified by providing functions for extracting data directly from ToyTree objects. This approach ensures the data are returned in the order in which they will be plotted (Figure 2). Mistakes can be further avoided by using the style argument node_hover = True, which activates an interactive hover feature to show all attributes of nodes when hovered over by a cursor. This allows one to easily check that node names, supports and edge lengths are coordinated, and to explore multiple node features at once.

2.5 Utility functions
Toytree includes functions to generate random trees in the .rtree submodule, and to modify trees in the .mod submodule. The .rtree functions can generate trees that are balanced or imbalanced, with equal or random branch lengths, and as coalescent trees. The .mod functions can be used to scale node heights by a constant, to randomly slide node heights or to make trees ultrametric. These simple functions can be generally useful for generating trees for comparing the effects of tree shape and edge lengths when testing evolutionary hypotheses.
2.6 MultiTree objects
Multiple trees can be parsed and stored together in the MultiTree class object to coordinate comparing and plotting sets of trees. Two MultiTree plotting functions are currently available: .draw_tree_grid() and .draw_cloud_tree(). The first arranges multiple trees on a coordinate grid, while the latter plots overlapping trees on the same coordinate axes (Figure 3). Cloud trees are primarily used to visualize discordance among topologies, which are most easily observed when overlapping trees are plotted with a fixed order of the tips. In tree grid drawings, users can also optionally fix the order of tips to better visualize discordance (Figure 3a,b). For both types of plots, the style of individual trees can be modified as well, as in Figure 3c, where different edge colours are applied to trees that match or do not match the majority rule consensus topology. Programmatic drawing of cloud trees makes it possible to generate complex plots using a very simple code (Figures 3d and 4). Finally, MultiTree objects can return a majority rule consensus tree from a set of trees and calculate support values for splits in the tree.


3 CONCLUSIONS
Toytree is a simple but powerful tree plotting library for Python. It provides pleasant out‐of‐the‐box styling for displaying trees, simple methods for manipulating them, and is easy to install and use. It can integrate easily with existing Python tree analysis packages. Future developments will include support for circular tree layouts, faster rendering of trees by simplifying CSS code, additional tree modification functions and integration with other Python and R libraries for plotting results of comparative analyses on trees.
ACKNOWLEDGEMENTS
During the development of toytree I received support from the National Science Foundation grant DEB‐1557059 and Columbia University start‐up funds. I thank Timothy Shead for helpful advice and for motivating the development of this software.
Open Research
DATA AVAILABILITY STATEMENT
Toytree source code is available at https://github.com/eaton-lab/toytree with the current version 1.0.0 archived at Zenodo https://doi.org/10.5281/zenodo.3445526. A jupyter notebook to reproduce figures in this publication is available in the supplementary materials and on GitHub: https://nbviewer.jupyter.org/github/eaton-lab/toytree/tree/master/manuscript/




