前言:系统发育树美化R程序包ggtree,由香港大学余光创博士编写。以下内容均为转载,放在这里仅仅为了方便查看。
转载来源:http://127.0.0.1:28433/session/Rvig.368047b416ec.html
参考文献:2041-210X.12628.pdf Ggtree.pdf
余博士其他教程https://yulab-smu.github.io/treedata-book/chapter5.html?tdsourcetag=s_pcqq_aiomsg


ggtree: a phylogenetic tree viewer for different types of tree annotations

Guangchuang Yu

School of Basic Medical Sciences, Southern Medical University

2019-01-14

Citation

If you use ggtree in published research, please cite the most appropriate paper(s) from this list:

  1. G Yu, DK Smith, H Zhu, Y Guan, TTY Lam. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017, 8(1):28-36. doi: 10.1111/2041-210X.12628.
  2. G Yu, TTY Lam, H Zhu, Y Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution, 2018, 35(2):3041-3043. doi: 10.1093/molbev/msy194.

    Introduction

    This project arose from our needs to annotate nucleotide substitutions in the phylogenetic tree, and we found that there is no tree visualization software can do this easily. Existing tree viewers are designed for displaying phylogenetic tree, but not annotating it. Although some tree viewers can displaying bootstrap values in the tree, it is hard/impossible to display other information in the tree. Our first solution for displaying nucleotide substituitions in the tree is to add this information in the node/tip names and use traditional tree viewer to show it. We displayed the information in the tree successfully, but we believe this indirect approach is inefficient.
    Previously, phylogenetic trees were much smaller. Annotation of phylogenetic trees was not as necessary as nowadays much more data is becomming available. We want to associate our experimental data, for instance antigenic change, with the evolution relationship. Visualizing these associations in a phylogenetic tree can help us to identify evolution patterns. We believe we need a next generation tree viewer that should be programmable and extensible. It can view a phylogenetic tree easily as we did with classical software and support adding annotation data in a layer above the tree. This is the objective of developing the ggtree (Yu et al. 2017). Common tasks of annotating a phylogenetic tree should be easy and complicated tasks can be possible to achieve by adding multiple layers of annotation.
    The ggtree is designed by extending the ggplot2 (Wickham 2009) package. It is based on the grammar of graphics and takes all the good parts of ggplot2. There are other R packages that implement tree viewer using ggplot2, including OutbreakTools, phyloseq (McMurdie and Holmes 2013) and ggphylo; they mostly create complex tree view functions for their specific needs. Internally, these packages interpret a phylogenetic as a collection of lines, which makes it hard to annotate diverse user input that are related to node (taxa). The ggtree is different to them by interpreting a tree as a collection of taxa and allowing general flexibilities of annotating phylogenetic tree with diverse types of user inputs.

    Getting data into R

    Most of the tree viewer software (including R packages) focus on Newick and Nexus file format, while there are file formats from different evolution analysis software that contain supporting evidences within the file that are ready for annotating a phylogenetic tree. The treeio package supports several file formats and software outputs. It brings analysis findings to R users for further analysis (e.g. summarization, visualization, comparison and test, etc.). It also allows external data to be mapped on the phylogeny. Please refer to the treeio vignette for more details.
    Users can use the following command to open the vignette:
    1. vignette("Importer", package="treeio")
    All the data parsed/integrated by treeio package can be used to visualize or annotate phylogenetic tree in ggtree (Yu et al. 2017).

    Tree Visualization and Annotation

    Tree Visualization in ggtree is easy, with one line of command ggtree(tree_object). It supports several layouts, including rectangular, slanted, circular and fan for phylogram and cladogram, equal_angle and daylight for unrooted layout, time-scaled and two dimentional phylogenies. Tree Visualization vignette describes these feature in details.
    We implement several functions to manipulate a phylogenetic tree visually, including viewing selected clade to explore large tree, taxa clustering, rotating clade or tree, zoom out or collapsing clades etc..
    Tree manipulation functions.
Function Descriptiotn
collapse collapse a selecting clade
expand expand collapsed clade
flip exchange position of 2 clades that share a parent node
groupClade grouping clades
groupOTU grouping OTUs by tracing back to most recent common ancestor
identify interactive tree manipulation
rotate rotating a selected clade by 180 degree
rotate_tree rotating circular layout tree by specific angle
scaleClade zoom in or zoom out selecting clade
open_tree convert a tree to fan layout by specific open angle

Details and examples can be found in Tree Manipulation vignette.
Most of the phylogenetic trees are scaled by evolutionary distance (substitution/site), in ggtree a phylogenetic tree can be re-scaled by any numerical variable inferred by evolutionary analysis ( e.g. species divergence time, d/d, etc). Numerical and category variable can be used to color a phylogenetic tree.
The ggtree package provides several layers to annotate a phylogenetic tree. These layers are building blocks that can be freely combined together to create complex tree visualization.
Geom layers defined in ggtree.

Layer Description
geom_balance highlights the two direct descendant clades of an internal node
geom_cladelabel annotate a clade with bar and text label
geom_cladelabel2 annotate a clade with bar and text label for unrooted layout
geom_hilight highlight a clade with rectangle
geom_hilight_encircle highlight a clade with xspline for unrooted layout
geom_label2 modified version of geom_label, with subsetting supported
geom_nodelab layer for node labels, which can be text or image
geom_nodepoint annotate internal nodes with symbolic points
geom_point2 modified version of geom_point, with subsetting supported
geom_range bar layer to present uncertainty of evolutionary inference
geom_rootpoint annotate root node with symbolic point
geom_segment2 modified version of geom_segment, with subsetting supported
geom_strip annotate associated taxa with bar and (optional) text label
geom_taxalink associate two related taxa by linking them with a curve
geom_text2 modified version of geom_text, with subsetting supported
geom_tiplab layer of tip labels, which can be text or image
geom_tiplab2 layer of tip labels for circular layout
geom_tippoint annotate external nodes with symbolic points
geom_tree tree structure layer, with multiple layout supported
geom_treescale tree branch scale legend

ggtree supports creating phylomoji using Emoji fonts, please refer to the Phylomoji vignette.
ggtree integrates phylopic database and silhouette images of organisms can be downloaded and used to annotate phylogenetic directly. ggtree also supports using local or remote images to annotate a phylogenetic tree. For details, please refer to the ggimage package vignette, which can be opened via the following command:

  1. vignette("ggtree", package="ggimage")

Visualizing an annotated phylogenetic tree with numerical matrix (e.g. genotype table), multiple sequence alignment and subplots are also supported in ggtree. Examples of annotating phylogenetic trees can be found in the Tree Annotation vignette.

Vignette Entry

ggtree homepage: https://guangchuangyu.github.io/software/ggtree (contains more information about the package, more documentation, a gallery of beautiful published images and links to related resources).

Need helps?

If you have questions/issues, please visit ggtree homepage first. Your problems are mostly documented. If you think you found a bug, please follow the guide and provide a reproducible example to be posted on github issue tracker. For questions, please post to google group. Users are highly recommended to subscribe to the mailing list.

Session info

Here is the output of sessionInfo() on the system on which this document was compiled:

  1. ## R version 3.5.2 (2018-12-20)
  2. ## Platform: x86_64-w64-mingw32/x64 (64-bit)
  3. ## Running under: Windows Server 2012 R2 x64 (build 9600)
  4. ##
  5. ## Matrix products: default
  6. ##
  7. ## locale:
  8. ## [1] LC_COLLATE=C
  9. ## [2] LC_CTYPE=English_United States.1252
  10. ## [3] LC_MONETARY=English_United States.1252
  11. ## [4] LC_NUMERIC=C
  12. ## [5] LC_TIME=English_United States.1252
  13. ##
  14. ## attached base packages:
  15. ## [1] stats graphics grDevices utils datasets methods base
  16. ##
  17. ## loaded via a namespace (and not attached):
  18. ## [1] compiler_3.5.2 magrittr_1.5 htmltools_0.3.6 tools_3.5.2
  19. ## [5] prettydoc_0.2.1 yaml_2.2.0 Rcpp_1.0.0 stringi_1.2.4
  20. ## [9] rmarkdown_1.11 highr_0.7 knitr_1.21 stringr_1.3.1
  21. ## [13] digest_0.6.18 xfun_0.4 evaluate_0.12

References

McMurdie, Paul J., and Susan Holmes. 2013. “Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLoS ONE 8 (4):e61217. https://doi.org/10.1371/journal.pone.0061217.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. 1st ed. Springer.
Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1):28–36. https://doi.org/10.1111/2041-210X.12628.

—————————————————————————

Tree Visualization

Guangchuang Yu and Tommy Tsan-Yuk Lam

School of Basic Medical Sciences, Southern Medical University

2019-01-14

To view a phylogenetic tree, we first need to parse the tree file into R. The ggtree (Yu et al. 2017) package supports many file formats via the treeio package, including output files of commonly used software packages in evolutionary biology. For more details, plase refer to the treeio vignette.

  1. library("treeio")
  2. library("ggtree")
  3. nwk <- system.file("extdata", "sample.nwk", package="treeio")
  4. tree <- read.tree(nwk)

Viewing a phylogenetic tree with ggtree

The ggtree package extends ggplot2 (Wickham 2009) package to support viewing phylogenetic tree. It implements geom_tree layer for displaying phylogenetic tree, as shown below:

  1. ggplot(tree, aes(x, y)) + geom_tree() + theme_tree()

image.png
The function, ggtree, was implemented as a short cut to visualize a tree, and it works exactly the same as shown above.
ggtree takes all the advantages of ggplot2. For example, we can change the color, size and type of the lines as we do with ggplot2.

  1. ggtree(tree, color="firebrick", size=1, linetype="dotted")

image.png
By default, the tree is viewed in ladderize form, user can set the parameter ladderize = FALSE to disable it.

  1. ggtree(tree, ladderize=FALSE)

image.png
The branch.length is used to scale the edge, user can set the parameter branch.length = “none” to only view the tree topology (cladogram) or other numerical variable to scale the tree (e.g. d/d, see also in Tree Annotation vignette).

  1. ggtree(tree, branch.length="none")

image.png

Layout

Currently, ggtree supports several layouts, including:

  • rectangular (by default)
  • slanted
  • circular
  • fan

for phylogram (by default) and cladogram if user explicitly setting branch.length=‘none’. Unrooted (equal angle and daylight methods), time-scaled and 2-dimensional layouts are also supported.

Phylogram and Cladogram

  1. library(ggtree)
  2. set.seed(2017-02-16)
  3. tr <- rtree(50)
  4. ggtree(tr)
  5. ggtree(tr, layout="slanted")
  6. ggtree(tr, layout="circular")
  7. ggtree(tr, layout="fan", open.angle=120)
  8. ggtree(tr, layout="equal_angle")
  9. ggtree(tr, layout="daylight")
  10. ggtree(tr, branch.length='none')
  11. ggtree(tr, branch.length='none', layout='circular')
  12. ggtree(tr, layout="daylight", branch.length='none')

image.png
There are also other possible layouts that can be drawn by modifying scales/coordination, for examples, reverse label of time scale, repropotion circular/fan tree, etc..

  1. ggtree(tr) + scale_x_reverse()
  2. ggtree(tr) + coord_flip()
  3. ggtree(tr) + scale_x_reverse() + coord_flip()
  4. print(ggtree(tr), newpage=TRUE, vp=grid::viewport(angle=-30, width=.9, height=.9))
  5. ggtree(tr, layout='slanted') + coord_flip()
  6. ggtree(tr, layout='slanted', branch.length='none') +
  7. coord_flip() + scale_y_reverse() +scale_x_reverse()
  8. ggtree(tr, layout='circular') + xlim(-10, NA)
  9. ggtree(tr) + scale_x_reverse() + coord_polar(theta='y')
  10. ggtree(tr) + scale_x_reverse(limits=c(10, 0)) + coord_polar(theta='y')

image.png

Time-scaled tree

A phylogenetic tree can be scaled by time (time-scaled tree) by specifying the parameter, mrsd (most recent sampling date).

  1. tree2d <- read.beast(system.file("extdata", "twoD.tree", package="treeio"))
  2. ggtree(tree2d, mrsd="2014-05-01") + theme_tree2()

image.png

Two dimensional tree

ggtree implemented two dimensional tree. It accepts parameter yscale to scale the y-axis based on the selected tree attribute. The attribute should be numerical variable. If it is character/category variable, user should provides a name vector of mapping the variable to numeric by passing it to parameter yscale_mapping.

  1. ggtree(tree2d, mrsd="2014-05-01",
  2. yscale="NGS", yscale_mapping=c(N2=2, N3=3, N4=4, N5=5, N6=6, N7=7)) +
  3. theme_classic() + theme(axis.line.x=element_line(), axis.line.y=element_line()) +
  4. theme(panel.grid.major.x=element_line(color="grey20", linetype="dotted", size=.3),
  5. panel.grid.major.y=element_blank()) +
  6. scale_y_continuous(labels=paste0("N", 2:7))

image.png
In this example, the figure demonstrates the quantity of y increase along the trunk. User can highlight the trunk with different line size or color using the functions described in Tree Manipulation vignette.

Displaying tree scale (evolution distance)

To show tree scale, user can use geom_treescale() layer.

  1. ggtree(tree) + geom_treescale()

image.png
geom_treescale() supports the following parameters:

  • x and y for tree scale position
  • width for the length of the tree scale
  • fontsize for the size of the text
  • linesize for the size of the line
  • offset for relative position of the line and the text
  • color for color of the tree scale
    1. ggtree(tree) + geom_treescale(x=0, y=12, width=6, color='red')
    2. ggtree(tree) + geom_treescale(fontsize=8, linesize=2, offset=-1)
    image.png
    We can also use theme_tree2() to display the tree scale by adding x axis.
    1. ggtree(tree) + theme_tree2()
    image.png
    Tree scale is not restricted to evolution distance, ggtree can re-scale the tree with other numerical variable. More details can be found in the Tree Annotation vignette.

    Displaying nodes/tips

    Showing all the internal nodes and tips in the tree can be done by adding a layer of points using geom_nodepoint, geom_tippoint or geom_point.
    1. ggtree(tree) + geom_point(aes(shape=isTip, color=isTip), size=3)
    image.png
    1. p <- ggtree(tree) + geom_nodepoint(color="#b5e521", alpha=1/4, size=10)
    2. p + geom_tippoint(color="#FDAC4F", shape=8, size=3)
    image.png

    Displaying labels

    Users can use geom_text or geom_label to display the node (if available) and tip labels simultaneously or geom_tiplab to only display tip labels:
    1. p + geom_tiplab(size=3, color="purple")
    image.png
    geom_tiplab not only supports using text or label geom to display labels, it also supports image geom to label tip with image files. A corresponding geom, geom_nodelab is also provided for displaying node labels. For details of label nodes with images, please refer to the vignette, Annotating phylogenetic tree with images.
    For circular and unrooted layout, ggtree supports rotating node labels according to the angles of the branches.
    1. ggtree(tree, layout="circular") + geom_tiplab(aes(angle=angle), color='blue')
    image.png
    To make it more readable for human eye, ggtree provides a geom_tiplab2 for circular layout (see post 1 and 2).
    1. ggtree(tree, layout="circular") + geom_tiplab2(color='blue')
    image.png
    By default, the positions are based on the node positions, we can change them to based on the middle of the branch/edge.
    1. p + geom_tiplab(aes(x=branch), size=3, color="purple", vjust=-0.3)
    image.png
    Based on the middle of branch is very useful when annotating transition from parent node to child node.

    Update tree view with a new tree

    In previous example, we have a p object that stored the tree viewing of 13 tips and internal nodes highlighted with specific colored big dots. If users want to apply this pattern (we can imaging a more complex one) to a new tree, you don’t need to build the tree step by step. ggtree provides an operator, %<%, for applying the visualization pattern to a new tree.
    For example, the pattern in the p object will be applied to a new tree with 50 tips as shown below:
    1. p %<% rtree(50)
    image.png

    Theme

    theme_tree() defined a totally blank canvas, while theme_tree2() adds phylogenetic distance (via x-axis). These two themes all accept a parameter of bgcolor that defined the background color. Users can pass any theme components to the theme_tree() function to modify them.
    1. ggtree(rtree(30), color="red") + theme_tree("steelblue")
    2. ggtree(rtree(20), color="white") + theme_tree("black")
    image.png

    Visualize a list of trees

    ggtree supports multiPhylo object and a list of trees can be viewed simultaneously.
    1. trees <- lapply(c(10, 20, 40), rtree)
    2. class(trees) <- "multiPhylo"
    3. ggtree(trees) + facet_wrap(~.id, scale="free") + geom_tiplab()
    image.png
    One hundred bootstrap trees can also be view simultaneously.
    1. btrees <- read.tree(system.file("extdata/RAxML", "RAxML_bootstrap.H3", package="treeio"))
    2. ggtree(btrees) + facet_wrap(~.id, ncol=10)
    image.png
    Another way to view the bootstrap trees is to merge them together to form a density tree. We can add a layer of the best tree on the top of the density tree.
    1. p <- ggtree(btrees, layout="rectangular", color="lightblue", alpha=.3)
    2. best_tree <- read.tree(system.file("extdata/RAxML", "RAxML_bipartitionsBranchLabels.H3", package="treeio"))
    3. df <- fortify(best_tree, branch.length='none')
    4. p+geom_tree(data=df, color='firebrick')
    image.png

    Rescale tree

    Most of the phylogenetic trees are scaled by evolutionary distance (substitution/site). In ggtree, users can re-scale a phylogenetic tree by any numerical variable inferred by evolutionary analysis (e.g. dN/dS).
    1. library("treeio")
    2. beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
    3. beast_tree <- read.beast(beast_file)
    4. beast_tree
    1. ## 'treedata' S4 object that stored information of
    2. ## 'C:/Users/biocbuild/bbs-3.8-bioc/tmpdir/RtmpojHSZc/Rinst13901e4f3fa/ggtree/examples/MCC_FluA_H3.tree'.
    3. ##
    4. ## ...@ phylo:
    5. ## Phylogenetic tree with 76 tips and 75 internal nodes.
    6. ##
    7. ## Tip labels:
    8. ## A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
    9. ##
    10. ## Rooted; includes branch lengths.
    11. ##
    12. ## with the following features available:
    13. ## 'height', 'height_0.95_HPD', 'height_median', 'height_range', 'length',
    14. ## 'length_0.95_HPD', 'length_median', 'length_range', 'posterior', 'rate',
    15. ## 'rate_0.95_HPD', 'rate_median', 'rate_range'.
    1. p1 <- ggtree(beast_tree, mrsd='2013-01-01') + theme_tree2() +
    2. ggtitle("Divergence time")
    3. p2 <- ggtree(beast_tree, branch.length='rate') + theme_tree2() +
    4. ggtitle("Substitution rate")
    5. library(cowplot)
    6. plot_grid(p1, p2, ncol=2)
    image.png
    1. mlcfile <- system.file("extdata/PAML_Codeml", "mlc", package="treeio")
    2. mlc_tree <- read.codeml_mlc(mlcfile)
    3. p1 <- ggtree(mlc_tree) + theme_tree2() +
    4. ggtitle("nucleotide substitutions per codon")
    5. p2 <- ggtree(mlc_tree, branch.length='dN_vs_dS') + theme_tree2() +
    6. ggtitle("dN/dS tree")
    7. plot_grid(p1, p2, ncol=2)
    image.png
    In addition to specify branch.length in tree visualization, users can change branch length stored in tree object by using rescale_tree function.
    1. beast_tree2 <- rescale_tree(beast_tree, branch.length='rate')
    2. ggtree(beast_tree2) + theme_tree2()
    image.png

    Zoom on a portion of tree

    ggtree provides gzoom function that similar to zoom function provided in ape. This function plots simultaneously a whole phylogenetic tree and a portion of it. It aims at exploring very large trees.
    1. library("ape")
    2. data(chiroptera)
    3. library("ggtree")
    4. gzoom(chiroptera, grep("Plecotus", chiroptera$tip.label))
    image.png
    Zoom in selected clade of a tree that was already annotated with ggtree is also supported.
    1. groupInfo <- split(chiroptera$tip.label, gsub("_\\w+", "", chiroptera$tip.label))
    2. chiroptera <- groupOTU(chiroptera, groupInfo)
    3. p <- ggtree(chiroptera, aes(color=group)) + geom_tiplab() + xlim(NA, 23)
    4. gzoom(p, grep("Plecotus", chiroptera$tip.label), xmax_adjust=2)
    image.png

    Color tree

    In ggtree, coloring phylogenetic tree is easy, by using aes(color=VAR) to map the color of tree based on a specific variable (numeric and category are both supported).
    1. ggtree(beast_tree, aes(color=rate)) +
    2. scale_color_continuous(low='darkgreen', high='red') +
    3. theme(legend.position="right")
    image.png
    User can use any feature (if available), including clade posterior and dN/dS etc., to scale the color of the tree.

    References

    Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. 1st ed. Springer.
    Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1):28–36. https://doi.org/10.1111/2041-210X.12628.

    —————————————————————————-

Tree Manipulation

Guangchuang Yu

School of Basic Medical Sciences, Southern Medical University

2019-01-14

  • Internal node number
  • View Clade
  • Group Clades
  • Group OTUs
  • Collapse clade
  • Expand collapsed clade
  • Scale clade
  • Rotate clade
  • Flip clade
  • Open tree
  • Rotate tree
  • Interactive tree manipulation

    Internal node number

    Some of the functions in ggtree work with clade and accept a parameter of internal node number. To get the internal node number, user can use geom_text2 to display it:
    1. nwk <- system.file("extdata", "sample.nwk", package="treeio")
    2. tree <- read.tree(nwk)
    3. ggtree(tree) + geom_text2(aes(subset=!isTip, label=node), hjust=-.3) + geom_tiplab()
    image.png
    Another way to get the internal node number is using MRCA() function by providing a vector of taxa names. The function will return node number of input taxa’s most recent commond ancestor (MRCA). It works with tree and graphic object.
    1. MRCA(tree, tip=c('A', 'E'))
    1. ## [1] 17
    1. MRCA(tree, tip=c('H', 'G'))
    1. ## [1] 21
    1. p <- ggtree(tree)
    2. MRCA(p, tip=c('A', 'E'))
    1. ## [1] 17

    View Clade

    ggtree provides a function viewClade to visualize a clade of a phylogenetic tree.
    1. viewClade(p+geom_tiplab(), node=21)
    image.png

    Group Clades

    The ggtree package defined several functions to manipulate tree view. groupClade and groupOTU methods were designed for clustering clades or related OTUs. groupClade accepts an internal node or a vector of internal nodes to cluster clade/clades.
    Both groupClade and groupOTU work fine with tree and graphic object.
    1. tree <- groupClade(tree, .node=21)
    2. ggtree(tree, aes(color=group, linetype=group))
    image.png
    The following command will produce the same figure.
    1. ggtree(read.tree(nwk)) %>% groupClade(.node=21) + aes(color=group, linetype=group)
    With groupClade and groupOTU, it’s easy to highlight selected taxa and easy to select taxa to display related features.
    1. tree <- groupClade(tree, .node=c(21, 17))
    2. ggtree(tree, aes(color=group, linetype=group)) + geom_tiplab(aes(subset=(group==2)))
    image.png

    Group OTUs

    groupOTU accepts a vector of OTUs (taxa name) or a list of OTUs. groupOTU will trace back from OTUs to their most recent common ancestor and cluster them together. Related OTUs are not necessarily within a clade, they can be monophyletic (clade), polyphyletic or paraphyletic.
    1. tree <- groupOTU(tree, .node=c("D", "E", "F", "G"))
    1. ggtree(tree, aes(color=group)) + geom_tiplab()
    image.png
    groupOTU can also input a list of tip groups.
    1. cls <- list(c1=c("A", "B", "C", "D", "E"),
    2. c2=c("F", "G", "H"),
    3. c3=c("L", "K", "I", "J"),
    4. c4="M")
    5. tree <- groupOTU(tree, cls)
    6. library("colorspace")
    7. ggtree(tree, aes(color=group, linetype=group)) + geom_tiplab() +
    8. scale_color_manual(values=c("black", rainbow_hcl(4))) + theme(legend.position="right")
    image.png
    groupOTU also works with graphic object.
    1. p <- ggtree(tree)
    2. groupOTU(p, LETTERS[1:5]) + aes(color=group) + geom_tiplab() + scale_color_manual(values=c("black", "firebrick"))
    image.png
    The following example use groupOTU to display taxa classification.
    1. library("ape")
    2. data(chiroptera)
    3. groupInfo <- split(chiroptera$tip.label, gsub("_\\w+", "", chiroptera$tip.label))
    4. chiroptera <- groupOTU(chiroptera, groupInfo)
    5. ggtree(chiroptera, aes(color=group), layout='circular') + geom_tiplab(size=1, aes(angle=angle))
    image.png

    Collapse clade

    With collapse function, user can collapse a selected clade.
    1. cp <- collapse(p, node=21)
    2. cp + geom_point2(aes(subset=(node == 21)), size=5, shape=23, fill="steelblue")
    image.png

    Expand collapsed clade

    The collapsed clade can be expanded via expand function.
    1. cp %>% expand(node=21)
    image.png
    1. p1 <- ggtree(tree)
    2. p2 <- collapse(p1, 21) + geom_point2(aes(subset=(node==21)), size=5, shape=23, fill="blue")
    3. p3 <- collapse(p2, 17) + geom_point2(aes(subset=(node==17)), size=5, shape=23, fill="red")
    4. p4 <- expand(p3, 17)
    5. p5 <- expand(p4, 21)
    6. library(cowplot)
    7. plot_grid(p1, p2, p3, p4, p5, ncol=5)
    image.png

    Scale clade

    Collpase selected clades can save some space, another approach is to zoom out clade to a small scale.
    1. plot_grid(ggtree(tree) + geom_hilight(21, "steelblue"),
    2. ggtree(tree) %>% scaleClade(21, scale=0.3) + geom_hilight(21, "steelblue"),
    3. ncol=2)
    image.png
    Of course, scaleClade can accept scale larger than 1 and zoom in the selected portion.
    1. plot_grid(ggtree(tree) + geom_hilight(17, fill="steelblue") +
    2. geom_hilight(21, fill="darkgreen"),
    3. ggtree(tree) %>% scaleClade(17, scale=2) %>% scaleClade(21, scale=0.3) +
    4. geom_hilight(17, "steelblue") + geom_hilight(21, fill="darkgreen"),
    5. ncol=2)
    image.png

    Rotate clade

    A selected clade can be rotated by 180 degree using rotate function.
    1. tree <- groupClade(tree, c(21, 17))
    2. p <- ggtree(tree, aes(color=group)) + scale_color_manual(values=c("black", "firebrick", "steelblue"))
    3. p2 <- rotate(p, 21) %>% rotate(17)
    4. plot_grid(p, p2, ncol=2)
    image.png
    1. set.seed(2016-05-29)
    2. p <- ggtree(tree <- rtree(50)) + geom_tiplab()
    3. for (n in reorder(tree, 'postorder')$edge[,1] %>% unique) {
    4. p <- rotate(p, n)
    5. print(p + geom_point2(aes(subset=(node == n)), color='red'))
    6. }
    image.gif

    Flip clade

    The positions of two selected clades (should share a same parent) can be flip over using flip function.
    1. plot_grid(p, flip(p, 17, 21), ncol=2)
    image.png

    Open tree

    ggtree supports fan layout and can also transform the circular layout tree to a fan tree by specifying an open angle to open_tree function.
    1. set.seed(123)
    2. tr <- rtree(50)
    3. p <- ggtree(tr, layout='circular') + geom_tiplab2()
    4. for (angle in seq(0, 270, 10)) {
    5. print(open_tree(p, angle=angle) + ggtitle(paste("open angle:", angle)))
    6. }
    image.gif

    Rotate tree

    Rotating a circular tree is supported by rotate_tree function.
    1. for (angle in seq(0, 270, 10)) {
    2. print(rotate_tree(p, angle) + ggtitle(paste("rotate angle:", angle)))
    3. }
    image.gif

    Interactive tree manipulation

    Interactive tree manipulation is also possible, please refer to https://guangchuangyu.github.io/2016/06/identify-method-for-ggtree.

—————————————————————————

Tree Annotation

Guangchuang Yu

School of Basic Medical Sciences, Southern Medical University

2019-01-14

  • Annotate clades
  • Labelling associated taxa (Monophyletic, Polyphyletic or Paraphyletic)
  • Highlight clades
  • Taxa connection
  • Tree annotation with output from evolution software
  • Tree annotation with user specified annotation
  • Visualize tree with associated matrix
  • Visualize tree with multiple sequence alignment
  • Plot tree with associated data
  • Plot tree with images and suplots
  • References

    Annotate clades

    ggtree (Yu et al. 2017) implements geom_cladelabel layer to annotate a selected clade with a bar indicating the clade with a corresponding label.
    The geom_cladelabel layer accepts a selected internal node number. To get the internal node number, please refer to Tree Manipulation vignette.
    1. set.seed(2015-12-21)
    2. tree <- rtree(30)
    3. p <- ggtree(tree) + xlim(NA, 6)
    4. p + geom_cladelabel(node=45, label="test label") +
    5. geom_cladelabel(node=34, label="another clade")
    image.png
    Users can set the parameter, align = TRUE, to align the clade label, and use the parameter, offset, to adjust the position.
    1. p + geom_cladelabel(node=45, label="test label", align=TRUE, offset=.5) +
    2. geom_cladelabel(node=34, label="another clade", align=TRUE, offset=.5)
    image.png
    Users can change the color of the clade label via the parameter color.
    1. p + geom_cladelabel(node=45, label="test label", align=T, color='red') +
    2. geom_cladelabel(node=34, label="another clade", align=T, color='blue')
    image.png
    Users can change the angle of the clade label text and relative position from text to bar via the parameter offset.text.
    1. p + geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5) +
    2. geom_cladelabel(node=34, label="another clade", align=T, angle=45)
    image.png
    The size of the bar and text can be changed via the parameters barsize and fontsize respectively.
    1. p + geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5, barsize=1.5) +
    2. geom_cladelabel(node=34, label="another clade", align=T, angle=45, fontsize=8)
    image.png
    Users can also use geom_label to label the text.
    1. p + geom_cladelabel(node=34, label="another clade", align=T, geom='label', fill='lightblue')
    image.png

    Annotate clades for unrooted tree

    ggtree provides geom_clade2 for labeling clades of unrooted layout trees.
    1. pg <- ggtree(tree, layout="daylight")
    2. pg + geom_cladelabel2(node=45, label="test label", angle=10) +
    3. geom_cladelabel2(node=34, label="another clade", angle=305)
    image.png

    Labelling associated taxa (Monophyletic, Polyphyletic or Paraphyletic)

    geom_cladelabel is designed for labelling Monophyletic (Clade) while there are related taxa that are not form a clade. ggtree provides geom_strip to add a strip/bar to indicate the association with optional label (see the issue).
    1. nwk <- system.file("extdata", "sample.nwk", package="treeio")
    2. tree <- read.tree(nwk)
    3. ggtree(tree) + geom_tiplab() +
    4. geom_strip(5, 7, barsize=2, color='red') +
    5. geom_strip(6, 12, barsize=2, color='blue')
    image.png

    Highlight clades

    ggtree implements geom_hilight layer, that accepts an internal node number and add a layer of rectangle to highlight the selected clade.
    1. ggtree(tree) + geom_hilight(node=21, fill="steelblue", alpha=.6) +
    2. geom_hilight(node=17, fill="darkgreen", alpha=.6)
    image.png
    1. ggtree(tree, layout="circular") + geom_hilight(node=21, fill="steelblue", alpha=.6) +
    2. geom_hilight(node=23, fill="darkgreen", alpha=.6)
    image.png
    Another way to highlight selected clades is setting the clades with different colors and/or line types as demonstrated in Tree Manipulation vignette.

    Highlight balances

    In addition to geom_hilight, ggtree also implements geom_balance which is designed to highlight neighboring subclades of a given internal node.
    1. ggtree(tree) +
    2. geom_balance(node=16, fill='steelblue', color='white', alpha=0.6, extend=1) +
    3. geom_balance(node=19, fill='darkgreen', color='white', alpha=0.6, extend=1)
    image.png

    Highlight clades for unrooted tree

    ggtree provides geom_hilight_encircle to support highlight clades for unrooted layout trees.
    1. pg + geom_hilight_encircle(node=45) + geom_hilight_encircle(node=34, fill='darkgreen')
    image.png

    Taxa connection

    Some evolutionary events (e.g. reassortment, horizontal gene transfer) can be modeled by a simple tree. ggtree provides geom_taxalink layer that allows drawing straight or curved lines between any of two nodes in the tree, allow it to represent evolutionary events by connecting taxa.
    1. ggtree(tree) + geom_tiplab() + geom_taxalink('A', 'E') +
    2. geom_taxalink('F', 'K', color='red', arrow=grid::arrow(length=grid::unit(0.02, "npc")))
    image.png

    Tree annotation with output from evolution software

    The treeio package implemented several parser functions to parse output from commonly used software in evolutionary biology.
    Here, we used BEAST (Bouckaert et al. 2014) output as an example. For details, please refer to the Importer vignette.
    1. file <- system.file("extdata/BEAST", "beast_mcc.tree", package="treeio")
    2. beast <- read.beast(file)
    3. ggtree(beast, aes(color=rate)) +
    4. geom_range(range='length_0.95_HPD', color='red', alpha=.6, size=2) +
    5. geom_nodelab(aes(x=branch, label=round(posterior, 2)), vjust=-.5, size=3) +
    6. scale_color_continuous(low="darkgreen", high="red") +
    7. theme(legend.position=c(.1, .8))
    image.png

    Tree annotation with user specified annotation

    Integrating user data to annotate phylogenetic tree can be done at different levels. The treeio package implements full_join methods to combine tree data to phylogenetic tree object. The tidytree package supports linking tree data to phylogeny using tidyverse verbs. ggtree supports mapping external data to phylogeny for visualization and annotation on the fly.

    The %<+% operator

    Suppose we have the following data that associate with the tree and would like to attach the data in the tree.
    1. nwk <- system.file("extdata", "sample.nwk", package="treeio")
    2. tree <- read.tree(nwk)
    3. p <- ggtree(tree)
    4. dd <- data.frame(taxa = LETTERS[1:13],
    5. place = c(rep("GZ", 5), rep("HK", 3), rep("CZ", 4), NA),
    6. value = round(abs(rnorm(13, mean=70, sd=10)), digits=1))
    7. ## you don't need to order the data
    8. ## data was reshuffled just for demonstration
    9. dd <- dd[sample(1:13, 13), ]
    10. row.names(dd) <- NULL
    1. print(dd)
    | taxa | place | value | | —- | —- | —- | | D | GZ | 78.4 | | K | CZ | 72.7 | | C | GZ | 83.0 | | H | HK | 102.6 | | E | GZ | 75.3 | | M | NA | 67.1 | | J | CZ | 70.4 | | A | GZ | 51.5 | | B | GZ | 56.6 | | L | CZ | 79.6 | | F | HK | 55.9 | | I | CZ | 68.0 | | G | HK | 86.1 |

We can imaging that the place column stores the location that we isolated the species and value column stores numerical values (e.g. bootstrap values).
We have demonstrated using the operator, %<%, to update a tree view with a new tree. Here, we will introduce another operator, %<+%, that attaches annotation data to a tree view. The only requirement of the input data is that its first column should be matched with the node/tip labels of the tree.
After attaching the annotation data to the tree by %<+%, all the columns in the data are visible to ggtree. As an example, here we attach the above annotation data to the tree view, p, and add a layer that showing the tip labels and colored them by the isolation site stored in place column.

  1. p <- p %<+% dd + geom_tiplab(aes(color=place)) +
  2. geom_tippoint(aes(size=value, shape=place, color=place), alpha=0.25)
  3. p + theme(legend.position="right")

image.png
Once the data was attached, it is always attached. So that we can add other layers to display these information easily.

  1. p + geom_text(aes(color=place, label=place), hjust=1, vjust=-0.4, size=3) +
  2. geom_text(aes(color=place, label=value), hjust=1, vjust=1.4, size=3)

image.png

Visualize tree with associated matrix

The gheatmap function is designed to visualize phylogenetic tree with heatmap of associated matrix.
In the following example, we visualized a tree of H3 influenza viruses with their associated genotype.

  1. beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
  2. beast_tree <- read.beast(beast_file)
  3. genotype_file <- system.file("examples/Genotype.txt", package="ggtree")
  4. genotype <- read.table(genotype_file, sep="\t", stringsAsFactor=F)
  5. colnames(genotype) <- sub("\\.$", "", colnames(genotype))
  6. p <- ggtree(beast_tree, mrsd="2013-01-01") + geom_treescale(x=2008, y=1, offset=2)
  7. p <- p + geom_tiplab(size=2)
  8. gheatmap(p, genotype, offset=5, width=0.5, font.size=3, colnames_angle=-45, hjust=0) +
  9. scale_fill_manual(breaks=c("HuH3N2", "pdm", "trig"), values=c("steelblue", "firebrick", "darkgreen"))

image.png
The width parameter is to control the width of the heatmap. It supports another parameter offset for controlling the distance between the tree and the heatmap, for instance to allocate space for tip labels.
For time-scaled tree, as in this example, it’s more often to use x axis by using theme_tree2. But with this solution, the heatmap is just another layer and will change the x axis. To overcome this issue, we implemented scale_x_ggtree to set the x axis more reasonable.

  1. p <- ggtree(beast_tree, mrsd="2013-01-01") + geom_tiplab(size=2, align=TRUE, linesize=.5) + theme_tree2()
  2. pp <- (p + scale_y_continuous(expand=c(0, 0.3))) %>%
  3. gheatmap(genotype, offset=8, width=0.6, colnames=FALSE) %>%
  4. scale_x_ggtree()
  5. pp + theme(legend.position="right")

image.png

Visualize tree with multiple sequence alignment

With msaplot function, user can visualize multiple sequence alignment with phylogenetic tree, as demonstrated below:

  1. fasta <- system.file("examples/FluA_H3_AA.fas", package="ggtree")
  2. msaplot(ggtree(beast_tree), fasta)

image.png
A specific slice of the alignment can also be displayed by specific window parameter.

  1. msaplot(ggtree(beast_tree), fasta, window=c(150, 200)) + coord_polar(theta='y')

image.png

Plot tree with associated data

For associating phylogenetic tree with different type of plot produced by user’s data, ggtree provides facet_plot function which accepts an input data.frame and a geom function to draw the input data. The data will be displayed in an additional panel of the plot.

  1. tr <- rtree(30)
  2. d1 <- data.frame(id=tr$tip.label, val=rnorm(30, sd=3))
  3. p <- ggtree(tr)
  4. p2 <- facet_plot(p, panel="dot", data=d1, geom=geom_point, aes(x=val), color='firebrick')
  5. d2 <- data.frame(id=tr$tip.label, value=abs(rnorm(30, mean=100, sd=50)))
  6. facet_plot(p2, panel='bar', data=d2, geom=geom_segment, aes(x=0, xend=value, y=y, yend=y), size=3, color='steelblue') + theme_tree2()

image.png

Plot tree with images and suplots

Please refer to the following vignettes: