作者:Zuguang Gu8
    翻译:Steven Shen
    原文:https://jokergoo.github.io/ComplexHeatmap-reference/book/oncoprint.html#complex-alteration-types


    It is very easy to have many more different alteration types when integrating information from multiple analysis results. It is sometimes difficult to design graphics and assign different colors for them (e.g. see plot in this link. On the other hand, in these alteration types, there are primary classes of alteration types which is more important to distinguish, while there are secondary classes which is less important. For example, we may have alteration types of “intronic snv”, “exonic snv”, “intronic indel” and “exonic indel”. Actually we can classify them into two classes where “snv/indel” is more important and they belong to the primary class, and “intronic/exonic” is less important and they belong to the secondary class. Reflecting on the oncoPrint, for the “intronic snv” and “exonic snv”, we want to use similar graphics because they are snvs and we want them visually similar, and we add slightly different symbols to represent “intronic” and “exonic”, E.g. we can use red rectangle for snv and above the red rectangles, we use dots to represent “intronic” and cross lines to represent “exonic”. On the barplot annotations which summarize the number of different alteration types, we don’t want to separate “intronic snv” and “exonic snv” while we prefer to simply get the total number of snv to get rid of too many categories in the barplots.

    Let’s demonstrate this scenario by following simulated data. To simplify the example, we assume for a single gene in a single sample, it only has either snv or indel and it can only be either intronic or exonic. If there is no “intronic” or “exonic” attached to the gene, it basically means we don’t have this gene-related information (maybe it is an intergenic snv/indel).

    1. set.seed(123)
    2. x1 = sample(c("", "snv"), 100, replace = TRUE, prob = c(8, 2))
    3. x2 = sample(c("", "indel"), 100, replace = TRUE, prob = c(8, 2))
    4. x2[x1 == "snv"] = ""
    5. x3 = sample(c("", "intronic"), 100, replace = TRUE, prob = c(5, 5))
    6. x4 = sample(c("", "exonic"), 100, replace = TRUE, prob = c(5, 5))
    7. x3[x1 == "" & x2 == ""] = ""
    8. x4[x1 == "" & x2 == ""] = ""
    9. x4[x3 == "intronic"] = ""
    10. x = apply(cbind(x1, x2, x3, x4), 1, function(x) {
    11. x = x[x != ""]
    12. paste(x, collapse = ";")
    13. })
    14. m = matrix(x, nrow = 10, ncol = 10, dimnames = list(paste0("g", 1:10), paste0("s", 1:10)))
    15. m[1:4, 1:4]
    16. ## s1 s2 s3 s4
    17. ## g1 "" "snv;intronic" "snv;intronic" "snv"
    18. ## g2 "" "" "" "snv;intronic"
    19. ## g3 "" "" "" ""
    20. ## g4 "snv" "indel;exonic" "snv" ""

    Now in m, there are four different alteration types: snv, indel, intronic and exonic. Next we define alter_fun for the four alterations.

    1. alter_fun = list(
    2. background = function(x, y, w, h)
    3. grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "#CCCCCC", col = NA)),
    4. # red rectangles
    5. snv = function(x, y, w, h)
    6. grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "red", col = NA)),
    7. # blue rectangles
    8. indel = function(x, y, w, h)
    9. grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "blue", col = NA)),
    10. # dots
    11. intronic = function(x, y, w, h)
    12. grid.points(x, y, pch = 16),
    13. # crossed lines
    14. exonic = function(x, y, w, h) {
    15. grid.segments(x - w*0.4, y - h*0.4, x + w*0.4, y + h*0.4, gp = gpar(lwd = 2))
    16. grid.segments(x + w*0.4, y - h*0.4, x - w*0.4, y + h*0.4, gp = gpar(lwd = 2))
    17. }
    18. )

    For the alteration types in the primary class (snv and indel), we use colorred rectangles to represent them because the rectangles are visually obvious, while for the alteration types in the secondary class (intronic and exonic), we only use simple symbols (dots for intronic and crossed diagonal lines for exonic). Since there is no color corresponding to intronic and exonic, we don’t need to define colors for these two types, and on the barplot annotation for genes and samples, only snv and indel are visualized (so the height for snv in the barplot corresponds the number of intronic snv plus exonic snv).

    In following code which draws the oncoPrint, we add another legend for the intronic/exonic types. note a pch value of 16 corresponds to a dot and a value of 28 corresponds to crossed diagonal lines (see the last plot in Section 5.2 for pch 26, 27, 28).

    1. # we only define color for snv and indel, so barplot annotations only show snv and indel
    2. ht = oncoPrint(m, alter_fun = alter_fun, col = c(snv = "red", indel = "blue"))
    3. draw(ht, heatmap_legend_list = list(
    4. Legend(labels = c("intronic", "exonic"), type = "points", pch = c(16, 28))
    5. ))

    7.1.4 复杂的变更类型 - 图1