作者:Zuguang Gu8
    翻译:Steven Shen
    原文:https://jokergoo.github.io/ComplexHeatmap-reference/book/oncoprint.html#input-data-format


    There are two different formats of input data. The first is represented as a matrix in which each value can include multiple alterations in a form of a complicated string. In follow example, ‘g1’ in ‘s1’ has two types of alterations which are ‘snv’ and ‘indel’.

    1. mat = read.table(textConnection(
    2. "s1,s2,s3
    3. g1,snv;indel,snv,indel
    4. g2,,snv;indel,snv
    5. g3,snv,,indel;snv"), row.names = 1, header = TRUE, sep = ",", stringsAsFactors = FALSE)
    6. mat = as.matrix(mat)
    7. mat
    8. ## s1 s2 s3
    9. ## g1 "snv;indel" "snv" "indel"
    10. ## g2 "" "snv;indel" "snv"
    11. ## g3 "snv" "" "indel;snv"

    In this case, we need to define a function to extract different alteration types from these long strings. The definition of such function is always simple, it accepts the complicated string and returns a vector of alteration types.

    For mat, we can define the function as:

    1. get_type_fun = function(x) strsplit(x, ";")[[1]]
    2. get_type_fun(mat[1, 1])
    3. ## [1] "snv" "indel"
    4. get_type_fun(mat[1, 2])
    5. ## [1] "snv"

    So, if the alterations are encoded as snv|indel, you can define the function as function(x) strsplit(x, "|")[[1]]. This self-defined function is assigned to the get_type argument in oncoPrint().

    Since in most cases, the separators are only single characters, If the separators are in ;:,|, oncoPrint() automatically spit the alteration strings so that you don’t need to explicitely specify get_type in oncoPrint() function. * For one gene in one sample, since different alteration types may be drawn into one same grid in the heatmap, we need to define how to add the graphics by providing a list of self-defined functions to alter_fun argument. Here if the graphics have no transparency, order of adding graphics matters. In following example, snv are first drawn and then the indel. You can see rectangles for indels are actually smaller (`0.4h) than that for snvs (0.9*h) so that you can visualize both snvs and indels if they are in a same grid. Names of the function list should correspond to the alteration types (here,snvandindel`).

    For the self-defined graphic function (the functions in alter_fun, there should be four arguments which are positions of the grids on the oncoPrint (x and y), and widths and heights of the grids (w and h, which is measured in npc unit). Proper values for the four arguments are sent to these functions automatically from oncoPrint().

    Colors for different alterations are defined in col. It should be a named vector for which names correspond to alteration types. It is used to generate the barplots and the legends.

    1. col = c(snv = "red", indel = "blue")
    2. oncoPrint(mat,
    3. alter_fun = list(
    4. snv = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.9,
    5. gp = gpar(fill = col["snv"], col = NA)),
    6. indel = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.4,
    7. gp = gpar(fill = col["indel"], col = NA))
    8. ), col = col)

    7.1.1 输入数据格式 - 图1

    You can see the order in barplots also correspond to the order defined in alter_fun.

    If you are confused of how to generated the matrix, there is a second way. The second type of input data is a list of matrix for which each matrix contains binary value representing whether the alteration is absent or present. The list should have names which correspond to the alteration types.

    1. mat_list = list(snv = matrix(c(1, 0, 1, 1, 1, 0, 0, 1, 1), nrow = 3),
    2. indel = matrix(c(1, 0, 0, 0, 1, 0, 1, 0, 0), nrow = 3))
    3. rownames(mat_list$snv) = rownames(mat_list$indel) = c("g1", "g2", "g3")
    4. colnames(mat_list$snv) = colnames(mat_list$indel) = c("s1", "s2", "s3")
    5. mat_list
    6. ## $snv
    7. ## s1 s2 s3
    8. ## g1 1 1 0
    9. ## g2 0 1 1
    10. ## g3 1 0 1
    11. ##
    12. ## $indel
    13. ## s1 s2 s3
    14. ## g1 1 0 1
    15. ## g2 0 1 0
    16. ## g3 0 0 0

    oncoPrint() expects all matrices in mat_list having same row names and column names.

    Pass mat_list to oncoPrint():

    1. # now you don't need `get_type`
    2. oncoPrint(mat_list,
    3. alter_fun = list(
    4. snv = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.9,
    5. gp = gpar(fill = col["snv"], col = NA)),
    6. indel = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.4,
    7. gp = gpar(fill = col["indel"], col = NA))
    8. ), col = col)

    7.1.1 输入数据格式 - 图2

    In following parts of this chapter, we still use the single matrix form mat to specify the input data.