在ggplot 中,我们可以使用geom_bar 来画柱状图。
但默认下,柱状图并不需要定义y,我们只需要制定相应的x 或进一步的分组(fill 等),就会对数据进行计数。
但在某些情况下,我们的数据框可能非常大,这时候就可以自己进行计数,然后告诉ggplot 即可。
有两种指定y 的方式:
geom_col
geom_bar(stat = 'identity')
它们的结果都是一样的。
那么该如何分组计数呢?
也非常简单,tidyverse 套件提供了group_by 分组以及summarise 函数,使用n() 计算。
或者直接基础的table 搞定:
## count variant in each sample
tmp1 <- table(mutation_number_order$name, mutation_number_order$Variant_Classification)
tmp1 <- as.data.frame(tmp1)
colnames(tmp1)[1:2] <- c("name", "Variant_Classification")
> head(tmp1)
name Variant_Classification Freq
1 S110011502DT Frame_Shift_Del 38
2 S110011501DT Frame_Shift_Del 41
3 S110020203DT Frame_Shift_Del 36
4 S110030206DT Frame_Shift_Del 43
5 S110030801DT Frame_Shift_Del 50
6 S110020201DT Frame_Shift_Del 49
合并到原表格中,直接画就完事了:
## count variant in each sample
tmp1 <- table(mutation_number_order$name, mutation_number_order$Variant_Classification)
tmp1 <- as.data.frame(tmp1)
colnames(tmp1)[1:2] <- c("name", "Variant_Classification")
head(tmp1)
tmp3 <- merge(mutation_number_order, tmp1, by = c("name", "Variant_Classification"))
mutation_number_final <- unique(tmp3)
colnames(mutation_number_final) <- c("Tumor_Sample_Barcode",
"Variant_Classification",
"Clinical_Type",
"Total_Counts",
"Counts")
max_counts <- max(mutation_number_final$Total_Counts)
# counts plot
p1 <- ggplot(data = mutation_number_final) +
geom_col(mapping = aes(x = Tumor_Sample_Barcode, y = Counts, fill = Variant_Classification), position = "stack") +
barplot_theme + labs(x = NULL, size = 14) + scale_y_continuous(expand=c(0,0)) +
coord_cartesian(ylim = c(0, max_counts + 100))
(p1 <- p1 + labs(y = "Mutation Counts") + scale_x_discrete(expand = expansion(mult = c(0.03,0.05))))
y 也就是table 最后算出来的各组的数值,fill 分组的变量,再stack 也就堆积到一起啦。