R版本与运行环境信息

  1. Date:2021-7-22
  2. R version 4.0.3 (2020-10-10)
  3. Platform: x86_64-w64-mingw32/x64 (64-bit)
  4. Running under: Windows 10 x64 (build 18363)

veen图

使用ggVennDiagram绘制veen图

  1. #install.package("ggVennDiagram")
  2. library(tidyverse)
  3. library(ggVennDiagram)
  4. library(ggsci)
  5. #构建数据
  6. genes <- paste0("gene",1:1000)
  7. set.seed(20210502)
  8. #将维恩图的每个圈里的数据放到一个list中
  9. gene_list <- list(A = sample(genes,100),
  10. B = sample(genes,200),
  11. C = sample(genes,300),
  12. D = sample(genes,200))
  13. #绘制韦恩图
  14. ggVennDiagram(gene_list,
  15. #指定每个list的名字
  16. category.names = c("A","B","C","D"),
  17. #指定每个名字的颜色
  18. set_color = c("red1","red2","red3","red4"),
  19. #指定每个名字的大小
  20. set_size = 10,
  21. #指定线的类型,具体线条类型见下
  22. edge_lty = "dashed",
  23. #指定线的粗细
  24. edge_size = 1,
  25. #指定标签的内容: counts(仅仅计数结果),percent(百分比),both(全部显示),none
  26. label = "both",
  27. #指定label的颜色
  28. label_color = "black",
  29. #标签的透明度
  30. label_alpha = 0.5) +
  31. #具体配色代码见下方
  32. scale_fill_distiller(palette = "RdBu") +
  33. #自行指定外框的颜色
  34. scale_color_manual(values = c(rep("black",4)))

Veen图和upset图 - 图1

R的线条类型与点类型以及调色板

  1. #install.packages("ggpubr")
  2. a <- ggpubr::show_point_shapes()
  3. b <- ggpubr::show_line_types()
  4. cowplot::plot_grid(a,b)

Veen图和upset图 - 图2

  1. #调色板
  2. RColorBrewer::display.brewer.all()

Veen图和upset图 - 图3

Upset图

韦恩图不适合展示维度过高的数据,个人认为最多四个维度,当维度更高时,用Upset图会更好

基本使用

  • 基本用法, 使用数据集为package里自带的movies数据集
  1. #install.package("UpSetR")
  2. library(UpSetR)
  3. #载入数据集
  4. movies <- read.csv( system.file("extdata", "movies.csv", package = "UpSetR"), header=TRUE, sep=";" )
  5. #简单查看数据集,数据集第一列为电影的名称,第二列为发行年份,第三列往后为电影类型,1代表为对应类型,
  6. > head(movies)
  7. Name ReleaseDate Action Adventure Children Comedy Crime Documentary Drama Fantasy Noir Horror
  8. 1 Toy Story (1995) 1995 0 0 1 1 0 0 0 0 0 0
  9. 2 Jumanji (1995) 1995 0 1 1 0 0 0 0 1 0 0
  10. 3 Grumpier Old Men (1995) 1995 0 0 0 1 0 0 0 0 0 0
  11. 4 Waiting to Exhale (1995) 1995 0 0 0 1 0 0 1 0 0 0
  12. 5 Father of the Bride Part II (1995) 1995 0 0 0 1 0 0 0 0 0 0
  13. 6 Heat (1995) 1995 1 0 0 0 1 0 0 0 0 0
  14. Musical Mystery Romance SciFi Thriller War Western AvgRating Watches
  15. 1 0 0 0 0 0 0 0 4.15 2077
  16. 2 0 0 0 0 0 0 0 3.20 701
  17. 3 0 0 1 0 0 0 0 3.02 478
  18. 4 0 0 0 0 0 0 0 2.73 170
  19. 5 0 0 0 0 0 0 0 3.01 296
  20. 6 0 0 0 0 1 0 0 3.88 940
  21. #绘制Upset图
  22. upset(data = movies,
  23. sets = c("Drama", "Comedy", "Action", "Thriller", "Western", "Documentary"))
  • data =: 指定数据集
  • sets =: 指定作图的数据
  1. #图表的左下角为指定的六个数据集合的大小,即所有上映的电影中属于各个类型电影的数量
  2. > movies %>% select(c("Drama", "Comedy", "Action", "Thriller", "Western", "Documentary")) %>% colSums()
  3. Drama Comedy Action Thriller Western Documentary
  4. 1603 1200 503 492 68 127
  5. #右上角的图是对应右下方的集合中的元素的个数,如第一根柱子为1184,代表仅属于Daram类型的电影有1184部,第七根柱子则代表同时属于Drama和Comedy的电影为211部

Veen图和upset图 - 图4

展示指定部分

  • 展示指定部分intersections
    使用queries选项来对指定的数据进行展示,默认提供intersectionselements两种展示方式,首先是intersections,对指定的交集进行展示
    命令格式,
    query为指定展示数据的方式, 数据类型为列表,
    params, 指定展示哪个交集,数据类型为列表,列表内的元素为需要展示部分的数据名称,
    color为指定展示的颜色,
    active为是否对柱子着色,=T代表上色,=F则在柱子上用一个与color指定的一致颜色的小三角表示
    命令格式
    queries = list(query = intersects, params = list(a,b,c,d), color = "", active = T)
  1. #展示"Action","Comedy", "Drama"三种类型的交集
  2. #展示"Drama", "Comedy"两种类型的交集
  3. #展示"Western","Drama"两种类型的交集,但是柱子不填充颜色
  4. #展示"Action"特有的部分
  5. upset(data = movies,sets = c("Drama", "Comedy", "Action", "Thriller", "Western", "Documentary"),
  6. queries = list(list(query = intersects,params = list("Action","Comedy", "Drama"),color = "red1",active = T),
  7. list(query = intersects,params = list("Drama", "Comedy"),color = "blue2",active = T),
  8. list(query = intersects,params = list("Western","Drama"),color = "green2",active = F)))

Veen图和upset图 - 图5

展示指定部分在某因子上的水平

  • 展示某部分在某因子上的水平elements
    命令格式

queries = list(query = elements, params = list("factors",b,c,d), color = "", active = T)

  1. #展示各个交集在分布在1998和1995年的数量
  2. upset(data = movies,sets = c("Drama", "Comedy", "Action", "Thriller", "Western", "Documentary"),
  3. queries = list(list(query = intersects,params = list("Action","Comedy", "Drama"),color = "red1",active = T),
  4. list(query = intersects,params = list("Drama", "Comedy"),color = "blue2",active = T),
  5. list(query = intersects,params = list("Western","Drama"),color = "green2",active = F),
  6. list(query = intersects,params = list("Action"),color = "pink",active = T),
  7. list(query = elements, params = list("ReleaseDate", 1998,1995),color = "purple",active = T)))

Veen图和upset图 - 图6

增加箱线图

  • 可以通过指定因子来展示在不同因子上,每个交集的分布情况,如,通过箱线图展示在1920-2000年间,不同的intersects在每所有年份的分布情况
    boxplot.summary = "factors": 可以指定多个因子
  1. upset(data = movies,sets = c("Drama", "Comedy", "Action", "Thriller", "Western", "Documentary"),
  2. queries = list(list(query = intersects,params = list("Action","Comedy", "Drama"),color = "red1",active = T),
  3. list(query = intersects,params = list("Drama", "Comedy"),color = "blue2",active = T),
  4. list(query = intersects,params = list("Western","Drama"),color = "green2",active = F),
  5. list(query = intersects,params = list("Action"),color = "pink",active = T),
  6. list(query = elements, params = list("ReleaseDate", 1998,1995),color = "purple",active = T)),
  7. boxplot.summary = "ReleaseDate")

Veen图和upset图 - 图7

增加其他图表

  • 通过attribute.plots可以增加其他的图表,如直方图,点图,箱线图等等……
    选项用法:attribute.plots = list(),可以通过构建函数的方法进行绘图,每张图片放在一个list中,gridrows为图的高度,ncols用于控制列数

使用方法

  1. #添加一个直方图,表示每个年份的电影数量情况
  2. #首先构建一个绘制直方图的function
  3. plot1 <- function(data,x){
  4. plot.his <- (ggplot(data,aes_string(x=x,fill = "color"))) +
  5. geom_histogram(bins = 30)+
  6. scale_fill_identity() +
  7. labs(y = "Number of movies",x = "Years") +theme_bw()
  8. }
  9. #将直方图写入attribute.plots,其中x= "ReleaseDate"为传给x的参数,queries = TRUE表示将上面展示的数据同样展示在直方图中,
  10. upset(data = movies,sets = c("Drama", "Comedy", "Action", "Thriller", "Western", "Documentary"),
  11. queries = list(list(query = intersects,params = list("Action","Comedy", "Drama"),color = "red1",active = T),
  12. list(query = intersects,params = list("Drama", "Comedy"),color = "blue2",active = T),
  13. list(query = intersects,params = list("Western","Drama"),color = "green2",active = F),
  14. list(query = intersects,params = list("Action"),color = "pink",active = T),
  15. list(query = elements, params = list("ReleaseDate", 1998,1995),color = "purple",active = T)),
  16. attribute.plots = list(gridrows = 55,
  17. plots = list(list(plot = plot1,x= "ReleaseDate", queries = TRUE)),ncols = 1))

Veen图和upset图 - 图8

  1. #添加一个散点图(如,Avgrating和观看人数的关系),同样是先构建绘图函数
  2. plot2 <- function(data,x,y){
  3. polt.point <- (ggplot(data,aes_string(x=x,y=y,color = "color"))) +
  4. geom_point() +
  5. scale_color_identity() +
  6. theme_bw()
  7. }
  8. ##将散点图图写入attribute.plots
  9. upset(data = movies,sets = c("Drama", "Comedy", "Action", "Thriller", "Western", "Documentary"),
  10. queries = list(list(query = intersects,params = list("Action","Comedy", "Drama"),color = "red1",active = T),
  11. list(query = intersects,params = list("Drama", "Comedy"),color = "blue2",active = T),
  12. list(query = intersects,params = list("Western","Drama"),color = "green2",active = F),
  13. list(query = intersects,params = list("Action"),color = "pink",active = T),
  14. list(query = elements, params = list("ReleaseDate", 1998,1995),color = "purple",active = T)),
  15. attribute.plots = list(gridrows = 55,
  16. plots = list(list(plot = plot1,x= "ReleaseDate", queries = TRUE),
  17. list(plot = plot2,x = "AvgRating",y = "Watches",queries = TRUE)),ncols = 2))

Veen图和upset图 - 图9

  1. #增加一个抖动图
  2. plot3 <- function(data,x,y){
  3. polt.jit <- (ggplot(data,aes_string(x=x,y=y,color = "color"))) +
  4. geom_jitter() +
  5. scale_color_identity() +
  6. theme_bw()
  7. }
  8. ##将抖动图写入attribute.plots
  9. upset(data = movies,sets = c("Drama", "Comedy", "Action", "Thriller", "Western", "Documentary"),
  10. queries = list(list(query = intersects,params = list("Action","Comedy", "Drama"),color = "red1",active = T),
  11. list(query = intersects,params = list("Drama", "Comedy"),color = "blue2",active = T),
  12. list(query = intersects,params = list("Western","Drama"),color = "green2",active = F),
  13. list(query = intersects,params = list("Action"),color = "pink",active = T),
  14. list(query = elements, params = list("ReleaseDate", 1998,1995),color = "purple",active = T)),
  15. attribute.plots = list(gridrows = 55,
  16. plots = list(list(plot = plot1,x= "ReleaseDate", queries = TRUE),
  17. list(plot = plot2,x = "AvgRating",y = "Watches",queries = TRUE),
  18. list(plot = plot3,x = "Watches",y = "AvgRating")),ncols = 3))

Veen图和upset图 - 图10