数据科学
在一些常见的统计图表中经常需要在一些图表中添加P值,汇总一下关于统计图表中P值的添加方法。今天推文的主要内容如下:

  • P值简单介绍
  • 可视化绘制中P值绘制

    P值简单介绍

    P值是指在一个概率模型中,统计摘要(如两组样本均值差)与实际观测数据相同,或甚至更大这一事件发生的概率。换言之,是检验假设零假设成立或表现更严重的可能性。P值若与选定显著性水平(0.05或0.01)相比更小,则零假设会被否定而不可接受。然而这并不直接表明原假设正确。P值是一个服从正态分布的随机变量,在实际使用中因样本等各种因素存在不确定性.在许多研究领域,0.05的P值通常被认为是可接受错误的边界水平。

    可视化绘制中P值绘制

    在可视化图表中添加P值,使其更好的表现图表含义是在绘制图表是需要考虑的。这里使用R-ggpubr和R-ggsignif包进行P值添加及定制化操作。

    R-ggpubr 添加P值

    在使用ggpubr包进行P值添加之前,需导入R-rstatix包进行必要的统计操作(T检验等),这里直接通过例子进行解释说明。

    「简单例子」:

    ```python

    导入必要的包

    library(tidyverse) library(ggtext) library(hrbrthemes) library(ggpubr) library(rstatix) library(ggsci)

导入数据

df <- ToothGrowth df$dose <- as.factor(df$dose)

进行T-test

stat.test <- df %>% rstatix::t_test(len ~ supp) %>% rstatix::add_significance()

进行可视化绘制

stat.test <- stat.test %>% rstatix::add_xy_position(x = “supp”) bxp <- ggpubr::ggboxplot(df, x = “supp”, y = “len”, fill = “supp”, palette=”jco”, ggtheme= hrbrthemes::theme_ipsum(base_family = “Roboto Condensed”)) +

labs( title = “Example of add P values in ggpubr“, subtitle = “processed charts with ggpubr::ggboxplot + rstatix::t_test“, caption = “Visualization by DataCharm“)+

添加P值

stat_pvalue_manual(stat.test, label = “p”) + scale_y_continuous(expand = expansion(mult = c(0.05, 0.1)))+ theme( plot.title = element_markdown(hjust = 0.5,vjust = .5,color = “black”, size = 20, margin = margin(t = 1, b = 12)), plot.subtitle = element_markdown(hjust = 0,vjust = .5,size=15), plot.caption = element_markdown(face = ‘bold’,size = 12))

  1. ![2021-05-07-09-47-36-281402.png](https://cdn.nlark.com/yuque/0/2021/png/396745/1620352979047-6d1baa6d-8c68-4b69-8d7c-9ef9d42f3df1.png#align=left&display=inline&height=810&id=u40c3cb2b&margin=%5Bobject%20Object%5D&name=2021-05-07-09-47-36-281402.png&originHeight=810&originWidth=1080&size=2629534&status=done&style=shadow&width=1080)<br />boxplot with P value<br />此外,还可以通过显示P值的显著性水平(p.signif)
  2. ```python
  3. +
  4. stat_pvalue_manual(stat.test, label = "p.signif")
  5. +

2021-05-07-09-47-36-427189.png
boxplot with P value in different form

「分组数据例子」:

如果面对分组数据时,那么可通过如下代码进行组与组数据直接的显著性比较,还是使用上面的数据,只不过使用group_by进行分组操作:

  1. # 分组计算P值
  2. stat.test <- df %>% group_by(dose) %>%rstatix::t_test(len ~ supp) %>% rstatix::adjust_pvalue() %>%
  3. rstatix::add_significance("p.adj")
  4. #可视化绘制
  5. stat.test <- stat.test %>% add_xy_position(x = "supp")
  6. bxp2 <- ggboxplot(df, x = "supp", y = "len", fill = "supp",palette = "jco",
  7. facet.by = "dose",
  8. ggtheme= hrbrthemes::theme_ipsum(base_family = "Roboto Condensed")) +
  9. labs(
  10. title = "Example of <span style='color:#D20F26'> add P.adj values in ggpubr</span>",
  11. subtitle = "processed charts with <span style='color:#1A73E8'>ggpubr::ggboxplot + rstatix::t_test</span>",
  12. caption = "Visualization by <span style='color:#DD6449'>DataCharm</span>")+
  13. stat_pvalue_manual(stat.test, label = "p.adj") +
  14. scale_y_continuous(expand = expansion(mult = c(0.05, 0.10))) +
  15. theme(
  16. plot.title = element_markdown(hjust = 0.5,vjust = .5,color = "black",
  17. size = 20, margin = margin(t = 1, b = 12)),
  18. plot.subtitle = element_markdown(hjust = 0,vjust = .5,size=15),
  19. plot.caption = element_markdown(face = 'bold',size = 12))

2021-05-07-09-47-36-559548.png
Add P Values in group data
不喜欢ggsci包的颜色配色,可以使用黑灰色系进行颜色设置,修改成如下代码即可:

  1. ggboxplot(df, x = "supp", y = "len", fill = "supp",palette = c("gray80","gray20"),
  2. facet.by = "dose",
  3. ggtheme= hrbrthemes::theme_ipsum(base_family = "Roboto Condensed"))

设置palette = c("gray80","gray20")即可。

「定义P值样式」:

如果觉得P值的样式比较单一,也可以自定义P值样式:

  1. #定义P值显示条件
  2. stat.test$custom.label <- ifelse(stat.test$p.adj <=.05,stat.test$p.adj,"ns")
  3. # 可视化绘制
  4. stat.test <- stat.test %>% add_xy_position(x = "supp")
  5. bxp4 <- ggboxplot(df, x = "supp", y = "len", fill = "supp",palette = "jco",
  6. facet.by = "dose",
  7. ggtheme= hrbrthemes::theme_ipsum(base_family = "Roboto Condensed")) +
  8. labs(
  9. title = "Example of <span style='color:#D20F26'> add P.custom values in ggpubr</span>",
  10. subtitle = "processed charts with <span style='color:#1A73E8'>ggpubr::ggboxplot + rstatix::t_test</span>",
  11. caption = "Visualization by <span style='color:#DD6449'>DataCharm</span>")+
  12. stat_pvalue_manual(stat.test, label = "custom.label") +
  13. scale_y_continuous(expand = expansion(mult = c(0.05, 0.10))) +
  14. theme(
  15. plot.title = element_markdown(hjust = 0.5,vjust = .5,color = "black",
  16. size = 20, margin = margin(t = 1, b = 12)),
  17. plot.subtitle = element_markdown(hjust = 0,vjust = .5,size=15),
  18. plot.caption = element_markdown(face = 'bold',size = 12))

2021-05-07-09-47-36-694207.png
Set P Value form

设置P值位科学计数法:

  1. # 添加科学计数法一列
  2. +
  3. stat.test$p.scient <- format(stat.test$p.adj, scientific = TRUE)
  4. + ····
  5. stat_pvalue_manual(stat.test, label = "p.scient")+
  6. # 其他同上

2021-05-07-09-47-36-867249.png
Add P Value in scientific form
还可以绘制如下可视化结果:

  1. # 计算P值
  2. stat.test <- df %>%
  3. t_test(len ~ supp, paired = TRUE) %>%add_significance()
  4. # 可视化绘制
  5. stat.test <- stat.test %>% add_xy_position(x = "supp")
  6. ggpaired(df, x = "supp", y = "len", fill = "supp",palette = "jco",
  7. line.color = "gray", line.size = 0.4,
  8. ggtheme= hrbrthemes::theme_ipsum(base_family = "Roboto Condensed")) +
  9. labs(
  10. title = "Example of <span style='color:#D20F26'> add P.signif values in ggpubr</span>",
  11. subtitle = "processed charts with <span style='color:#1A73E8'>ggpubr::ggpaired + rstatix::t_test</span>",
  12. caption = "Visualization by <span style='color:#DD6449'>DataCharm</span>")+
  13. stat_pvalue_manual(stat.test, label = "{p}{p.signif}") +
  14. scale_y_continuous(expand = expansion(mult = c(0.05, 0.10))) +
  15. theme(
  16. plot.title = element_markdown(hjust = 0.5,vjust = .5,color = "black",
  17. size = 20, margin = margin(t = 1, b = 12)),
  18. plot.subtitle = element_markdown(hjust = 0,vjust = .5,size=15),
  19. plot.caption = element_markdown(face = 'bold',size = 12))

2021-05-07-09-47-37-018092.png
Add P Value in ggpaired example

「柱形图P值添加」:

统计计算如下:

  1. # 统计计算
  2. stat.test <- df %>% t_test(len ~ dose)%>% add_xy_position(fun = "mean_sd", x = "dose")
  3. #可视化绘制
  4. bp_p <- ggbarplot(df, x = "dose", y = "len", add = "mean_sd", fill = "dose", palette = "jco",
  5. ggtheme= hrbrthemes::theme_ipsum(base_family = "Roboto Condensed")) +
  6. stat_pvalue_manual(stat.test, label = "p.adj.signif", tip.length = 0.01)+
  7. labs(
  8. title = "Example of <span style='color:#D20F26'> ggpubr::ggbarplot with p.adj.signif</span>",
  9. subtitle = "processed charts with <span style='color:#1A73E8'>ggpubr::ggbarplot + rstatix::t_test</span>",
  10. caption = "Visualization by <span style='color:#DD6449'>DataCharm</span>")+
  11. theme(
  12. plot.title = element_markdown(hjust = 0.5,vjust = .5,color = "black",
  13. size = 20, margin = margin(t = 1, b = 12)),
  14. plot.subtitle = element_markdown(hjust = 0,vjust = .5,size=15),
  15. plot.caption = element_markdown(face = 'bold',size = 12))

2021-05-07-09-47-37-123861.png
Add P Values in ggbarplot
具体的P值样式修改可参看上面代码。以上就是使用R-ggpubr包快速绘制P值的方法介绍,借助了R-rstatix包进行完成,下面就介绍一种更简单的绘制P值的方法。

R-ggsignif添加P值

R-ggsignif 包可是专门为绘制P值的第三方包,其实用也较为简单,接下来通过三个小例子解释一下:

「样例一」:

  1. ggplot(mpg, aes(class, hwy)) +
  2. geom_boxplot() +
  3. geom_signif(
  4. comparisons = list(c("2seater", "midsize"), c("minivan", "suv")),
  5. textsize = 6,map_signif_level = function(p) sprintf("P = %.2g", p)
  6. ) +
  7. ylim(NA, 48) +
  8. labs(
  9. title = "Example of <span style='color:#D20F26'>ggsignif::geom_signif function</span>",
  10. subtitle = "processed charts with <span style='color:#1A73E8'>geom_signif()</span>",
  11. caption = "Visualization by <span style='color:#DD6449'>DataCharm</span>") +
  12. hrbrthemes::theme_ipsum(base_family = "Roboto Condensed") +
  13. theme(
  14. plot.title = element_markdown(hjust = 0.5,vjust = .5,color = "black",
  15. size = 20, margin = margin(t = 1, b = 12)),
  16. plot.subtitle = element_markdown(hjust = 0,vjust = .5,size=15),
  17. plot.caption = element_markdown(face = 'bold',size = 12))

2021-05-07-09-47-37-234569.png
Add and Custom P Values in geom_signif()
注意:

  1. geom_signif(
  2. comparisons = list(c("2seater", "midsize"), c("minivan", "suv")),
  3. textsize = 6,map_signif_level = function(p) sprintf("P = %.2g", p)
  4. )

添加P值并修改P值样式。

「样例二」:

  1. ggplot(iris, aes(Species, Sepal.Width)) +
  2. geom_boxplot(aes(fill=Species)) +
  3. geom_signif(
  4. comparisons = list(c("versicolor", "setosa"),c("versicolor","virginica")),
  5. textsize = 6,map_signif_level = function(p) sprintf("P = %.3g", p),
  6. y_position = c(4.5, 4.))+
  7. scale_fill_jco()+
  8. ylim(NA, 5) +
  9. labs(
  10. title = "Example of <span style='color:#D20F26'>ggsignif::geom_signif function</span>",
  11. subtitle = "processed charts with <span style='color:#1A73E8'>geom_signif()</span>",
  12. caption = "Visualization by <span style='color:#DD6449'>DataCharm</span>") +
  13. hrbrthemes::theme_ipsum(base_family = "Roboto Condensed") +
  14. theme(
  15. plot.title = element_markdown(hjust = 0.5,vjust = .5,color = "black",
  16. size = 20, margin = margin(t = 1, b = 12)),
  17. plot.subtitle = element_markdown(hjust = 0,vjust = .5,size=15),
  18. plot.caption = element_markdown(face = 'bold',size = 12))

2021-05-07-09-47-37-999033.png
Add and Custom P Values in geom_signif()
通过:

  1. y_position = c(4.5, 4.)

设置P值的具体添加位置。

「样例三」:

  1. test_data<- data.frame(
  2. Group = c("S1", "S1", "S2", "S2"),
  3. Sub = c("A", "B", "A", "B"),
  4. Value = c(2, 5, 7, 8)
  5. )
  6. # 可视化绘制
  7. ggplot(test_data, aes(x = Group, y = Value)) +
  8. geom_bar(aes(fill = Sub), stat = "identity", position = "dodge", width = .6) +
  9. geom_signif(
  10. y_position = c(5.3, 8.3), xmin = c(0.8, 1.8), xmax = c(1.2, 2.2),
  11. annotation = c("**", "NS")) +
  12. geom_signif(
  13. comparisons = list(c("S1", "S2")),size=.7,
  14. y_position = 9.3, vjust = 0.2)+
  15. scale_fill_grey() +
  16. labs(
  17. title = "Example of <span style='color:#D20F26'>ggsignif::geom_signif function</span>",
  18. subtitle = "processed charts with <span style='color:#1A73E8'>geom_signif() in geom_bar</span>",
  19. caption = "Visualization by <span style='color:#DD6449'>DataCharm</span>") +
  20. hrbrthemes::theme_ipsum(base_family = "Roboto Condensed") +
  21. theme(
  22. plot.title = element_markdown(hjust = 0.5,vjust = .5,color = "black",
  23. size = 20, margin = margin(t = 1, b = 12)),
  24. plot.subtitle = element_markdown(hjust = 0,vjust = .5,size=15),
  25. plot.caption = element_markdown(face = 'bold',size = 12))

2021-05-07-09-47-38-090652.png
Add P Vlaus on geom_bar() in geom_signif()
通过:

  1. y_position = c(5.3, 8.3), xmin = c(0.8, 1.8), xmax = c(1.2, 2.2),
  2. annotation = c("**", "NS")
  3. comparisons = list(c("S1", "S2")),size=.7,
  4. y_position = 9.3, vjust = 0.2

设置P值显示样式和样式(粗细、位置等)。
是不是觉得使用R-ggsignif包绘制P值更加方便些呢,更多属性设置和其他用法,可以去ggsigni包官网进行查阅。