@你需要提高一下 R 技能了（plyr 包）

@你需要提高一下 R 技能了（plyr 包）
- 介绍
- 使用

介绍

plyr 包是大神 Hadley Wickham 开发的，是这样描述的：

对数据进行：1.切片分割-2.应用函数-3.合并结果 ，三步操作一步搞定，极大的提升了编写代码的效率，简化代码量。对于 apply 家族函数算得上是新一代升级吧。

主要包含以下函数，都是有规律可循的，XYply格式：

X 代表输入数据类型，可为：a（array）、d（data frame）、l（list）等类型。
Y 代表输出数据类型，a（array）、d（data frame）、l（list）、（_）等类型。_ 表示什么都不输出。

具体见下：

@你需要提高一下 R 技能了（plyr 包） - 图2

使用

我们看看用 tidyverse 做个示例，按 cyl 分组计算 mpg 的均值：

library(tidyverse)
data("mtcars")
# 按cyl分组计算mpg的均值
mtcars %>% group_by(cyl) %>%
  summarise(mean = mean(mpg))
# A tibble: 3 x 2
    cyl  mean
  <dbl> <dbl>
1     4  26.7
2     6  19.7
3     8  15.1

我们可以使用 ddply ，看看用法：

Usage
ddply(
  .data, # 输入数据，数据框
  .variables, # 分割变量，字符串或者公式
  .fun = NULL, # 应用的函数
  ..., # 函数的其它参数
  .progress = "none", # 显示处理进程
  .inform = FALSE, # 产生过程信息
  .drop = TRUE,
  .parallel = FALSE, # 是否多线程处理
  .paropts = NULL
)

使用，一句搞定：

ddply(mtcars,.(cyl),summarise,mean = mean(mpg),sum = sum(disp))
  cyl     mean    sum
1   4 26.66364 1156.5
2   6 19.74286 1283.2
3   8 15.10000 4943.4

输出列表：

# 输出列表
dlply(mtcars,.(cyl),summarise,mean = mean(mpg))
$`4`
      mean
1 26.66364
$`6`
      mean
1 19.74286
$`8`
  mean
1 15.1
attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  cyl
1   4
2   6
3   8

不输出：

# 不输出
d_ply(mtcars,.(cyl),summarise,mean = mean(mpg))
# 结果啥都没有

显示进程：

# 显示进程
ddply(mtcars,.(cyl),summarise,mean = mean(mpg),
      .progress = 'text')
  |==================================================================| 100%
  cyl     mean
1   4 26.66364
2   6 19.74286
3   8 15.10000

列表格式数据输入：

lst <- list(a = 1:3,b = 2:4,c = 1:5)
# 输出数据框
ldply(lst,sum)
  .id V1
1   a  6
2   b  9
3   c 15
# 输出列表
llply(lst,sum)
$a
[1] 6
$b
[1] 9
$c
[1] 15

a*ply() 的特点在于含有 .margins 参数，它和 apply 很相似。对于 2 维数组， .margins 可以取 1，2，或者 c(1:2)，对应按行切片，按列切片及每个元素进行切片，随便看一个用法：

adply(
  .data, # 矩阵、数组、数据框
  .margins,
  .fun = NULL,
  ...,
  .expand = TRUE,
  .progress = "none",
  .inform = FALSE,
  .parallel = FALSE,
  .paropts = NULL,
  .id = NA
)

使用示例：

# 构建数组
da <- array(1:10, c(4,4))
da
     [,1] [,2] [,3] [,4]
[1,]    1    5    9    3
[2,]    2    6   10    4
[3,]    3    7    1    5
[4,]    4    8    2    6

按行求和：

# 按行求和
adply(da,.margins = 1,sum)
  X1 V1
1  1 18
2  2 22
3  3 16
4  4 20

按列求均值：

# 按列求均值
adply(da,.margins = 1,mean)
  X1  V1
1  1 4.5
2  2 5.5
3  3 4.0
4  4 5.0

操作每个元素：

# 每个元素乘以10
adply(da,.margins = c(1,2),.fun = function(x){x*10})
   X1 X2  V1
1   1  1  10
2   2  1  20
3   3  1  30
4   4  1  40
5   1  2  50
6   2  2  60
7   3  2  70
8   4  2  80
9   1  3  90
10  2  3 100
11  3  3  10
12  4  3  20
13  1  4  30
14  2  4  40
15  3  4  50
16  4  4  60

结合自定义函数应用会更加快捷方便，其它的小伙伴们自行去探索吧。