purrr 包之 list 处理系列函数

purrr 包之 list 处理系列函数
1、filter
2、summarise
3、transform

purrr 包还包含了一系列的 list 处理函数，主要包括 filter、summarise、transform 3 个类型。

上次介绍了 purrr 包的类似于 apply 家族的 map 系列，这次介绍这个包对 list 进行筛选的一系列函数。

1、filter

filter 系列有 6 个子函数来对 list 来进行筛选：

purrr 包之 list 处理系列函数 - 图1

pluck：

pluck 函数根据 list 的索引值，名称或属性来筛选结果：
- 示例：

library(purrr)
# 新建list
obj1 <- list("a", list(1, elt = "foo"))
obj2 <- list("b", list(2, elt = "bar"))
x <- list(obj1, obj2)
# 根据索引值取元素，类似于[[]]符号
# x[[1]]
pluck(x, 1)
[[1]]
[1] "a"
[[2]]
[[2]][[1]]
[1] 1
[[2]]$elt
[1] "foo"
# 取第一个列表的第二个元素
# x[[1]][[2]]
pluck(x, 1, 2)
[[1]]
[1] 1
$elt
[1] "foo"
# 根据第一个列表的第二个元素的名称取值
# x[[1]][[2]][["elt"]]
pluck(x, 1, 2, "elt")
[1] "foo"
# 如果元素不存在则返回null
pluck(x, 10)
NULL

keep：

keep 函数保存通过逻辑判断的 list 元素：
- 示例：

# 生成10组含有5个数的随机数，取出均值大于6的list
rep(10, 10) %>%
        map(sample, 5) %>%
        keep(function(x) mean(x) > 6)
[[1]]
[1]  6 10  5  8  2
[[2]]
[1] 5 3 9 7 8
[[3]]
[1] 10  5  3  7  6
# 筛选出NA
x <- list(1,2,3,NA,5,NA)
x %>% keep(is.na)
[[1]]
[1] NA
[[2]]
[1] NA

discard：

discard 函数保存没有通过逻辑判断的 list 元素：
- 示例：

# 筛选出不是NA的值
x <- list(1,2,3,NA,5,NA)
x %>% discard(is.na)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 5

compact：

mpact 函数丢弃为 null 的元素：
- 示例：

list(a = "a", b = NULL, c = integer(0), d = NA, e = list()) %>%
        compact()
$a
[1] "a"
$d
[1] NA

head/tail_while：

head/tail_while 函数返回没有通过条件判断之前或之后的元素：
- 示例：

# 输出直到大于0或小于0的元素
pos <- function(x) x >= 0
head_while(5:-5, pos)
[1] 5 4 3 2 1 0
tail_while(5:-5, negate(pos))
[1] -1 -2 -3 -4 -5

2、summarise

summarise 也有 6 个函数对 list 元素进行判断和筛选：

purrr 包之 list 处理系列函数 - 图2

every：

every 函数对 list 每个元素进行判断，返回逻辑值，所有元素都满足条件则返回 TRUE，否则返回 FALSE:
- 示例：

y <- list(0:10, 5.5)
y %>% every(is.numeric)
[1] TRUE
y <- list(0:10, 5.5,'test')
y %>% every(is.numeric)
[1] FALSE

some：

some 函数判断是否存在元素通过条件，如果存在则返回 TRUE，否则返回 FALSE:
- 示例：

# 判断是否存在整形
y <- list(0:10, 5.5,'test')
y %>% some(is.integer)
[1] TRUE
# 判断是否存在NA
y %>% some(is.na)
[1] FALSE

has_element：

has_element 函数判断是否存在某个元素，如果存在则返回 TRUE，否则返回 FALSE:
- 示例：

# 判断是否存在1：10
x <- list(1:10, 5, 9.9)
x %>% has_element(1:10)
[1] TRUE
# 判断是否存在3
x %>% has_element(3)
[1] FALSE

detect/detect_index：

detect/detect_index 函数寻找第一个满足条件的元素并返回其值（前者）或索引值（后者）
- 示例：

# 判断第一个能整除2的元素
is_even <- function(x) x %% 2 == 0
3:10 %>% detect(is_even)
[1] 4
# 返回索引值
3:10 %>% detect_index(is_even)
[1] 2
# 从后向前寻找
3:10 %>% detect(is_even, .dir = "backward")
[1] 10
# 索引值
3:10 %>% detect_index(is_even, .dir = "backward")
[1] 8

vec_depth：

vec_depth 函数返回索引级别的数量：
- 示例：

x <- list(
        list(),
        list(list()),
        list(list(list(1)))
)
vec_depth(x)
[1] 5

3、transform

transform 有 4 个子函数对 list 进行转换：

purrr 包之 list 处理系列函数 - 图3

modify：

modify 函数对 list 每个元素应用函数：map, map_chr, map_dbl, map_dfc, map_dfr, map_int, map_lgl 等：

modify_at：

modify_at 函数对指定位置或指定名称的元素应用函数：

modify_if：

modify_if 函数对通过条件判断的元素应用函数：

modify_depth：

modify_depth 函数对 list 特定等级的元素应用函数：
- 综合示例：

# 把factor转化为character
iris %>%
        modify_if(is.factor, as.character) %>%
        str()
'data.frame':    150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...
 # 对指定列进行转换为character
 mtcars %>% modify_at(c(1, 4, 5), as.character) %>% str()
 'data.frame':    32 obs. of  11 variables:
 $ mpg : chr  "21" "21" "22.8" "21.4" ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : chr  "110" "110" "93" "110" ...
 $ drat: chr  "3.9" "3.9" "3.85" "3.08" ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
 # 用指定列名来转换为character
 mtcars %>% modify_at(c("cyl", "am"), as.character) %>% str()
 'data.frame':    32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : chr  "6" "6" "4" "6" ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : chr  "1" "1" "1" "0" ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
 # 构建多层嵌套list，pbj为第一层，pro为第二层，param为第三层
l1 <- list(
        obj1 = list(
                prop1 = list(param1 = 1:2, param2 = 3:4),
                prop2 = list(param1 = 5:6, param2 = 7:8)
        ),
        obj2 = list(
                prop1 = list(param1 = 9:10, param2 = 11:12),
                prop2 = list(param1 = 12:14, param2 = 15:17)
        )
)
# 对第三层元素进行加和
l1 %>% modify_depth(3, sum) %>% str()
List of 2
 $ obj1:List of 2
  ..$ prop1:List of 2
  .. ..$ param1: int 3
  .. ..$ param2: int 7
  ..$ prop2:List of 2
  .. ..$ param1: int 11
  .. ..$ param2: int 15
 $ obj2:List of 2
  ..$ prop1:List of 2
  .. ..$ param1: int 19
  .. ..$ param2: int 23
  ..$ prop2:List of 2
  .. ..$ param1: int 39
  .. ..$ param2: int 48
# 用名字取对应的值
l1 %>% modify(c("prop1", "param2")) %>% str()
List of 2
 $ obj1: int [1:2] 3 4
 $ obj2: int [1:2] 11 12
# 对第二层的list用指定名称取值
l1 %>% modify_depth(2, "param2") %>% str()
List of 2
 $ obj1:List of 2
  ..$ prop1: int [1:2] 3 4
  ..$ prop2: int [1:2] 7 8
 $ obj2:List of 2
  ..$ prop1: int [1:2] 11 12
  ..$ prop2: int [1:3] 15 16 17
# 对第二层级的每个元素结合pmap函数
l1 %>% modify_depth(2, ~ pmap(., paste, sep = " / ")) %>% str()
List of 2
 $ obj1:List of 2
  ..$ prop1:List of 2
  .. ..$ : chr "1 / 3"
  .. ..$ : chr "2 / 4"
  ..$ prop2:List of 2
  .. ..$ : chr "5 / 7"
  .. ..$ : chr "6 / 8"
 $ obj2:List of 2
  ..$ prop1:List of 2
  .. ..$ : chr "9 / 11"
  .. ..$ : chr "10 / 12"
  ..$ prop2:List of 3
  .. ..$ : chr "12 / 15"
  .. ..$ : chr "13 / 16"
  .. ..$ : chr "14 / 17"