############# filter函数
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
- filter取sepal大于3的所有行
- filter取sepal大于3或者Petal=1.4的所有行
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
- filter也可以用来过滤缺失值
A tibble: 3 × 1
x
1 1
2 3
3 4
- 补充 %in%
X%in%Y:返回所有X中有Y值的位置的值
# A tibble: 50 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
- ############# arrange函数
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.3 3.0 1.1 0.1 setosa
2 4.4 2.9 1.4 0.2 setosa
3 4.4 3.0 1.3 0.2 setosa
4 4.4 3.2 1.3 0.2 setosa
Sepal.Width
1 3.5
2 3.0
3 3.2
# A tibble: 150 × 6
Sepal.Length Sepal.Width Petal.Length Petal.Width Species total_length
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
A tibble: 1 × 1
mean_sepal
#
# 1 3.06
A tibble: 3 × 2
Species mean_sepal
1 setosa 3.43
2 versicolor 2.77
3 virginica 2.97

基础函数

五大基础函数
- 选择行(obsevations)：filter()
- 重排行：arrange()
- 选择列（variables）:select
- 创建列：mutate()
- 折叠值并汇总：summarize()
函数特点：
- 以上函数，均可搭配group_by()，将数据分组
- 函数的工作方式类似：
  - 第一：指定输入的数据框
  - 第二：指要做什么
  - 第三：输出均为新的数据框（意味着：不改变原有数据）
- 一套标准的比较算法：>, >=, <, <=, != (not equal), and == (equal), 兼容布尔逻辑：& is “and,” | is “or,” and ! is “not.”
前面基础函数的介绍已经有同学介绍，我就不再多赘述，用例子来体会一下

filter(): 根据特定条件选择（过滤）行 ```
############# filter函数
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

filter取sepal大于3的所有行

filter(iris, Sepal.Width >=3)

filter取sepal大于3或者Petal=1.4的所有行

filter(iris, Sepal.Width >=3 | Petal.Length == 1.4)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa

filter也可以用来过滤缺失值

df <- tibble(x = c(1, NA, 3,4)) filter(df, !is.na(x))

A tibble: 3 × 1

x

1 1

2 3

3 4

补充 %in%

X%in%Y:返回所有X中有Y值的位置的值

filter(iris, Species %in% c(“setosa”))

# A tibble: 50 × 5

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa


2. arrange(): 与filter()类似也是操作行，但与filter()不同是，不选择，只对行进行排序，自定义为升序，na值总排在最后。降序参数用desc()

############# arrange函数

arrange(iris, Species, Sepal.Length)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 4.3 3.0 1.1 0.1 setosa

2 4.4 2.9 1.4 0.2 setosa

3 4.4 3.0 1.3 0.2 setosa

4 4.4 3.2 1.3 0.2 setosa


3. select()：选择列
   1. 特点：有一些帮助功能，不用正则表达式也可以提取特定字符的列
      1. starts_with("abc") matches names that begin with “abc”. 
      1. ends_with("xyz") matches names that end with “xyz”.
      1. contains("ijk") matches names that contain “ijk”.
      1. matches("(.)\\1") selects variables that match a regular expression. 
      1. num_range("x", 1:3) matches x1, x2, and x3.

select(iris,Sepal.Width)

Sepal.Width

1 3.5

2 3.0

3 3.2

4.mutate(): 新增列

mutate(iris, total_length = Sepal.Length + Sepal.Width)

# A tibble: 150 × 6

Sepal.Length Sepal.Width Petal.Length Petal.Width Species total_length

1 5.1 3.5 1.4 0.2 setosa 8.6

2 4.9 3 1.4 0.2 setosa 7.9

3 4.7 3.2 1.3 0.2 setosa 7.9


5. summarize():折叠值

summarise(iris, mean_sepal = mean(Sepal.Width))

A tibble: 1 × 1

mean_sepal

#

# 1 3.06

summarise(group_by(iris, Species), mean_sepal = mean(Sepal.Width))

A tibble: 3 × 2

Species mean_sepal

1 setosa 3.43

2 versicolor 2.77

3 virginica 2.97

``` 2021.12.2
I one

领学人day4: dplyr函数总结

############# filter函数

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3.0 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

4 4.6 3.1 1.5 0.2 setosa

5 5.0 3.6 1.4 0.2 setosa

6 5.4 3.9 1.7 0.4 setosa

filter取sepal大于3的所有行

filter取sepal大于3或者Petal=1.4的所有行

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3.0 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

4 4.6 3.1 1.5 0.2 setosa

filter也可以用来过滤缺失值

A tibble: 3 × 1

x

1 1

2 3

3 4

补充 %in%

X%in%Y:返回所有X中有Y值的位置的值

# A tibble: 50 × 5

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

############# arrange函数

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 4.3 3.0 1.1 0.1 setosa

2 4.4 2.9 1.4 0.2 setosa

3 4.4 3.0 1.3 0.2 setosa

4 4.4 3.2 1.3 0.2 setosa

Sepal.Width

1 3.5

2 3.0

3 3.2

# A tibble: 150 × 6

Sepal.Length Sepal.Width Petal.Length Petal.Width Species total_length

1 5.1 3.5 1.4 0.2 setosa 8.6

2 4.9 3 1.4 0.2 setosa 7.9

3 4.7 3.2 1.3 0.2 setosa 7.9

A tibble: 1 × 1

mean_sepal

#

# 1 3.06

A tibble: 3 × 2

Species mean_sepal

1 setosa 3.43

2 versicolor 2.77

3 virginica 2.97