- ############# filter函数
- Sepal.Length Sepal.Width Petal.Length Petal.Width Species
- 1 5.1 3.5 1.4 0.2 setosa
- 2 4.9 3.0 1.4 0.2 setosa
- 3 4.7 3.2 1.3 0.2 setosa
- 4 4.6 3.1 1.5 0.2 setosa
- 5 5.0 3.6 1.4 0.2 setosa
- 6 5.4 3.9 1.7 0.4 setosa
- Sepal.Length Sepal.Width Petal.Length Petal.Width Species
- 1 5.1 3.5 1.4 0.2 setosa
- 2 4.9 3.0 1.4 0.2 setosa
- 3 4.7 3.2 1.3 0.2 setosa
- 4 4.6 3.1 1.5 0.2 setosa
- A tibble: 3 × 1
- x
- 1 1
- 2 3
- 3 4
- X%in%Y:返回所有X中有Y值的位置的值
- # A tibble: 50 × 5
- Sepal.Length Sepal.Width Petal.Length Petal.Width Species
- 1 5.1 3.5 1.4 0.2 setosa
- 2 4.9 3 1.4 0.2 setosa
- 3 4.7 3.2 1.3 0.2 setosa
- Sepal.Length Sepal.Width Petal.Length Petal.Width Species
- 1 4.3 3.0 1.1 0.1 setosa
- 2 4.4 2.9 1.4 0.2 setosa
- 3 4.4 3.0 1.3 0.2 setosa
- 4 4.4 3.2 1.3 0.2 setosa
- Sepal.Width
- 1 3.5
- 2 3.0
- 3 3.2
- # A tibble: 150 × 6
- Sepal.Length Sepal.Width Petal.Length Petal.Width Species total_length
- 1 5.1 3.5 1.4 0.2 setosa 8.6
- 2 4.9 3 1.4 0.2 setosa 7.9
- 3 4.7 3.2 1.3 0.2 setosa 7.9
- A tibble: 1 × 1
- mean_sepal
- #
- # 1 3.06
- A tibble: 3 × 2
- Species mean_sepal
- 1 setosa 3.43
- 2 versicolor 2.77
- 3 virginica 2.97
基础函数
- 五大基础函数
- 选择行(obsevations):filter()
- 重排行:arrange()
- 选择列(variables):select
- 创建列:mutate()
- 折叠值并汇总:summarize()
- 函数特点:
- 以上函数,均可搭配group_by(),将数据分组
- 函数的工作方式类似:
- 第一:指定输入的数据框
- 第二:指要做什么
- 第三:输出均为新的数据框(意味着:不改变原有数据)
- 一套标准的比较算法:>, >=, <, <=, != (not equal), and == (equal), 兼容布尔逻辑:& is “and,” | is “or,” and ! is “not.”
- 前面基础函数的介绍已经有同学介绍,我就不再多赘述,用例子来体会一下
- filter(): 根据特定条件选择(过滤)行
```
############# filter函数
head(iris)Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
filter取sepal大于3的所有行
filter(iris, Sepal.Width >=3)
filter取sepal大于3或者Petal=1.4的所有行
filter(iris, Sepal.Width >=3 | Petal.Length == 1.4)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
filter也可以用来过滤缺失值
df <- tibble(x = c(1, NA, 3,4)) filter(df, !is.na(x))
A tibble: 3 × 1
x
1 1
2 3
3 4
补充 %in%
X%in%Y:返回所有X中有Y值的位置的值
filter(iris, Species %in% c(“setosa”))
# A tibble: 50 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
2. arrange(): 与filter()类似也是操作行,但与filter()不同是,不选择,只对行进行排序,自定义为升序,na值总排在最后。降序参数用desc()
############# arrange函数
arrange(iris, Species, Sepal.Length)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.3 3.0 1.1 0.1 setosa
2 4.4 2.9 1.4 0.2 setosa
3 4.4 3.0 1.3 0.2 setosa
4 4.4 3.2 1.3 0.2 setosa
3. select():选择列1. 特点:有一些帮助功能,不用正则表达式也可以提取特定字符的列1. starts_with("abc") matches names that begin with “abc”.1. ends_with("xyz") matches names that end with “xyz”.1. contains("ijk") matches names that contain “ijk”.1. matches("(.)\\1") selects variables that match a regular expression.1. num_range("x", 1:3) matches x1, x2, and x3.
select(iris,Sepal.Width)
Sepal.Width
1 3.5
2 3.0
3 3.2
4.mutate(): 新增列
mutate(iris, total_length = Sepal.Length + Sepal.Width)
# A tibble: 150 × 6
Sepal.Length Sepal.Width Petal.Length Petal.Width Species total_length
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
5. summarize():折叠值
summarise(iris, mean_sepal = mean(Sepal.Width))
A tibble: 1 × 1
mean_sepal
#
# 1 3.06
summarise(group_by(iris, Species), mean_sepal = mean(Sepal.Width))
A tibble: 3 × 2
Species mean_sepal
1 setosa 3.43
2 versicolor 2.77
3 virginica 2.97
```
2021.12.2
I one
