tidyfst包 - 缺失值处理 - 《R语言》

转储、替换和填充data.frame中缺失的值

描述

一组处理data.frames中缺失值的工具。它可以转储，替换，填充(与next或或根据缺失的值删除条目。

Usage

drop_na_dt(.data, …)

replace_na_dt(.data, …, to)

delete_na_cols(.data, prop = NULL, n = NULL)

delete_na_rows(.data, prop = NULL, n = NULL)

fill_na_dt(.data, …, direction = “down”)

shift_fill(x, direction = “down”)

Arguments

.data	data.frame
…	Colunms to be replaced or filled. If not specified, use all columns.
to	What value should NA replace by?用什么值来代替缺失值
prop	If proportion of NAs is larger than or equal to “prop”, would be deleted.
n	If number of NAs is larger than or equal to “n”, would be deleted.
direction	Direction in which to fill missing values. Currently either “down” (the default) or “up”.
x	A vector with missing values to be filled.

detail

drop_na_dt删除特定列中带有NAs的条目。

fill_na_dt用前面(“向下”)或下面(“向上”)的观察结果填充NAs，这也被称为最后的观察结果向前推进(LOCF)和下一个观察结果向后推进(NOCB)。

delete_na_cols可以删除NA比例大于或等于“prop”或NA数量大于或等于“n”的列，delete_na_rows的工作方式类似，但是处理的是行。

shift_fill可以用缺失的值填充向量。

library(tidyfst)
df <- data.table(x = c(1, 2, NA), y = c("a", NA, "b"))
df %>% drop_na_dt()
df %>% drop_na_dt(x)
df %>% drop_na_dt(y)
df %>% drop_na_dt(x,y)
df %>% replace_na_dt(to = 0)
df %>% replace_na_dt(x,to = 0)
df %>% replace_na_dt(y,to = 0)
df %>% replace_na_dt(x,y,to = 0)
df %>% fill_na_dt(x)
df %>% fill_na_dt() # not specified, fill all columns
df %>% fill_na_dt(y,direction = "up")
x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x
x %>% delete_na_cols() #将全部为缺失值的列删除
x %>% delete_na_cols(prop = 0.75)
x %>% delete_na_cols(prop = 0.5)
x %>% delete_na_cols(prop = 0.24)
x %>% delete_na_cols(n = 2)
x %>% delete_na_rows(prop = 0.6)
x %>% delete_na_rows(n = 2)
# shift_fill
y = c("a",NA,"b",NA,"c")
shift_fill(y) # equals to shift_fill(y,"down")
shift_fill(y,"up")