转储、替换和填充data.frame中缺失的值
描述
一组处理data.frames中缺失值的工具。它可以转储,替换,填充(与next或或根据缺失的值删除条目。
Usage
drop_na_dt(.data, …)
replace_na_dt(.data, …, to)
delete_na_cols(.data, prop = NULL, n = NULL)
delete_na_rows(.data, prop = NULL, n = NULL)
fill_na_dt(.data, …, direction = “down”)
shift_fill(x, direction = “down”)
Arguments
.data | data.frame |
---|---|
… | Colunms to be replaced or filled. If not specified, use all columns. |
to | What value should NA replace by?用什么值来代替缺失值 |
prop | If proportion of NAs is larger than or equal to “prop”, would be deleted. |
n | If number of NAs is larger than or equal to “n”, would be deleted. |
direction | Direction in which to fill missing values. Currently either “down” (the default) or “up”. |
x | A vector with missing values to be filled. |
detail
drop_na_dt删除特定列中带有NAs的条目。
fill_na_dt用前面(“向下”)或下面(“向上”)的观察结果填充NAs,这也被称为最后的观察结果向前推进(LOCF)和下一个观察结果向后推进(NOCB)。
delete_na_cols可以删除NA比例大于或等于“prop”或NA数量大于或等于“n”的列,delete_na_rows的工作方式类似,但是处理的是行。
shift_fill可以用缺失的值填充向量。
library(tidyfst)
df <- data.table(x = c(1, 2, NA), y = c("a", NA, "b"))
df %>% drop_na_dt()
df %>% drop_na_dt(x)
df %>% drop_na_dt(y)
df %>% drop_na_dt(x,y)
df %>% replace_na_dt(to = 0)
df %>% replace_na_dt(x,to = 0)
df %>% replace_na_dt(y,to = 0)
df %>% replace_na_dt(x,y,to = 0)
df %>% fill_na_dt(x)
df %>% fill_na_dt() # not specified, fill all columns
df %>% fill_na_dt(y,direction = "up")
x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5),z = rep(NA,4))
x
x %>% delete_na_cols() #将全部为缺失值的列删除
x %>% delete_na_cols(prop = 0.75)
x %>% delete_na_cols(prop = 0.5)
x %>% delete_na_cols(prop = 0.24)
x %>% delete_na_cols(n = 2)
x %>% delete_na_rows(prop = 0.6)
x %>% delete_na_rows(n = 2)
# shift_fill
y = c("a",NA,"b",NA,"c")
shift_fill(y) # equals to shift_fill(y,"down")
shift_fill(y,"up")