impute_dt
描述
将data_frame的列输入其平均值、中位数或众数。
Usage
impute_dt(.data, …, .func = “mode”)
Arguments
.data | A data.frame |
---|---|
… | Columns to select |
.func | 字符,“模式”(默认),“平均值”或“中值”。也可以自己定义。 |
Pclass <- c(3, 1, 3, 1, 3, 2, 2, 3, NA, NA)
Sex <- c('male', 'male', 'female', 'female', 'female',
'female', NA, 'male', 'female', NA)
Age <- c(22, 38, 26, 35, NA,
45, 25, 39, 28, 40)
SibSp <- c(0, 1, 3, 1, 2, 3, 2, 2, NA, 0)
Fare <- c(7.25, 71.3, 7.92, NA, 8.05, 8.46, 51.9, 60, 32, 15)
Embarked <- c('S', NA, 'S', 'Q', 'Q', 'S', 'C', 'S', 'C', 'S')
data <- data.frame('Pclass' = Pclass,
'Sex' = Sex, 'Age' = Age, 'SibSp' = SibSp,
'Fare' = Fare, 'Embarked' = Embarked)
data
data %>% impute_dt() # defalut uses "mode" as `.func`
data %>% impute_dt(is.numeric,.func = "mean")
data %>% impute_dt(is.numeric,.func = "median")
my_fun = function(x){
x[is.na(x)] = (max(x,na.rm = TRUE) - min(x,na.rm = TRUE))/2
x
}
data %>% impute_dt(is.numeric,.func = my_fun)