impute_dt


    描述

    将data_frame的列输入其平均值、中位数或众数。

    Usage

    impute_dt(.data, …, .func = “mode”)

    Arguments

    .data A data.frame
    Columns to select
    .func 字符,“模式”(默认),“平均值”或“中值”。也可以自己定义。
    1. Pclass <- c(3, 1, 3, 1, 3, 2, 2, 3, NA, NA)
    2. Sex <- c('male', 'male', 'female', 'female', 'female',
    3. 'female', NA, 'male', 'female', NA)
    4. Age <- c(22, 38, 26, 35, NA,
    5. 45, 25, 39, 28, 40)
    6. SibSp <- c(0, 1, 3, 1, 2, 3, 2, 2, NA, 0)
    7. Fare <- c(7.25, 71.3, 7.92, NA, 8.05, 8.46, 51.9, 60, 32, 15)
    8. Embarked <- c('S', NA, 'S', 'Q', 'Q', 'S', 'C', 'S', 'C', 'S')
    9. data <- data.frame('Pclass' = Pclass,
    10. 'Sex' = Sex, 'Age' = Age, 'SibSp' = SibSp,
    11. 'Fare' = Fare, 'Embarked' = Embarked)
    12. data
    13. data %>% impute_dt() # defalut uses "mode" as `.func`
    14. data %>% impute_dt(is.numeric,.func = "mean")
    15. data %>% impute_dt(is.numeric,.func = "median")
    16. my_fun = function(x){
    17. x[is.na(x)] = (max(x,na.rm = TRUE) - min(x,na.rm = TRUE))/2
    18. x
    19. }
    20. data %>% impute_dt(is.numeric,.func = my_fun)