介绍

数据归一化和标准化都是scaling,常用Normalization或Standardization表示。记录下R实现不同scaling方法。更多知识分享请到 https://zouhua.top/

标准化R实现

  • Median scale normalization
  • Robust scale normalization
  • Unit scale normalization
  • z-scale normalization
  • Min-Max normalization
  1. # method1: Median scale normalization
  2. MDA_fun <- function(features){
  3. # x for features X = (x1, x2, ..., xn)
  4. value <- as.numeric(features)
  5. d_mad <- mad(value)
  6. x_scale <- (value - median(value))/d_mad
  7. return(x_scale)
  8. }
  9. dat_s1_MDA <- apply(dat, 1, MDA_fun)
  10. rownames(dat_s1_MDA) <- colnames(dat)
  11. # method2: Robust scale normalization
  12. Robust_fun <- function(features){
  13. # x for features X = (x1, x2, ..., xn)
  14. value <- as.numeric(features)
  15. q_value <- as.numeric(quantile(value))
  16. remain_value <- value[value > q_value[2] & value < q_value[4]]
  17. mean_value <- mean(remain_value)
  18. sd_value <- sd(remain_value)
  19. x_scale <- (value - mean_value)/sd_value
  20. return(x_scale)
  21. }
  22. # method3: Unit scale normalization
  23. Unit_fun <- function(samples){
  24. # v for samples v = (v1, v2, ..., vn)
  25. value <- as.numeric(samples)
  26. x_scale <- value / sqrt(sum(value^2))
  27. return(x_scale)
  28. }
  29. # method4: z-scale normalization
  30. Zscore_fun <- function(features){
  31. # x for features X = (x1, x2, ..., xn)
  32. value <- as.numeric(features)
  33. mean_value <- mean(value)
  34. sd_value <- sd(value)
  35. x_scale <- (value - mean_value)/sd_value
  36. return(x_scale)
  37. }
  38. # method5: Min-Max normalization
  39. Min_Max_fun <- function(features){
  40. # x for features X = (x1, x2, ..., xn)
  41. value <- as.numeric(features)
  42. min_value <- min(value)
  43. max_value <- max(value)
  44. x_scale <- (value - min_value)/(max_value - min_value)
  45. return(x_scale)
  46. }

method1 2 4 5 的scaling的计算方式为减一个统计量再除以一个统计量,method3除以向量自身的长度,前者适合行向量,后者适合列向量,当然也不一定。

参考

  1. Data Normalization With R
  2. Median Absolute Deviation

参考文章如引起任何侵权问题,可以与我联系,谢谢。