取列 只认列名,不能用下标```r

library(data.table)

各列长度不一,可自动补齐

dt = data.table(v1 = c(1,2),v2 = LETTERS[1:3],v3 = round(rnorm(12,2,2)),

  • v4 = sample(1:20,12));dt v1 v2 v3 v4 1: 1 A 3 20 2: 2 B 0 10 3: 1 C 1 11 4: 2 A 2 9 5: 1 B 2 16 6: 2 C 2 7 7: 1 A -1 17 8: 2 B 2 12 9: 1 C 1 3 10: 2 A 1 18 11: 1 B 2 4 12: 2 C 4 19 dt[3:6] #取行 v1 v2 v3 v4 1: 1 C 1 11 2: 2 A 2 9 3: 1 B 2 16 4: 2 C 2 7 dt[v2 ==’B’] v1 v2 v3 v4 1: 2 B 0 10 2: 1 B 2 16 3: 2 B 2 12 4: 1 B 2 4 dt[v2 ==’B’,] #取行,加不加逗号,结果一样 v1 v2 v3 v4 1: 2 B 0 10 2: 1 B 2 16 3: 2 B 2 12 4: 1 B 2 4 dt[v2 %in% c(“A”,”B”)] #返回有AB的行 v1 v2 v3 v4 1: 1 A 3 20 2: 2 B 0 10 3: 2 A 2 9 4: 1 B 2 16 5: 1 A -1 17 6: 2 B 2 12 7: 2 A 1 18 8: 1 B 2 4
    1. 取列 只认列名,不能用下标<a name="d41d8cd9"></a>
    2. #
    3. ```r
    4. # 取列 只认列名,不能用下标
    5. dt[,list(v1,v2)]
    6. v1 v2
    7. 1: 1 A
    8. 2: 2 B
    9. 3: 1 C
    10. 4: 2 A
    11. 5: 1 B
    12. 6: 2 C
    13. 7: 1 A
    14. 8: 2 B
    15. 9: 1 C
    16. 10: 2 A
    17. 11: 1 B
    18. 12: 2 C
    19. dt[,v1]
    20. [1] 1 2 1 2 1 2 1 2 1 2 1 2
    21. dt[,sum(v4)] #取列的同时同时对其操作,sum mean
    22. [1] 146
    23. dt[,list(sum_v3 = sum(v3),mean_v4 = mean(v4))]
    24. sum_v3 mean_v4
    25. 1: 19 12.16667
    26. dt[,.(sum_v3 = sum(v3),mean_v4 = mean(v4))] #另一种方法
    27. sum_v3 mean_v4
    28. 1: 19 12.16667
  1. > dt1 = dt[,list(v5 = v4 + 1,v6 = v3 +1)];dt1
  2. v5 v6
  3. 1: 21 4
  4. 2: 11 1
  5. 3: 12 2
  6. 4: 10 3
  7. 5: 17 3
  8. 6: 8 3
  9. 7: 18 0
  10. 8: 13 3
  11. 9: 4 2
  12. 10: 19 2
  13. 11: 5 3
  14. 12: 20 5
  15. > dt[,list(print(v2),plot(1:12,v3,col = 'red'))]
  16. [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C"
  17. V1
  18. 1: A
  19. 2: B
  20. 3: C
  21. 4: A
  22. 5: B
  23. 6: C
  24. 7: A
  25. 8: B
  26. 9: C
  27. 10: A
  28. 11: B
  29. 12: C
  30. > dt[,{print(v2);plot(1:12,v3,col = 'red')}]
  31. [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C"
  32. NULL

image.png
作图一样,但是后面那个出现了一个null,不懂

  1. > dt[,list(sum_v3 = sum(v3),mean_v4 = mean(v4)),by = v2] # 根据V2计算
  2. v2 sum_v3 mean_v4
  3. 1: A 5 16.0
  4. 2: B 6 10.5
  5. 3: C 8 10.0
  6. > dt[,list(sum_v3 = sum(v3),mean_v4 = mean(v4)),by = list(v2,v1)]
  7. v2 v1 sum_v3 mean_v4
  8. 1: A 1 2 18.5
  9. 2: B 2 2 11.0
  10. 3: C 1 2 7.0
  11. 4: A 2 3 13.5
  12. 5: B 1 4 10.0
  13. 6: C 2 6 13.0
  14. > dt[1:6,list(sum_v3 = sum(v3),mean_v4 = mean(v4)),by = v2]
  15. v2 sum_v3 mean_v4
  16. 1: A 5 14.5
  17. 2: B 2 13.0
  18. 3: C 3 9.0
  19. > dt[,.N,by =list(v1,v2)] #N计算频数
  20. v1 v2 N
  21. 1: 1 A 2
  22. 2: 2 B 2
  23. 3: 1 C 2
  24. 4: 2 A 2
  25. 5: 1 B 2
  26. 6: 2 C 2

增加列 :=特殊符号

  1. > # 增加列 :=特殊符号
  2. > dt[,v5 := v4+1];head(dt)
  3. v1 v2 v3 v4 v5 v6
  4. 1: 1 A 3 20 21 21
  5. 2: 2 B 0 10 11 11
  6. 3: 1 C 1 11 12 12
  7. 4: 2 A 2 9 10 10
  8. 5: 1 B 2 16 17 17
  9. 6: 2 C 2 7 8 8
  10. > #增加两列
  11. > dt[,c("v5","v6") := list(v3 +1,v4+1)] ;head(dt)
  12. v1 v2 v3 v4 v5 v6
  13. 1: 1 A 3 20 4 21
  14. 2: 2 B 0 10 1 11
  15. 3: 1 C 1 11 2 12
  16. 4: 2 A 2 9 3 10
  17. 5: 1 B 2 16 3 17
  18. 6: 2 C 2 7 3 8

setkey 设置关键变量

  1. > setkey(dt,v2) #类似attach(data) 会改变作用环境,慎用
  2. > dt[c("A","B")] #直接将作用环境设置到了V2
  3. v1 v2 v3 v4 v5 v6
  4. 1: 1 A 3 20 4 21
  5. 2: 2 A 2 9 3 10
  6. 3: 1 A -1 17 0 18
  7. 4: 2 A 1 18 2 19
  8. 5: 2 B 0 10 1 11
  9. 6: 1 B 2 16 3 17
  10. 7: 2 B 2 12 3 13
  11. 8: 1 B 2 4 3 5
  12. > # nomatch
  13. > dt[c("A","D"),nomatch = 0] #未匹配到的不会显示为NA
  14. v1 v2 v3 v4 v5 v6
  15. 1: 1 A 3 20 4 21
  16. 2: 2 A 2 9 3 10
  17. 3: 1 A -1 17 0 18
  18. 4: 2 A 1 18 2 19
  19. > dt[c("A","D")]
  20. v1 v2 v3 v4 v5 v6
  21. 1: 1 A 3 20 4 21
  22. 2: 2 A 2 9 3 10
  23. 3: 1 A -1 17 0 18
  24. 4: 2 A 1 18 2 19
  25. 5: NA D NA NA NA NA

直接取值[][]

仍需要研究

  1. > dt[,list(sum_v4 = sum(v4)),by = v2][sum_v4 >20]
  2. v2 sum_v4
  3. 1: A 47
  4. 2: B 37