数据框的来源

数据框一般有四种来源:在R中新建;由已有数据转换或处理得到;从文件中读取;内置数据集

新建和读取数据框

新建数据框需要用data.frame()函数

  1. > df <- data.frame(gene = paste0("gene",1:4),
  2. + change = rep(c("up","down"),each = 2),
  3. + score = c(5,3,-2,-4))
  4. > df
  5. gene change score
  6. 1 gene1 up 5
  7. 2 gene2 up 3
  8. 3 gene3 down -2
  9. 4 gene4 down -4

读取数据框需要将csv文件放在【Project文件夹内】,需要read.csv()函数

  1. > df2 <- read.csv("gene.csv")
  2. > df2
  3. gene change score
  4. 1 gene1 up 5
  5. 2 gene2 up 3
  6. 3 gene3 down -2
  7. 4 gene4 down -4

数据框属性

查看数据框的行数列数、行名列名

  1. > df <- data.frame(gene = paste0("gene",1:4),
  2. + change = rep(c("up","down"),each = 2),
  3. + score = c(5,3,-2,-4))
  4. > df
  5. gene change score
  6. 1 gene1 up 5
  7. 2 gene2 up 3
  8. 3 gene3 down -2
  9. 4 gene4 down -4
  10. > dim(df)
  11. [1] 4 3
  12. > nrow(df)
  13. [1] 4
  14. > ncol(df)
  15. [1] 3
  16. > rownames(df)
  17. [1] "1" "2" "3" "4"
  18. > colnames(df)
  19. [1] "gene" "change" "score"

数据框取子集

按坐标取子集

  1. > df[2,2]
  2. [1] "up"
  3. > df[2,]
  4. gene change score
  5. 2 gene2 up 3
  6. > df[,2]
  7. [1] "up" "up" "down" "down"
  8. > df[c(1,3),1:2]
  9. gene change
  10. 1 gene1 up
  11. 3 gene3 down

按名字取子集

  1. > df[,"gene"]
  2. [1] "gene1" "gene2" "gene3" "gene4"
  3. > df[,c('gene','change')]
  4. gene change
  5. 1 gene1 up
  6. 2 gene2 up
  7. 3 gene3 down
  8. 4 gene4 down

按条件取子集(逻辑值)

  1. > df[df$score>0,]
  2. gene change score
  3. 1 gene1 up 5
  4. 2 gene2 up 3

取最后一列(或者除最后一列)

  1. > df[,3]
  2. [1] 5 3 -2 -4
  3. > df[,ncol(df)]
  4. [1] 5 3 -2 -4
  5. > df[,-ncol(df)]
  6. gene change
  7. 1 gene1 up
  8. 2 gene2 up
  9. 3 gene3 down
  10. 4 gene4 down

筛选score>0的基因

  1. > df[df$score>0,1]
  2. [1] "gene1" "gene2"
  3. > df$gene[df$score>0]
  4. [1] "gene1" "gene2"

数据框编辑

改一个格子

  1. > df <- data.frame(gene = paste0("gene",1:4),
  2. + change = rep(c("up","down"),each = 2),
  3. + score = c(5,3,-2,-4))
  4. > df
  5. gene change score
  6. 1 gene1 up 5
  7. 2 gene2 up 3
  8. 3 gene3 down -2
  9. 4 gene4 down -4
  10. > df[3,3]<- 5
  11. > df
  12. gene change score
  13. 1 gene1 up 5
  14. 2 gene2 up 3
  15. 3 gene3 down 5
  16. 4 gene4 down -4

改一整列

  1. > df$score<-c(12,23,50,2)
  2. > df
  3. gene change score
  4. 1 gene1 up 12
  5. 2 gene2 up 23
  6. 3 gene3 down 50
  7. 4 gene4 down 2

新增一列

  1. > df$p.value <-c(0.01,0.02,0.07,0.05)
  2. > df
  3. gene change score p.value
  4. 1 gene1 up 12 0.01
  5. 2 gene2 up 23 0.02
  6. 3 gene3 down 50 0.07
  7. 4 gene4 down 2 0.05

改行名和列名

  1. > rownames(df) <- c("r1","r2","r3","r4")
  2. > df
  3. gene change score p.value
  4. r1 gene1 up 12 0.01
  5. r2 gene2 up 23 0.02
  6. r3 gene3 down 50 0.07
  7. r4 gene4 down 2 0.05

只修改某一行/列的名字

  1. > colnames(df)[2]="CHANGE"
  2. > df
  3. gene CHANGE score p.value
  4. r1 gene1 up 12 0.01
  5. r2 gene2 up 23 0.02
  6. r3 gene3 down 50 0.07
  7. r4 gene4 down 2 0.05

数据框进阶

行数较多的数据框可截取前/后几行查看

  1. > iris
  2. Sepal.Length Sepal.Width Petal.Length Petal.Width Species
  3. 1 5.1 3.5 1.4 0.2 setosa
  4. 2 4.9 3.0 1.4 0.2 setosa
  5. 3 4.7 3.2 1.3 0.2 setosa
  6. 4 4.6 3.1 1.5 0.2 setosa
  7. 5 5.0 3.6 1.4 0.2 setosa
  8. 6 5.4 3.9 1.7 0.4 setosa
  9. 7 4.6 3.4 1.4 0.3 setosa
  10. 8 5.0 3.4 1.5 0.2 setosa
  11. 9 4.4 2.9 1.4 0.2 setosa
  12. 10 4.9 3.1 1.5 0.1 setosa
  13. 11 5.4 3.7 1.5 0.2 setosa
  14. 12 4.8 3.4 1.6 0.2 setosa
  15. 13 4.8 3.0 1.4 0.1 setosa
  16. 14 4.3 3.0 1.1 0.1 setosa
  17. 15 5.8 4.0 1.2 0.2 setosa
  18. 16 5.7 4.4 1.5 0.4 setosa
  19. 17 5.4 3.9 1.3 0.4 setosa
  20. 18 5.1 3.5 1.4 0.3 setosa
  21. 19 5.7 3.8 1.7 0.3 setosa
  22. 20 5.1 3.8 1.5 0.3 setosa
  23. 21 5.4 3.4 1.7 0.2 setosa
  24. 22 5.1 3.7 1.5 0.4 setosa
  25. 23 4.6 3.6 1.0 0.2 setosa
  26. 24 5.1 3.3 1.7 0.5 setosa
  27. 25 4.8 3.4 1.9 0.2 setosa
  28. 26 5.0 3.0 1.6 0.2 setosa
  29. 27 5.0 3.4 1.6 0.4 setosa
  30. 28 5.2 3.5 1.5 0.2 setosa
  31. 29 5.2 3.4 1.4 0.2 setosa
  32. 30 4.7 3.2 1.6 0.2 setosa
  33. 31 4.8 3.1 1.6 0.2 setosa
  34. 32 5.4 3.4 1.5 0.4 setosa
  35. 33 5.2 4.1 1.5 0.1 setosa
  36. 34 5.5 4.2 1.4 0.2 setosa
  37. 35 4.9 3.1 1.5 0.2 setosa
  38. 36 5.0 3.2 1.2 0.2 setosa
  39. 37 5.5 3.5 1.3 0.2 setosa
  40. 38 4.9 3.6 1.4 0.1 setosa
  41. 39 4.4 3.0 1.3 0.2 setosa
  42. 40 5.1 3.4 1.5 0.2 setosa
  43. 41 5.0 3.5 1.3 0.3 setosa
  44. 42 4.5 2.3 1.3 0.3 setosa
  45. 43 4.4 3.2 1.3 0.2 setosa
  46. 44 5.0 3.5 1.6 0.6 setosa
  47. 45 5.1 3.8 1.9 0.4 setosa
  48. 46 4.8 3.0 1.4 0.3 setosa
  49. 47 5.1 3.8 1.6 0.2 setosa
  50. 48 4.6 3.2 1.4 0.2 setosa
  51. 49 5.3 3.7 1.5 0.2 setosa
  52. 50 5.0 3.3 1.4 0.2 setosa
  53. 51 7.0 3.2 4.7 1.4 versicolor
  54. 52 6.4 3.2 4.5 1.5 versicolor
  55. 53 6.9 3.1 4.9 1.5 versicolor
  56. 54 5.5 2.3 4.0 1.3 versicolor
  57. 55 6.5 2.8 4.6 1.5 versicolor
  58. 56 5.7 2.8 4.5 1.3 versicolor
  59. 57 6.3 3.3 4.7 1.6 versicolor
  60. 58 4.9 2.4 3.3 1.0 versicolor
  61. 59 6.6 2.9 4.6 1.3 versicolor
  62. 60 5.2 2.7 3.9 1.4 versicolor
  63. 61 5.0 2.0 3.5 1.0 versicolor
  64. 62 5.9 3.0 4.2 1.5 versicolor
  65. 63 6.0 2.2 4.0 1.0 versicolor
  66. 64 6.1 2.9 4.7 1.4 versicolor
  67. 65 5.6 2.9 3.6 1.3 versicolor
  68. 66 6.7 3.1 4.4 1.4 versicolor
  69. 67 5.6 3.0 4.5 1.5 versicolor
  70. 68 5.8 2.7 4.1 1.0 versicolor
  71. 69 6.2 2.2 4.5 1.5 versicolor
  72. 70 5.6 2.5 3.9 1.1 versicolor
  73. 71 5.9 3.2 4.8 1.8 versicolor
  74. 72 6.1 2.8 4.0 1.3 versicolor
  75. 73 6.3 2.5 4.9 1.5 versicolor
  76. 74 6.1 2.8 4.7 1.2 versicolor
  77. 75 6.4 2.9 4.3 1.3 versicolor
  78. 76 6.6 3.0 4.4 1.4 versicolor
  79. 77 6.8 2.8 4.8 1.4 versicolor
  80. 78 6.7 3.0 5.0 1.7 versicolor
  81. 79 6.0 2.9 4.5 1.5 versicolor
  82. 80 5.7 2.6 3.5 1.0 versicolor
  83. 81 5.5 2.4 3.8 1.1 versicolor
  84. 82 5.5 2.4 3.7 1.0 versicolor
  85. 83 5.8 2.7 3.9 1.2 versicolor
  86. 84 6.0 2.7 5.1 1.6 versicolor
  87. 85 5.4 3.0 4.5 1.5 versicolor
  88. 86 6.0 3.4 4.5 1.6 versicolor
  89. 87 6.7 3.1 4.7 1.5 versicolor
  90. 88 6.3 2.3 4.4 1.3 versicolor
  91. 89 5.6 3.0 4.1 1.3 versicolor
  92. 90 5.5 2.5 4.0 1.3 versicolor
  93. 91 5.5 2.6 4.4 1.2 versicolor
  94. 92 6.1 3.0 4.6 1.4 versicolor
  95. 93 5.8 2.6 4.0 1.2 versicolor
  96. 94 5.0 2.3 3.3 1.0 versicolor
  97. 95 5.6 2.7 4.2 1.3 versicolor
  98. 96 5.7 3.0 4.2 1.2 versicolor
  99. 97 5.7 2.9 4.2 1.3 versicolor
  100. 98 6.2 2.9 4.3 1.3 versicolor
  101. 99 5.1 2.5 3.0 1.1 versicolor
  102. 100 5.7 2.8 4.1 1.3 versicolor
  103. 101 6.3 3.3 6.0 2.5 virginica
  104. 102 5.8 2.7 5.1 1.9 virginica
  105. 103 7.1 3.0 5.9 2.1 virginica
  106. 104 6.3 2.9 5.6 1.8 virginica
  107. 105 6.5 3.0 5.8 2.2 virginica
  108. 106 7.6 3.0 6.6 2.1 virginica
  109. 107 4.9 2.5 4.5 1.7 virginica
  110. 108 7.3 2.9 6.3 1.8 virginica
  111. 109 6.7 2.5 5.8 1.8 virginica
  112. 110 7.2 3.6 6.1 2.5 virginica
  113. 111 6.5 3.2 5.1 2.0 virginica
  114. 112 6.4 2.7 5.3 1.9 virginica
  115. 113 6.8 3.0 5.5 2.1 virginica
  116. 114 5.7 2.5 5.0 2.0 virginica
  117. 115 5.8 2.8 5.1 2.4 virginica
  118. 116 6.4 3.2 5.3 2.3 virginica
  119. 117 6.5 3.0 5.5 1.8 virginica
  120. 118 7.7 3.8 6.7 2.2 virginica
  121. 119 7.7 2.6 6.9 2.3 virginica
  122. 120 6.0 2.2 5.0 1.5 virginica
  123. 121 6.9 3.2 5.7 2.3 virginica
  124. 122 5.6 2.8 4.9 2.0 virginica
  125. 123 7.7 2.8 6.7 2.0 virginica
  126. 124 6.3 2.7 4.9 1.8 virginica
  127. 125 6.7 3.3 5.7 2.1 virginica
  128. 126 7.2 3.2 6.0 1.8 virginica
  129. 127 6.2 2.8 4.8 1.8 virginica
  130. 128 6.1 3.0 4.9 1.8 virginica
  131. 129 6.4 2.8 5.6 2.1 virginica
  132. 130 7.2 3.0 5.8 1.6 virginica
  133. 131 7.4 2.8 6.1 1.9 virginica
  134. 132 7.9 3.8 6.4 2.0 virginica
  135. 133 6.4 2.8 5.6 2.2 virginica
  136. 134 6.3 2.8 5.1 1.5 virginica
  137. 135 6.1 2.6 5.6 1.4 virginica
  138. 136 7.7 3.0 6.1 2.3 virginica
  139. 137 6.3 3.4 5.6 2.4 virginica
  140. 138 6.4 3.1 5.5 1.8 virginica
  141. 139 6.0 3.0 4.8 1.8 virginica
  142. 140 6.9 3.1 5.4 2.1 virginica
  143. 141 6.7 3.1 5.6 2.4 virginica
  144. 142 6.9 3.1 5.1 2.3 virginica
  145. 143 5.8 2.7 5.1 1.9 virginica
  146. 144 6.8 3.2 5.9 2.3 virginica
  147. 145 6.7 3.3 5.7 2.5 virginica
  148. 146 6.7 3.0 5.2 2.3 virginica
  149. 147 6.3 2.5 5.0 1.9 virginica
  150. 148 6.5 3.0 5.2 2.0 virginica
  151. 149 6.2 3.4 5.4 2.3 virginica
  152. 150 5.9 3.0 5.1 1.8 virginica
  153. > head(iris)
  154. Sepal.Length Sepal.Width Petal.Length Petal.Width Species
  155. 1 5.1 3.5 1.4 0.2 setosa
  156. 2 4.9 3.0 1.4 0.2 setosa
  157. 3 4.7 3.2 1.3 0.2 setosa
  158. 4 4.6 3.1 1.5 0.2 setosa
  159. 5 5.0 3.6 1.4 0.2 setosa
  160. 6 5.4 3.9 1.7 0.4 setosa
  161. > head(iris,3)
  162. Sepal.Length Sepal.Width Petal.Length Petal.Width Species
  163. 1 5.1 3.5 1.4 0.2 setosa
  164. 2 4.9 3.0 1.4 0.2 setosa
  165. 3 4.7 3.2 1.3 0.2 setosa
  166. > tail(iris)
  167. Sepal.Length Sepal.Width Petal.Length Petal.Width Species
  168. 145 6.7 3.3 5.7 2.5 virginica
  169. 146 6.7 3.0 5.2 2.3 virginica
  170. 147 6.3 2.5 5.0 1.9 virginica
  171. 148 6.5 3.0 5.2 2.0 virginica
  172. 149 6.2 3.4 5.4 2.3 virginica
  173. 150 5.9 3.0 5.1 1.8 virginica

行列数都多的数据框可取前几行前几列查看

  1. > iris[1:3,1:3]
  2. Sepal.Length Sepal.Width Petal.Length
  3. 1 5.1 3.5 1.4
  4. 2 4.9 3.0 1.4
  5. 3 4.7 3.2 1.3

查看每一列的数据类型和具体内容

  1. > str(df)
  2. 'data.frame': 4 obs. of 4 variables:
  3. $ gene : chr "gene1" "gene2" "gene3" "gene4"
  4. $ CHANGE : chr "up" "up" "down" "down"
  5. $ score : num 12 23 50 2
  6. $ p.value: num 0.01 0.02 0.07 0.05
  7. > str(iris)
  8. 'data.frame': 150 obs. of 5 variables:
  9. $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
  10. $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
  11. $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
  12. $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
  13. $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

去除含有缺失值的行

  1. > df<-data.frame(X1 = LETTERS[1:5],X2 = 1:5)
  2. > df[2,2] <- NA
  3. > df[4,1] <- NA
  4. > df
  5. X1 X2
  6. 1 A 1
  7. 2 B NA
  8. 3 C 3
  9. 4 <NA> 4
  10. 5 E 5
  11. > na.omit(df)
  12. X1 X2
  13. 1 A 1
  14. 3 C 3
  15. 5 E 5

两个表格的链接

  1. > test1 <- data.frame(name = c('jimmy','nicker','doodle'),
  2. + blood_type = c("A","B","O"))
  3. > test1
  4. name blood_type
  5. 1 jimmy A
  6. 2 nicker B
  7. 3 doodle O
  8. > test2 <- data.frame(name = c('doodle','jimmy','nicker','tony'),
  9. + group = c("group1","group1","group2","group2"),
  10. + vision = c(4.2,4.3,4.9,4.5))
  11. > test2
  12. name group vision
  13. 1 doodle group1 4.2
  14. 2 jimmy group1 4.3
  15. 3 nicker group2 4.9
  16. 4 tony group2 4.5
  17. >
  18. > test3 <- data.frame(NAME = c('doodle','jimmy','lucy','nicker'),
  19. + weight = c(140,145,110,138))
  20. > tmp =merge(test1,test2,by="name")
  21. > merge(test1,test3,by.x = "name",by.y = "NAME")
  22. name blood_type weight
  23. 1 doodle O 140
  24. 2 jimmy A 145
  25. 3 nicker B 138

矩阵和列表

矩阵

  1. > m <- matrix(1:9, nrow = 3)
  2. > colnames(m) <- c("a","b","c")
  3. > m
  4. a b c
  5. [1,] 1 4 7
  6. [2,] 2 5 8
  7. [3,] 3 6 9
  8. > m[2,]
  9. a b c
  10. 2 5 8
  11. > m[,1]
  12. [1] 1 2 3
  13. > m[2,3]
  14. c
  15. 8
  16. > m[2:3,1:2]
  17. a b
  18. [1,] 2 5
  19. [2,] 3 6
  20. > m
  21. a b c
  22. [1,] 1 4 7
  23. [2,] 2 5 8
  24. [3,] 3 6 9
  25. > t(m)
  26. [,1] [,2] [,3]
  27. a 1 2 3
  28. b 4 5 6
  29. c 7 8 9
  30. > as.data.frame(m)
  31. a b c
  32. 1 1 4 7
  33. 2 2 5 8
  34. 3 3 6 9

列表

  1. > l <- list(m=matrix(1:9, nrow = 3),
  2. + df=data.frame(gene = paste0("gene",1:3),
  3. + sam = paste0("sample",1:3),
  4. + exp = c(32,34,45)),
  5. + x=c(1,3,5))
  6. > l
  7. $m
  8. [,1] [,2] [,3]
  9. [1,] 1 4 7
  10. [2,] 2 5 8
  11. [3,] 3 6 9
  12. $df
  13. gene sam exp
  14. 1 gene1 sample1 32
  15. 2 gene2 sample2 34
  16. 3 gene3 sample3 45
  17. $x
  18. [1] 1 3 5

给元素命名

  1. > scores = c(100,59,73,95,45)
  2. > names(scores) = c("jimmy","nicker","lucy","doodle","tony")
  3. > scores
  4. jimmy nicker lucy doodle tony
  5. 100 59 73 95 45
  6. > scores["jimmy"]
  7. jimmy
  8. 100
  9. > scores[c("jimmy","nicker")]
  10. jimmy nicker
  11. 100 59
  12. >
  13. > names(scores)[scores>60]
  14. [1] "jimmy" "lucy" "doodle"

删除

  1. > rm(l)
  2. > rm(df,m)
  3. > rm(list = ls())