数据框的来源
数据框一般有四种来源:在R中新建;由已有数据转换或处理得到;从文件中读取;内置数据集
新建和读取数据框
新建数据框需要用data.frame()函数
> df <- data.frame(gene = paste0("gene",1:4),+ change = rep(c("up","down"),each = 2),+ score = c(5,3,-2,-4))> dfgene change score1 gene1 up 52 gene2 up 33 gene3 down -24 gene4 down -4
读取数据框需要将csv文件放在【Project文件夹内】,需要read.csv()函数
> df2 <- read.csv("gene.csv")> df2gene change score1 gene1 up 52 gene2 up 33 gene3 down -24 gene4 down -4
数据框属性
查看数据框的行数列数、行名列名
> df <- data.frame(gene = paste0("gene",1:4),+ change = rep(c("up","down"),each = 2),+ score = c(5,3,-2,-4))> dfgene change score1 gene1 up 52 gene2 up 33 gene3 down -24 gene4 down -4> dim(df)[1] 4 3> nrow(df)[1] 4> ncol(df)[1] 3> rownames(df)[1] "1" "2" "3" "4"> colnames(df)[1] "gene" "change" "score"
数据框取子集
按坐标取子集
> df[2,2][1] "up"> df[2,]gene change score2 gene2 up 3> df[,2][1] "up" "up" "down" "down"> df[c(1,3),1:2]gene change1 gene1 up3 gene3 down
按名字取子集
> df[,"gene"][1] "gene1" "gene2" "gene3" "gene4"> df[,c('gene','change')]gene change1 gene1 up2 gene2 up3 gene3 down4 gene4 down
按条件取子集(逻辑值)
> df[df$score>0,]gene change score1 gene1 up 52 gene2 up 3
取最后一列(或者除最后一列)
> df[,3][1] 5 3 -2 -4> df[,ncol(df)][1] 5 3 -2 -4> df[,-ncol(df)]gene change1 gene1 up2 gene2 up3 gene3 down4 gene4 down
筛选score>0的基因
> df[df$score>0,1][1] "gene1" "gene2"> df$gene[df$score>0][1] "gene1" "gene2"
数据框编辑
改一个格子
> df <- data.frame(gene = paste0("gene",1:4),+ change = rep(c("up","down"),each = 2),+ score = c(5,3,-2,-4))> dfgene change score1 gene1 up 52 gene2 up 33 gene3 down -24 gene4 down -4> df[3,3]<- 5> dfgene change score1 gene1 up 52 gene2 up 33 gene3 down 54 gene4 down -4
改一整列
> df$score<-c(12,23,50,2)> dfgene change score1 gene1 up 122 gene2 up 233 gene3 down 504 gene4 down 2
新增一列
> df$p.value <-c(0.01,0.02,0.07,0.05)> dfgene change score p.value1 gene1 up 12 0.012 gene2 up 23 0.023 gene3 down 50 0.074 gene4 down 2 0.05
改行名和列名
> rownames(df) <- c("r1","r2","r3","r4")> dfgene change score p.valuer1 gene1 up 12 0.01r2 gene2 up 23 0.02r3 gene3 down 50 0.07r4 gene4 down 2 0.05
只修改某一行/列的名字
> colnames(df)[2]="CHANGE"> dfgene CHANGE score p.valuer1 gene1 up 12 0.01r2 gene2 up 23 0.02r3 gene3 down 50 0.07r4 gene4 down 2 0.05
数据框进阶
行数较多的数据框可截取前/后几行查看
> irisSepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa7 4.6 3.4 1.4 0.3 setosa8 5.0 3.4 1.5 0.2 setosa9 4.4 2.9 1.4 0.2 setosa10 4.9 3.1 1.5 0.1 setosa11 5.4 3.7 1.5 0.2 setosa12 4.8 3.4 1.6 0.2 setosa13 4.8 3.0 1.4 0.1 setosa14 4.3 3.0 1.1 0.1 setosa15 5.8 4.0 1.2 0.2 setosa16 5.7 4.4 1.5 0.4 setosa17 5.4 3.9 1.3 0.4 setosa18 5.1 3.5 1.4 0.3 setosa19 5.7 3.8 1.7 0.3 setosa20 5.1 3.8 1.5 0.3 setosa21 5.4 3.4 1.7 0.2 setosa22 5.1 3.7 1.5 0.4 setosa23 4.6 3.6 1.0 0.2 setosa24 5.1 3.3 1.7 0.5 setosa25 4.8 3.4 1.9 0.2 setosa26 5.0 3.0 1.6 0.2 setosa27 5.0 3.4 1.6 0.4 setosa28 5.2 3.5 1.5 0.2 setosa29 5.2 3.4 1.4 0.2 setosa30 4.7 3.2 1.6 0.2 setosa31 4.8 3.1 1.6 0.2 setosa32 5.4 3.4 1.5 0.4 setosa33 5.2 4.1 1.5 0.1 setosa34 5.5 4.2 1.4 0.2 setosa35 4.9 3.1 1.5 0.2 setosa36 5.0 3.2 1.2 0.2 setosa37 5.5 3.5 1.3 0.2 setosa38 4.9 3.6 1.4 0.1 setosa39 4.4 3.0 1.3 0.2 setosa40 5.1 3.4 1.5 0.2 setosa41 5.0 3.5 1.3 0.3 setosa42 4.5 2.3 1.3 0.3 setosa43 4.4 3.2 1.3 0.2 setosa44 5.0 3.5 1.6 0.6 setosa45 5.1 3.8 1.9 0.4 setosa46 4.8 3.0 1.4 0.3 setosa47 5.1 3.8 1.6 0.2 setosa48 4.6 3.2 1.4 0.2 setosa49 5.3 3.7 1.5 0.2 setosa50 5.0 3.3 1.4 0.2 setosa51 7.0 3.2 4.7 1.4 versicolor52 6.4 3.2 4.5 1.5 versicolor53 6.9 3.1 4.9 1.5 versicolor54 5.5 2.3 4.0 1.3 versicolor55 6.5 2.8 4.6 1.5 versicolor56 5.7 2.8 4.5 1.3 versicolor57 6.3 3.3 4.7 1.6 versicolor58 4.9 2.4 3.3 1.0 versicolor59 6.6 2.9 4.6 1.3 versicolor60 5.2 2.7 3.9 1.4 versicolor61 5.0 2.0 3.5 1.0 versicolor62 5.9 3.0 4.2 1.5 versicolor63 6.0 2.2 4.0 1.0 versicolor64 6.1 2.9 4.7 1.4 versicolor65 5.6 2.9 3.6 1.3 versicolor66 6.7 3.1 4.4 1.4 versicolor67 5.6 3.0 4.5 1.5 versicolor68 5.8 2.7 4.1 1.0 versicolor69 6.2 2.2 4.5 1.5 versicolor70 5.6 2.5 3.9 1.1 versicolor71 5.9 3.2 4.8 1.8 versicolor72 6.1 2.8 4.0 1.3 versicolor73 6.3 2.5 4.9 1.5 versicolor74 6.1 2.8 4.7 1.2 versicolor75 6.4 2.9 4.3 1.3 versicolor76 6.6 3.0 4.4 1.4 versicolor77 6.8 2.8 4.8 1.4 versicolor78 6.7 3.0 5.0 1.7 versicolor79 6.0 2.9 4.5 1.5 versicolor80 5.7 2.6 3.5 1.0 versicolor81 5.5 2.4 3.8 1.1 versicolor82 5.5 2.4 3.7 1.0 versicolor83 5.8 2.7 3.9 1.2 versicolor84 6.0 2.7 5.1 1.6 versicolor85 5.4 3.0 4.5 1.5 versicolor86 6.0 3.4 4.5 1.6 versicolor87 6.7 3.1 4.7 1.5 versicolor88 6.3 2.3 4.4 1.3 versicolor89 5.6 3.0 4.1 1.3 versicolor90 5.5 2.5 4.0 1.3 versicolor91 5.5 2.6 4.4 1.2 versicolor92 6.1 3.0 4.6 1.4 versicolor93 5.8 2.6 4.0 1.2 versicolor94 5.0 2.3 3.3 1.0 versicolor95 5.6 2.7 4.2 1.3 versicolor96 5.7 3.0 4.2 1.2 versicolor97 5.7 2.9 4.2 1.3 versicolor98 6.2 2.9 4.3 1.3 versicolor99 5.1 2.5 3.0 1.1 versicolor100 5.7 2.8 4.1 1.3 versicolor101 6.3 3.3 6.0 2.5 virginica102 5.8 2.7 5.1 1.9 virginica103 7.1 3.0 5.9 2.1 virginica104 6.3 2.9 5.6 1.8 virginica105 6.5 3.0 5.8 2.2 virginica106 7.6 3.0 6.6 2.1 virginica107 4.9 2.5 4.5 1.7 virginica108 7.3 2.9 6.3 1.8 virginica109 6.7 2.5 5.8 1.8 virginica110 7.2 3.6 6.1 2.5 virginica111 6.5 3.2 5.1 2.0 virginica112 6.4 2.7 5.3 1.9 virginica113 6.8 3.0 5.5 2.1 virginica114 5.7 2.5 5.0 2.0 virginica115 5.8 2.8 5.1 2.4 virginica116 6.4 3.2 5.3 2.3 virginica117 6.5 3.0 5.5 1.8 virginica118 7.7 3.8 6.7 2.2 virginica119 7.7 2.6 6.9 2.3 virginica120 6.0 2.2 5.0 1.5 virginica121 6.9 3.2 5.7 2.3 virginica122 5.6 2.8 4.9 2.0 virginica123 7.7 2.8 6.7 2.0 virginica124 6.3 2.7 4.9 1.8 virginica125 6.7 3.3 5.7 2.1 virginica126 7.2 3.2 6.0 1.8 virginica127 6.2 2.8 4.8 1.8 virginica128 6.1 3.0 4.9 1.8 virginica129 6.4 2.8 5.6 2.1 virginica130 7.2 3.0 5.8 1.6 virginica131 7.4 2.8 6.1 1.9 virginica132 7.9 3.8 6.4 2.0 virginica133 6.4 2.8 5.6 2.2 virginica134 6.3 2.8 5.1 1.5 virginica135 6.1 2.6 5.6 1.4 virginica136 7.7 3.0 6.1 2.3 virginica137 6.3 3.4 5.6 2.4 virginica138 6.4 3.1 5.5 1.8 virginica139 6.0 3.0 4.8 1.8 virginica140 6.9 3.1 5.4 2.1 virginica141 6.7 3.1 5.6 2.4 virginica142 6.9 3.1 5.1 2.3 virginica143 5.8 2.7 5.1 1.9 virginica144 6.8 3.2 5.9 2.3 virginica145 6.7 3.3 5.7 2.5 virginica146 6.7 3.0 5.2 2.3 virginica147 6.3 2.5 5.0 1.9 virginica148 6.5 3.0 5.2 2.0 virginica149 6.2 3.4 5.4 2.3 virginica150 5.9 3.0 5.1 1.8 virginica> head(iris)Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa> head(iris,3)Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa> tail(iris)Sepal.Length Sepal.Width Petal.Length Petal.Width Species145 6.7 3.3 5.7 2.5 virginica146 6.7 3.0 5.2 2.3 virginica147 6.3 2.5 5.0 1.9 virginica148 6.5 3.0 5.2 2.0 virginica149 6.2 3.4 5.4 2.3 virginica150 5.9 3.0 5.1 1.8 virginica
行列数都多的数据框可取前几行前几列查看
> iris[1:3,1:3]Sepal.Length Sepal.Width Petal.Length1 5.1 3.5 1.42 4.9 3.0 1.43 4.7 3.2 1.3
查看每一列的数据类型和具体内容
> str(df)'data.frame': 4 obs. of 4 variables:$ gene : chr "gene1" "gene2" "gene3" "gene4"$ CHANGE : chr "up" "up" "down" "down"$ score : num 12 23 50 2$ p.value: num 0.01 0.02 0.07 0.05> str(iris)'data.frame': 150 obs. of 5 variables:$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
去除含有缺失值的行
> df<-data.frame(X1 = LETTERS[1:5],X2 = 1:5)> df[2,2] <- NA> df[4,1] <- NA> dfX1 X21 A 12 B NA3 C 34 <NA> 45 E 5> na.omit(df)X1 X21 A 13 C 35 E 5
两个表格的链接
> test1 <- data.frame(name = c('jimmy','nicker','doodle'),+ blood_type = c("A","B","O"))> test1name blood_type1 jimmy A2 nicker B3 doodle O> test2 <- data.frame(name = c('doodle','jimmy','nicker','tony'),+ group = c("group1","group1","group2","group2"),+ vision = c(4.2,4.3,4.9,4.5))> test2name group vision1 doodle group1 4.22 jimmy group1 4.33 nicker group2 4.94 tony group2 4.5>> test3 <- data.frame(NAME = c('doodle','jimmy','lucy','nicker'),+ weight = c(140,145,110,138))> tmp =merge(test1,test2,by="name")> merge(test1,test3,by.x = "name",by.y = "NAME")name blood_type weight1 doodle O 1402 jimmy A 1453 nicker B 138
矩阵和列表
矩阵
> m <- matrix(1:9, nrow = 3)> colnames(m) <- c("a","b","c")> ma b c[1,] 1 4 7[2,] 2 5 8[3,] 3 6 9> m[2,]a b c2 5 8> m[,1][1] 1 2 3> m[2,3]c8> m[2:3,1:2]a b[1,] 2 5[2,] 3 6> ma b c[1,] 1 4 7[2,] 2 5 8[3,] 3 6 9> t(m)[,1] [,2] [,3]a 1 2 3b 4 5 6c 7 8 9> as.data.frame(m)a b c1 1 4 72 2 5 83 3 6 9
列表
> l <- list(m=matrix(1:9, nrow = 3),+ df=data.frame(gene = paste0("gene",1:3),+ sam = paste0("sample",1:3),+ exp = c(32,34,45)),+ x=c(1,3,5))> l$m[,1] [,2] [,3][1,] 1 4 7[2,] 2 5 8[3,] 3 6 9$dfgene sam exp1 gene1 sample1 322 gene2 sample2 343 gene3 sample3 45$x[1] 1 3 5
给元素命名
> scores = c(100,59,73,95,45)> names(scores) = c("jimmy","nicker","lucy","doodle","tony")> scoresjimmy nicker lucy doodle tony100 59 73 95 45> scores["jimmy"]jimmy100> scores[c("jimmy","nicker")]jimmy nicker100 59>> names(scores)[scores>60][1] "jimmy" "lucy" "doodle"
删除
> rm(l)> rm(df,m)> rm(list = ls())
