Group(实验分组)和ids(探针注释)

  1. rm(list = ls())
  2. load(file = "step1output.Rdata")
  3. library(stringr)

第一种,有现成分组可用

  1. Group = pd$`disease state:ch1`

第二仲,自己生成

  1. Group = c(rep("RA",times=13),
  2. rep("control",times=9))
  3. Group = rep(c("RA","control"),times = c(13,9)

推荐:第三种,匹配关键词

  1. Group=ifelse(str_detect(pd$source_name_ch1,"control"),
  2. "control",
  3. "RA")

设置参考水平,指定levels,对照组在前,处理组在后

  1. Group = factor(Group,
  2. levels = c("control","RA"))
  3. Group

2.探针注释的获取————————-


#捷径

  1. find_anno(gpl_number)
  2. ids <- AnnoProbe::idmap('GPL570')

方法1

  1. #方法1 BioconductorR包(最常用)
  2. gpl_number
  3. #http://www.bio-info-trainee.com/1399.html
  4. if(!require(hgu133plus2.db))BiocManager::install("hgu133plus2.db")
  5. library(hgu133plus2.db)
  6. ls("package:hgu133plus2.db")
  7. ids <- toTable(hgu133plus2SYMBOL)
  8. head(ids)

方法2

  1. # 方法2 读取GPL平台的soft文件,按列取子集
  2. ##https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL570
  3. if(F){
  4. #注:soft文件列名不统一,活学活用,有的表格里没有symbol列,也有的GPL平台没有提供注释
  5. a = getGEO(gpl_number,destdir = ".")
  6. b = a@dataTable@table
  7. colnames(b)
  8. ids2 = b[,c("ID","Gene Symbol")]
  9. colnames(ids2) = c("probe_id","symbol")
  10. ids2 = ids2[ids2$symbol!="" & !str_detect(ids2$symbol,"///"),]
  11. }

方法3 官网下载注释文件并读取


##http://www.affymetrix.com/support/technical/byproduct.affx?product=hg-u133-plus
# 方法4 自主注释
#https://mp.weixin.qq.com/s/mrtjpN8yDKUdCSvSUuUwcA
保存

  1. save(exp,Group,ids,gse_number,file = "step2output.Rdata")