获取更多R语言知识,请关注公众号:医学和生信笔记

    医学和生信笔记 公众号主要分享:1.医学小知识、肛肠科小知识;2.R语言和Python相关的数据分析、可视化、机器学习等;3.生物信息学学习资料和自己的学习笔记!

    这个包很方便下载TCGA的各种数据,而且是最新的,唯一的障碍是网络问题。记录以下这个包的使用方法,以下代码摘录自网络。

    1. # if (!requireNamespace("BiocManager", quietly=TRUE))
    2. # install.packages("BiocManager")
    3. # BiocManager::install("TCGAbiolinks")
    4. library(TCGAbiolinks)
    5. library(dplyr)
    6. library(DT)
    7. library(SummarizedExperiment)
    8. getGDCprojects()
    9. #下载临床数据
    10. clinical <- GDCquery_clinic(project = "TCGA-COAD", type = "clinical")
    11. write.csv(clinical,file = "TCGA-COAD-clinical.csv")
    12. save(clinical,file = "TCGA-COAD-clinical.RData")
    13. #下载rna-seq的counts数据
    14. query <- GDCquery(project = "TCGA-COAD",
    15. data.category = "Transcriptome Profiling",
    16. data.type = "Gene Expression Quantification",
    17. workflow.type = "HTSeq - Counts")
    18. #save(query, file = "query_mrnaCounts.RData")
    19. GDCdownload(query, method = "api", files.per.chunk = 50)
    20. expdat <- GDCprepare(query = query)
    21. count_matrix <- assay(expdat)
    22. write.csv(count_matrix,file = "TCGA-COAD-Counts.csv")
    23. save(count_matrix,file = "expdat_mrna.RData")
    24. #下载miRNA数据
    25. query <- GDCquery(project = "TCGA-COAD",
    26. data.category = "Transcriptome Profiling",
    27. data.type = "miRNA Expression Quantification",
    28. workflow.type = "BCGSC miRNA Profiling")
    29. GDCdownload(query, method = "api", files.per.chunk = 50)
    30. expdat_mirna <- GDCprepare(query = query)
    31. write.csv(expdat_mirna,file = "TCGA-COAD-miRNA.csv")
    32. save(expdat_mirna,file = "expdat_mirna.RData")
    33. #下载Copy Number Variation数据
    34. query <- GDCquery(project = "TCGA-COAD",
    35. data.category = "Copy Number Variation",
    36. data.type = "Copy Number Segment")
    37. GDCdownload(query, method = "api", files.per.chunk = 50)
    38. expdat <- GDCprepare(query = query)
    39. save(expdat,file = "TCGA-COAD-Copy-Number-Variation.RData")
    40. write.csv(expdat,file = "TCGA-COAD-Copy-Number-Variation.csv")
    41. #下载Copy Number Variation GISTIC2数据
    42. query <- GDCquery(project = "TCGA-COAD",
    43. data.category = "Copy Number Variation",
    44. data.type = "Gene Level Copy Number Scores",
    45. access="open")
    46. GDCdownload(query, method = "api")
    47. GISTIC_cnv <- GDCprepare(query)
    48. save(GISTIC_cnv,file = "TCGA-COAD-GISTIC-cnv.RData")
    49. #下载甲基化数据,非常大,50多G
    50. query.met <- GDCquery(project = "TCGA-COAD",
    51. #legacy = TRUE,
    52. data.category = "DNA Methylation")
    53. GDCdownload(query.met, method = "api", files.per.chunk = 300)
    54. expdat <- GDCprepare(query = query)
    55. count_matrix=assay(expdat)
    56. write.csv(count_matrix,file = "TCGA-COAD-methylation.csv")
    57. # 下载SNV数据
    58. acc.maf <- GDCquery_Maf("COAD", pipelines = "muse")
    59. save(acc.maf,file = "TCGA-COAD-acc.maf.RData")

    获取更多R语言知识,请关注公众号:医学和生信笔记

    医学和生信笔记 公众号主要分享:1.医学小知识、肛肠科小知识;2.R语言和Python相关的数据分析、可视化、机器学习等;3.生物信息学学习资料和自己的学习笔记!