WGCNA简介

https://cloud.tencent.com/developer/article/1936465

hdWGCNA(scWGCNA)

单细胞WGCNA
github: https://github.com/smorabit/hdWGCNA
目前并非稳定版本!(2022-5-7)

安装

  1. # install BiocManager
  2. install.packages("BiocManager")
  3. # install Bioconductor core packages
  4. BiocManager::install()
  5. # install additional packages:
  6. install.packages(c("Seurat", "WGCNA", "igraph", "devtools"))
  7. devtools::install_github('smorabit/hdWGCNA', ref='dev')
  8. # test
  9. library(hdWGCNA)

测试数据

  1. # 作者提供测试数据
  2. wget https://swaruplab.bio.uci.edu/public_data/Zhou_2020.rds
  3. # pbmc3k测试数据(使用该数据)
  4. wget https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz

输入数据要求

输入数据必须经过seurat或scanpy流程分析,以seurat为例子,输入数据必须经过以下分析
image.png

plotting and data science packages

library(tidyverse) library(cowplot) library(patchwork)

co-expression network analysis packages:

library(WGCNA) library(hdWGCNA)

using the cowplot theme for ggplot

theme_set(theme_cowplot())

set random seed for reproducibility

set.seed(12345)

  1. <a name="QQfsL"></a>
  2. ### 读入数据
  3. <a name="aV2Qc"></a>
  4. #### 基本分析
  5. ```r
  6. pbmc.data <- Read10X(data.dir = "./data/pbmc3k")
  7. pbmc <- CreateSeuratObject(counts = pbmc.data,project = "pbmc3k",min.cells = 3,min.features = 200)
  8. pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc,pattern = "^MT-")
  9. pbmc <- subset(pbmc,subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
  10. pbmc <- NormalizeData(pbmc,normalization.method = "LogNormalize",scale.factor = 10000)
  11. pbmc <- FindVariableFeatures(pbmc,selection.method = "vst",nfeatures = 2000)
  12. pbmc <- ScaleData(pbmc, vars.to.regress = "percent.mt")
  13. pbmc <- ScaleData(pbmc,features = VariableFeatures(pbmc))
  14. pbmc <- RunPCA(pbmc)
  15. pbmc <- FindNeighbors(pbmc,dims = 1:13)
  16. pbmc <- FindClusters(pbmc,resolution = 0.5)
  17. pbmc <- RunTSNE(pbmc,dims = 1:13)
  18. pbmc <- RunUMAP(pbmc,dims = 1:13)
  19. new.cluster.ids <- c("Naive CD4 T", "CD14+ Mono", "Memory CD4 T", "B", "CD8 T", "FCGR3A+ Mono",
  20. "NK", "DC", "Platelet")
  21. names(new.cluster.ids) <- levels(pbmc)
  22. pbmc <- RenameIdents(pbmc, new.cluster.ids)
  23. pbmc@meta.data$cell_type <- Idents(pbmc)

测试数据

  1. p <- DimPlot(seurat_obj, group.by='cell_type', label=TRUE) +
  2. umap_theme() + ggtitle('Zhou et al Control Cortex') + NoLegend()
  3. p

image.png

为WGCNA设定seurat对象

hdWGCNA的结果储存在seurat对象的@miscslot中,一个seurat对象可以存在多个hdWGCNA experiment
使用SetupForWGCNA 设定seurat对象, 此步骤可以设定 scWGNCA experiment的名称和选择使用哪些符合条件的基因,主要有三种方法可以筛选基因,通过gene_select 选项指定,wgcna_name指定名称

  • variable: 使用seurat确定的VariableFeatures
  • fraction: use genes that are expressed in a certain fraction of cells for in the whole dataset or in each group of cells, specified by group.by.根据基因在多少比例的cell中表达的阈值筛选基因
  • custom: 自行指定gene列表
    1. pbmc <- SetupForWGCNA(
    2. pbmc,
    3. gene_select = "fraction", # the gene selection approach
    4. fraction = 0.05, # fraction of cells that a gene needs to be expressed in order to be included
    5. wgcna_name = "tutorial" # the name of the hdWGCNA experiment
    6. )
    查看一下@misc
    1. pbmc@misc # list
    2. #out
    3. $active_wgcna
    4. 'tutorial'
    5. $tutorial
    6. $wgcna_group
    7. 'all'
    8. $wgcna_genes
    9. 'AL627309.1''LINC01409'...

    构建metacell

    设定了seurat对象后,第一步是构建metacell, 所有metacell是来自相同生物样品的一小组相似细胞的聚类,scWGCNA使用了KNN算法识别相似的cell, 将cell聚类后再计算基因的平均表达量,得到metacell的基因表达矩阵,大大降低了原始矩阵的稀疏性
    使用MetacellsByGroups构建metacell, 该函数创建了一个新的 Seurat 对象 ```r

    construct metacells in each group

    pbmc <- MetacellsByGroups( seurat_obj = pbmc, group.by = “cell_type”, # specify the columns in seurat_obj@meta.data to group by k = 25, # nearest-neighbors parameter ident.group = ‘cell_type’ # set the Idents of the metacell seurat object )

normalize metacell expression matrix:

pbmc <- NormalizeMetacells(pbmc)

  1. `seurat_obj`: seurat对象<br />`**group.by**`**: **确定将在哪些组中构建metacell,可以指定多个,名称要和`seurat_obj@meta.data`一致<br />`k`: KNN算法的k值, 一般设定在25 - 75之间, 本例有40,039 cell,每个生物样本中的细胞数从 890 - 8188 不等<br />`ident.group`: Idents of the metacell seurat object
  2. <a name="i0FVQ"></a>
  3. #### 对构建好的metacell进行标准流程分析:可选
  4. 使用 `GetMetacellObject`可以提取metacell对象
  5. ```r
  6. metacell_obj <- GetMetacellObject(pbmc)
  7. ## 查看cell数量
  8. metacell_obj
  9. An object of class Seurat
  10. 36601 features across 21058 samples within 1 assay
  11. Active assay: RNA (36601 features, 0 variable features)
  12. pbmc
  13. An object of class Seurat
  14. 36601 features across 36671 samples within 1 assay
  15. Active assay: RNA (36601 features, 0 variable features)
  16. 2 dimensional reductions calculated: harmony, umap

hdWGCNA封装了部分标准流程的步骤可直接调用,支持%>%

  1. pbmc <- pbmc %>%
  2. NormalizeMetacells() %>%
  3. ScaleMetacells(features=VariableFeatures(seurat_obj)) %>%
  4. RunPCAMetacells(features=VariableFeatures(seurat_obj)) %>%
  5. RunHarmonyMetacells(group.by.vars='Sample') %>%
  6. RunUMAPMetacells(reduction='harmony', dims=1:15)
  7. p1 <- DimPlotMetacells(pbmc, group.by='cell_type') + umap_theme + ggtitle("Cell Type")
  8. p2 <- DimPlotMetacells(pbmc, group.by='Sample') + umap_theme + ggtitle("Sample")
  9. p1 | p2

基因共表达网络分析

对数据集中的抑制性神经元 (INH) 细胞执行基因共表达网络分析

设置表达矩阵

使用SetDatExpr()设置表达矩阵,该函数包括了提取子集和转置矩阵的功能, 关键参数如下

  • seurat_obj: seurat对象名称
  • group_name:需要分析哪部分cell(需要分析的cell的分组名),名称应在group.by指定的范围内
  • use_metacells: 是否使用metacell
  • group.by: 根据seurat@metada哪一列分组 ```r pbmc <- SetDatExpr( pbmc, group_name = “NK”, # the name of the group of interest in the group.by column group.by=’cell_type’ # the metadata column containing the cell type info. This same column should have also been used in MetacellsByGroups )

— 选择多个分组 —————-

pbmc <- SetDatExpr( pbmc, group_name = c(“CD8 T”, “B”), group.by=’cell_type’ )

  1. <a name="kz58Y"></a>
  2. #### 选择软阈值
  3. 使用`TestSoftPowers()`函数确定软阈值,`PlotSoftPowers()`可视化结果, 如果前面没有进行`SetDatExpr`,可以再此步骤执行,参数和`SetDatExpr()`函数一致,WGCNA 和 hdWGCNA 的一般模型拟合大于或等于 0.8时的最低软阈值。本例给出的是7
  4. ```r
  5. # Test different soft powers:
  6. pbmc <- TestSoftPowers(
  7. pbmc,
  8. setDatExpr = FALSE, # set this to FALSE since we did this above
  9. )
  10. # 可视化软阈值结果
  11. plot_list <- PlotSoftPowers(pbmc)
  12. # assemble with patchwork
  13. wrap_plots(plot_list, ncol=2)

image.png
使用[GetPowerTable](../reference/GetPowerTable.html)()可以获取具体的软阈值表格

  1. power_table <- GetPowerTable(pbmc)
  2. head(power_table, 3)
  3. ###
  4. Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
  5. 1 1 0.192964167 10.1742363 0.9508966 5887.9290 5901.7852 6501.0283
  6. 2 2 0.001621749 0.4801145 0.9828730 3092.6379 3092.7878 3835.2202
  7. 3 3 0.150812707 -3.2345892 0.9918393 1657.6198 1646.6169 2349.6967

构建共表达网络

使用ConstructNetwork()构建基因共表达网络

  1. # construct co-expression network:
  2. pbmc <- ConstructNetwork(
  3. pbmc,
  4. soft_power=12,
  5. setDatExpr=FALSE
  6. )

使用PlotDendrogram()可视化

  1. PlotDendrogram(pbmc, main='pbmc3k hdWGCNA')

image.png

模块特征基因(MEs)

相应模块的表达矩阵的第一主成分,代表了该模块内基因表达的整体水平。因此,模块特征基因并非某个具体的基因,而是一种“特征组合”
使用ModuleEigengenes()计算模块特征基因, 可能会报错

  1. Error in data.use %*% cell.embeddings : non-conformable arguments
  2. https://github.com/immunogenomics/harmony/issues/86
  3. 增加选项project.dim = F 解决
  1. # compute all MEs in the full single-cell dataset
  2. pbmc <- ModuleEigengenes(
  3. pbmc,
  4. group.by.vars="cell_type"
  5. )

使用GetMEs()可以得到模块特征基因的表达矩阵,默认使用harmonized矫正,即得到hMEs

  1. # harmonized module eigengenes:
  2. hMEs <- GetMEs(pbmc)
  3. turquoise blue brown grey yellow
  4. AAACATACAACCAC-1 4.885751 0.7222126 -2.035966 -4.5174804 -2.0320305
  5. AAACATTGAGCTAC-1 2.331174 -0.5567514 3.371447 0.4035175 -0.7455446
  6. # module eigengenes: 未矫正的数据
  7. MEs <- GetMEs(pbmc, harmonized=FALSE)

模块间基因连通度计算

在共表达网络分析中,需要得到每个模块内高连接度的基因。使用ModuleConnectivity()确定每个基因的基于特征基因的连接性,也称为kME
hdWGCNA includes the ModuleConnectivity to compute the kME values in the full single-cell dataset, rather than the metacell dataset. This function essentially computes pairwise correlations between genes and module eigengenes. kME can be computed for all cells in the dataset, but we recommend computing kME in the cell type or group that was previously used to run ConstructNetwork

  1. # compute eigengene-based connectivity (kME):
  2. seurat_obj <- ModuleConnectivity(
  3. seurat_obj,
  4. group.by = 'cell_type', group_name = 'NK'
  5. )

使用GetModules()获取每个基因的kME信息

  1. modules <- GetModules(pbmc)

image.png

计算gene signature score

使用ModuleExprScore计算每个模块的gene signature score, 支持UCellSeurat的算法

  1. # compute gene scoring for the top 25 hub genes by kME for each module
  2. # with Seurat method
  3. pbmc <- ModuleExprScore(
  4. pbmc,
  5. n_genes = 25,
  6. method='Seurat'
  7. )
  8. # compute gene scoring for the top 25 hub genes by kME for each module
  9. # with UCell method
  10. library(UCell) # R version >= 4.1
  11. pbmc <- ModuleExprScore(
  12. pbmc,
  13. n_genes = 25,
  14. method='UCell'
  15. )
  16. saveRDS(pbmc, file='./data/pbmc3kwgcna.rds')

模块间相关性

  1. library(corrplot)
  2. ModuleCorrelogram(pbmc)

image.png

  1. # use GGally to investigate 6 selected modules:
  2. GGally::ggpairs(GetMEs(pbmc)[,c(1:3,12:15)])

image.png