Monocle2包学习笔记(二):Classifying and Counting Cells - 图2


Classifying cells by type


  1. # 选择marker标记基因
  2. MYF5_id <- row.names(subset(fData(HSMM), gene_short_name == "MYF5"))
  3. ANPEP_id <- row.names(subset(fData(HSMM), gene_short_name == "ANPEP"))
  4. cth <- newCellTypeHierarchy()
  5. # 根据MYF5基因的表达标记Myoblast细胞
  6. cth <- addCellType(cth, "Myoblast", classify_func = function(x)
  7. { x[MYF5_id,] >= 1 })
  8. # 根据MYF5和ANPEP基因的表达标记Fibroblast细胞
  9. cth <- addCellType(cth, "Fibroblast", classify_func = function(x)
  10. { x[MYF5_id,] < 1 & x[ANPEP_id,] > 1 })
  11. # 使用classifyCells函数对细胞进行分类
  12. HSMM <- classifyCells(HSMM, cth, 0.1)
  13. # 查看分类的结果
  14. table(pData(HSMM)$CellType)
  15. # Fibroblast Myoblast Unknown
  16. # 56 85 121


  1. # 对细胞分类后的结果进行可视化
  2. pie <- ggplot(pData(HSMM),
  3. aes(x = factor(1), fill = factor(CellType))) + geom_bar(width = 1)
  4. pie + coord_polar(theta = "y") +
  5. theme(axis.title.x = element_blank(), axis.title.y = element_blank())

Monocle2包学习笔记(二):Classifying and Counting Cells - 图3

从上图可以看出,许多细胞被分类标记为“Unknown”类型。这很常见,主要是因为在大多数单细胞RNA-Seq实验中,mRNA的捕获率较低。如果一个细胞表达很少量的MYF5 mRNA,我们有可能无法捕获到它。当细胞不满足分类功能中指定的任何条件时,就会将其标记为“Unknown”类型。如果满足多个筛选条件,则会标记为“Ambiguous”类型。虽然我们可以预先排除此类细胞,但会丢弃很多数据。在这种情况下,我们将可能会损失一半以上的细胞!

Clustering cells without marker genes(不依赖marker基因对细胞进行聚类分群)


  1. disp_table <- dispersionTable(HSMM)
  2. # 筛选基因
  3. unsup_clustering_genes <- subset(disp_table, mean_expression >= 0.1)
  4. # 过滤基因
  5. HSMM <- setOrderingFilter(HSMM, unsup_clustering_genes$gene_id)
  6. plot_ordering_genes(HSMM)

Monocle2包学习笔记(二):Classifying and Counting Cells - 图4

plot_ordering_genes函数显示了基因表达的变异性(分散性)相对于整个细胞平均表达的分布。 其中,红线表示Monocle2基于此关系拟合的变异期望值,黑点标记的为用于后续聚类的基因,而其他基因显示为灰点。

  1. # HSMM@auxClusteringData[["tSNE"]]$variance_explained <- NULL
  2. plot_pc_variance_explained(HSMM, return_all = F) # norm_method='log'

Monocle2包学习笔记(二):Classifying and Counting Cells - 图5

  1. # 降维可视化
  2. # 使用tSNE方法进行降维
  3. HSMM <- reduceDimension(HSMM, max_components = 2, num_dim = 6,
  4. reduction_method = 'tSNE', verbose = T)
  5. # 使用clusterCells方法进行聚类
  6. HSMM <- clusterCells(HSMM, num_clusters = 2)
  7. # 使用plot_cell_clusters函数对细胞聚类后的结果进行可视化
  8. plot_cell_clusters(HSMM, 1, 2, color = "CellType",
  9. markers = c("MYF5", "ANPEP"))

Monocle2包学习笔记(二):Classifying and Counting Cells - 图6

  1. plot_cell_clusters(HSMM, 1, 2, color = "Media")

Monocle2包学习笔记(二):Classifying and Counting Cells - 图7


  1. HSMM <- reduceDimension(HSMM, max_components = 2, num_dim = 2,
  2. reduction_method = 'tSNE',
  3. residualModelFormulaStr = "~Media + num_genes_expressed",
  4. verbose = T)
  5. HSMM <- clusterCells(HSMM, num_clusters = 2)
  6. plot_cell_clusters(HSMM, 1, 2, color = "CellType")

Monocle2包学习笔记(二):Classifying and Counting Cells - 图8

  1. HSMM <- clusterCells(HSMM, num_clusters = 2)
  2. plot_cell_clusters(HSMM, 1, 2, color = "Cluster") +
  3. facet_wrap(~CellType)

Monocle2包学习笔记(二):Classifying and Counting Cells - 图9

Clustering cells using marker genes(依赖marker基因对细胞进行聚类分群)

First, we’ll select a different set of genes to use for clustering the cells. Before we just picked genes that were highly expressed and highly variable. Now, we’ll pick genes that co-vary with our markers. In a sense, we’ll be building a large list of genes to use as markers, so that even if a cell doesn’t have MYF5, it might be recognizable as a myoblast based on other genes.

marker_diff <- markerDiffTable(
            residualModelFormulaStr = "~Media + num_genes_expressed",
            cores = 1)


# 选择用于细胞聚类的基因
candidate_clustering_genes <- row.names(subset(marker_diff, qval < 0.01))
# 使用calculateMarkerSpecificity函数计算不同细胞类型特异的marker基因
marker_spec <- calculateMarkerSpecificity(HSMM[candidate_clustering_genes,], cth)

# 查看top marker基因
head(selectTopMarkers(marker_spec, 3))

这里,我们选择每种细胞类型top 500的marker基因用于后续的细胞聚类:

# 选择marker基因
semisup_clustering_genes <- unique(selectTopMarkers(marker_spec, 500)$gene_id)
# 基因过滤
HSMM <- setOrderingFilter(HSMM, semisup_clustering_genes)

Monocle2包学习笔记(二):Classifying and Counting Cells - 图10

plot_pc_variance_explained(HSMM, return_all = F)

Monocle2包学习笔记(二):Classifying and Counting Cells - 图11

# 细胞降维聚类可视化
HSMM <- reduceDimension(HSMM, max_components = 2, num_dim = 3,
  norm_method = 'log',
  reduction_method = 'tSNE',
  residualModelFormulaStr = "~Media + num_genes_expressed",
  verbose = T)
HSMM <- clusterCells(HSMM, num_clusters = 2)
plot_cell_clusters(HSMM, 1, 2, color = "CellType")

Monocle2包学习笔记(二):Classifying and Counting Cells - 图12

Imputing cell type 估算填充细胞类型


HSMM <- clusterCells(HSMM,
              cell_type_hierarchy = cth,
              num_clusters = 2,
              frequency_thresh = 0.1)
plot_cell_clusters(HSMM, 1, 2, color = "CellType",
    markers = c("MYF5", "ANPEP"))

Monocle2包学习笔记(二):Classifying and Counting Cells - 图13

# 分类结果可视化
pie <- ggplot(pData(HSMM), 
              aes(x = factor(1), 
              fill = factor(CellType))) +
       geom_bar(width = 1)

pie + coord_polar(theta = "y") +
      theme(axis.title.x = element_blank(), axis.title.y = element_blank())

Monocle2包学习笔记(二):Classifying and Counting Cells - 图14


Monocle2包学习笔记(二):Classifying and Counting Cells - 图15
Monocle2包学习笔记(二):Classifying and Counting Cells - 图16