免疫组库重叠分析

免疫组库重叠(Repertoire overlap)是度量不同样本之间TCR或BCR库相似性的最常用方法。它是通过计算在给定的Repertoire之间共享的克隆型(也称为“公共”克隆型)的特定统计数据来实现的。immunarch提供了几个指标: - 公共克隆型的数量 ( .method = "public") - 一个重叠相似性的经典度量指标。

  • overlap coefficient,重叠系数 ( .method = "overlap") - 重叠相似性的标准化度量。它被定义为交集的大小除以两个集合中的较小者。
  • Jaccard index,Jaccard指数 ( .method = "jaccard") - 它衡量有限样本集之间的相似度,定义为交集的大小除以样本集并集的大小。
  • Tversky index,Tversky指数 ( .method = "tversky") - 一种集合上的非对称相似性度量,它将变体与原型进行比较。如果使用默认参数,则类似于 Dice 的系数。
  • cosine similarity,余弦相似度 ( .method = "cosine") - 两个非零向量之间相似度的度量
  • Morisita’s overlap index,Morisita重叠指数 ( .method = "morisita") - 一种用于计算个体在总体中的分散程度的统计量度。它用于比较样本之间的重叠。
  • incremental overlap,增量重叠 - 计算N个最丰富的克隆型与增量增长的N的重叠(.method = "inc+METHOD"例如,"inc+public""inc+morisita")。

我们可以使用repOverlap函数计算不同样本之间Repertoire的重叠情况。同样,我们可以将分析结果传递给vis()函数,以完成所有输出结果的可视化展示。

  1. library(immunarch) # Load the package into R
  2. data(immdata) # Load the test dataset
  3. # 使用不同的度量方法计算Repertoire的重叠情况
  4. imm_ov1 <- repOverlap(immdata$data, .method = "public", .verbose = F)
  5. head(imm_ov1)
  6. # A2-i129 A2-i131 A2-i133 A2-i132 A4-i191 A4-i192 MS1 MS2 MS3 MS4 MS5 MS6
  7. #A2-i129 NA 63 74 69 46 55 30 41 24 35 44 54
  8. #A2-i131 63 NA 56 81 42 64 34 31 33 33 23 49
  9. #A2-i133 74 56 NA 87 49 61 41 44 31 31 44 65
  10. #A2-i132 69 81 87 NA 62 67 47 46 50 48 60 76
  11. #A4-i191 46 42 49 62 NA 55 42 34 41 29 37 49
  12. #A4-i192 55 64 61 67 55 NA 56 37 27 37 56 61
  13. imm_ov2 <- repOverlap(immdata$data, .method = "morisita", .verbose = F)
  14. head(imm_ov2)
  15. # A2-i129 A2-i131 A2-i133 A2-i132 A4-i191 A4-i192
  16. #A2-i129 NA 0.0024642881 0.0011511984 0.0044505612 0.0005804524 0.0024253356
  17. #A2-i131 0.0024642881 NA 0.0011475178 0.0088347844 0.0006212924 0.0019547325
  18. #A2-i133 0.0011511984 0.0011475178 NA 0.0043090343 0.0004456898 0.0016076124
  19. #A2-i132 0.0044505612 0.0088347844 0.0043090343 NA 0.0009178361 0.0023583418
  20. #A4-i191 0.0005804524 0.0006212924 0.0004456898 0.0009178361 NA 0.0005006889
  21. #A4-i192 0.0024253356 0.0019547325 0.0016076124 0.0023583418 0.0005006889 NA
  22. # MS1 MS2 MS3 MS4 MS5 MS6
  23. #A2-i129 0.0003009428 0.0011482287 0.0001797280 0.0014031280 0.0007196454 0.0027679140
  24. #A2-i131 0.0001927309 0.0014283644 0.0002328510 0.0021404462 0.0002198598 0.0034297172
  25. #A2-i133 0.0002194163 0.0018138252 0.0001618185 0.0007751521 0.0002272166 0.0017382456
  26. #A2-i132 0.0004486568 0.0032894737 0.0005874910 0.0073378655 0.0008173229 0.0106015902
  27. #A4-i191 0.0007469433 0.0002730513 0.0001892369 0.0004114056 0.0003530021 0.0008469919
  28. #A4-i192 0.0002977945 0.0007443917 0.0001358868 0.0016537104 0.0003422629 0.0544339382
  29. p1 <- vis(imm_ov1)
  30. p2 <- vis(imm_ov2, .text.size = 2)
  31. p1 + p2

使用immunarch包进行单细胞免疫组库数据分析(五):Repertoire overlap and public clonotypes - 图1

  1. vis(imm_ov1, "heatmap2")

使用immunarch包进行单细胞免疫组库数据分析(五):Repertoire overlap and public clonotypes - 图2

我们可以设置更改有效数字的位数:

  1. p1 <- vis ( imm_ov2 , .text.size = 2.5 , .signif.digits = 1 )
  2. p2 <- vis ( imm_ov2 , .text.size = 2 , .signif.digits = 2 )
  3. p1 + p2

使用immunarch包进行单细胞免疫组库数据分析(五):Repertoire overlap and public clonotypes - 图3

我们还可以使用repOverlapAnalysis函数对计算得到的重叠指标进行分析。

  1. # Apply different analysis algorithms to the matrix of public clonotypes:
  2. # "mds" - Multi-dimensional Scaling
  3. # MDS降维
  4. repOverlapAnalysis(imm_ov1, "mds")
  5. ## Standard deviations (1, .., p=4):
  6. ## [1] 0 0 0 0
  7. ##
  8. ## Rotation (n x k) = (12 x 2):
  9. ## [,1] [,2]
  10. ## A2-i129 -18.7767715 -18.360817
  11. ## A2-i131 29.9586985 -7.870441
  12. ## A2-i133 28.1148594 22.629093
  13. ## A2-i132 -44.3435640 6.221812
  14. ## A4-i191 13.8586515 7.452149
  15. ## A4-i192 -14.0065477 27.068830
  16. ## MS1 -8.8469009 -8.151574
  17. ## MS2 -0.9712073 -1.297017
  18. ## MS3 -10.4398629 4.894354
  19. ## MS4 0.5131505 10.471309
  20. ## MS5 18.5153823 -12.628029
  21. ## MS6 6.4241122 -30.429669
  1. # "tsne" - t-Stochastic Neighbor Embedding
  2. # TSNE降维
  3. repOverlapAnalysis(imm_ov1, "tsne")
  4. ## DimI DimII
  5. ## A2-i129 141.757274 -1.875981
  6. ## A2-i131 -336.372028 177.099380
  7. ## A2-i133 82.395447 -42.878936
  8. ## A2-i132 -11.731661 -41.878646
  9. ## A4-i191 38.681498 -109.060994
  10. ## A4-i192 169.797839 36.512757
  11. ## MS1 116.225222 42.689482
  12. ## MS2 3.659358 -70.354712
  13. ## MS3 139.036548 18.337403
  14. ## MS4 21.703642 -81.842537
  15. ## MS5 -320.584081 165.745570
  16. ## MS6 -44.569057 -92.492786
  17. ## attr(,"class")
  18. ## [1] "immunr_tsne" "matrix"
  1. # Visualise the results
  2. # MDS降维可视化
  3. repOverlapAnalysis(imm_ov1, "mds") %>% vis()

使用immunarch包进行单细胞免疫组库数据分析(五):Repertoire overlap and public clonotypes - 图4

  1. # Visualise the results
  2. # TSNE降维可视化
  3. repOverlapAnalysis(imm_ov1, "tsne") %>% vis()

使用immunarch包进行单细胞免疫组库数据分析(五):Repertoire overlap and public clonotypes - 图5

  1. # Clusterise the MDS resulting components using K-means
  2. repOverlapAnalysis(imm_ov1, "mds+kmeans") %>% vis()

使用immunarch包进行单细胞免疫组库数据分析(五):Repertoire overlap and public clonotypes - 图6

构建公共克隆型库

为了从repertoires列表中构建一个包含所有clonotypes的大型公共克隆型库,我们可以使用pubRep该函数。

  1. # Pass "nt" as the second parameter to build the public repertoire table using CDR3 nucleotide sequences
  2. # 使用CDR3区域的核苷酸序列计算构建公共克隆型库
  3. pr.nt <- pubRep(immdata$data, "nt", .verbose = F)
  4. pr.nt
  5. ## CDR3.nt Samples A2-i129
  6. ## 1: TGCGCCAGCAGCTTGGAAGAGACCCAGTACTTC 8 1
  7. ## 2: TGTGCCAGCAGCTTCCAAGAGACCCAGTACTTC 7 NA
  8. ## 3: TGTGCCAGCAGTTACCAAGAGACCCAGTACTTC 7 1
  9. ## 4: TGCGCCAGCAGCTTCCAAGAGACCCAGTACTTC 6 2
  10. ## 5: TGTGCCAGCAGCCAAGAGACCCAGTACTTC 6 4
  11. ## ---
  12. ## 75101: TGTGCTTCACAACTCTTATTGGACGAGACCCAGTACTTC 1 NA
  13. ## 75102: TGTGCTTCACAAGCCCTACAGGGCACTTTCCATAATTCACCCCTCCACTTT 1 NA
  14. ## 75103: TGTGCTTCAGGGCGGGCCTACGAGCAGTACTTC 1 NA
  15. ## 75104: TGTGCTTCCGCCGGACCGGACCGGGAGACCCAGTACTTC 1 NA
  16. ## 75105: TGTGCTTGCGGGACAGATAACTATGGCTACACCTTC 1 NA
  17. ## A2-i131 A2-i133 A2-i132 A4-i191 A4-i192 MS1 MS2 MS3 MS4 MS5 MS6
  18. ## 1: NA 1 1 NA 1 NA NA 1 1 1 1
  19. ## 2: 1 1 2 1 NA 1 NA NA 2 NA 1
  20. ## 3: 1 1 NA 1 1 1 NA 2 NA NA NA
  21. ## 4: NA 1 1 NA NA NA 1 NA 1 NA 1
  22. ## 5: 2 NA 2 3 1 NA NA NA NA 4 NA
  23. ## ---
  24. ## 75101: 1 NA NA NA NA NA NA NA NA NA NA
  25. ## 75102: NA NA NA NA NA NA NA NA NA 1 NA
  26. ## 75103: NA NA NA NA NA 1 NA NA NA NA NA
  27. ## 75104: NA 1 NA NA NA NA NA NA NA NA NA
  28. ## 75105: NA NA NA NA 1 NA NA NA NA NA NA
  1. # Pass "aa+v" as the second parameter to build the public repertoire table using CDR3 aminoacid sequences and V alleles
  2. # In order to use only CDR3 aminoacid sequences, just pass "aa"
  3. # 使用CDR3区域的氨基酸序列和V等位基因序列计算构建公共克隆型库
  4. pr.aav <- pubRep(immdata$data, "aa+v", .verbose = F)
  5. pr.aav
  6. ## CDR3.aa V.name Samples A2-i129 A2-i131 A2-i133 A2-i132
  7. ## 1: CASSLEETQYF TRBV5-1 8 1 NA 2 1
  8. ## 2: CASSDSSGGANEQFF TRBV6-4 6 1 1 2 NA
  9. ## 3: CASSFQETQYF TRBV5-1 6 3 NA 1 1
  10. ## 4: CASSLGETQYF TRBV12-4 6 2 NA NA 4
  11. ## 5: CASSDSGGSYNEQFF TRBV6-4 5 NA NA NA 3
  12. ## ---
  13. ## 74440: CTSSRPTQGAYEQYF TRBV7-2 1 NA NA NA NA
  14. ## 74441: CTSSSRAGAGTDTQYF TRBV7-2 1 NA NA NA NA
  15. ## 74442: CTSSYPGLAGLKRKETQYF TRBV7-2 1 NA NA NA 1
  16. ## 74443: CTSSYRQRPYQETQYF TRBV7-2 1 NA NA NA NA
  17. ## 74444: CTSSYSTSGVGQFF TRBV7-2 1 NA NA NA NA
  18. ## A4-i191 A4-i192 MS1 MS2 MS3 MS4 MS5 MS6
  19. ## 1: NA 2 NA NA 1 1 1 1
  20. ## 2: 3 NA NA NA 2 NA NA 12
  21. ## 3: NA NA NA 1 NA 1 NA 1
  22. ## 4: 3 NA 1 NA NA NA 2 1
  23. ## 5: NA 1 1 NA 1 NA NA 1
  24. ## ---
  25. ## 74440: NA NA NA NA NA NA NA 1
  26. ## 74441: NA NA NA NA 1 NA NA NA
  27. ## 74442: NA NA NA NA NA NA NA NA
  28. ## 74443: NA NA NA NA 1 NA NA NA
  29. ## 74444: NA NA NA NA NA 1 NA NA
  1. # You can also pass the ".coding" parameter to filter out all noncoding sequences first:
  2. # 也可以通过设置“.coding=T”参数,事先过滤掉所有非编码序列:
  3. pr.aav.cod <- pubRep(immdata$data, "aa+v", .coding = T)
  1. # Create a public repertoire with coding-only sequences using both CDR3 amino acid sequences and V genes
  2. # 使用 CDR3 氨基酸序列和 V 基因创建一个仅包含编码序列的公共库
  3. pr <- pubRep(immdata$data, "aa+v", .coding = T, .verbose = F)
  4. head(pr)
  5. # CDR3.aa V.name Samples A2-i129 A2-i131 A2-i133 A2-i132 A4-i191 A4-i192 MS1 MS2 MS3
  6. #1: CASSLEETQYF TRBV5-1 8 1 NA 2 1 NA 2 NA NA 1
  7. #2: CASSDSSGGANEQFF TRBV6-4 6 1 1 2 NA 3 NA NA NA 2
  8. #3: CASSFQETQYF TRBV5-1 6 3 NA 1 1 NA NA NA 1 NA
  9. #4: CASSLGETQYF TRBV12-4 6 2 NA NA 4 3 NA 1 NA NA
  10. #5: CASSDSGGSYNEQFF TRBV6-4 5 NA NA NA 3 NA 1 1 NA 1
  11. #6: CASSDSSGSTDTQYF TRBV6-4 5 NA NA NA 4 1 1 NA NA 1
  12. # MS4 MS5 MS6
  13. #1: 1 1 1
  14. #2: NA NA 12
  15. #3: 1 NA 1
  16. #4: NA 2 1
  17. #5: NA NA 1
  18. #6: NA NA 2
  19. # Apply the filter subroutine to leave clonotypes presented only in healthy individuals
  20. # 应用过滤子程序,过滤出只出现在健康个体中克隆型
  21. pr1 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "C"))
  22. # Apply the filter subroutine to leave clonotypes presented only in diseased individuals
  23. # 应用过滤子程序,过滤出只出现在患病个体中克隆型
  24. pr2 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "MS"))
  25. # Divide one by another
  26. pr3 <- pubRepApply(pr1, pr2)
  27. # Plot it
  28. p <- ggplot() +
  29. geom_jitter(aes(x = "Treatment", y = Result), data = pr3)
  30. p

使用immunarch包进行单细胞免疫组库数据分析(五):Repertoire overlap and public clonotypes - 图7

参考来源:https://immunarch.com/articles/web_only/v4_overlap.html