估算双端测序的有效长度

使用picard软件的CollectInsertSizeMetrics功能进行估算

  1. ~/app/pichard/picard CollectInsertSizeMetrics \
  2. I=/zfsqd1/ST_LBI/PROJECT/P17Z10200N0166-2/bidui/normal/104.subjunc.bam \
  3. O=104_insert_size_metrics.txt \
  4. H=104_insert_size_histogram.pdf \
  5. M=0.5

然后计算片段的平均长度

如何通过readscount计算TPM? - 图1

然后计算有效长度

如何通过readscount计算TPM? - 图2

计算TPM

  1. countToTpm <- function(counts, effLen)
  2. {
  3. rate <- log(counts) - log(effLen)
  4. denom <- log(sum(exp(rate)))
  5. exp(rate - denom + log(1e6))
  6. }
  7. countToFpkm <- function(counts, effLen)
  8. {
  9. N <- sum(counts)
  10. exp( log(counts) + log(1e9) - log(effLen) - log(N) )
  11. }
  12. fpkmToTpm <- function(fpkm)
  13. {
  14. exp(log(fpkm) - log(sum(fpkm)) + log(1e6))
  15. }
  16. countToEffCounts <- function(counts, len, effLen)
  17. {
  18. counts * (len / effLen)
  19. }
  1. ################################################################################
  2. # An example
  3. ################################################################################
  4. cnts <- c(4250, 3300, 200, 1750, 50, 0)
  5. lens <- c(900, 1020, 2000, 770, 3000, 1777)
  6. countDf <- data.frame(count = cnts, length = lens)
  7. # assume a mean(FLD) = 203.7
  8. countDf$effLength <- countDf$length - 203.7 + 1
  9. countDf$tpm <- with(countDf, countToTpm(count, effLength))
  10. countDf$fpkm <- with(countDf, countToFpkm(count, effLength))
  11. with(countDf, all.equal(tpm, fpkmToTpm(fpkm)))
  12. countDf$effCounts <- with(countDf, countToEffCounts(count, length, effLength))

说在最后:根据原理,我想大家一定都注意到了,tpm的计算是需要计算插入长度的,而这个矫正在想要比较多个样本的时候,我认为反而不利,究其原因是因为多个不同样本的插入片段长度会有所区别,而我们将其引入tpm的计算得到的结果就是,即使定量的是同一个基因,也会因为测序的不同而导致表达量的不同?

参考文章:FPKM与TPM计算