估算双端测序的有效长度
使用picard
软件的CollectInsertSizeMetrics
功能进行估算
~/app/pichard/picard CollectInsertSizeMetrics \
I=/zfsqd1/ST_LBI/PROJECT/P17Z10200N0166-2/bidui/normal/104.subjunc.bam \
O=104_insert_size_metrics.txt \
H=104_insert_size_histogram.pdf \
M=0.5
然后计算片段的平均长度
然后计算有效长度
计算TPM
countToTpm <- function(counts, effLen)
{
rate <- log(counts) - log(effLen)
denom <- log(sum(exp(rate)))
exp(rate - denom + log(1e6))
}
countToFpkm <- function(counts, effLen)
{
N <- sum(counts)
exp( log(counts) + log(1e9) - log(effLen) - log(N) )
}
fpkmToTpm <- function(fpkm)
{
exp(log(fpkm) - log(sum(fpkm)) + log(1e6))
}
countToEffCounts <- function(counts, len, effLen)
{
counts * (len / effLen)
}
################################################################################
# An example
################################################################################
cnts <- c(4250, 3300, 200, 1750, 50, 0)
lens <- c(900, 1020, 2000, 770, 3000, 1777)
countDf <- data.frame(count = cnts, length = lens)
# assume a mean(FLD) = 203.7
countDf$effLength <- countDf$length - 203.7 + 1
countDf$tpm <- with(countDf, countToTpm(count, effLength))
countDf$fpkm <- with(countDf, countToFpkm(count, effLength))
with(countDf, all.equal(tpm, fpkmToTpm(fpkm)))
countDf$effCounts <- with(countDf, countToEffCounts(count, length, effLength))
说在最后:根据原理,我想大家一定都注意到了,tpm的计算是需要计算插入长度的,而这个矫正在想要比较多个样本的时候,我认为反而不利,究其原因是因为多个不同样本的插入片段长度会有所区别,而我们将其引入tpm的计算得到的结果就是,即使定量的是同一个基因,也会因为测序的不同而导致表达量的不同?
参考文章:FPKM与TPM计算