TPM、RPKM與FPKM相互轉(zhuǎn)換

RNA 數(shù)據(jù)下載后,如果處理成read counts matrix的話,是一定要進(jìn)行基于基因長(zhǎng)度的標(biāo)準(zhǔn)化的(TMP,RPKM,TPKM等)。目前最常用的是TPM,網(wǎng)上已經(jīng)有很多關(guān)于這三個(gè)標(biāo)準(zhǔn)的計(jì)算方法了,在此不贅述,主要說(shuō)一下這幾個(gè)數(shù)據(jù)的計(jì)算公式和相互轉(zhuǎn)換。

前提知識(shí)點(diǎn)

RPKM, FPKM, TPM區(qū)別www.plob.org

計(jì)算公式

  • FPKM、RPKM

Reads Per Kilobase of exon model per Million mapped reads (每千個(gè)堿基的轉(zhuǎn)錄每百萬(wàn)映射讀取的reads)

FPKM : Fragments Per Kilobase of exon model per Million mapped fragments(每千個(gè)堿基的轉(zhuǎn)錄每百萬(wàn)映射讀取的fragments)。與RPKM計(jì)算過(guò)程類(lèi)似。只有一點(diǎn)差異:RPKM計(jì)算的是reads,F(xiàn)PKM計(jì)算的是fragments。single-end/paired-end測(cè)序數(shù)據(jù)均可計(jì)算reads count,fragments count只能通過(guò)paired-end測(cè)序數(shù)據(jù)計(jì)算。paired-end測(cè)序數(shù)據(jù)時(shí),兩端的reads比對(duì)到相同區(qū)域,且方向相反,即計(jì)數(shù)1個(gè)fragments;如果只有單端reads比對(duì)到該區(qū)域,則一個(gè)reads即計(jì)數(shù)1個(gè)fragments。所以fragments count接近且小于2 * reads count

參考:http://www.lxweimin.com/p/c25e84383ae3

  • TPM
    Transcripts Per Kilobase of exon model per Million mapped reads (每千個(gè)堿基的轉(zhuǎn)錄每百萬(wàn)映射讀取的Transcripts)

i為比對(duì)到第i個(gè)exon的reads數(shù); Li為第i個(gè)exon的長(zhǎng)度;sum()為所有 (n個(gè))exon按長(zhǎng)度進(jìn)行標(biāo)準(zhǔn)化之后數(shù)值的和

  • CPM

RPM/CPM: Reads/Counts of exon model per Million mapped reads (每百萬(wàn)映射讀取的reads),多進(jìn)行樣本間比較,無(wú)法進(jìn)行樣本內(nèi)差異表達(dá)分析

相互轉(zhuǎn)換代碼

countToTpm <- function(counts, effLen)
{
    rate <- log(counts) - log(effLen)
    denom <- log(sum(exp(rate)))
    exp(rate - denom + log(1e6))
}

countToFpkm <- function(counts, effLen)
{
    N <- sum(counts)
    exp( log(counts) + log(1e9) - log(effLen) - log(N) )
}

fpkmToTpm <- function(fpkm)
{
    exp(log(fpkm) - log(sum(fpkm)) + log(1e6))
}

################################################################################
# An example
################################################################################
# count convert
cnts <- c(4250, 3300, 200, 1750, 50, 0)
lens <- c(900, 1020, 2000, 770, 3000, 1777)
countDf <- data.frame(count = cnts, length = lens)

## assume a mean(FLD) = 203.7
#countDf$effLength <- countDf$length - 203.7 + 1
countDf$effLength=countDf$length
countDf$tpm <- with(countDf, countToTpm(count, effLength))
countDf$fpkm <- with(countDf, countToFpkm(count, effLength))

countDf (INPUT FORMAT)

本來(lái)還有一個(gè)effect length 的計(jì)算,即校正實(shí)驗(yàn)誤差后的序列長(zhǎng)度,同時(shí)由effect length 產(chǎn)生effect counts,為了方便理解,此處把原始數(shù)據(jù)當(dāng)成effect并進(jìn)行后續(xù)計(jì)算,詳細(xì)見(jiàn)下方英文文章說(shuō)明

結(jié)果輸出

#fpkmToTpm
expMatrix<-read.table("fpkm_expr.txt",header = T,row.names = 1)
tpms <- apply(expMatrix,2,fpkmToTpm)
tpms[1:3,]
#最后可以根據(jù)TPM的特征進(jìn)行檢查,看每列加和是否一致
colSums(tpms)

fpkm_expr.txt:行為基因,列為樣本,中間數(shù)值是FPKM計(jì)算得到的值

轉(zhuǎn)換后的TPM

超全英文版參考資料:

https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/haroldpimentel.wordpress.com
轉(zhuǎn)自https://zhuanlan.zhihu.com/p/150300801

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。