天堂最新,台湾十八成人网,成人一对一视频

0.腫瘤純度

Tumour purity is the proportion of cancer cells in the admixture. 出自https://www.nature.com/articles/ncomms9971#MOESM1235

最近在一篇文獻中看到了腫瘤純度，當作背景知識補充一下。

1.方法介紹

ESTIMATE算法，可以根據表達數據估計腫瘤樣本的基質分數(stromal score )和免疫分數(immune score)，用于代表基質和免疫細胞的存在。兩個分數相加即得到estimate score，可用于估計腫瘤純度。

該算法于2013年發表在NC上:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3826632/

2015年又有一篇NC:https://www.nature.com/articles/ncomms9971#MOESM1235。用4種方法計算了TCGA樣本的腫瘤純度，其中就有ESTIMATE。

如果僅僅需要腫瘤純度，15年的這篇文章附件里有計算好的結果哦。我是為了學習根據普通轉錄組count如何計算出腫瘤純度。

2.問題

但作者的提供的幫助文檔里只有芯片數據計算方式，而未提到轉錄組數據如何處理。我探索了一下發現，是可以計算的，作者在estimate網站上也提供了部分TCGA project的計算結果。

3.搜索ing

一番搜索找到曾老板寫的帖子，可謂一站式找齊了：

關于算法：https://mp.weixin.qq.com/s/LiL_TZiJztUClz86a-aHWQ，介紹了算法的基本原理和方法。

關于R包的用法：https://mp.weixin.qq.com/s/JTD8ZmH2YYCIqcbs-97JzA，介紹了芯片數據如何得到三個score和腫瘤純度

轉錄組數據計算：https://mp.weixin.qq.com/s/UehaaJZgARryH7P25v9wNQ，介紹了轉錄組數據如何得到三個score和腫瘤純度

4.操練起來

library(utils)
rforge <- "http://r-forge.r-project.org"
if(!require("estimate"))install.packages("estimate", repos=rforge, dependencies=TRUE)
library(estimate)
#help(package="estimate")

4.1 從count數據計算estimate score

找了TCGA的ACC count數據作為示例數據。如果你想要我的示例數據，請在生信星球公眾號后臺回復“est766”。?用自己的count數據也可以噢。?

load("exprSet.Rdata")
exprSet[1:3,1:3]

##         TCGA-OR-A5JP-01A TCGA-OR-A5JG-01A TCGA-OR-A5K1-01A
## MT-RNR2           810396          1190259          1206077
## MT-CO1            579888          1298037          1400198
## MT-ND4            623896           768059          1050890

這是曾老板寫的函數，轉錄組數據與芯片數據計算過程不同的地方是platform是illumina。

dat=log2(edgeR::cpm(exprSet)+1)
library(estimate)
estimate <- function(dat,pro){
  input.f=paste0(pro,'_estimate_input.txt')
  output.f=paste0(pro,'_estimate_gene.gct')
  output.ds=paste0(pro,'_estimate_score.gct')
  write.table(dat,file = input.f,sep = '\t',quote = F)
  library(estimate)
  filterCommonGenes(input.f=input.f,
                    output.f=output.f ,
                    id="GeneSymbol")
  estimateScore(input.ds = output.f,
                output.ds=output.ds,
                platform="illumina")   ## 注意platform
  scores=read.table(output.ds,skip = 2,header = T)
  rownames(scores)=scores[,1]
  scores=t(scores[,3:ncol(scores)])
  return(scores)
}
pro='ACC'
scores=estimate(dat,pro)

## [1] "Merged dataset includes 10221 genes (191 mismatched)."
## [1] "1 gene set: StromalSignature  overlap= 139"
## [1] "2 gene set: ImmuneSignature  overlap= 141"

head(scores)

##                  StromalScore ImmuneScore ESTIMATEScore
## TCGA.OR.A5JP.01A    -773.8226  -1143.9749    -1917.7975
## TCGA.OR.A5JG.01A    -878.7773   -685.5286    -1564.3059
## TCGA.OR.A5K1.01A    -663.8511   -360.2218    -1024.0729
## TCGA.OR.A5JR.01A    -931.0601   -344.3306    -1275.3907
## TCGA.OR.A5KU.01A    -925.6045  -1222.4672    -2148.0717
## TCGA.OR.A5L9.01A    -247.1255    404.5509      157.4254

4.2 發現輸出結果里沒有TumorPurity列

affy芯片輸出結果是有這一列的。

我對比了一下15年的那篇NC的方法部分，他們計算使用的是 level 3 RNA-seq profiles (RNAseqV2 normalized RSEM)數據，用estimate包計算了scores，用13年NC文章中的公式計算了腫瘤純度。

公式是：

Tumour purity=cos (0.6049872018+0.0001467884 × ESTIMATE score)

不要忘了R語言是個好計算器

TumorPurity = cos(0.6049872018+0.0001467884 * scores[,3])
head(TumorPurity)

## TCGA.OR.A5JP.01A TCGA.OR.A5JG.01A TCGA.OR.A5K1.01A TCGA.OR.A5JR.01A 
##        0.9481360        0.9303738        0.8984081        0.9139941 
## TCGA.OR.A5KU.01A TCGA.OR.A5L9.01A 
##        0.9583367        0.8091481

4.3 拿原文的rsem數據來計算

15年的文章給出了計算結果，我復現一下他的計算。RNAseqV2 normalized RSEM 數據不好找，我是從firehouse瀏覽器找到的，并進行了一些整理，讓它變成了規范的表達矩陣。

load("exprSet2.Rdata")
exprSet2[1:3,1:3]

##       TCGA-OR-A5J1-01A TCGA-OR-A5J2-01A TCGA-OR-A5J3-01A
## A1BG           16.3305           9.5987          20.7377
## A1CF            0.0000           0.0000           0.5925
## A2BP1          17.2911           5.6368           8.8876

dat2=log2(exprSet2+1)
scores2=estimate(dat2,pro)

## [1] "Merged dataset includes 10412 genes (0 mismatched)."
## [1] "1 gene set: StromalSignature  overlap= 141"
## [1] "2 gene set: ImmuneSignature  overlap= 141"

head(scores2)

##                  StromalScore ImmuneScore ESTIMATEScore
## TCGA.OR.A5J1.01A   -1161.6834  -524.37956    -1686.0629
## TCGA.OR.A5J2.01A    -569.1191  -765.96407    -1335.0831
## TCGA.OR.A5J3.01A   -1295.4628 -1070.18777    -2365.6506
## TCGA.OR.A5J5.01A   -1710.5108  -918.61256    -2629.1234
## TCGA.OR.A5J6.01A    -730.8294    64.28074     -666.5487
## TCGA.OR.A5J7.01A   -1191.5230 -1013.01517    -2204.5382

TumorPurity2 = cos(0.6049872018+0.0001467884 * scores2[,3])
head(TumorPurity2)

## TCGA.OR.A5J1.01A TCGA.OR.A5J2.01A TCGA.OR.A5J3.01A TCGA.OR.A5J5.01A 
##        0.9367771        0.9175140        0.9669692        0.9761016 
## TCGA.OR.A5J6.01A TCGA.OR.A5J7.01A 
##        0.8741344        0.9606713

我把這個計算結果與15年的NC做了比較，一毛不差，開心。

4.4 比較cpm和rsem結果

我把兩個數據處理得到的結果組成一個表格來比較一下：

TumorPurity2 = TumorPurity2[match(names(TumorPurity),names(TumorPurity2))]
compare = cbind(TumorPurity,TumorPurity2)
round(compare,4)[1:10,]

##                  TumorPurity TumorPurity2
## TCGA.OR.A5JP.01A      0.9481       0.9493
## TCGA.OR.A5JG.01A      0.9304       0.9344
## TCGA.OR.A5K1.01A      0.8984       0.9003
## TCGA.OR.A5JR.01A      0.9140       0.9133
## TCGA.OR.A5KU.01A      0.9583       0.9603
## TCGA.OR.A5L9.01A      0.8091       0.8106
## TCGA.OR.A5JQ.01A      0.8303       0.8306
## TCGA.OR.A5K4.01A      0.9829       0.9852
## TCGA.OR.A5JL.01A      0.9416       0.9434
## TCGA.OR.A5LS.01A      0.9748       0.9774

相差無幾咯。非常完美的結果。

5.一點爭議

illumina輸出結果不帶有Tumorpurity列，這是包自身的設置。

在biostars上面看到一個討論，有人認為estimate score 計算腫瘤純度的公式是根據Affymetrix的芯片數據得出的，是專門針對芯片數據使用，因此不可以用于轉錄組。建議只計算出estimate score，用這個分數來代替腫瘤純度的絕對數值用于后續分析。

原帖討論見：https://www.biostars.org/p/279853/

然而NC 15年就已經發了這篇文章，五年來沒人反對，可以認為人家做的是可用的，用就是了唄。

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

estimate 算法計算腫瘤純度

estimate 算法計算腫瘤純度

0.腫瘤純度

1.方法介紹

2.問題

3.搜索ing

4.操練起來

4.1 從count數據計算estimate score

4.2 發現輸出結果里沒有TumorPurity列

4.3 拿原文的rsem數據來計算

4.4 比較cpm和rsem結果

5.一點爭議

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

estimate 算法計算腫瘤純度

0.腫瘤純度

1.方法介紹

2.問題

3.搜索ing

4.操練起來

4.1 從count數據計算estimate score

4.2 發現輸出結果里沒有TumorPurity列

4.3 拿原文的rsem數據來計算

4.4 比較cpm和rsem結果

5.一點爭議

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频