得益于曾老師的介紹引導,了解此包。了解一個包,先看包的說明書,包的用法都在里面。
maftools包說明書
1.安裝包,加載包
source("http://bioconductor.org/biocLite.R")
biocLite("maftools")
library(maftools)
安裝包時可能會提示缺少一些包,按照提示安裝一下即可。
2.讀取MAF文件
maf文件使用的是TCGA-KIRC,下載方法很多
options(stringsAsFactors = F)
laml = read.maf(maf = 'GDC/TCGA.KIRC.mutect.somatic.maf.gz')
3.概覽maf文件
plotmafSummary(maf = laml, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)
KIRC概覽
missense_mutation:錯義突變
frame_shift_del:移碼缺失突變
nonsense_mutation:無義突變
frame_shift_ins:移碼插入突變
splice_site:剪接位點
in_frame_ins:框內插入
in_frame_del:框內缺失
translation_start_site:轉錄起始位點
nonstop_mutation:終止密碼子突變
4.繪制瀑布圖
oncoplot(maf = laml, top = 30, fontSize = 12 ,showTumorSampleBarcodes = F )
oncoplot_top30_TCGA_KIRC.png
5.繪制箱線圖
箱線圖,顯示六種不同轉換的總體分布,并作為堆積條形圖顯示每個樣本中的轉換比例
laml.titv = titv(maf = laml, plot = FALSE, useSyn = TRUE)
plotTiTv(res = laml.titv)
箱線圖
6.分析相互關系圖
somaticInteractions(maf = laml, top = 25, pvalue = c(0.05, 0.1))
Rplot02.png
7.變異特征
第一步從變異矩陣,獲得變異堿基周圍臨近的堿基,比對的是hg38,官網的例子是hg19
laml.tnm = trinucleotideMatrix(maf = laml, ref_genome = 'G:/ref/hg38/hg38.fa', add = TRUE, useSyn = TRUE)
reading G:/ref/hg38/hg38.fa (this might take few minutes)..
#Extracting 5' and 3' adjacent bases..
#Extracting +/- 20bp around mutated bases for background C>T estimation..
#Estimating APOBEC enrichment scores..
#Performing one-way Fisher's test for APOBEC enrichment..
#APOBEC related mutations are enriched in 4.167% of samples (APOBEC enrichment score > 2 ; 14 of 336 samples)
#Creating mutation matrix..
#matrix of dimension 336x96
可視化APOBEC富集與非富集樣本的差異
plotApobecDiff(tnm = laml.tnm, maf = laml)
$results
Hugo_Symbol Enriched nonEnriched pval or ci.up ci.low adjPval
1: CD163L1 2 0 0.001616915 Inf 4.503567 Inf 1
2: CCDC54 2 1 0.004734561 50.87663 2.491826 3080.923605 1
3: CPXM1 2 1 0.004734561 50.87663 2.491826 3080.923605 1
4: DHRS7C 2 1 0.004734561 50.87663 2.491826 3080.923605 1
5: OPA1 2 1 0.004734561 50.87663 2.491826 3080.923605 1
3934: ZSWIM8 0 5 1.000000000 0.00000 0.000000 26.888784 1
3935: ZUFSP 0 3 1.000000000 0.00000 0.000000 58.517895 1
3936: ZYG11B 0 3 1.000000000 0.00000 0.000000 58.517895 1
3937: AKAP9 0 11 1.000000000 0.00000 0.000000 9.914261 1
3938: XIRP2 0 11 1.000000000 0.00000 0.000000 9.914261 1
$SampleSummary
Cohort SampleSize Mean Median
1: Enriched 14 37.786 34.0
2: nonEnriched 322 55.220 47.5
特征分析
library(NMF)
laml.sign = extractSignatures(mat = laml.tnm, nTry = 6, plotBestFitRes = FALSE)
Estimating best rank..
Error in (function (...) : All the runs produced an error:
-#1 [r=2] -> cannot open file 'E:/R-3.5.1/library/stringi/R/stringi.rdb': No such file or directory [in call to 'str_c']
-#2 [r=3] -> cannot open file 'E:/R-3.5.1/library/stringi/R/stringi.rdb': No such file or directory [in call to 'str_c']
-#3 [r=4] -> cannot open file 'E:/R-3.5.1/library/stringi/R/stringi.rdb': No such file or directory [in call to 'str_c']
-#4 [r=5] -> cannot open file 'E:/R-3.5.1/library/stringi/R/stringi.rdb': No such file or directory [in call to 'str_c']
-#5 [r=6] -> cannot open file 'E:/R-3.5.1/library/stringi/R/stringi.rdb': No such file or directory [in call to 'str_c']
In addition: Warning messages:
1: In str_c(names(x), "=") : restarting interrupted promise evaluation
2: In str_c(names(x), "=") : restarting interrupted promise evaluation
3: In str_c(names(x), "=") : restarting interrupted promise evaluation
4: In str_c(names(x), "=") : restarting interrupted promise evaluation
5: In str_c(names(x), "=") : restarting interrupted promise evaluation
出現了報錯
/stringi/R/stringi.rdb': No such file or directory,缺少這個文件
試著重新安裝這個包
install.packages("stringi")
重新運行
laml.sign = extractSignatures(mat = laml.tnm, nTry = 6, plotBestFitRes = FALSE)
Estimating best rank..
method seed rng metric rank sparseness.basis sparseness.coef rss evar silhouette.coef silhouette.basis
1: brunet random 4 KL 2 0.3294052 0.1553251 23710.81 0.5991749 1.0000000 1.0000000
2: brunet random 2 KL 3 0.3255095 0.2536579 23227.82 0.6073398 0.6526353 0.7024022
3: brunet random 1 KL 4 0.3956116 0.3326660 21245.44 0.6408514 0.4571158 0.5157059
4: brunet random 2 KL 5 0.4523101 0.3241846 20710.76 0.6498899 0.3969265 0.5686192
5: brunet random 3 KL 6 0.4780410 0.3445228 20325.10 0.6564095 0.3227373 0.4817975
residuals niter cpu cpu.all nrun cophenetic dispersion silhouette.consensus
1: 15156.92 1760 NA NA 10 0.9797910 0.8606186 0.9152344
2: 14753.96 2000 NA NA 10 0.7632717 0.3654457 0.3744583
3: 14369.35 2000 NA NA 10 0.7531582 0.4183581 0.3061800
4: 14032.47 2000 NA NA 10 0.7078448 0.4989598 0.2487093
5: 13708.91 2000 NA NA 10 0.6732816 0.5472385 0.2090123
Using 3 as a best-fit rank based on decreasing cophenetic correlation coefficient.
Comparing against experimentally validated 30 signatures.. (See http://cancer.sanger.ac.uk/cosmic/signatures for details.)
Found Signature_1 most similar to validated Signature_5. Aetiology: Unknown [cosine-similarity: 0.757]
Found Signature_2 most similar to validated Signature_3. Aetiology: defects in DNA-DSB repair by HR [cosine-similarity: 0.848]
Found Signature_3 most similar to validated Signature_1. Aetiology: spontaneous deamination of 5-methylcytosine [cosine-similarity: 0.876]
Rplot03.png
Rplot04.png