R包——maftools 可視化神器

背景介紹

隨著癌癥基因組學(xué)的進(jìn)步,突變注釋格式(MAF)被廣泛接受并用于存儲(chǔ)檢測(cè)到的體細(xì)胞變體。 癌癥基因組圖譜項(xiàng)目對(duì)30多種不同的癌癥進(jìn)行了測(cè)序,每種癌癥類型的樣本量超過(guò)200種。由體細(xì)胞變體組成的結(jié)果數(shù)據(jù)以MAF格式形式存儲(chǔ)。 只要數(shù)據(jù)采用MAF格式,該軟件包就會(huì)嘗試從TCGA源或任何內(nèi)部研究中有效地匯總,分析,注釋和可視化MAF文件.

準(zhǔn)備

使用前要先將文件轉(zhuǎn)換為maf格式,對(duì)于VCF格式文件,可以使用vcf2maf進(jìn)行格式轉(zhuǎn)換.

maf文件包含的內(nèi)容:

  • Mandatory fields(必填項(xiàng)): Hugo_Symbol, Chromosome, Start_Position, End_Position, Reference_Allele, Tumor_Seq_Allele2, Variant_Classification, Variant_Type and Tumor_Sample_Barcode.

  • Recommended optional fields(選填項(xiàng)): non MAF specific fields containing VAF (Variant Allele Frequecy) and amino acid change information.

格式轉(zhuǎn)換

#將突變結(jié)果進(jìn)行注釋,得到txt文件
for i in *.somatic.anno;do perl ~/software/Desktop/annovar/table_annovar.pl $sra_file /home/yang.zou/database/humandb_new/ -buildver hg19 -out variants --otherinfo -remove -protocol ensGene -operation g -nastring NA -outfile;done
 
#然后將所有.hg19_multianno.txt文件添加一列填入文件名前綴并將所有txt文件拼接成一個(gè)文件,提取出含有外顯子的信息
for i in *.hg19_multianno.txt;do sed '1d' $i | sed "s/$/${i%%.*}/" >> all_annovar;done 
grep -P "\texonic\t" all_annovar > all_annovar2
 
#格式轉(zhuǎn)換
perl to-maftools.pl all_annovar2         #將文件轉(zhuǎn)換為maf格式 
  #to-maftools.pl
    use strict;
    use warnings;
     
    open (FA,"all_annovar2");
    open (FB,">all_annovar3");
     
    print FB "Chr\tStart\tEnd\tRef\tAlt\tFunc.ensGene\tGene.ensGene\tGeneDetail.ensGene\tExonicFunc.ensGene\tAAChange.ensGene\tTumor_Sample_Barcode\n";
    while (<FA>){
            chomp;
            my @l=split /\t/,$_;
            print FB $l[0],"\t",$l[1],"\t",$l[2],"\t",$l[3],"\t",$l[4],"\t",$l[5],"\t",$l[6],"\t",$l[7],"\t",$l[8],"\t",$l[9],"\t",$l[10],"\n";
    }

總體分析框架

總體框架

maftools安裝

source("http://bioconductor.org/biocLite.R")
biocLite("maftools")
 
library(maftools)

注:安裝過(guò)程特別麻煩,按了好幾天,R版本要求3.3以上,也不要使用最新版本,可能有的包新版本還沒(méi)同步。我使用的是:

version.string R version 3.4.1 (2017-06-30)

正式處理

讀入annovar文件轉(zhuǎn)換為maf——annovarToMaf

#read maf
var.annovar.maf = annovarToMaf(annovar = "all_annovar3", Center = 'NA', refBuild = 'hg19', tsbCol = 'Tumor_Sample_Barcode', table = 'ensGene',sep = "\t")
write.table(x=var.annovar.maf,file="var_annovar_maf",quote= F,sep="\t",row.names=F)

annovarToMaf函數(shù)說(shuō)明

Description

Converts variant annotations from Annovar into a basic MAF.將annovar格式轉(zhuǎn)換為maf格式

Usage

annovarToMaf(annovar, Center = NULL, refBuild = "hg19", tsbCol = NULL,
  table = "refGene", basename = NULL, sep = "\t", MAFobj = FALSE,
  sampleAnno = NULL)

Arguments

| 參數(shù) |詳細(xì)解釋 |
| annovar| input annovar annotation file.|
| Center | Center field in MAF file will be filled with this value. Default NA.(MAF文件中的中心字段將填充此值。 默認(rèn)NA)|
| refBuild | NCBI_Build field in MAF file will be filled with this value. Default hg19.(MAF文件中的NCBI_Build字段將填充此值。 默認(rèn)hg19)|
| tsbCol | column name containing Tumor_Sample_Barcode or sample names in input file.(列名包含Tumor_Sample_Barcode或輸入文件中的示例名稱) |
| table | reference table used for gene-based annotations. Can be 'ensGene' or 'refGene'. Default 'refGene'(用于基于基因的注釋的參考表。 可以是'ensGene'或'refGene'。 默認(rèn)'refGene)|
| basename | If provided writes resulting MAF file to an output file. (將結(jié)果MAF文件寫入輸出文件)|
| sep | field seperator for input file. Default tab seperated.|
| MAFobj | If TRUE, returns results as an [MAF](http://127.0.0.1:37698/help/library/maftools/help/MAF object.|
| sampleAnno | annotations associated with each sample/Tumor_Sample_Barcode in input annovar file. If provided it will be included in MAF object. Could be a text file or a data.frame. Ideally annotation would contain clinical data, survival information and other necessary features associated with samples. Default NULL.(與輸入annovar文件中的每個(gè)樣本/ Tumor_Sample_Barcode相關(guān)聯(lián)的注釋。 如果提供,它將包含在MAF對(duì)象中。 可以是文本文件或data.frame。 理想情況下,注釋將包含臨床數(shù)據(jù),生存信息和與樣本相關(guān)的其他必要特征。 默認(rèn)為NULL)|

然后用linux處理掉那些無(wú)義突變,也可以在后續(xù)設(shè)置參數(shù)去掉無(wú)義突變

sed 's/^NA/unknown/' var_annovar_maf > var_annovar_maf2
grep -v "^NA" var_annovar_maf | grep -v -P "\tUNKNOWN\t"> var_annovar_maf2

讀入maf文件——read.maf

var_maf = read.maf(maf ="var_annovar_maf2")
plotmafSummary(maf = var_maf, rmOutlier = TRUE, addStat = 'median')
oncoplot(maf = var_maf, top = 400, writeMatrix=T,removeNonMutated = F,showTumorSampleBarcodes=T)

Description

Takes tab delimited MAF (can be plain text or gz compressed) file as an input and summarizes it in various ways. Also creates oncomatrix - helpful for visualization.

Usage

read.maf(maf, clinicalData = NULL, removeDuplicatedVariants = TRUE,
  useAll = TRUE, gisticAllLesionsFile = NULL, gisticAmpGenesFile = NULL,
  gisticDelGenesFile = NULL, gisticScoresFile = NULL, cnLevel = "all",
  cnTable = NULL, isTCGA = FALSE, vc_nonSyn = NULL, verbose = TRUE)

Arguments

  • maf
    tab delimited MAF file. File can also be gz compressed. Required. Alternatively, you can also provide already read MAF file as a dataframe.(制表符分隔的MAF文件。 文件也可以是gz壓縮的。 也可以將已讀取的MAF文件作為數(shù)據(jù)框提供
  • clinicalData
    Clinical data associated with each sample/Tumor_Sample_Barcode in MAF. Could be a text file or a data.frame. Default NULL.(與MAF中每個(gè)樣本/ Tumor_Sample_Barcode相關(guān)的臨床數(shù)據(jù)。 可以是文本文件或data.frame。 默認(rèn)為NULL)

  • removeDuplicatedVariants
    removes repeated variants in a particuar sample, mapped to multiple transcripts of same Gene. See Description. Default TRUE.(去除特定樣本中的重復(fù)變體,映射到同一基因的多個(gè)轉(zhuǎn)錄本。 見(jiàn)說(shuō)明。 默認(rèn)值為TRUE)

  • useAll
    logical. Whether to use all variants irrespective of values in Mutation_Status. Defaults to TRUE. If FALSE, only uses with values Somatic.(邏輯。 是否使用所有變體而不管Mutation_Status中的值。 默認(rèn)為TRUE。 如果為FALSE,則僅使用值Somatic)

  • gisticAllLesionsFile
    All Lesions file generated by gistic. e.g; all_lesions.conf_XX.txt, where XX is the confidence level. Default NULL.

  • gisticAmpGenesFile
    Amplification Genes file generated by gistic. e.g; amp_genes.conf_XX.txt, where XX is the confidence level. Default NULL.(擴(kuò)增由gistic生成的基因文件。 例如; amp_genes.conf_XX.txt)

  • gisticDelGenesFile
    Deletion Genes file generated by gistic. e.g; del_genes.conf_XX.txt, where XX is the confidence level. Default NULL.(刪除由gistic生成的Genes文件。 例如; del_genes.conf_XX.txt)

  • gisticScoresFile
    scores.gistic file generated by gistic. Default NULL(由gistic生成的scores.gistic文件)

  • cnLevel
    level of CN changes to use. Can be 'all', 'deep' or 'shallow'. Default uses all i.e, genes with both 'shallow' or 'deep' CN changes(CN級(jí)別改變使用。 可以是“全部”,“深層”或“淺層”。 默認(rèn)使用所有,即具有“淺”或“深”CN變化的基因)

  • cnTable
    Custom copynumber data if gistic results are not available. Input file or a data.frame should contain three columns with gene name, Sample name and copy number status (either 'Amp' or 'Del'). Default NULL.(如果gistic結(jié)果不可用,則自定義copynumber數(shù)據(jù)。 輸入文件或data.frame應(yīng)包含三列,其中包含基因名稱,樣品名稱和拷貝編號(hào)狀態(tài)('Amp'或'Del')。 默認(rèn)為NULL)

  • isTCGA
    Is input MAF file from TCGA source. If TRUE uses only first 12 characters from Tumor_Sample_Barcode.(是來(lái)自TCGA源的輸入MAF文件。 如果TRUE僅使用Tumor_Sample_Barcode中的前12個(gè)字符。)

  • vc_nonSyn
    NULL. Provide manual list of variant classifications to be considered as non-synonymous. Rest will be considered as silent variants. Default uses Variant Classifications with High/Moderate variant consequences. http://asia.ensembl.org/Help/Glossary?id=535: "Frame_Shift_Del", "Frame_Shift_Ins", "Splice_Site", "Translation_Start_Site","Nonsense_Mutation", "Nonstop_Mutation", "In_Frame_Del","In_Frame_Ins", "Missense_Mutation"

  • verbose
    TRUE logical. Default to be talkative and prints summary.


繪制maf文件的摘要——plotmafSummary

該文件將每個(gè)樣本中的變體數(shù)顯示為堆積條形圖,將變體類型顯示為Variant_Classification匯總的箱形圖。 我們可以在堆積的條形圖中添加平均線或中線,以顯示整個(gè)群組中變體的平均值/中值數(shù)

plotmafSummary(maf = laml, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)

Description

Plots maf summary.

Usage

plotmafSummary(maf, file = NULL, rmOutlier = TRUE, dashboard = TRUE,
  titvRaw = TRUE, width = 10, height = 7, addStat = NULL,
  showBarcodes = FALSE, fs = 10, textSize = 2, color = NULL,
  statFontSize = 3, titleSize = c(10, 8), titvColor = NULL, top = 10)

Arguments

  • maf
    an MAF object generated by read.maf

  • file
    If given pdf file will be generated.

  • rmOutlier
    If TRUE removes outlier from boxplot.

  • dashboard
    If FALSE plots simple summary instead of dashboard style.

  • titvRaw
    TRUE. If false instead of raw counts, plots fraction.

  • width
    plot parameter for output file.

  • height
    plot parameter for output file.

  • addStat
    Can be either mean or median. Default NULL.

  • showBarcodes
    include sample names in the top bar plot.

  • fs
    base size for text. Default 10.

  • color
    named vector of colors for each Variant_Classification.

  • titvColor
    colors for SNV classifications.

  • top
    include top n genes dashboard plot. Default 10.


繪制瀑布圖——oncoplots

Oncoplot函數(shù)使用“ComplexHeatmap”來(lái)繪制oncoplots2。 具體來(lái)說(shuō),oncoplot是ComplexHeatmap的OncoPrint功能的包裝器,幾乎沒(méi)有任何修改和自動(dòng)化,使繪圖更容易。 側(cè)面條形圖和頂部條形圖可分別由drawRowBar和drawColBar參數(shù)控制。

top的值需要視情況而定

oncoplot(maf = var_maf, top = 400, writeMatrix=T,removeNonMutated = F,showTumorSampleBarcodes=T)

takes output generated by read.maf and draws an oncoplot

Usage

oncoplot(maf, top = 20, genes = NULL, mutsig = NULL, mutsigQval = 0.1,
  drawRowBar = TRUE, drawColBar = TRUE, clinicalFeatures = NULL,
  annotationDat = NULL, annotationColor = NULL, genesToIgnore = NULL,
  showTumorSampleBarcodes = FALSE, removeNonMutated = TRUE, colors = NULL,
  sortByMutation = FALSE, sortByAnnotation = FALSE,
  annotationOrder = NULL, keepGeneOrder = FALSE, GeneOrderSort = TRUE,
  sampleOrder = NULL, writeMatrix = FALSE, fontSize = 10,
  SampleNamefontSize = 10, titleFontSize = 15, legendFontSize = 12,
  annotationFontSize = 12, annotationTitleFontSize = 12)

Arguments

  • maf
    an MAF object generated by read.maf|
  • top
    how many top genes to be drawn. defaults to 20.(顯示多少基因)|
  • genes
    Just draw oncoplot for these genes. Default NULL.(顯示特定基因)|
  • mutsig
    Mutsig resuts if availbale. Usually file named sig_genes.txt If provided plots significant genes and correpsonding Q-values as side row-bar. Default NULL.(如果可以的話,Mutsig會(huì)重新開(kāi)始。 通常名為sig_genes.txt的文件如果提供,則繪制重要基因并將Q值作為側(cè)行條對(duì)應(yīng)。 默認(rèn)為NULL) |
  • mutsigQval
    Q-value to choose significant genes from mutsig results. Default 0.1(從mutsig結(jié)果中選擇重要基因的Q值。 默認(rèn)值為0.1)|
  • genesToIgnore
    do not show these genes in Oncoplot. Default NULL.
  • showTumorSampleBarcodes
    logical to include sample names.(邏輯包含樣本名稱)
  • removeNonMutated
    Logical. If TRUE removes samples with no mutations in the oncoplot for better visualization. Default TRUE.(消除無(wú)義突變)
  • sortByMutation
    Force sort matrix according mutations. Helpful in case of MAF was read along with copy number data. Default FALSE.(根據(jù)突變強(qiáng)制排序矩陣。 在閱讀MAF的情況下有用以及拷貝數(shù)數(shù)據(jù)。 默認(rèn)為FALSE)|
  • sortByAnnotation
    logical sort oncomatrix (samples) by provided 'clinicalFeatures'. Sorts based on first 'clinicalFeatures'. Defaults to FALSE. column-sort|
  • annotationOrder
    Manually specify order for annotations. Works only for first 'clinicalFeatures'. Default NULL. |
  • keepGeneOrder
    logical whether to keep order of given genes. Default FALSE, order according to mutation frequency|
  • GeneOrderSort
    logical this is applicable when 'keepGeneOrder' is TRUE. Default TRUE|
  • sampleOrder
    Manually speify sample names for oncolplot ordering. Default NULL.|
  • writeMatrix
    writes character coded matrix used to generate the plot to an output file. This can be used as an input for ComplexHeatmap oncoPrint function if you wish to customize the plot.(將用于生成繪圖的字符編碼矩陣寫入輸出文件。 如果您想自定義繪圖,則可以將其用作ComplexHeatmap oncoPrint函數(shù)的輸入)

通過(guò)包括與樣本相關(guān)的注釋(臨床特征),改變變體分類的顏色并包括顯著性的q值(從MutSig或類似程序生成),可以進(jìn)一步改善Oncoplots。

col = RColorBrewer::brewer.pal(n = 8, name = 'Paired')
names(col) = c('Frame_Shift_Del','Missense_Mutation', 'Nonsense_Mutation', 'Multi_Hit', 'Frame_Shift_Ins','In_Frame_Ins', 'Splice_Site', 'In_Frame_Del')

#Color coding for FAB classification; try getAnnotations(x = laml) to see available annotations.
fabcolors = RColorBrewer::brewer.pal(n = 8,name = 'Spectral')
names(fabcolors) = c("M0", "M1", "M2", "M3", "M4", "M5", "M6", "M7")
fabcolors = list(FAB_classification = fabcolors)

#MutSig reusults
laml.mutsig <- system.file("extdata", "LAML_sig_genes.txt.gz", package = "maftools")

oncoplot(maf = laml, colors = col, mutsig = laml.mutsig, mutsigQval = 0.01, clinicalFeatures = 'FAB_classification', sortByAnnotation = TRUE, annotationColor = fabcolors)

[圖片上傳失敗...(image-fc6334-1536734778754)]

使用oncostrip函數(shù)可視化任何一組基因,它們?cè)诿總€(gè)樣本中繪制類似于cBioPortal上的OncoPrinter工具的突變。 oncostrip可用于使用top或gene參數(shù)繪制任意數(shù)量的基因

#顯示特定基因
oncostrip(maf = laml, genes = c('DNMT3A','NPM1', 'RUNX1'))

繪制箱線圖—— titv

titv函數(shù)將SNP分類為Transitions_vs_Transversions,并以各種方式返回匯總表的列表。 匯總數(shù)據(jù)也可以顯示為一個(gè)箱線圖,顯示六種不同轉(zhuǎn)換的總體分布,并作為堆積條形圖顯示每個(gè)樣本中的轉(zhuǎn)換比例。

image.png

Description

takes output generated by read.maf and classifies Single Nucleotide Variants into Transitions and Transversions.

Usage

titv(maf, useSyn = FALSE, plot = TRUE, file = NULL)

Arguments

  • maf
    an MAF object generated by read.maf

  • useSyn
    Logical. Whether to include synonymous variants in analysis. Defaults to FALSE.

  • plot
    plots a titv fractions. default TRUE.
  • file
    basename for output file name. If given writes summaries to output file. Default NULL.

#繪制棒棒圖——lollipopPlot

棒棒糖圖是簡(jiǎn)單且最有效的方式,顯示蛋白質(zhì)結(jié)構(gòu)上的突變點(diǎn)。許多致癌基因具有比任何其他基因座更頻繁突變的優(yōu)先位點(diǎn)。這些斑點(diǎn)被認(rèn)為是突變熱點(diǎn),棒棒糖圖可以用于顯示它們以及其他突變。我們可以使用函數(shù)lollipopPlot繪制這樣的圖。這個(gè)功能要求我們?cè)趍af文件中有氨基酸改變信息。然而,MAF文件沒(méi)有關(guān)于命名氨基酸變化字段的明確指南,不同的研究具有不同的氨基酸變化的字段(或列)名稱。默認(rèn)情況下,lollipopPlot查找列AAChange,如果在MAF文件中找不到它,則會(huì)打印所有可用字段并顯示警告消息。對(duì)于以下示例,MAF文件包含字段/列名稱“Protein_Change”下的氨基酸變化。我們將使用參數(shù)AACol手動(dòng)指定它。此函數(shù)還將繪圖作為ggplot對(duì)象返回,如果需要,用戶稍后可以修改該對(duì)象。


maftools還可以制作很多圖,比如







還可以用函數(shù)geneCloud繪制突變基因的詞云圖。 每個(gè)基因的大小與其突變/改變的樣品總數(shù)成比例。

癌癥中的許多引起疾病的基因共同發(fā)生或在其突變模式中顯示出強(qiáng)烈的排他性。 可以使用somaticInteractions函數(shù)檢測(cè)這種相互排斥或共同發(fā)生的基因組,其執(zhí)行成對(duì)的Fisher's Exact檢驗(yàn)以檢測(cè)這種顯著的基因?qū)Α?somaticInteractions函數(shù)還使用cometExactTest來(lái)識(shí)別涉及> 2個(gè)基因的潛在改變的基因集


maftools包 功能很強(qiáng)大,具體可參考:

http://bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

推薦閱讀更多精彩內(nèi)容

  • 轉(zhuǎn)眼六月也過(guò)完了,酷暑難耐,我有點(diǎn)懶懶的,群主的活計(jì)馬上也要交割了。想一想,還是有些話要說(shuō)的。 一、關(guān)于做群主的心...
    冠世墨玉yanzi閱讀 667評(píng)論 0 1
  • 說(shuō)明: 封裝自定義圓角方向并且可設(shè)置投影的View通過(guò)傳入不同的圓角方向以及圓角半徑來(lái)實(shí)現(xiàn) Demo地址https...
    CocoaJason閱讀 1,417評(píng)論 0 0