作者：椰子糖
審稿：童蒙
編輯：amethyst

可變剪切能夠產生多種類型的mRNA，因此一個基因就可以產生多種不同的蛋白。這個過程極大的增加了mRNA和蛋白質的多樣性。可變剪切（alternative splicing）是一種后轉錄生物學過程，對細胞活動和疾病過程具有重要的且廣泛的影響。研究表明人的基因組中有超過90-95%的多外顯子基因存在可變剪切。到目前為止，也有很多軟件可對其進行檢測，今天我們就來了解一下這款常用可變剪切軟件rMATS的最新版詳情。

1. 軟件介紹

rMATS是檢測可變剪切事件的常用軟件之一，其可以從RNA測序數據中，檢測出多種類型的可變剪切事件，并提供了定量和組間差異分析的功能，可對生物學重復的樣本進行組間分析。2020年6月更新的4.1版中更是對軟件功能進行了完善：

1. 添加參數--task、--tmp等，以在不同的計算機上運行部分計算；
2. 添加參數--variable-read-length，能夠允許不同長度的長度的reads進行分析；
3. 添加參數--paired-stats，進行成對統計分析；
4. 添加參數--novelSS, --mil, --mel，以檢測新發可變剪切；
5. 輸出文件中用fromGTF.novelJunction 和 fromGTF.novelSpliceSite 代替 fromGTF.novelEvents；
6. 版本兼容了python2和python3；
7. 在僅一個樣本的組別或僅一個組別時，務必添加參數--statoff；
8. 修改了部分之前版本的bug。

軟件網頁鏈接：http://rnaseq-mats.sourceforge.net/

其檢測的可變檢測的事件類型如下：

2. 軟件安裝

rMATS turbo是rMATS的C/Cython版本。主要的差別在于速度和存儲資源上，相比較rMATS turbo要快100倍，輸出文件要小1000倍。具體可以參考文檔：https://github.com/Xinglab/rmats-turbo/blob/v4.1.0/README.md，因此我們安裝rMATS turbo。

安裝依賴：Python (either 2.7 or 3.6),BLAS,LAPACK,GNU Scientific Library,GCC,gfortran,CMake等。保證以上依賴均存在的情況下就可以進行安裝了。其實安裝好conda，這些基礎的包均已包括了。

conda create --name py2 python=2.7
conda activate py2
conda install -c bioconda rmats

安裝好以后就可以進行軟件測試啦。

3. 軟件使用及測試

參數說明:

python rmats.py -h
usage: rmats.py [options]
optional arguments:
 -h, --help            show this help message and exit
 --version             show program's version number and exit
  --gtf GTF             An annotation of genes and transcripts in GTF format
 --b1 B1               A text file containing a comma separated list of the
                        BAM files for sample_1. (Only if using BAM)
 --b2 B2               A text file containing a comma separated list of the
                        BAM files for sample_2. (Only if using BAM)
  --s1 S1               A text file containing a comma separated list of the
                        FASTQ files for sample_1. If using paired reads the
                        format is ":" to separate pairs and "," to separate
                        replicates. (Only if using fastq)
  --s2 S2               A text file containing a comma separated list of the
                        FASTQ files for sample_2. If using paired reads the
                        format is ":" to separate pairs and "," to separate
                        replicates. (Only if using fastq)
  --od OD               The directory for final output
  --tmp TMP             The directory for intermediate output such as ".rmats"
                        files from the prep step
  -t {paired,single}    Type of read used in the analysis: either "paired" for
                        paired-end data or "single" for single-end data.
                        Default: paired
  --libType {fr-unstranded,fr-firststrand,fr-secondstrand}
                        Library type. Use fr-firststrand or fr-secondstrand
                        for strand-specific data. Default: fr-unstranded
  --readLength READLENGTH
                        The length of each read
  --variable-read-length
                        Allow reads with lengths that differ from --readLength
                        to be processed. --readLength will still be used to
                        determine IncFormLen and SkipFormLen
  --anchorLength ANCHORLENGTH
                        The anchor length. Default is 1
  --tophatAnchor TOPHATANCHOR
                        The "anchor length" or "overhang length" used in the
                        aligner. At least "anchor length" NT must be mapped to
                        each end of a given junction. The default is 6. (Only
                        if using fastq)
  --bi BINDEX           The directory name of the STAR binary indices (name of
                        the directory that contains the SA file). (Only if
                        using fastq)
  --nthread NTHREAD     The number of threads. The optimal number of threads
                        should be equal to the number of CPU cores. Default: 1
  --tstat TSTAT         The number of threads for the statistical model.
                        Default: 1
  --cstat CSTAT         The cutoff splicing difference. The cutoff used in the
                        null hypothesis test for differential splicing. The
                        default is 0.0001 for 0.01% difference. Valid: 0 <=
                        cutoff < 1. Does not apply to the paired stats model
  --task {prep,post,both,inte}
                        Specify which step(s) of rMATS to run. Default: both.
                        prep: preprocess BAMs and generate a .rmats file.
                        post: load .rmats file(s) into memory, detect and
                        count alternative splicing events, and calculate P
                        value (if not --statoff). both: prep + post. inte
                        (integrity): check that the BAM filenames recorded by
                        the prep task(s) match the BAM filenames for the
                        current command line
  --statoff             Skip the statistical analysis
  --paired-stats        Use the paired stats model
  --novelSS             Enable detection of novel splice sites (unannotated
                        splice sites). Default is no detection of novel splice
                        sites
  --mil MIL             Minimum Intron Length. Only impacts --novelSS
                        behavior. Default: 50
  --mel MEL             Maximum Exon Length. Only impacts --novelSS behavior.
                        Default: 500

單個樣本運行時

將NA12878的bam文件的具體路徑寫入到/path/to/b1.txt文件中

condadir/envs/py2/bin/python condadir/envs/py2/rMATS/rmats.py --nthread 4 --b1 /path/to/b1.txt --gtf Homo_sapiens.hg19_ucsc.gtf --od NA12878 -t paired --readLength 101 --libType fr-unstranded --statoff

--b1 為bam文件的路徑，若有生物學重復則bam文件路徑用逗號隔開，為單比較組時，僅給b1或者給s1即可；
--gtf 為已知的基因及轉錄本的gtf文件；--od 即為輸出路徑；-t 測序類型為單端或者雙端 ;
--readLength 每條reads的長度，若長度不一致時，可使用--variable-read-length參數與readLength結合使用將reads截取到給定的數值；--libType 文庫類型，可選擇是否為鏈特異性；
--statoff 加上該參數則跳過統計部分，單樣本或者單比較組時，跳過統計步驟。

比較組運行時

##/path/to/b1.txt
/path/to/1_1.bam,/path/to/1_2.bam
##/path/to/b2.txt
/path/to/2_1.bam,/path/to/2_2.bam
python rmats.py --b1 /path/to/b1.txt --b2 /path/to/b2.txt --gtf /path/to/the.gtf -t paired --readLength 50 --nthread 4 --od /path/to/output --tmp /path/to/tmp_output --paired-stats

--b1 為組別1的bam文件的路徑，若有生物學重復則bam文件路徑用逗號隔開，為單比較組時，僅給b1或者給s1即可；
--b2 為組別2的bam文件的路徑，若有生物學重復則bam文件路徑用逗號隔開；
--gtf 為已知的基因及轉錄本的gtf文件；
--od 即為輸出路徑；
-t 測序類型為單端或者雙端 ;
--readLength 若長度不一致時，可使用該參數將reads截取到給定的數值；
--libType 文庫類型，可選擇是否為鏈特異性；
--tmp 暫存目錄；
--paired-stats 使用成對統計模型。

除了bam文件可作該軟件的輸入外，還可以使用fq文件做為輸入，使用-s1和-s2參數即可，同一樣本的雙端reads使用冒號分隔，生物學重復間使用逗號分隔。

4. 結果說明

每一種可變剪切事件有相關的一系列的輸出文件，每一種事件的相關文件以事件名作為前綴之一，以下文件中以[AS_Event]代替了[SE (skipped exon)，MXE (mutually exclusive exons)，A3SS (alternative 3' splice site)，A5SS (alternative 5' splice site)，RI (retained intron)] 中各事件：

[AS_Event].MATS.JC.txt：檢出的junction區域的reads數（Junction Counts）；
[AS_Event].MATS.JCEC.txt：檢出的junction區域的reads數（Junction Counts）和不跨越的外顯子上read數（Exon Counts），考慮已知可變剪切事件時，可重點參考這個文件；
fromGTF.[AS_Event].txt：從RNA和GTF中檢出的所有可變剪切事件；
fromGTF.novelJunction.[AS_Event].txt：僅使用RNA鑒定的可變剪切事件，與gtf的分析分離，其中并不包含未注釋的可變剪切位點；
fromGTF.novelSpliceSite.[AS_Event].txt：文件中僅包含未知的可變剪切位點的可變剪切事件，僅使用--novelSS參數時產生該文件；
JC.raw.input.[AS_Event].txt：[AS_Event].MATS.JC.txt文件的input raw文件；
JCEC.raw.input.[AS_Event].txt：[AS_Event].MATS.JCEC.txt文件的input raw文件。

01. 事件文件中共同的屬性列

ID：rMATS 事件的ID；
GeneID：Gene ID；
geneSymbol：Gene 名稱；
chr：染色體；
strand：基因的正負鏈情況；
IJC_SAMPLE_1：sample 1中包含剪切區域的reads數，生物學重復以逗號分隔；
SJC_SAMPLE_1：sample 1中不包含剪切區域的reads數，生物學重復以逗號分隔；
IJC_SAMPLE_2：sample 2中包含剪切區域的reads數，生物學重復以逗號分隔；
SJC_SAMPLE_2：sample 2中不包含剪切區域的reads數，生物學重復以逗號分隔；
IncFormLen：包含區域的長度，用于校正；
SkipFormLen：跳過區域的長度，用于校正；
PValue：兩個比較組可變剪切差異的顯著性（僅在使用statistical model時存在）；
FDR：由 p-value計算的錯誤發現率（僅在使用statistical model時存在）；
IncLevel1：由校正后reads數得到的sample 1的區域等級，生物學重復以逗號分隔；
IncLevel2：由校正后reads數得到的sample 2的區域等級，生物學重復以逗號分隔；
IncLevelDifference：average(IncLevel1) - average(IncLevel2)。

02. 事件文件中特異的屬性列

SE：exonStart_0base，exonEnd，upstreamES，upstreamEE，downstreamES，downstreamEE
包含形式中的目標外顯子（該外顯子的起始位置, 終止位置）
MXE：1stExonStart_0base，1stExonEnd，2ndExonStart_0base，2ndExonEnd，upstreamES，upstreamEE，downstreamES，downstreamEE
+鏈，包含形式是包含第1個外顯子（外顯子的起始位置, 終止位置），跳躍第2個外顯子
-鏈，包含形式是包含第2個外顯子（外顯子的起始位置, 終止位置），跳躍第1個外顯子
A3SS, A5SS：longExonStart_0base，longExonEnd，shortES，shortEE，flankingES，flankingEE
包含形式中使用長外顯子（長外顯子的起始位置, 終止位置）代替短的外顯子（短外顯子的起始位置，終止位置）
RI：riExonStart_0base，riExonEnd，upstreamES，upstreamEE，downstreamES，downstreamEE
包含形式中包含內含子區域一般使用（上游外顯子的終止位置 , 下有外顯子的起始位置）

5. 總結

總體上說目前rMATS4.1版不受限于單雙端測序，reads長度不一，是否存在生物學重復，是否有比較組，是否需要檢測新轉錄本，是否鏈特異性等條件，并且其可以進行分步，分機器計算，功能完善，主要可變剪切事件檢測完整的一款軟件。在二代測序可變剪切檢測的軟件中可以算佼佼者，希望小編的介紹能給大家的可變剪切分析帶來幫助。

6. 參考文獻

Mehmood A , Laiho A , Venlinen M S , et al. Systematic evaluation of differential splicing tools for RNA-seq studies[J]. Briefings in Bioinformatics, 2019.
Shen S , Park J W , Lu Z , et al. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data[J]. Proc Natl Acad Sci U S A, 2014, 111(51):5593-601.
Park J W , Tokheim C , Shen S , et al. Identifying Differential Alternative Splicing Events from RNA Sequencing Data Using RNASeq-MATS[M]// Deep Sequencing Data Analysis. Humana Press, 2013.
Shihao S , Won P J , Jian H , et al. MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data[J]. Nucleic Acids Research, 2012(8):e61.
http://rnaseq-mats.sourceforge.net/
https://github.com/Xinglab/rmats-turbo/blob/v4.1.0

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

詳細介紹最新版可變剪接軟件rMATS

詳細介紹最新版可變剪接軟件rMATS

1. 軟件介紹

2. 軟件安裝

3. 軟件使用及測試

單個樣本運行時

比較組運行時

4. 結果說明

01. 事件文件中共同的屬性列

02. 事件文件中特異的屬性列

5. 總結

6. 參考文獻

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

詳細介紹最新版可變剪接軟件rMATS

1. 軟件介紹

2. 軟件安裝

3. 軟件使用及測試

單個樣本運行時

比較組運行時

4. 結果說明

01. 事件文件中共同的屬性列

02. 事件文件中特異的屬性列

5. 總結

6. 參考文獻

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频