???本文記錄我使用PsRobot的psRobot_tar模塊識(shí)別靶基因的過程。踩了不少坑,供實(shí)驗(yàn)室?guī)煹軒熋脗兘梃b學(xué)習(xí)。本文參考:
psRobot_tar
模塊 is designed to find potential small RNA targets;
psRobot_tar 識(shí)別潛在的小RNA 的靶基因。psRobot_map
模塊 is designed to find all perfect matching locations of short sequences (less than 40bp) in longer reference sequences;
psRobot_map 在更長(zhǎng)的參考序列上找出所有完美匹配的短序列(小于40bp)。psRobot_mir
模塊 is designed to find small RNAs with stem-loop precursors (e.g. miRNAs or shRNAs) for a batch of input sequences from high throughput sequencing data;
psRobot_mir 可為一批來自高通量的輸入序列尋找具有莖環(huán)前體的小RNA(如miRNA或shRNA)。psRobot_deg
模塊 is designed to identify which small RNA targets are supported by user specified degradome data.
psRobot_deg 用于識(shí)別哪些小RNA靶標(biāo)得到了用戶指定的降解組數(shù)據(jù)的支持。
下面我們借助psRobot_tar模塊識(shí)別miRNA的靶基因,let's go。
1. 下載、處理mature.fa文件
- 從mirBase下載,
mature.fa
文件
?? 注意:最好迅雷下載,不知道為何,瀏覽器直接下載,下載不了。
- 從mirBase下載,
- 使用Notepad++ 軟件處理,刪除其他物種的miRNA,僅保留小麥的,另存為
tae_miR.fa
.
- 使用Notepad++ 軟件處理,刪除其他物種的miRNA,僅保留小麥的,另存為
2. 從Ensembl plants 下載cDNA文件。
Triticum_aestivum.IWGSC.cdna.all.fa
3. 使用xftp 上傳至服務(wù)器
- Triticum_aestivum.IWGSC.cdna.all.fa
- tae_miR.fa
4. 簡(jiǎn)化Triticum_aestivum.IWGSC.cdna.all.fa 和tae_miR.fa文件
刪除以">"開始的行中cdna 及以后的信息
sed -ri '/>/s/cdna.*$/ /g' Triticum_aestivum.IWGSC.cdna.all.fa
簡(jiǎn)化tae_miR.fa
沒處理之前的tae_miR.fa
less -SN tae_miR.fa
=================== 沒處理之前 =========================
>tae-miR159a MIMAT0005343 Triticum aestivum miR159a
UUUGGAUUGAAGGGAGCUCUG
>tae-miR159b MIMAT0005344 Triticum aestivum miR159b
UUUGGAUUGAAGGGAGCUCUG
>tae-miR160 MIMAT0005345 Triticum aestivum miR160
UGCCUGGCUCCCUGUAUGCCA
>tae-miR164 MIMAT0005346 Triticum aestivum miR164
UGGAGAAGCAGGGCACGUGCA
=================== 沒處理之前 =========================
處理tae_miR.fa,變得清爽多了
sed -ri '/>/s/MIMAT.*$//g' tae_miR.fa
less -SN tae_miR.fa
=================== 處理之后 ===========================
>tae-miR159a
UUUGGAUUGAAGGGAGCUCUG
>tae-miR159b
UUUGGAUUGAAGGGAGCUCUG
>tae-miR160
=================== 處理之后 ===========================
5. 依賴軟件mfold3.5 安裝 (有管理員權(quán)限)
wget http://omicslab.genetics.ac.cn/psRobot/program/WebServer/mfold.tar.gz
tar xvzf mfold.tar.gz
cd mfold-3.5/
./configure
make
sudo make install
6. PsRobot軟件 安裝 (有管理員權(quán)限)
wget http://omicslab.genetics.ac.cn/psRobot/program/WebServer/psRobot_v1.2.tar.gz
tar xvzf psRobot_v1.2.tar.gz
cd psRobot_v1.2
sudo ./configure
make
sudo make install
source ~/.bashrc
7. PsRobot運(yùn)行
PsRobot 有一些參數(shù):
psRobot_tar -s tae_miR.fa -t Triticum_aestivum.IWGSC.cdna.all.fa -p 8 -o target_results.gTP
使用cDNA序列不用genomic序列的原因是,miRNA在細(xì)胞質(zhì)和靶基因結(jié)合發(fā)揮作用。此時(shí)靶基因還有UTR區(qū)域但是已經(jīng)沒有內(nèi)含子區(qū)了。(考慮到UTR區(qū)域的序列特點(diǎn),其實(shí)用CDS序列也行)
psRobot_tar 的參數(shù):
-s
input file name: smRNA sequences (fasta format);default = smRNA
待預(yù)測(cè)的miRNA,fasta格式;默認(rèn):smRNA-t
input file name: target sequences (fasta format),default = target
用于搜索的cDNA序列,fasta格式;默認(rèn): target-o
output file name,??注意:default = smRNA-target.gTP
輸出文件名,默認(rèn):smRNA-target.gTP-ts
target penalty score, lower is better (0-5),default = 2.5
輸出結(jié)果的閾值,默認(rèn):2.5-fp
5 prime boundary of essential sequence (1-2),default = 2
5'后第幾位開始是必要區(qū)間(1~2), 默認(rèn):2-tp
3 prime boundary of essential sequence (7-31), default = 17
3'后第幾位開始是必要區(qū)間(7~31), 默認(rèn):17-gl
position after which with gap/bulge permit (0-30), 0 means no gap/bulge permitted, default = 17
從第幾個(gè)堿基后允許出現(xiàn)gap/bulge, 默認(rèn):17-p
number of processors use,default = 1,
使用線程數(shù), 默認(rèn):1,??注意:根據(jù)實(shí)際情況可以改大些-gn
number of gaps/bulges permitted (0-5), default = 1
允許存在幾個(gè)gap/bulge, 默認(rèn):1
8. 結(jié)果查看
less -SN target_results.gTP
======================================================
1 >tae-miR159a Score: 2.5 TraesCS7A02G377100.1
2
3 Query: 1 TTTGGATTGAAGGGAGCTCTG^M 22
4 *|||||*||||||||||||::*
5 Sbjct: 1095 TAACCTTACTTCCCTCGAGGTA 1074
6
7
8 >tae-miR159a Score: 2.5 TraesCS7D02G446700.1
9
10 Query: 1 TTTGGATTGAAGGGAGCTCTG^M 22
11 |||||*|:|||||||||||*|*
12 Sbjct: 952 AAACCAAGCTTCCCTCGAG-CG 932
13
14
15 >tae-miR159a Score: 2.5 TraesCS1D02G307500.2
16
17 Query: 1 TTTGGATTGAAGGGAGCTCTG^M 22
18 *|||||*||||||||||||::*
19 Sbjct: 1156 TAACCTTACTTCCCTCGAGGTA 1135
======================================================
9. 將靶基因?qū)Υ嬗趍iRNA-mRNA.txt 文件
cat target_results | grep "^>" | cut -f 1,3 | sed 's/>//g' >>miRNA_mRNA.txt