比較基因組學之共線性工具JCVI安裝及使用
jcvi及依賴的安裝
$ conda create -n jcvi -c bioconda -c conda-forge jcvi last
工作流程
1.將GFF文件轉換成BED文件
$ python -m jcvi.formats.gff bed --type=mRNA --key=Parent Puya_raimondii.rep.gff3 -o Puya_raimondii.bed
$ python -m jcvi.formats.gff bed --type=mRNA --key=Parent CB5.v20190123.re.gff3 -o CB5.bed
#由于很多基因組注釋文件會包含許多不同的轉錄本,MCcan 并不知道這些基因實際上是同一個基因,而是將它們視為看起來像串聯基因復制的不同基因。如果轉錄本過多,建議添加一個選項--primary_only到上面的BED生成命令中,只保留每個基因的一個轉錄本。
$ python -m jcvi.formats.gff bed --type=mRNA --key=Name --primary_only Puya_raimondii.rep.gff3 -o Puya_raimondii.bed
2.準備cds文件或蛋白質序列文件
$ 此處我直接用tbtools提取CDS序列
#或者
$ python -m jcvi.formats.fasta format Puya_raimondii.cds.fa Puya_raimondii.cds.
$ python -m jcvi.formats.fasta format CB5.v20190123.cds.fa CB5.cds
#注這里的CDS序列文件ID要與.bed文件ID統一,python -m jcvi.formats.gff bed --type=mRNA --key=Parent中 --key=后面可以自己視情況調整
3.同線性區塊的搜索
$ ls *.???
image.png
$ python -m jcvi.compara.catalog ortholog CB5 Puya_raimondii --no_strip_names
image.png
可以看到上面由于latal不能使用多線程而報錯,無法生成共線性區塊信息
查找了需多辦法最終通過增加--cpu=1解決
$ python -m jcvi.compara.catalog ortholog CB5 Puya_raimondii --no_strip_names --cpu=1
image.png
4.同線性的可視化
$ python -m jcvi.graphics.dotplot CB5.Puya_raimondii.anchors
image.png
image.png
5.局部共線性可視化
$ python -m jcvi.compara.synteny mcscan CB5.bed CB5.Puya_raimondii.lifted.anchors --iter=1 -o CB5.Puya_raimondii.i1.blocks
image.png
#將你需要展示的局部共線性信息從CB5.Puya_raimondii.i1.blocks文件中提取出來
#這里我將CB5.17G0008100,CB5.17G0008090,CB5.17G0008050,CB5.17G0008110上由21個基因,下游20個基因區域提取出來保存為CB5.Puya_raimondii.block
CB5.17G0007820 PY_026228
CB5.17G0007830 PY_026226
CB5.17G0007840 PY_026225
CB5.17G0007850 PY_026221
CB5.04G0002850 .
CB5.17G0007870 .
CB5.17G0007880 PY_026226
CB5.17G0007890 .
CB5.17G0007900 .
CB5.17G0007910 PY_026217
CB5.05G0004350 .
CB5.17G0007930 PY_026215
CB5.17G0007940 PY_026214
CB5.17G0007950 PY_026213
CB5.17G0007960 PY_026213
CB5.17G0007970 PY_026213
CB5.17G0008000 PY_026212
CB5.17G0008010 PY_026211
CB5.17G0008020 PY_026209
CB5.17G0008030 PY_026205
CB5.17G0008110 PY_026205
CB5.17G0008050 PY_026206
CB5.17G0008060 PY_026207
CB5.17G0008070 PY_026208
CB5.17G0008080 PY_026207
CB5.17G0008090 PY_026206
CB5.17G0008100 PY_026205
CB5.17G0008120 .
CB5.17G0008130 PY_026204
CB5.17G0008140 PY_026203
CB5.17G0008150 PY_026203
CB5.17G0008160 PY_026202
CB5.17G0008170 .
CB5.17G0008180 PY_026200
CB5.17G0008190 PY_026199
CB5.17G0008200 PY_026199
CB5.17G0008210 PY_026198
CB5.17G0008220 PY_026198
CB5.17G0008230 .
CB5.17G0008240 .
CB5.17G0008250 PY_026197
CB5.17G0008260 PY_026197
CB5.17G0008270 .
CB5.17G0008280 .
CB5.17G0008290 PY_026196
#準備layout文件,文件內容如下CB5.Puya_raimondii.layout
# x, y, rotation, ha, va, color, ratio, label
0.5, 0.6, 0, left, center, m, 1, CB5 Chr17
0.5, 0.4, 0, left, center, #fc8d62, 1, Puya_raimondii Scaffold4
# edges
e, 0, 1
#合并.bed文件
cat CB5.bed Puya_raimondii.bed >CB5.Puya_raimondii.bed
#生成局部共線性圖
$ python -m jcvi.graphics.synteny CB5.Puya_raimondii.block CB5.Puya_raimondii.bed CB5.Puya_raimondii.layout
[圖片上傳失敗...(image-882d1e-1676960060731)]
#顯示指定基因lable
python -m jcvi.graphics.synteny CB5.Puya_raimondii.block CB5.Puya_raimondii.bed CB5.Puya_raimondii.layout --genelabelsize=4 --genelabels=CB5.17G0008100,CB5.17G0008090,CB5.17G0008050,CB5.17G0008110,PY_026206,PY_026205
image.png
接下來重復上面操作將做出CB5與Acmosus局部共線性關系展示出來
$ python -m jcvi.formats.gff bed --type=mRNA --key=Parent Acomosus_321_v3.re.gene.gff3 -o Acomosus.bed
$ python -m jcvi.formats.fasta format Acomosus_321_v3.re.gene.cds Acomosus.cds
$ python -m jcvi.compara.catalog ortholog CB5 Acomosus --no_strip_names --cpu=1
$ python -m jcvi.compara.synteny mcscan CB5.bed CB5.Acomosus.lifted.anchors --iter=1 -o CB5.Acomosus.i1.blocks
$ cat CB5.bed Acomosus.bed >CB5_Acmosus.bed
$ python -m jcvi.graphics.synteny CB5.Ac.blocks CB5_Acmosus.bed CB5.Ac.layout
$ python -m jcvi.graphics.synteny CB5.Ac.blocks CB5_Acmosus.bed CB5.Ac.layout --genelabelsize=4 --genelabels=CB5.17G0008100,CB5.17G0008090,CB5.17G0008050,CB5.17G0008110,Aco023267.1,Aco023266.1,Aco023263.1,Aco023262.1
image.png
接下來重復上面操作將做出Puya_raimondii與rice局部共線性關系展示出來
$ python -m jcvi.formats.gff bed --type=mRNA --key=Parent Osativa_323_v7.0.re.gene.gff3 -o Osativa.bed
$ python -m jcvi.formats.fasta format Osativa_323_v7.0.re.gene.cds Osativa.cds
$ python -m jcvi.compara.catalog ortholog Puya_raimondii Osativa --no_strip_names --cpu=1
$ python -m jcvi.compara.synteny mcscan Puya_raimondii.bed Puya_raimondii.Osativa.lifted.anchors --iter=1 -o Puya_raimondii.Osativa.i1.blocks
$ cat Puya_raimondii.bed Osativa.bed >Puya_raimondii.Osativa.bed
$ python -m jcvi.graphics.synteny Puya_raimondii.Osativa.blocks Puya_raimondii.Osativa.bed Puya_raimondii.Osativa.layout
$ python -m jcvi.graphics.synteny Puya_raimondii.Osativa.blocks Puya_raimondii.Osativa.bed Puya_raimondii.Osativa.layout --genelabelsize=4 --genelabels=PY_026206,PY_026205
image.png
接下來重復上面操作將做出rice與bananas局部共線性關系展示出來
$ python -m jcvi.formats.gff bed --type=mRNA --key=Parent Musa_acuminata_pahang_v4.re.gff3 -o Musa_acuminata.bed
$ python -m jcvi.formats.fasta format Musa_acuminata_pahang_v4.gene.cds Musa_acuminata.cds
$ python -m jcvi.compara.catalog ortholog Osativa Musa_acuminata --no_strip_names --cpu=1
$ python -m jcvi.compara.synteny mcscan Osativa.bed Osativa.Musa_acuminata.lifted.anchors --iter=1 -o Osativa.Musa_acuminata.i1.blocks
$ cat Osativa.bed Musa_acuminata.bed>Osativa.Musa_acuminata.bed
$ python -m jcvi.graphics.synteny Puya_raimondii.Osativa.blocks Puya_raimondii.Osativa.bed Puya_raimondii.Osativa.layout
$ python -m jcvi.graphics.synteny Puya_raimondii.Osativa.blocks Puya_raimondii.Osativa.bed Puya_raimondii.Osativa.layout --genelabelsize=4 --genelabels=PY_026206,PY_026205
2.多重共線性關系
為方便起見,我們可以在一個圖中展示多重共線性關系。首先與之前一樣,使用**python -m jcvi.compara.synteny mcscan**
構建多個共線性塊,然后修改blocks.layout
文件以表示更多區域以及區域之間的邊緣。
這次我以Acmosus,CB5,MD2,PY,At,Musa,rice,Atr為例。在此我以Acmosus為參考構建含有8個基因組的blocks文件。隨后將CB5,MD2,PY,At,Musa,rice,Atr分別與菠蘿比對。
2.1 Acmosus,MD2的比對
$ python -m jcvi.compara.catalog ortholog Acomosus ACMD2 --cscore=.99
#使用官方文檔命令報錯說不存在下列id得不到結果文件
[圖片上傳失敗...(image-4e80ec-1676959427417)]
# 解決辦法增加一個--no_strip_names
$ python -m jcvi.compara.catalog ortholog Acomosus ACMD2 --cpu=1 --no_strip_names --cscore=.99
image.png
# 生成blocks文件
$ python -m jcvi.compara.synteny mcscan Acomosus.bed Acomosus.ACMD2.lifted.anchors --iter=1 -o Acomosus.ACMD2.i1.blocks
2.2 Acmosus與CB5的比對
# 尋找同源基因對
$ python -m jcvi.compara.catalog ortholog Acomosus CB5 --cpu=1 --no_strip_names --cscore=.99
# 根據同源關系生成blocks文件
python -m jcvi.compara.synteny mcscan Acomosus.bed Acomosus.CB5.lifted.anchors --iter=1 -o Acomosus.CB5.i1.blocks
image.png
image.png
2.3 Acmosus與PY的比對
# 尋找同源基因對
$ python -m jcvi.compara.catalog ortholog Acomosus Puya_raimondii --cpu=1 --no_strip_names --cscore=.99
# 根據同源關系生成blocks文件
python -m jcvi.compara.synteny mcscan Acomosus.bed Acomosus.Puya_raimondii.lifted.anchors --iter=1 -o Acomosus.Puya_raimondii.i1.blocks
2.4 Acmosus與At的比對
# 尋找同源基因對
$ python -m jcvi.compara.catalog ortholog Acomosus Athaliana --cpu=1 --no_strip_names --cscore=.99
# 根據同源關系生成blocks文件
python -m jcvi.compara.synteny mcscan Acomosus.bed Acomosus.Athaliana.lifted.anchors --iter=1 -o Acomosus.Athaliana.i1.blocks
2.5 Acmosus與Musa的比對
# 尋找同源基因對
$ python -m jcvi.compara.catalog ortholog Acomosus Musa_acuminata --cpu=1 --no_strip_names --cscore=.99
# 根據同源關系生成blocks文件
python -m jcvi.compara.synteny mcscan Acomosus.bed Acomosus.Musa_acuminata.lifted.anchors --iter=1 -o Acomosus.Musa_acuminata.i1.blocks
2.6 Acmosus與rice的比對
# 尋找同源基因對
$ python -m jcvi.compara.catalog ortholog Acomosus Osativa --cpu=1 --no_strip_names --cscore=.99
# 根據同源關系生成blocks文件
python -m jcvi.compara.synteny mcscan Acomosus.bed Acomosus.Osativa.lifted.anchors --iter=1 -o Acomosus.Osativa.i1.blocks
2.7Acmosus與ATR的比對
# 尋找同源基因對
$ python -m jcvi.compara.catalog ortholog Acomosus Amborella_trichopoda --cpu=1 --no_strip_names --cscore=.99
# 根據同源關系生成blocks文件
python -m jcvi.compara.synteny mcscan Acomosus.bed Acomosus.Amborella_trichopoda.lifted.anchors --iter=1 -o Acomosus.Amborella_trichopoda.i1.blocks
2.8 將所有比對結果中blocks文件整合
$ python -m jcvi.formats.base join Acomosus.ACMD2.i1.blocks Acomosus.CB5.i1.blocks Acomosus.Puya_raimondii.i1.blocks Acomosus.Athaliana.i1.blocks Acomosus.Musa_acuminata.i1.blocks Acomosus.Osativa.i1.blocks Acomosus.Amborella_trichopoda.i1.blocks --noheader > Acomosus.ACMD2.CB5.Puya_raimondii.Athaliana.Musa_acuminata.Osativa.Amborella_trichopoda.blocks
$ python -m jcvi.formats.base join Acomosus.ACMD2.i1.blocks Acomosus.CB5.i1.blocks Acomosus.Puya_raimondii.i1.blocks Acomosus.Athaliana.i1.blocks Acomosus.Musa_acuminata.i1.blocks Acomosus.Osativa.i1.blocks Acomosus.Amborella_trichopoda.i1.blocks --noheader | cut -f1,2,4,6,8,10,12,14 > Acomosus.blocks
2.9 準備layout文件
#文件內容如下
# x, y, rotation, ha, va, color, ratio, label
0.5, 0.6, 30, center, top, , 20, Acomosus LG02
0.3, 0.4, 0, center, bottom, , 5, MD2 LSRQ01005221.1
0.4, 0.4, 0, center, bottom, , 5, MD2 LSRQ01000111.1
0.7, 0.4, 0, center, bottom, , 20, CB5 chr17
0.5, 0.8, 0, center, top, , 2, Puya_raimondii Scaffold4
0.7, 0.8, 0, center, bottom, , .2, Oryza_sativa Chr9
0.3, 0.6, 90, center, bottom, , 10, Musa_acuminata chr08
0.7, 0.6, 90, center, bottom, , 10, Musa_acuminata chr07
0.3, 0.8, 0, center, bottom, , 5, Arabidopsis_thaliana Chr2
0.4, 0.7, 0, center, bottom, , .2, Amborella_trichopoda scaffold00024
# edges
e, 0, 1
e, 0, 2
e, 0, 3
e, 0, 4
e, 0, 5
e, 0, 6
e, 0, 7
# x,y分別表示各物種共線性區塊位置,x,y值需要在0-1之間否則無法出圖。rotation旋轉角度,即共線性區塊旋轉角度
2.10 合并所有bed文件
$ cat Acomosus.bed ACMD2.bed CB5.bed Puya_raimondii.bed Athaliana.bed Musa_acuminata.bed Osativa.bed Amborella_trichopoda.bed > Acomosus.ACMD2.CB5.Puya_raimondii.Athaliana.Musa_acuminata.Osativa.Amborella_trichopoda.bed
2.11 生成多物種共線性圖
$ python -m jcvi.graphics.synteny Acomosus.ACMD2.CB5.Puya_raimondii.Athaliana.Musa_acuminata.Osativa.Amborella_trichopoda.blocks Acomosus.ACMD2.CB5.Puya_raimondii.Athaliana.Musa_acuminata.Osativa.Amborella_trichopoda.bed Acomosus.ACMD2.CB5.Puya_raimondii.Athaliana.Musa_acuminata.Osativa.Amborella_trichopoda.blocks.layout
image.png