RIdeogram:染色體數據可視化的R包

已有的工具如circos, 只能繪制彎的染色體,又如R包chromoMap,IdeoViz,karyploteR,ggbio和在線工具Idiographica,雖然能繪制直的染色體,但僅支持人,小鼠,大鼠和果蠅等幾個物種,不支持自定義物種,不夠自由。R包RIdeogram可以可視化染色體上的全基因組數據,并且輸出SVG格式的文件,也可以將SVG格式的文件轉化為pdf, png, tiff, 或jpg格式。

安裝

install.packages('RIdeogram')
require(RIdeogram)

輸入文件

data(human_karyotype, package="RIdeogram")
data(gene_density, package="RIdeogram")
data(Random_RNAs_500, package="RIdeogram")
head(human_karyotype)
#>   Chr Start       End  CE_start    CE_end
#> 1   1     0 248956422 122026459 124932724
#> 2   2     0 242193529  92188145  94090557
#> 3   3     0 198295559  90772458  93655574
#> 4   4     0 190214555  49712061  51743951
#> 5   5     0 181538259  46485900  50059807
#> 6   6     0 170805979  58553888  59829934

karyotype文件:可以是五列(包含中心粒位置)或三列(不含中心粒位置)
第一列:染色體號
第二列:起始
第三列:終止
第四列:中心粒起始位置
第五列:中心粒終止位置

head(gene_density)
#>   Chr   Start     End Value
#> 1   1       1 1000000    65
#> 2   1 1000001 2000000    76
#> 3   1 2000001 3000000    35
#> 4   1 3000001 4000000    30
#> 5   1 4000001 5000000    10
#> 6   1 5000001 6000000    10

基因密度文件:
第一列:染色體號
第二列:起始
第三列:終止
第四列:基因密度值

head(Random_RNAs_500)
#>    Type    Shape Chr    Start      End  color
#> 1  tRNA   circle   6 69204486 69204568 6a3d9a
#> 2  rRNA      box   3 68882967 68883091 33a02c
#> 3  rRNA      box   5 55777469 55777587 33a02c
#> 4  rRNA      box  21 25202207 25202315 33a02c
#> 5 miRNA triangle   1 86357632 86357687 ff7f00
#> 6 miRNA triangle  11 74399237 74399333 ff7f00

染色體旁的標記文件:
第一列:標記類型
第二列:標記形狀
第三列:染色體號
第四列:起始
第五列:終止
第六列:顏色

也可以加載自己的數據。

karyotype <- read.table("karyotype.txt", sep = "\t", header = T, stringsAsFactors = F)
density <- read.table("data_1.txt", sep = "\t", header = T, stringsAsFactors = F)
label <- read.table("data_2.txt", sep = "\t", header = T, stringsAsFactors = F)

另外,該R包中還提供了一個GFFex函數用來從GFF文件中提取繪制染色體上熱圖的信息(例如基因密度)。
首先,需要準備物種的karyotype文件,格式與上述的相同,且保證第一列染色體號與GFF文件中的相同。然后,使用GFFex提取基因密度信息。

gene_density <- GFFex(input = "gencode.v32.annotation.gff3.gz", karyotype = "human_karyotype.txt", feature = "gene", window = 1000000)

其中,feature選項可以改為要繪制的特征類型,window選項可以更改統計的窗口大小。

用法

基本染色體繪制

ideogram(karyotype = human_karyotype)
convertSVG("chromosome.svg", device = "png")

基因密度熱圖繪制

ideogram(karyotype = human_karyotype, overlaid = gene_density)
convertSVG("chromosome.svg", device = "png")

標記類型繪制

ideogram(karyotype = human_karyotype, label = Random_RNAs_500, label_type = "marker")
convertSVG("chromosome.svg", device = "png")

染色體,基因密度和標記同時繪制

ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker")
convertSVG("chromosome.svg", device = "png")

修改基因密度熱圖的顏色

ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker", colorset1 = c("#fc8d59", "#ffffbf", "#91bfdb"))
convertSVG("chromosome.svg", device = "png")

無中心粒染色體的繪制

human_karyotype <- human_karyotype[,1:3]
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker")
convertSVG("chromosome.svg", device = "png")

修改染色體寬度值(染色體數較少時)

# default width is 170
human_karyotype <- human_karyotype[1:10,]
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker")
convertSVG("chromosome.svg", device = "png")
# change width to 100
human_karyotype <- human_karyotype[1:10,]
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker", width = 100)
convertSVG("chromosome.svg", device = "png")

移動圖例位置

# change Lx and Ly to 80, 25
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker", width = 100, Lx = 80, Ly = 25)
convertSVG("chromosome.svg", device = "png")

繪制熱圖標簽

data(human_karyotype, package="RIdeogram") #reload the karyotype data
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = LTR_density, label_type = "heatmap", colorset1 = c("#f7f7f7", "#e34a33"), colorset2 = c("#f7f7f7", "#2c7fb8")) #use the arguments 'colorset1' and 'colorset2' to set the colors for gene and LTR heatmaps, separately.
convertSVG("chromosome.svg", device = "png")

繪制線形標簽

# 單線形標簽
data(liriodendron_karyotype, package="RIdeogram") #load the karyotype data
data(Fst_between_CE_and_CW, package="RIdeogram") #load the Fst data for overlaid heatmap
data(Pi_for_CE, package="RIdeogram") #load the Pi data for one-line label
head(Pi_for_CE) #this data has a similar format with the heatmap data with additional column of "Color" which indicate the color for the line.
#>   Chr   Start     End      Value  Color
#> 1   1       1 2000000 0.00273566 fc8d62
#> 2   1 1000001 3000000 0.00239580 fc8d62
#> 3   1 2000001 4000000 0.00319407 fc8d62
#> 4   1 3000001 5000000 0.00286900 fc8d62
#> 5   1 4000001 6000000 0.00186596 fc8d62
#> 6   1 5000001 7000000 0.00186182 fc8d62

ideogram(karyotype = liriodendron_karyotype, overlaid = Fst_between_CE_and_CW, label = Pi_for_CE, label_type = "line", colorset1 = c("#e5f5f9", "#99d8c9", "#2ca25f"))
convertSVG("chromosome.svg", device = "png")
# 雙線形標簽
data(liriodendron_karyotype, package="RIdeogram") #load the karyotype data
data(Fst_between_CE_and_CW, package="RIdeogram") #load the Fst data for overlaid heatmap
data(Pi_for_CE_and_CW, package="RIdeogram") #load the Pi data for two-line label
head(Pi_for_CE_and_CW) #this data has a similar format with the one for one-line label with additional two columns indicating the second feature you want to show. When you prepare your own data, please keep the exact same column names.
#>   Chr   Start     End    Value_1 Color_1    Value_2 Color_2
#> 1   1       1 2000000 0.00273566  fc8d62 0.00385702  8da0cb
#> 2   1 1000001 3000000 0.00239580  fc8d62 0.00331109  8da0cb
#> 3   1 2000001 4000000 0.00319407  fc8d62 0.00374530  8da0cb
#> 4   1 3000001 5000000 0.00286900  fc8d62 0.00339141  8da0cb
#> 5   1 4000001 6000000 0.00186596  fc8d62 0.00305246  8da0cb
#> 6   1 5000001 7000000 0.00186182  fc8d62 0.00323655  8da0cb

ideogram(karyotype = liriodendron_karyotype, overlaid = Fst_between_CE_and_CW, label = Pi_for_CE_and_CW, label_type = "line", colorset1 = c("#e5f5f9", "#99d8c9", "#2ca25f"))
convertSVG("chromosome.svg", device = "png")

繪制多邊形標簽

# 單多邊形標簽
data(liriodendron_karyotype, package="RIdeogram") #load the karyotype data
data(Fst_between_CE_and_CW, package="RIdeogram") #load the Fst data for overlaid heatmap
data(Pi_for_CE, package="RIdeogram") #load the Pi data for one-polygon label
ideogram(karyotype = liriodendron_karyotype, overlaid = Fst_between_CE_and_CW, label = Pi_for_CE, label_type = "polygon", colorset1 = c("#e5f5f9", "#99d8c9", "#2ca25f"))
convertSVG("chromosome.svg", device = "png")
# 雙多邊形標簽
data(liriodendron_karyotype, package="RIdeogram") #load the karyotype data
data(Fst_between_CE_and_CW, package="RIdeogram") #load the Fst data for overlaid heatmap
data(Pi_for_CE_and_CW, package="RIdeogram") #load the Pi data for two-polygon label
ideogram(karyotype = liriodendron_karyotype, overlaid = Fst_between_CE_and_CW, label = Pi_for_CE_and_CW, label_type = "polygon", colorset1 = c("#e5f5f9", "#99d8c9", "#2ca25f"))
convertSVG("chromosome.svg", device = "png")

此外,還可以修改device參數來轉換圖片為其他格式,比如tiff, pdf, jpg等,還可以修改dpi參數來設置圖片的分辨率(默認為300).

convertSVG("chromosome.svg", device = "tiff", dpi = 600)

該R包中還提供了四個快捷方式進行圖片格式轉換。

svg2tiff("chromosome.svg")
svg2pdf("chromosome.svg")
svg2jpg("chromosome.svg")
svg2png("chromosome.svg")

基因組共線性區域可視化

兩個基因組間共線性區域的繪制,

data(karyotype_dual_comparison, package="RIdeogram")
head(karyotype_dual_comparison)
#>   Chr Start      End   fill species size  color
#> 1  I      1 23037639 969696   Grape   12 252525
#> 2  II     1 18779884 969696   Grape   12 252525
#> 3 III     1 19341862 969696   Grape   12 252525
#> 4  IV     1 23867706 969696   Grape   12 252525
#> 5   V     1 25021643 969696   Grape   12 252525
#> 6  VI     1 21508407 0ab276   Grape   12 252525
table(karyotype_dual_comparison$species)
#> 
#>   Grape Populus 
#>      19      19

data(synteny_dual_comparison, package="RIdeogram")
head(synteny_dual_comparison)
#>   Species_1  Start_1    End_1 Species_2 Start_2   End_2   fill
#> 1         1 12226377 12267836         2 5900307 5827251 cccccc
#> 2        15  5635667  5667377        17 4459512 4393226 cccccc
#> 3         9  7916366  7945659         3 8618518 8486865 cccccc
#> 4         2  8214553  8242202        18 5964233 6027199 cccccc
#> 5        13  2330522  2356593        14 6224069 6138821 cccccc
#> 6        11 10861038 10886821        10 8099058 8011502 cccccc

karyotype_dual_comparison文件格式
Chr: 染色體號
Start: 起始
End: 終止
fill: 染色體填充色
species:物種名
size: 物種名字體大小
color: 物種名字體顏色

synteny_dual_comparison文件格式
Species_1:物種1染色體號
Start_1,End_1:物種1染色體區域位置
Species_2:物種2染色體號
Start_2,End_2:物種2染色體區域位置

ideogram(karyotype = karyotype_dual_comparison, synteny = synteny_dual_comparison)
convertSVG("chromosome.svg", device = "png")

三個基因組間共線性區域的繪制,

data(karyotype_ternary_comparison, package="RIdeogram")
head(karyotype_ternary_comparison)
#>   Chr Start      End   fill   species size  color
#> 1  NA     1 15980527 fcb06b Amborella   10 fcb06b
#> 2  NA     1 11522362 fcb06b Amborella   10 fcb06b
#> 3  NA     1 11085951 fcb06b Amborella   10 fcb06b
#> 4  NA     1 10537363 fcb06b Amborella   10 fcb06b
#> 5  NA     1  9585472 fcb06b Amborella   10 fcb06b
#> 6  NA     1  9414115 fcb06b Amborella   10 fcb06b
table(karyotype_ternary_comparison$species)
#> 
#>    Amborella        Grape Liriodendron 
#>          100           19           19

data(synteny_ternary_comparison, package="RIdeogram")
head(synteny_ternary_comparison)
#>   Species_1 Start_2   End_2 Species_2  Start_1    End_1   fill type
#> 1         1 4761181 2609697         1   342802   981451 cccccc    1
#> 2         6 6344197 8074393         1 15387184 16716190 cccccc    1
#> 3        10 6457890 9052487         1 11224953 14959548 cccccc    1
#> 4        13 6318795 1295413         1 20564870 21386271 cccccc    1
#> 5        16 1398101 2884119         1 21108654 22221088 cccccc    1
#> 6        16 1482529 2093625         1 21864494 22364888 cccccc    1
tail(synteny_ternary_comparison, n = 20)
#>     Species_1  Start_2    End_2 Species_2  Start_1    End_1   fill type
#> 571        16 19278042 20828694         2 95267449 93334736 cccccc    3
#> 572        12 20546006 22461088         2 22647943 18365764 cccccc    3
#> 573         4 22259262 23453956         2 15068249 17839485 cccccc    3
#> 574        14 22377895 23821929         2 97299880 96033346 cccccc    3
#> 575         6  1538773  2808373         1 91285578 95681546 cccccc    3
#> 576        11  3381792  4954528         1 67689752 75286468 cccccc    3
#> 577         9  4814481  6975840         1 69506847 76015710 cccccc    3
#> 578        10  7091825  9742616         1 19333526 24516133 cccccc    3
#> 579        13 22063957 23402389         1 95843870 92195256 cccccc    3
#> 580         7   679765  1881756         6  7365421  7531534 e41a1c    1
#> 581         7   679765  2752867        13   501561   766473 e41a1c    1
#> 582         7   679765  3012501         8  7406703  8222490 e41a1c    1
#> 583         7  2049369  2942034        14 29350547 34369929 e41a1c    2
#> 584         7  2075095  1538540        10 28985737 30815217 e41a1c    2
#> 585        13   531939   834472        14 28866243 35278211 e41a1c    3
#> 586         8  7427221  8894821        14 28632063 34805893 e41a1c    3
#> 587         6  7567597  7690342        14 32050301 34913801 e41a1c    3
#> 588        13   501561   876423        10 30496700 27874100 e41a1c    3
#> 589         6  7171014  7815454        10 31408837 27660041 e41a1c    3
#> 590         8  5773528  9346871        10 31408837 26585934 e41a1c    3
ideogram(karyotype = karyotype_ternary_comparison, synteny = synteny_ternary_comparison)
convertSVG("chromosome.svg", device = "png")

更好的閱讀體驗請移步這里>>

參考

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容