如何拿到 KEGG數(shù)據(jù)庫的 hsa04650 Natural killer cell mediated cytotoxicity這個通路的所有基因名字

  • KEGG 是了解高級功能和生物系統(tǒng)(如細(xì)胞、 生物和生態(tài)系統(tǒng)),從分子水平信息,尤其是大型分子數(shù)據(jù)集生成的基因組測序和其他高通量實驗技術(shù)的實用程序數(shù)據(jù)庫資源, 由日本京都大學(xué)生物信息學(xué)中心的Kanehisa實驗室于1995年建立。是國際最常用的生物信息數(shù)據(jù)庫之一,以“理解生物系統(tǒng)的高級功能和實用程序資源庫”著稱。

  • 小練習(xí):如何拿到 KEGG數(shù)據(jù)庫的 hsa04650 Natural killer cell mediated cytotoxicity(自然殺傷細(xì)胞介導(dǎo)的細(xì)胞毒性)這個通路的所有基因名字。(hsa04650:Homo sapiens智人)

兩種辦法,第一谷歌,通過網(wǎng)頁方式瀏覽得到,第二種辦法,使用R包和代碼來做。


第一種辦法:網(wǎng)頁瀏覽


1、谷歌直接搜索:hsa04650

image.png

2、點開此條網(wǎng)址(https://www.genome.jp/dbget-bin/www_bget?hsa04650
image.png

3、直接翻到gene這個條目下即可看到答案。
image.png


第二種方法:使用R包和代碼:


思路:看一下網(wǎng)頁答案可知,我們的目標(biāo)是得到Gene條目形成的一個矩陣,并提取出第二列的基因(縮寫)


image.png

參考文章: http://www.bio-info-trainee.com/3533.html
看一下這篇文章:

library(clusterProfiler)   #加載這個包,這個包有什么用呢?
# https://www.kegg.jp/dbget-bin/www_bget?pathway+hsa05169
# library(KEGG.db) library(KEGGREST)  #這兩個包有什么用呢?
?
kg=download_KEGG('hsa')     #直接提取,并未提示用哪個命令獲得。
head(kg[[1]])
head(kg[[2]])
ps=c('hsa04660','hsa04659',
     'hsa04658','hsa04657','hsa04662',
     'hsa04650')
  • clusterProfiler :This package implements methods to analyze and visualize functional profiles (GO and KEGG) of gene and gene clusters.(該軟件包是實現(xiàn)了分析和可視化基因和基因簇的功能譜(GO和KEGG)的方法。)
  • KEGGREST :A package that provides a client interface to the KEGG REST server. (一個為KEGG REST服務(wù)器提供客戶端接口的包。)

確定方向,先安裝包:


老規(guī)矩三部曲(安裝bioconductor內(nèi)的包):
1、source("http://bioconductor.org/biocLite.R")安裝BiocInstaller

2、options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/") 切換鏡像

3、BiocInstaller::biocLite('KEGGREST')安裝bioconductor內(nèi)的包(KEGGREST就是bioconductor的包)

> source("http://bioconductor.org/biocLite.R")
Bioconductor version 3.7 (BiocInstaller 1.30.0), ?biocLite for help
A newer version of Bioconductor is available for this version of R, ?BiocUpgrade for
  help
> options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/") 
> BiocInstaller::biocLite('KEGGREST')
BioC_mirror: http://mirrors.ustc.edu.cn/bioc/
Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.2 (2018-12-20).
Installing package(s) ‘KEGGREST’
also installing the dependency ‘png’

trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/png_0.1-7.zip'
Content type 'application/zip' length 292639 bytes (285 KB)
downloaded 285 KB

trying URL 'http://mirrors.ustc.edu.cn/bioc//packages/3.7/bioc/bin/windows/contrib/3.5/KEGGREST_1.20.2.zip'
Content type 'application/zip' length 124626 bytes (121 KB)
downloaded 121 KB

package ‘png’ successfully unpacked and MD5 sums checked
package ‘KEGGREST’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\300S\AppData\Local\Temp\Rtmp4wKPRV\downloaded_packages
Old packages: 'gplots', 'purrr'
Update all/some/none? [a/s/n]: 
a
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/gplots_3.0.1.1.zip'
Content type 'application/zip' length 657011 bytes (641 KB)
downloaded 641 KB

trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/purrr_0.3.0.zip'
Content type 'application/zip' length 413820 bytes (404 KB)
downloaded 404 KB

package ‘gplots’ successfully unpacked and MD5 sums checked
package ‘purrr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\300S\AppData\Local\Temp\Rtmp4wKPRV\downloaded_packages

了解包的使用:


命令:

> ?KEGGREST
No documentation for ‘KEGGREST’ in specified packages and libraries:
you could try ‘??KEGGREST’
> ??KEGGREST
image.png

點擊查看,了解基本命令:

  • KEGG exposes a number of databases. To get an idea of what is available, run listDatabases() 顯示KEGGREST所包含的數(shù)據(jù)內(nèi)容
  • You can obtain the list of organisms available in KEGG with the keggList()function 得到可用的生物列表

> gs<-keggGet('hsa04650')
> View(gs)
image.png

網(wǎng)頁部分截圖:
image.png

目錄和網(wǎng)頁一樣,但是可以明顯看出gs目前不是矩陣。把其變成矩陣再提取出來即可。

image.png

光標(biāo)放在目錄旁,發(fā)現(xiàn)一個圖標(biāo),點擊出現(xiàn)一行代碼,enter運行,得到該目錄內(nèi)容。


image.png

與網(wǎng)頁對比正確:


image.png
  • strsplit(x, split, fixed = FALSE, perl= FALSE, useBytes = FALSE)

參數(shù)x是要處理的字符串,
參數(shù)split是分割點。
參數(shù)fixed為TRUE時采用精確查找;
參數(shù)perl為TRUE時采用Perl正則表達(dá)式;
參數(shù)fixed和perl都為FALSE時,使用POSIX1003.2擴展正則表達(dá)式;
參數(shù)useBytes為TRUE時,匹配過程是逐字節(jié)進行的;

  • lapply(X, FUN, ...)
    lapply的返回值是和一個和X有相同的長度的list對象,這個list對象中的每個元素是將函數(shù)FUN應(yīng)用到X的每一個元素。其中X為List對象(該list的每個元素都是一個向量),其他類型的對象會被R通過函數(shù)as.list()自動轉(zhuǎn)換為list類型。

  • unlist就是把里面不同的類型的數(shù)據(jù)分解出來,在此將數(shù)字與字符分隔開。unlist(x)生成一個包含x所有元素的向量,作用是展平數(shù)據(jù)列表。

> lapply(a,function(x) strsplit(x,';'))
[[1]]
[[1]][[1]]
[1] "3105"


[[2]]
[[2]][[1]]
[1] "HLA-A"                                                    
[2] " major histocompatibility complex, class I, A [KO:K06751]"
...
> unlist(lapply(a,function(x) strsplit(x,';')[[1]][1]))
  [1] "3105"        "HLA-A"       "3106"        "HLA-B"       "3107"        "HLA-C"      
  [7] "3135"        "HLA-G"       "3133"        "HLA-E"       "3812"        "KIR3DL2"    
 [13] "3811"        "KIR3DL1"     "3803"        "KIR2DL2"     "3802"        "KIR2DL1"    

> b<- unlist(lapply(a,function(x) strsplit(x,';')[[1]][1]))
> b[1:length(b)%%2 ==0]  #length(b)為基因所在位置,取出位置為偶數(shù)的字符即基因名
  [1] "HLA-A"       "HLA-B"       "HLA-C"       "HLA-G"       "HLA-E"       "KIR3DL2"    
  [7] "KIR3DL1"     "KIR2DL2"     "KIR2DL1"     "KIR2DL3"     "KIR2DL4"     "KIR2DL5A"   
 [13] "KLRC1"       "KLRC2"       "KLRC3"       "KLRD1"       "PTPN6"       "PTPN11"     
 [19] "ICAM1"       "ICAM2"       "ITGAL"       "ITGB2"       "PTK2B"       "VAV3"       
 [25] "VAV1"        "VAV2"        "RAC1"        "RAC2"        "RAC3"        "PAK1"       
 [31] "MAP2K1"      "MAP2K2"      "MAPK1"       "MAPK3"       "TNF"         "CSF2"       
 [37] "IFNG"        "KIR2DS1"     "KIR2DS3"     "KIR2DS4"     "KIR2DS5"     "KIR2DS2"    
 [43] "NCR2"        "TYROBP"      "LCK"         "IGH"         "FCGR3A"      "FCGR3B"     
 [49] "NCR1"        "NCR3"        "FCER1G"      "CD247"       "ZAP70"       "SYK"        
 [55] "LCP2"        "LAT"         "PLCG1"       "PLCG2"       "SH3BP2"      "PIK3CA"     
 [61] "PIK3CD"      "PIK3CB"      "PIK3R1"      "PIK3R2"      "PIK3R3"      "FYN"        
 [67] "SHC1"        "SHC2"        "SHC3"        "SHC4"        "GRB2"        "SOS1"       
 [73] "SOS2"        "HRAS"        "KRAS"        "NRAS"        "ARAF"        "BRAF"       
 [79] "RAF1"        "MICB"        "MICA"        "ULBP1"       "ULBP2"       "ULBP3"      
 [85] "RAET1G"      "RAET1L"      "RAET1E"      "KLRK1"       "KLRC4-KLRK1" "HCST"       
 [91] "CD48"        "CD244"       "PPP3CA"      "PPP3CB"      "PPP3CC"      "PPP3R1"     
 [97] "PPP3R2"      "NFATC1"      "NFATC2"      "PRKCA"       "PRKCB"       "PRKCG"      
[103] "SH2D1B"      "SH2D1A"      "IFNGR1"      "IFNGR2"      "IFNA1"       "IFNA2"      
[109] "IFNA4"       "IFNA5"       "IFNA6"       "IFNA7"       "IFNA8"       "IFNA10"     
[115] "IFNA13"      "IFNA14"      "IFNA16"      "IFNA17"      "IFNA21"      "IFNB1"      
[121] "IFNAR1"      "IFNAR2"      "TNFSF10"     "TNFRSF10A"   "TNFRSF10B"   "FASLG"      
[127] "FAS"         "GZMB"        "PRF1"        "CASP3"       "BID"  

友情閱讀推薦:

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。