前面在講GO和KEGG富集倍數(Fold Enrichment)如何計算時,給大家簡單介紹過GO富集分析結果如何看。
ONTOLOGY:區分是BP,MF還是CC
ID:具體的GO條目的ID號
Description:GO條目的描述
GeneRatio:這里是一個分數,分子是富集到這個GO條目上的gene的數目,
分母是所有輸入的做富集分析的gene的數目,可以是差異表達
分析得到的gene
BgRatio:Background Ratio. 這里也是一個分數,分母是人的所有編碼蛋白的
基因中有GO注釋的gene的數目,這里是19623個,分子是這19623個
gene中注釋到這個GO條目上面的gene的數目
pvalue:富集的p值
p.adjust:校正之后的p值
qvalue:q值
geneID:輸入的做富集分析的gene中富集到這個GO條目上面的具體的
gene名字
Count:輸入的做富集分析的gene中富集到這個GO條目上面的gene的數目
有時候我們得到的富集結果中geneID這一列顯示的是基因的名字(symbol),有時候顯示的是一串數字(Entrez gene ID)或者是ensembl gene ID。其實我們最希望看到的是顯示基因的名字(symbol),因為只有這樣你才能一眼就看出是什么基因富集到這個GO條目或者是KEGG通路上,其他的ID號,都不太直觀。那么我們如何能保證富集結果中就顯示gene symbol呢?
今天給大家介紹三種不同的方法,來達到同樣的效果假設我們這里差異表達分析得到了1000個差異表達的基因(DEG),基因的ID號是ensembl gene ID。
load("DEG.rds")
ls()
library(org.Hs.eg.db)
library(clusterProfiler)
ego <- enrichGO(gene = DEG,
OrgDb=org.Hs.eg.db,
ont = "all",
pAdjustMethod = "BH",
minGSSize = 10,
pvalueCutoff = 0.01,
qvalueCutoff = 0.01,
keyType='ENSEMBL')
富集得到的結果如下,geneID為ensembl gene ID
>ego
# over-representation test
#
#...@organism Homo sapiens
#...@ontology GOALL
#...@keytype ENSEMBL
#...@gene chr [1:1000] "ENSG00000259250" "ENSG00000255717" ...
#...pvalues adjusted by 'BH' with cutoff <0.01
#...81 enriched terms found
'data.frame': 81 obs. of 10 variables:
$ ONTOLOGY : Factor w/ 3 levels "BP","CC","MF": 1 1 1 1 1 1 1 1 1 1 ...
$ ID : chr "GO:0016054" "GO:0046395" "GO:0044282" "GO:0009063" ...
$ Description: chr "organic acid catabolic process" "carboxylic acid catabolic process" "small molecule catabolic process" "cellular amino acid catabolic process" ...
$ GeneRatio : chr "50/832" "50/832" "63/832" "30/832" ...
$ BgRatio : chr "300/20610" "300/20610" "476/20610" "140/20610" ...
$ pvalue : num 1.06e-17 1.06e-17 8.27e-17 4.25e-14 1.33e-11 ...
$ p.adjust : num 2.69e-14 2.69e-14 1.39e-13 5.38e-11 1.34e-08 ...
$ qvalue : num 2.51e-14 2.51e-14 1.30e-13 5.01e-11 1.25e-08 ...
$ geneID : chr "ENSG00000198650/ENSG00000248098/ENSG00000078070/ENSG00000169738/ENSG00000113492/ENSG00000113790/ENSG00000111271"| __truncated__ "ENSG00000198650/ENSG00000248098/ENSG00000078070/ENSG00000169738/ENSG00000113492/ENSG00000113790/ENSG00000111271"| __truncated__ "ENSG00000198650/ENSG00000248098/ENSG00000171903/ENSG00000173597/ENSG00000078070/ENSG00000169738/ENSG00000113492"| __truncated__ "ENSG00000198650/ENSG00000248098/ENSG00000078070/ENSG00000113492/ENSG00000008311/ENSG00000139631/ENSG00000140905"| __truncated__ ...
$ Count : int 50 50 63 30 50 35 24 22 27 43 ...
#...Citation
Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
clusterProfiler: an R package for comparing biological themes among
gene clusters. OMICS: A Journal of Integrative Biology
2012, 16(5):284-287
方法一、使用readable = TRUE參數
ego1 <- enrichGO(gene = DEG,
OrgDb=org.Hs.eg.db,
ont = "all",
pAdjustMethod = "BH",
minGSSize = 10,
pvalueCutoff = 0.01,
qvalueCutoff = 0.01,
keyType='ENSEMBL',
readable = TRUE)
這時候得到的結果你會發現已經轉換成了gene symbol
>ego1
# over-representation test
#
#...@organism Homo sapiens
#...@ontology GOALL
#...@keytype ENSEMBL
#...@gene chr [1:1000] "ENSG00000259250" "ENSG00000255717" ...
#...pvalues adjusted by 'BH' with cutoff <0.01
#...81 enriched terms found
'data.frame': 81 obs. of 10 variables:
$ ONTOLOGY : Factor w/ 3 levels "BP","CC","MF": 1 1 1 1 1 1 1 1 1 1 ...
$ ID : chr "GO:0016054" "GO:0046395" "GO:0044282" "GO:0009063" ...
$ Description: chr "organic acid catabolic process" "carboxylic acid catabolic process" "small molecule catabolic process" "cellular amino acid catabolic process" ...
$ GeneRatio : chr "50/832" "50/832" "63/832" "30/832" ...
$ BgRatio : chr "300/20610" "300/20610" "476/20610" "140/20610" ...
$ pvalue : num 1.06e-17 1.06e-17 8.27e-17 4.25e-14 1.33e-11 ...
$ p.adjust : num 2.69e-14 2.69e-14 1.39e-13 5.38e-11 1.34e-08 ...
$ qvalue : num 2.51e-14 2.51e-14 1.30e-13 5.01e-11 1.25e-08 ...
$ geneID : chr "TAT/BCKDHA/MCCC1/DCXR/AGXT2/EHHADH/ACAD10/ACADS/ACOX1/AASS/PCCB/CSAD/GCSH/AFMID/GPT/ABAT/MLYCD/ABHD10/PRODH2/DD"| __truncated__ "TAT/BCKDHA/MCCC1/DCXR/AGXT2/EHHADH/ACAD10/ACADS/ACOX1/AASS/PCCB/CSAD/GCSH/AFMID/GPT/ABAT/MLYCD/ABHD10/PRODH2/DD"| __truncated__ "TAT/BCKDHA/CYP4F11/SULT1B1/MCCC1/DCXR/AGXT2/PGM2L1/DPYD/DERA/EHHADH/ACAD10/ACADS/ACOX1/AASS/PCCB/CSAD/GCSH/AFMI"| __truncated__ "TAT/BCKDHA/MCCC1/AGXT2/AASS/CSAD/GCSH/AFMID/GPT/ABAT/PRODH2/DDAH1/ARG1/ACMSD/BCKDK/SHMT1/HGD/CDO1/DBT/GSTZ1/HSD"| __truncated__ ...
$ Count : int 50 50 63 30 50 35 24 22 27 43 ...
#...Citation
Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
clusterProfiler: an R package for comparing biological themes among
gene clusters. OMICS: A Journal of Integrative Biology
2012, 16(5):284-287
?
方法二、使用setReadable函數
library(DOSE)
#如果原始的ID號為entrez gene id那么這里keyType設置為ENTREZID
ego2<-setReadable(ego, OrgDb = org.Hs.eg.db, keyType="ENSEMBL")
轉換之后的結果為
>ego2
# over-representation test
#
#...@organism Homo sapiens
#...@ontology GOALL
#...@keytype ENSEMBL
#...@gene chr [1:1000] "ENSG00000259250" "ENSG00000255717" "ENSG00000163328" ...
#...pvalues adjusted by 'BH' with cutoff <0.01
#...81 enriched terms found
'data.frame': 81 obs. of 10 variables:
$ ONTOLOGY : Factor w/ 3 levels "BP","CC","MF": 1 1 1 1 1 1 1 1 1 1 ...
$ ID : chr "GO:0016054" "GO:0046395" "GO:0044282" "GO:0009063" ...
$ Description: chr "organic acid catabolic process" "carboxylic acid catabolic process" "small molecule catabolic process" "cellular amino acid catabolic process" ...
$ GeneRatio : chr "50/832" "50/832" "63/832" "30/832" ...
$ BgRatio : chr "300/20610" "300/20610" "476/20610" "140/20610" ...
$ pvalue : num 1.06e-17 1.06e-17 8.27e-17 4.25e-14 1.33e-11 ...
$ p.adjust : num 2.69e-14 2.69e-14 1.39e-13 5.38e-11 1.34e-08 ...
$ qvalue : num 2.51e-14 2.51e-14 1.30e-13 5.01e-11 1.25e-08 ...
$ geneID : chr "TAT/BCKDHA/MCCC1/DCXR/AGXT2/EHHADH/ACAD10/ACADS/ACOX1/AASS/PCCB/CSAD/GCSH/AFMID/GPT/ABAT/MLYCD/ABHD10/PRODH2/DD"| __truncated__ "TAT/BCKDHA/MCCC1/DCXR/AGXT2/EHHADH/ACAD10/ACADS/ACOX1/AASS/PCCB/CSAD/GCSH/AFMID/GPT/ABAT/MLYCD/ABHD10/PRODH2/DD"| __truncated__ "TAT/BCKDHA/CYP4F11/SULT1B1/MCCC1/DCXR/AGXT2/PGM2L1/DPYD/DERA/EHHADH/ACAD10/ACADS/ACOX1/AASS/PCCB/CSAD/GCSH/AFMI"| __truncated__ "TAT/BCKDHA/MCCC1/AGXT2/AASS/CSAD/GCSH/AFMID/GPT/ABAT/PRODH2/DDAH1/ARG1/ACMSD/BCKDK/SHMT1/HGD/CDO1/DBT/GSTZ1/HSD"| __truncated__ ...
$ Count : int 50 50 63 30 50 35 24 22 27 43 ...
#...Citation
Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
clusterProfiler: an R package for comparing biological themes among
gene clusters. OMICS: A Journal of Integrative Biology
2012, 16(5):284-287
方法三、自己動手豐衣足食
library(org.Hs.eg.db)
ego3=as.data.frame(ego)
ensembl=strsplit(ego3$geneID,"/")
?
symbol=sapply(ensembl,function(x){
y=bitr(x, fromType="ENSEMBL", toType="SYMBOL", OrgDb="org.Hs.eg.db")
#一對多,取第一個
y=y[!duplicated(y$ENSEMBL),-1]
y=paste(y,collapse = "/")
})
?
ego3$geneID=symbol
ego3
得到結果如下
參考下面文章獲取DEG.rds文件