前言:
微博參與話題 #給你四年時間你也學不會生信#
先前的富集分析教程[1]主要是以模式物種人為例子,展開的分析,今天在B站看了孟浩巍視頻教程[2],學習新的技能,豁然開朗,欣然記之。
本文主要針對非模式物種,但是有參考基因組可用
1. R包安裝及database下載
# non-model, but have the genome
> source("https://bioconductor.org/biocLite.R")
> biocLite("AnnotationHub")
> biocLite("biomaRt")
# load package
> library(AnnotationHub)
> library(biomaRt)
# make a orgDb
> hub <- AnnotationHub::AnnotationHub()
這里以桔小實蠅為例
# fruit fly = bactrocera dorsalis
> query(hub, "bactrocera")
搜索后結果如下:
> query(hub, "bactrocera")
AnnotationHub with 9 records
# snapshotDate(): 2018-04-30
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Bactrocera (Bactrocera)_dorsalis, Bactrocera (Bactrocera)_latifrons, Bactrocera (Dacul...
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer,
# rdatadateadded, preparerclass, tags, rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH62538"]]'
title
AH62538 | org.Bactrocera_(Bactrocera)_latifrons.eg.sqlite
AH62539 | org.Bactrocera_latifrons.eg.sqlite
AH62542 | org.Bactrocera_(Daculus)_oleae.eg.sqlite
AH62543 | org.Bactrocera_(Dacus)_oleae.eg.sqlite
AH62544 | org.Bactrocera_oleae.eg.sqlite
AH62568 | org.Bactrocera_(Zeugodacus)_cucurbitae.eg.sqlite
AH62569 | org.Bactrocera_cucurbitae.eg.sqlite
AH62581 | org.Bactrocera_(Bactrocera)_dorsalis.eg.sqlite
AH62582 | org.Bactrocera_dorsalis.eg.sqlite
我們選擇AH62582 | org.Bactrocera_dorsalis.eg.sqlite
并下載它
> Bactrocera.OrgDb <- hub[["AH62582"]]
如果報錯,可能是缺少依賴的安裝包,可以按照提示依次下載,兩種方法
- install.packages("packages")
- source("https://bioconductor.org/biocLite.R")
biocLite("pacakges")
2. 查看注釋信息
> columns(Bactrocera.OrgDb)
[1] "ACCNUM" "ALIAS" "CHR" "ENTREZID" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[8] "GID" "GO" "GOALL" "ONTOLOGY" "ONTOLOGYALL" "PMID" "REFSEQ"
[15] "SYMBOL"
> Bactrocera.OrgDb
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Bactrocera dorsalis
| SPECIES: Bactrocera dorsalis
| CENTRALID: GID
| Taxonomy ID: 27457
| Db type: OrgDb
| Supporting package: AnnotationDbi
Please see: help('select') for usage information
# 查看注釋信息的每一列
> head(keys(Bactrocera.OrgDb,keytype = "ALIAS"))
[1] "AAA62341.1" "AAA62342.1" "AAA62343.1" "AAA62344.1" "AAF22478.1" "AAL17758.1"
實際上,ALIAS內包含了“omitted 17518 entries”
3. GO富集分析
# 對BP(Biological process)進行富集分析
# 只需將OrgDb數據庫替換為我們下載好的非模式物種庫即可。
> enrich.go.BP = enrichGO(gene = DEG.gene_symbol,
OrgDb = Bactrocera.OrgDb,
keyType = 'ENTREZID',ont= "BP",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05,
readable = T)
> barplot(enrich.go.BP)
> dotplot(enrich.go.BP)
p_value: 富集顯著性,統計顯著性要去小于0.01;
q_value: 對p_value的修正,在多次統計檢驗時,需要有修正值;
q_value一定大于p_value
4. KEGG富集分析
# 只需將OrgDb數據庫替換為我們下載好的非模式物種庫即可。
> enrichKEGG(gene = DEG.gene_symbol,
OrgDb = Bactrocera.OrgDb,
keyType = 'ENTREZID',
ont = "DO",
pvalueCutoff = 0.01,
qvalueCutofF = 0.05,
readable = T)
5. GO出圖解讀
縱軸為GO中每一term,例如Legionellosis;
橫軸為GeneRatio,即輸入的基因,term在整體基因中所占的百分數;
圓圈大小表示count的數目;
p.adjust:p越小,圓越大,結果越可靠;
Rplot22.png