說在前面
相信大家在平時做富集分析時都會有這樣的一個需求:如果能知道感興趣的某條通路中各基因的調控關系,那么就能準確識別出hub基因;或者說找到我們感興趣的基因在這條通路中的上下游調控關系,從而就可以進行后續的實驗驗證。很多情況下只有想象中是完美的,但是只要感想就會有實現的機會,對于上面說的這個想法就在今年被實現了。
想必國內的生信小伙伴都或多或少的聽聞過Y叔的大名,Y叔開發的一系列生信分析軟件可謂撐起了國內生信圈的半邊天。而今天Immugent介紹的這個軟件也是最近由Y叔和京都大學的Yasushi Okuno一同開發的CBNplot,相應的文章發表在Bioinformatics雜志上,篇名為 CBNplot: Bayesian network plots for enrichment analysis。
關于CBNplot的介紹,生信寶庫會以三篇推文并且以代碼實操的形式分別介紹其主要功能,下面開始介紹第一部分的用法。
代碼展示
首先我們先從GEO上下載一個示例數據,算出差異基因后再做富集分析。
library(DESeq2)
## Load dataset and make metadata
counts = read.table("GSE133624_reads-count-all-sample.txt", header=1, row.names=1)
meta = sapply(colnames(counts), function (x) substring(x,1,1))
meta = data.frame(meta)
colnames(meta) = c("Condition")
dds <- DESeqDataSetFromMatrix(countData = counts,
colData = meta,
design= ~ Condition)
## Prefiltering
filt <- rowSums(counts(dds) < 10) > dim(meta)[1]*0.9
dds <- dds[!filt,]
## Perform DESeq2()
dds = DESeq(dds)
res = results(dds, pAdjustMethod = "bonferroni")
## apply variance stabilizing transformation
v = vst(dds, blind=FALSE)
vsted = assay(v)
## Plot PCA of VST values
DESeq2::plotPCA(v, intgroup="Condition")+
theme_bw()
## Define the input genes, and use clusterProfiler::bitr to convert the ID.
sig = subset(res, padj<0.05)
cand.entrez = clusterProfiler::bitr(rownames(sig), fromType="ENSEMBL", toType="ENTREZID", OrgDb=org.Hs.eg.db)$ENTREZID
## Perform enrichment analysis (ORA)
pway = ReactomePA::enrichPathway(gene = cand.entrez)
pwayGO = clusterProfiler::enrichGO(cand.entrez, ont = "BP", OrgDb = org.Hs.eg.db)
## Convert to SYMBOL
pway = setReadable(pway, OrgDb=org.Hs.eg.db)
pwayGO = setReadable(pwayGO, OrgDb=org.Hs.eg.db)
## Store the similarity
pway = enrichplot::pairwise_termsim(pway)
## Define including samples
incSample = rownames(subset(meta, Condition=="T"))
allEntrez = clusterProfiler::bitr(rownames(res), fromType="ENSEMBL", toType="ENTREZID", OrgDb=org.Hs.eg.db)
res$ENSEMBL <- rownames(res)
lfc <- merge(data.frame(res), allEntrez, by="ENSEMBL")
lfc <- lfc[order(lfc$log2FoldChange, decreasing=TRUE),]
geneList <- lfc$log2FoldChange
names(geneList) <- lfc$ENTREZID
pwayGSE <- ReactomePA::gsePathway(geneList)
sigpway <- subset(pway@result, p.adjust<0.05)
paste(mean(sigpway$Count), sd(sigpway$Count))
基于富集分析的結果我們就可以使用CBNplot對我們感興趣的通路進行展示了。
barplot(pway, showCategory = 15)
#使用bngeneplot函數繪圖
bngeneplot(results = pway, exp = vsted, pathNum = 17)
#Change the label for the better readability.
bngeneplot(results = pway, exp = vsted, pathNum = 17, labelSize=7, shadowText=TRUE)
# Show the confidence of direction
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 13, R = 50, showDir = T,
convertSymbol = T,
expRow = "ENSEMBL",
strThresh = 0.7)
可以通過參數compareRef=TRUE并指定pathDb,可以將基因之間的關系與參考網絡進行比較。默認情況下,兩個有向網絡的交集以重疊邊的數量表示。
library(parallel)
cl = makeCluster(4)
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 13, R = 30, compareRef = T,
convertSymbol = T, pathDb = "reactome",
expRow = "ENSEMBL", cl = cl)
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 15, R = 10, compareRef = T,
convertSymbol = T, pathDb = "reactome", compareRefType = "difference",
expRow = "ENSEMBL")
還可以添加一個barplot來描述邊緣的強度和方向(概率),指定strength plot =TRUE和nStrength。
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 15, R = 10, compareRef = T,
convertSymbol = T, pathDb = "reactome", compareRefType = "intersection",
expRow = "ENSEMBL", sizeDep = T, dep = dep, strengthPlot = T, nStrength = 10)
cl = makeCluster(8)
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = c(15, 16), R = 10,
convertSymbol = T,
expRow = "ENSEMBL")
展望
在本期推文中,小編從GEO數據庫上下載了示例數據病,并后續進行了差異分析和富集分析,隨后演示了如何利用CBNplot來展示感興趣通路中的基因之間的調控關系。但是這種調控關系只是CBNplot基于基因在各樣本之間的表達水平進行的預測,并不能代表實際存在的調控關系。在實際應用中,還需要根據CHIPseq,ATAC等實驗數據進一步證實某兩個基因之間有之間的相互作用。無論如何,預測的結果可能不是很完美但總歸比沒有好;基于此,我們還可以根據相關的生物學知識和文獻檢索先建立幾個假說,最后再使用實驗進行驗證,
好啦,本期推文到這就結束啦,在下期的推文中,Immugent將會介紹如何使用CBNplot在通路水平進行展示。