一、通過limma包對輸入數據進行處理
1、歸一化處理
在利用limma包進行差異分析處理之前,要對數據進行歸一化處理:
輸入文件1
在使用limma包之前首先要對數據進行標準化處理
rm(list = ls())
setwd('/lab412C/LSM/RNA-SEQ/Volcant')
data1 <- read.csv(file = "data2.csv",header = T,sep = ",")
rownames(data1)<-data1[,1]
data2<-data1[,-1]
qx <- as.numeric(quantile(data2, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx[5] > 100) ||
(qx[6]-qx[1] > 50 && qx[2] > 0) ||
(qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2)
LogC
data <- log2(data2[,]+1)
data <- data[which(rowSums(data) > 0),]
如果在這一步你沒有對數據進行歸一化等處理,那么極大值就會掩蓋極小值且火山圖無法顯示出噴射狀態, 就是中間的分叉分不開,非常難看!
示例——經過歸一化處理之后的數據:
歸一化后的數據
2、limma包的差異分析處理
group_list <- c('c','c','c','E','E','E')
library(limma)
design=model.matrix(~factor(group_list))
fit=lmFit(data,design)
fit=eBayes(fit)
deg=topTable(fit,coef=2,number = Inf)
write.csv(x=deg,file='/lab412C/LSM/RNA-SEQ/Volcant/deg1.csv')
數據展示
注意:
這里的x名稱應該為gene,所以可以先導出來csv文件編輯一下 在輸入,也可以自己用R再設置一下就可以了
;因為后續我們需要對deg$gene這一列進行處理,不建議用X這種名稱直接進行處理哦!
deg1 <- read.csv(file = "deg1.csv",header = T,sep = ",")
deg <- deg1
logFC_t=1 #不同的閾值,篩選到的差異基因數量就不一樣,后面的超幾何分布檢驗結果就大相徑庭。
change=ifelse(deg$P.Value>0.05,'stable',
ifelse( deg$logFC >logFC_t,'up',
ifelse( deg$logFC < -logFC_t,'down','stable') )
)
接下來的數據應該是這樣的:
二、作圖
這一部分主要包括兩部分,首先是對adj.P.Val取對數,另外需要根據logFC的標準定義gene的上下調
logFC_t=1 #不同的閾值,篩選到的差異基因數量就不一樣,后面的超幾何分布檢驗結果就大相徑庭。
change=ifelse(deg$P.Value>0.05,'stable',
ifelse( deg$logFC >logFC_t,'up',
ifelse( deg$logFC < -logFC_t,'down','stable') )
)
deg$logP <- -log10(deg$adj.P.Val)
library(ggpubr)
library(ggthemes)
ggscatter(deg,x='logFC',y='logP')+theme_base()
deg <- mutate(deg,change)
table(deg$change)
ggscatter(deg, x = "logFC", y = "logP",color = "change",palette = c("#9999FF", "gray" , "#FF9999"),size=1 )+ theme_base()
##加分界線:
ggscatter(deg, x = "logFC", y = "logP",color = "change",palette = c("#9999FF", "gray" , "#FF9999"),size=1 )+ theme_base()+
geom_hline(yintercept = 0.43 , linetype ="dashed")+
geom_vline(xintercept = c(-1,1), linetype= "dashed")
##加gene_name
deg$label= ""
deg <- deg[order(deg$adj.P.Val), ]
up.gene <- head(deg$gene[which(deg$change=="up")],10)
down.gene <- head(deg$gene[which(deg$change=="down")],10)
deg.top10.genes <- c(as.character(up.gene),as.character(down.gene))
deg$label[match(deg.top10.genes,deg$gene)] <- deg.top10.genes
ggscatter(deg, x = "logFC", y = "logP",color = "change",palette = c("#9999FF", "gray" , "#FF9999"),
size=1,
lable = deg$label,
font.label = 8,
repel = T ,
xlab = "log2FoldChange",
ylab = "-log10(Adjust P-value)",)+ theme_base()+
geom_hline(yintercept = 0.43 , linetype ="dashed")+
geom_vline(xintercept = c(-1,1), linetype= "dashed")
結果:
記得點贊分享哦!
以上參考一篇分有意思的推文:
https://cloud.tencent.com/developer/article/1512442