本文使用基因表達數據繪制箱式圖,并疊加小提琴圖和點圖 (geom_boxplot繪制箱式圖,geom_violin繪制小提琴圖,geom_dotplot和geom_jitter繪制點圖).
了解一下R語言中箱式圖的術語,以及它的含義:
導入數據:
>genefpkm <- read.csv(file = "clipboard",header = T,sep = "\t")
>head(genefpkm)
x_d <- genefpkm? ?#復制數據框,萬一后面操作失誤就不用重新導入數據。
x_d <- as.matrix(x_d)? ?#變成矩陣類型才能進行接下來的操作
x_d <- matrix(log10(as.numeric(x_d)),dimnames = list(row.names(x_d),colnames(x_d)),nrow = dim(x_d)[1])? ?#對矩陣中的每個數取log10,使數據差異減小。有些表達量為0,在這一步會返回Inf,在接下來畫圖時會直接排除掉。
group <- c(rep("LPE",4*dim(genefpkm)[1]),rep("LPF",4*dim(genefpkm)[1]))? ? #分組情況
data <- data.frame(expression=c(x_d),sample=rep(colnames(x_d),each=nrow(x_d)),group = group)? ? ?#添加分組
開始畫圖:
ggplot(data = data,aes(x=sample,y=expression,fill = group))+
stat_boxplot(geom = "errorbar",size = 1,width = 0.3,na.rm = T)+? ? ?#添加誤差線
geom_boxplot(linetype = 2,na.rm = T,outlier.alpha = 0.3,outlier.size = 3,notch = T) +? ? ?# notch參數會在箱式圖的中位線處生成缺口,可以比較缺口有無重疊,來判斷中位數是否有差異。linetype的值有很多,不同的值代表不同的線(在R語言工作區中輸入vignette("ggplot2-specs")有詳細解釋)
? xlab("Samples") + ylab("log10(FPKM)")+
? theme(axis.text = element_text(size = rel(1.2)),
? ? ? ? axis.line = element_line(size = rel(1.5)),
? ? ? ? axis.title = element_text(size = rel(1.5)),
? ? ? ? panel.background = element_blank())+
? scale_fill_manual(values = c("darkolivegreen1", "deeppink"))? ? #手動設置顏色
此時,中間是虛線,兩端是實線(其實都是虛線,只是誤差線是實線,覆蓋了兩端的虛線)
ggplot(data = data,aes(x=sample,y=expression,fill = group))+
? stat_boxplot(geom = "errorbar",size = 1,width = 0.3,na.rm = T,linetype = 2)+
? geom_boxplot(linetype = 2,na.rm = T,outlier.alpha = 0.3,outlier.size = 3,notch = T) +
? stat_boxplot(aes(ymin = ..lower..,ymax = ..upper..),size = 1,alpha = 1,notch = T,outlier.shape = NA,na.rm = T)+? ? ? ? #這行代碼只會畫出中間的箱子,上下的線不會畫出來,因為設置了ymin = ..lower.. , ymax = ..upper..,可以看看本文第一幅圖,理解ymin和ymax是什么意思。
? xlab("Samples") + ylab("log10(FPKM)")+
? theme(axis.text = element_text(size = rel(1.2)),
? ? ? ? axis.line = element_line(size = rel(1.5)),
? ? ? ? axis.title = element_text(size = rel(1.5)),
? ? ? ? panel.background = element_blank())+
? scale_fill_manual(values = c("darkolivegreen1", "deeppink"))
此時中間是實線,兩端是虛線(其實全都是虛線,只是中間又畫了實線的框框,覆蓋了虛線)
箱式圖疊加小提琴圖:
ggplot(data = data,aes(x=sample,y=expression,fill = group))+
? geom_violin(linetype = "dashed",na.rm = T)+
? stat_boxplot(geom = "errorbar",size = 1,width = 0.3,na.rm = T,linetype = 2)+
? geom_boxplot(linetype = 2,na.rm = T,outlier.alpha = 0.3,outlier.size = 3,notch = T,width = 0.3) +? #設置箱式圖的寬度,避免和小提琴圖重合。
? stat_boxplot(aes(ymin = ..lower..,ymax = ..upper..),size = 1,width = 0.3,notch = T,na.rm = T)+
? xlab("Samples") + ylab("log10(FPKM)")+
? theme(axis.text = element_text(size = rel(1.2)),
? ? ? ? axis.line = element_line(size = rel(1.5)),
? ? ? ? axis.title = element_text(size = rel(1.5)),
? ? ? ? panel.background = element_blank())+
? scale_fill_manual(values = c("darkolivegreen1", "deeppink"))
去掉誤差線和離群點:
ggplot(data = data,aes(x=sample,y=expression,fill = group))+
? geom_violin(na.rm = T)+
? geom_boxplot(linetype = 2,na.rm = T,notch = T,width = 0.3,outlier.shape = NA) +
? stat_boxplot(aes(ymin = ..lower..,ymax = ..upper..),size = 1,width = 0.3,notch = T,outlier.shape = NA,na.rm = T)+
? xlab("Samples") + ylab("log10(FPKM)")+
? theme(axis.text = element_text(size = rel(1.2)),
? ? ? ? axis.line = element_line(size = rel(1.5)),
? ? ? ? axis.title = element_text(size = rel(1.5)),
? ? ? ? panel.background = element_blank())+
? scale_fill_manual(values = c("darkolivegreen1", "deeppink"))
點圖也可以表示小提琴圖的含義:
ggplot(data = data,aes(x=sample,y=expression,fill = group))+
? geom_boxplot(linetype = 2,na.rm = T,notch = T,width = 0.3,outlier.shape = NA) +
? stat_boxplot(aes(ymin = ..lower..,ymax = ..upper..),size = 1,width = 0.3,notch = T,outlier.shape = NA,na.rm = T)+
? geom_dotplot(binaxis = "y",stackdir = "center",dotsize = 0.11,method = "histodot",stackratio = 0.01,na.rm = T)+? #由于點很多,可以縮小點的大小和比例,來展示所有點。
? xlab("Samples") + ylab("log10(FPKM)")+
? theme(axis.text = element_text(size = rel(1.2)),
? ? ? ? axis.line = element_line(size = rel(1.5)),
? ? ? ? axis.title = element_text(size = rel(1.5)),
? ? ? ? panel.background = element_blank())+
? scale_fill_manual(values = c("darkolivegreen1", "deeppink"))