R實戰 | 用R也可以完成的RNA-Seq分析-3

Differential Expression of RNA-seq data

本文將要介紹的是在R中進行RNA-seq 數據基因表達差異分析的實戰代碼.

原文地址:https://bioinformatics-core-shared-training.github.io/RNAseq-R/rna-seq-de.nb.html

書接上文(用R也可以完成的RNA-Seq下游分析-2

在預處理生成了表達矩陣且標準化后,接下來我們要做的就是差異分析了。


本次流程需要的包

同時載入上次預處理完的數據preprocessing.Rdata

library(edgeR)
library(limma)
library(Glimma)
library(gplots)
library(org.Mm.eg.db)
load("preprocessing.Rdata")

edgeR包的分析流程中,構建了表達矩陣后,便需要使用model.matrix函數創建保存有分組信息的矩陣design matrix。該矩陣以\color{red}{0/1}的方式存儲了分組信息。本次分析中的分組信息包括了:小鼠狀態與細胞類型,現在我們先在兩種因素并不存在相互作用的假設下擬合線性模型。

> group <- as.character(group)
> group
 [1] "basal.virgin"     "basal.virgin"     "basal.pregnant"   "basal.pregnant"   "basal.lactate"   
 [6] "basal.lactate"    "luminal.virgin"   "luminal.virgin"   "luminal.pregnant" "luminal.pregnant"
[11] "luminal.lactate"  "luminal.lactate" 

group的分組信息包括了狀態和細胞類型,因此我們使用strsplit函數,以‘.’為分隔把細胞類型信息和狀態信息分別提取出來。

> type <- sapply(strsplit(group, ".", fixed=T), function(x) x[1])
> status <- sapply(strsplit(group, ".", fixed=T), function(x) x[2])
> type
 [1] "basal"   "basal"   "basal"   "basal"   "basal"   "basal"   "luminal" "luminal" "luminal"
[10] "luminal" "luminal" "luminal"
> status
 [1] "virgin"   "virgin"   "pregnant" "pregnant" "lactate"  "lactate"  "virgin"   "virgin"  
 [9] "pregnant" "pregnant" "lactate"  "lactate" 

成功提取后,構建分組信息矩陣

> design <- model.matrix(~ type + status)
> design
   (Intercept) typeluminal statuspregnant statusvirgin
1            1           0              0            1
2            1           0              0            1
3            1           0              1            0
4            1           0              1            0
5            1           0              0            0
6            1           0              0            0
7            1           1              0            1
8            1           1              0            1
9            1           1              1            0
10           1           1              1            0
11           1           1              0            0
12           1           1              0            0
attr(,"assign")
[1] 0 1 2 2
attr(,"contrasts")
attr(,"contrasts")$type
[1] "contr.treatment"

attr(,"contrasts")$status
[1] "contr.treatment"

Estimating the dispersion

> dgeObj <- estimateCommonDisp(dgeObj)
#Then we estimate gene-wise dispersion estimates, allowing a possible trend with averge count size:
> dgeObj <- estimateGLMTrendedDisp(dgeObj)
> dgeObj <- estimateTagwiseDisp(dgeObj)
 #Plot the estimated dispersions
> plotBCV(dgeObj)

Testing for differential expression

> fit <- glmFit(dgeObj, design)
# 看一眼系數
> head(coef(fit))
       (Intercept) typeluminal statuspregnant statusvirgin
497097  -11.187922 -7.58804851     -0.7085514  -0.09305118
20671   -12.715063 -1.85287334      0.2269001   0.49554506
27395   -11.221391  0.56368066     -0.1415910  -0.29221577
18777   -10.146793  0.08280255     -0.1845489  -0.48795441
21399    -9.909825 -0.24195503      0.1753606   0.13494615
58175   -16.310131  3.09936215      1.1975518   0.84742701
# Conduct likelihood ratio tests for luminal vs basal and show the top genes:
> lrt.BvsL <- glmLRT(fit, coef=2)

# 查看前10個最顯著的差異表達基因
> topTags(lrt.BvsL)
Coefficient:  typeluminal 
           logFC    logCPM       LR       PValue          FDR
110308 -8.940579 10.264297 24.89789 6.044844e-07 0.0004377961
50916  -8.636503  5.749781 24.80037 6.358512e-07 0.0004377961
12293  -8.362247  6.794788 24.68526 6.749827e-07 0.0004377961
56069  -8.419433  6.124377 24.41532 7.764861e-07 0.0004377961
24117  -9.290691  6.757163 24.32506 8.137331e-07 0.0004377961
12818  -8.216790  8.172247 24.24233 8.494462e-07 0.0004377961
22061  -8.034712  7.255370 24.16987 8.820135e-07 0.0004377961
12797  -9.001419  9.910795 24.12854 9.011487e-07 0.0004377961
50706  -7.697022 10.809629 24.06926 9.293193e-07 0.0004377961
237979 -8.167451  5.215921 24.03528 9.458678e-07 0.0004377961

最后,保存差異分析的結果(實際分析時可不保存,可以直接代碼分析)

save(lrt.BvsL,dgeObj,group,file="DE.Rdata")
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容