Seurat::DoHeatmap 的謎之報錯
最近在學(xué)習(xí)Seurat教程(Seurat版本3.0.2),發(fā)現(xiàn)運行到倒數(shù)第二步做熱圖展示marker genes時總是報錯
主要問題:
DoHeatmap(pbmc, features = top10$gene) + NoLegend()
運行結(jié)果:
Warning message:
In DoHeatmap(pbmc, features = top10$gene) :
The following features were omitted as they were not found in the scale.data slot for the RNA assay: NEAT1, SLC25A39, AC090498.1, CTSS, DNAJB1, HSP90AA1, PRF1, NKG7, KLRD1, CD8B, DUSP4, CD8A, CCL5, CD27, GZMK, CD69, TNFAIP3, ZFP36L2, IL7R
圖片倒是生成了,但是數(shù)了一下基因數(shù),確實少了很多,尤其是Cluster 0 和1,都沒幾個剩下的了
嘗試自己調(diào),一晚上也就調(diào)到這個程度:
好丑,還是繼續(xù)研究下包里的代碼吧/(ㄒoㄒ)/~~
DoHeatMap
看了一下代碼,里面有這樣一句:
if (any(!features %in% possible.features)) {
bad.features <- features[!features %in% possible.features]
features <- features[features %in% possible.features]
if (length(x = features) == 0) {
stop("No requested features found in the ",
slot, " slot for the ", assay, " assay.")
}
warning("The following features were omitted as they were not found in the ",
slot, " slot for the ", assay, " assay: ",
paste(bad.features, collapse = ", "))
這個bad.features來自features和possible.feature的比對,之后判斷若features內(nèi)元素個數(shù)不為零則warning:"The following features were omitted as they were not found in the ······“
function (object, features = NULL, cells = NULL, group.by = "ident",
group.bar = TRUE, disp.min = -2.5, disp.max = NULL, slot = "scale.data",
assay = NULL, label = TRUE, size = 5.5, hjust = 0, angle = 45,
raster = TRUE, draw.lines = TRUE, lines.width = NULL, group.bar.height = 0.02,
combine = TRUE)
往回找,原來DoHeatMap()有個默認(rèn)參數(shù)slot = "scale.data",自動用scale.data的數(shù)據(jù)畫圖。
看一下這個Seurat對象的結(jié)構(gòu):
哈哈,發(fā)現(xiàn)問題了,原來counts (應(yīng)該是raw read count)和data 都是20647行,每行對應(yīng)一個基因,但scale.data只有2000行。
> pbmc <- ScaleData(
+ object = pbmc,
+ do.scale = TRUE,
+ do.center = FALSE,
+ vars.to.regress = c("percent.mt"))
重新ScaleData()一下,發(fā)現(xiàn)數(shù)據(jù)沒有任何改變····
算了,試試用data作圖:
DoHeatmap(pbmc, features = top10$gene,slot = "data")
還是很難看,間接證明Seurat包開發(fā)者的可視化功力非同一般。但是沒有報warning,而且可以看到消失的基因都出現(xiàn)了。
試試直接給scale.data賦值:
pbmc@assays$RNA@scale.data <- scale(pbmc@assays$RNA@data, scale = TRUE)
DoHeatmap(pbmc, features = top10$gene,size = 0.5,slot = "scale.data") + NoLegend()
這次也沒有報warning,配色還是有點難看,大概是scale參數(shù)設(shè)置的不同,想完全復(fù)現(xiàn)可能還得查ScaleData的代碼了。不過基本上能用了。增加了height,避免文字?jǐn)D在一起,其實最好是保存成ggplot的對象,拼圖的時候再改。
總結(jié):
產(chǎn)生這個問題的主要原因是ScaleData()指令生成scale.data(封裝在Seurat對象里)中基因數(shù)減少。
因為Seurat對象封裝了好幾層,并且對S4對象的操作也不太熟悉,所以一開始不太容易發(fā)現(xiàn)原因。
另外,操作多個Seurat對象時注意偶爾rm()+gc()清空內(nèi)存,我24G內(nèi)存都好幾次99%內(nèi)存占用(╯‵□′)╯︵┻━┻