本教程介紹了Seurat包與sctransform 包一起分析時候的一般方法。首先你也許會問:sctransform包是用來干嘛的?
這是一個值得關心和探討的問題,我也將做一個簡單的回答。最好的辦法當然是到包所在的官網去讀人家的自我介紹了sctransform: Variance Stabilizing Transformations for Single Cell UMI Data
A normalization method for single-cell UMI count data using a variance stabilizing transformation. The transformation is based on a negative binomial regression model with regularized parameters. As part of the same regression framework, this package also provides functions for batch correction, and data correction. See Hafemeister and Satija 2019 <doi:10.1101/576827> for more details.
有沒有感覺很強大,在Seurat里面被封為SCTransform()函數(shù)可以:
- 代替 NormalizeData, ScaleData, and FindVariableFeatures.三個函數(shù)
- 轉換完了在SCT assay 里
- 在均一化的同時可以移除線粒體細胞等的影響
library(Seurat)
packageVersion("Seurat")
# https://satijalab.org/seurat/mca.html
library(dplyr)
library(ggsci)
library(ggplot2)
library(sctransform)
# Load the PBMC dataset
list.files("D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
?Read10X
pbmc.data <- Read10X(data.dir = "D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
# Initialize the Seurat object with the raw (non-normalized data).
?CreateSeuratObject
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc
一步數(shù)據均一化,也是比較慢的。
# store mitochondrial percentage in object meta data
pbmc <- PercentageFeatureSet(pbmc, pattern = "^MT-", col.name = "percent.mt")
# run sctransform
pbmc <- SCTransform(pbmc, vars.to.regress = "percent.mt", verbose = FALSE)
pbmc
An object of class Seurat
26286 features across 2700 samples within 2 assays
Active assay: SCT (12572 features)
1 other assay present: RNA
然后是標準操作。
# These are now standard steps in the Seurat workflow for visualization and clustering
pbmc <- RunPCA(pbmc, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindClusters(pbmc, verbose = FALSE)
DimPlot(pbmc, label = TRUE) + NoLegend()
為什么在轉化之后PC軸要比之前我們用NormalizeData 的時候要多呢?
- 更有效地消除技術影響
- 更高的PC維度可能代表一些微妙的生物學差異
- In addition, sctransform returns 3,000 variable features by default, instead of 2,000. The rationale is similar, the additional variable features are less likely to be driven by technical differences across cells, and instead may represent more subtle biological fluctuations.
我們還可以像下面這樣去運行:
pbmc <- CreateSeuratObject(pbmc_data) %>% PercentageFeatureSet(pattern = "^MT-", col.name = "percent.mt") %>%
SCTransform(vars.to.regress = "percent.mt") %>% RunPCA() %>% FindNeighbors(dims = 1:30) %>%
RunUMAP(dims = 1:30) %>% FindClusters()
Where are normalized values stored for sctransform?
- pbmc[["SCT"]]@scale.data
- pbmc[["SCT"]]@data 用于可視化的數(shù)據
- pbmc[["SCT"]]@counts 校正后的count
- You can use the corrected log-normalized counts for differential expression and integration. However, in principle, it would be most optimal to perform these calculations directly on the residuals (stored in the scale.data slot) themselves. This is not currently supported in Seurat v3, but will be soon.
第四條很容易讓人困惑,先不說技術上的,僅字面意思:這里給你的并不是最好的,最好的目前還沒有。這句話不能和最好的方法出來的時候再說嗎?以后的方法比現(xiàn)在的好,這很容易理解啊,但是你要告訴我最好的總在未來,我就糾結了。而且,FAQ4中建議使用RNA assay做差異分析,而在這里有建議使用SCT assay的scale.data(而且This is not currently supported in Seurat v3, but will be soon.)。
官方的解釋是這樣的:
Hi,
Thanks for the question, and I apologize for the confusion. We're working on allowing for DE to be performed on pearson residuals from SCTransform in an optimal way. Until then, its easiest for us to advise users just to use the RNA assay. But if you're really excited to give it a try, it is not invalid to do so. Still, in the interest of simplicity, we'll keep the FAQ as-is.
best,
Rahul
https://github.com/satijalab/seurat/issues/1421
# These are now standard steps in the Seurat workflow for visualization and clustering Visualize
# canonical marker genes as violin plots.
VlnPlot(pbmc, features = c("CD8A", "GZMK", "CCL5", "S100A4", "ANXA1", "CCR7", "ISG15", "CD3D"),
pt.size = 0.2, ncol = 4)
# Visualize canonical marker genes on the sctransform embedding.
FeaturePlot(pbmc, features = c("CD8A", "GZMK", "CCL5", "S100A4", "ANXA1", "CCR7"), pt.size = 0.2,
ncol = 3)
FeaturePlot(pbmc, features = c("CD3D", "ISG15", "TCL1A", "FCER2", "XCL1", "FCGR3A"), pt.size = 0.2,
ncol = 3)