Seurat是單細(xì)胞分析經(jīng)常使用的分析包。seurat對(duì)象的處理是分析的一個(gè)難點(diǎn),這里我根據(jù)我自己的理解整理了下常用的seurat對(duì)象處理的一些操作,有不足或者錯(cuò)誤的地方希望大家指正~
首先是從10X數(shù)據(jù)或者其他數(shù)據(jù)生成一個(gè)seurat對(duì)象(這里直接拷貝的官網(wǎng)的教程https://satijalab.org/seurat/essential_commands.html)也可以是其他的代碼。
pbmc.counts <- Read10X(data.dir = "~/Downloads/pbmc3k/filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc.counts)
首先在Rstudio中運(yùn)行幫助?seurat
Each Seurat object has a number of slots which store information. Key slots to access are listed below.
Slots:
raw.data
The raw project data
data
The normalized expression matrix (log-scale)
scale.data
scaled (default is z-scoring each gene) expression matrix; used for dimmensional reduction and heatmap visualization
var.genes
Vector of genes exhibiting high variance across single cells
is.expr
Expression threshold to determine if a gene is expressed (0 by default)
ident
THe 'identity class' for each cell
meta.data
Contains meta-information about each cell, starting with number of genes detected (nGene) and the original identity class (orig.ident); more information is added usingAddMetaData
project.name
Name of the project (for record keeping)
dr
List of stored dimmensional reductions; named by technique
assay
List of additional assays for multimodal analysis; named by technique
hvg.info
The output of the mean/variability analysis for all genes
imputed
Matrix of imputed gene scores
cell.names
Names of all single cells (column names of the expression matrix)
cluster.tree
List where the first element is a phylo object containing the phylogenetic tree relating different identity classes
snn
Spare matrix object representation of the SNN graph
calc.params
Named list to store all calculation-related parameter choices
kmeans
Stores output of gene-based clustering from DoKMeans
spatial
Stores internal data and calculations for spatial mapping of single cells
misc
Miscellaneous spot to store any data alongisde the object (for example, gene lists)
version
Version of package used in object creation
但在實(shí)際的分析中沒(méi)有這么多變量。大家可以用@
或者$
來(lái)獲取有的變量。
上面我在后面分析用到的是
orig.ident
和group
還有seurat_clusters
變量,這里分別存儲(chǔ)的是樣本名,分組以及cluster信息。
1、基本信息獲取
先來(lái)直接輸出seurat對(duì)象看看:
> pbmc # 測(cè)試數(shù)據(jù),進(jìn)行了PCA和UMAP分析
An object of class Seurat
25540 features across 46636 samples within 2 assays
Active assay: integrated (2000 features, 2000 variable features)
1 other assay present: RNA
2 dimensional reductions calculated: pca, umap
一些可以查詢(xún)和提取的基本信息:
colnames(x = pbmc) # 各個(gè)細(xì)胞的編號(hào)
Cells(pbmc) # 和上面的一樣,各個(gè)細(xì)胞的編號(hào)
rownames(x = pbmc) # 基因名
ncol(x = pbmc) #列數(shù)
nrow(x = pbmc) #行數(shù)
dim(pbmc) # 行數(shù)和列數(shù)
# 獲取細(xì)胞類(lèi)型
Idents(object = pbmc)
levels(pbmc)
table(Idents(pbmc)) # 獲取每個(gè)細(xì)胞類(lèi)型的細(xì)胞數(shù)目表格
# 其他的一些細(xì)胞類(lèi)型的處理
# Stash cell identity classes
pbmc[["old.ident"]] <- Idents(object = pbmc)
pbmc <- StashIdent(object = pbmc, save.name = "old.ident")
# Set identity classes
Idents(object = pbmc) <- "CD4 T cells"
Idents(object = pbmc, cells = 1:10) <- "CD4 T cells"
# Set identity classes to an existing column in meta data
Idents(object = pbmc, cells = 1:10) <- "orig.ident"
Idents(object = pbmc) <- "orig.ident"
# Rename identity classes
pbmc <- RenameIdents(object = pbmc, `CD4 T cells` = "T Helper cells")
我們可以直接根據(jù)
levels(pbmc)
獲取所有的細(xì)胞類(lèi)型2、subset函數(shù)篩選
# 篩選某一種或多種細(xì)胞類(lèi)型
subset(x = pbmc, idents = "B cells")
subset(x = pbmc, idents = c("CD4 T cells", "CD8 T cells"), invert = TRUE)
# 還可以根據(jù)表達(dá)量的值來(lái)進(jìn)行篩選
# Subset on the expression level of a gene/feature
subset(x = pbmc, subset = MS4A1 > 3)
# Subset on a combination of criteria
subset(x = pbmc, subset = MS4A1 > 3 & PC1 > 5)
subset(x = pbmc, subset = MS4A1 > 3, idents = "B cells")
# Subset on a value in the object meta data
subset(x = pbmc, subset = orig.ident == "Replicate1")
# Downsample the number of cells per identity class
subset(x = pbmc, downsample = 100)
#篩選基因
subset(x = pbmc_small, features = VariableFeatures(object = pbmc_small))
# 也可以使用數(shù)組的形式提取
pbmc_small_sub = pbmc_small[,pbmc_small@meta.data$seurat_clusters %in% c(0,2)]
pbmc_small_sub = pbmc_small[, Idents(pbmc_small) %in% c( "T cell" , "B cell" )] # 需要此時(shí)的pbmc_small數(shù)據(jù)Idents(pbmc_small)為細(xì)胞類(lèi)型
3、數(shù)據(jù)獲取
# 讀取保存在@meta.data中的數(shù)據(jù)
# View metadata data frame, stored in object@meta.data
pbmc[[]]
# 提取某一類(lèi)型的數(shù)據(jù)
# Retrieve specific values from the metadata
pbmc$nCount_RNA
pbmc[[c("percent.mito", "nFeature_RNA")]]
# 增加分組信息
# Add metadata, see ?AddMetaData
random_group_labels <- sample(x = c("g1", "g2"), size = ncol(x = pbmc), replace = TRUE)
pbmc$groups <- random_group_labels
# 使用GetAssayData函數(shù)獲取'counts', 'data'和'scale.data'信息
# Retrieve or set data in an expression matrix ('counts', 'data', and 'scale.data')
GetAssayData(object = pbmc, slot = "counts")
pbmc <- SetAssayData(object = pbmc, slot = "scale.data", new.data = new.data)
# Get cell embeddings and feature loadings
Embeddings(object = pbmc, reduction = "pca")
Loadings(object = pbmc, reduction = "pca")
Loadings(object = pbmc, reduction = "pca", projected = TRUE)
# FetchData can pull anything from expression matrices, cell embeddings, or metadata
FetchData(object = pbmc, vars = c("PC_1", "percent.mito", "MS4A1"))
因?yàn)椴煌姹局械淖兞靠赡軙?huì)有變化,這里的FetchData的前綴可以從Key(pbmc)
獲取,比如
4、計(jì)算
# 獲取平均表達(dá)量
Idents(scRNA_data) <- "seurat_clusters" # 這一步可以指定要計(jì)算哪一個(gè)分組的平均表達(dá)量,可以選擇細(xì)胞類(lèi)型("CellType")cluster("seurat_clusters")或者是樣本類(lèi)型("orig.ident"),要注意這里的變量名稱(chēng)不一定正確,要根據(jù)數(shù)據(jù)中的具體變量來(lái)指定
AverageExp <- AverageExpression(scRNA_data)
expr <- AverageExp$RNA
# 增加分組前綴,這里增加的是"Cluster"
for(i in 1:ncol(expr)){colnames(expr)[i] = paste("Cluster", colnames(expr)[i],sep = "")}
5、數(shù)據(jù)替換/修改
有時(shí)候需要對(duì)seurat對(duì)象的數(shù)據(jù)進(jìn)行替換或修改
library(Seurat)
# 替換cell ID名稱(chēng),-1改成_1
new_obj <- RenameCells(obj, new.names=gsub("-1", "_1", colnames(obj)))
# 如果有多個(gè)樣本,篩選細(xì)胞
barcode_names <- obj$orig.ident
sampleA_barcode_name = attr(x[x=="sampleA"],"names")
一些參考資料:
1、https://satijalab.org/seurat/essential_commands.html
2、https://satijalab.org/seurat/v3.0/interaction_vignette.html
3、http://www.lxweimin.com/p/d43f16bdfed9