單細胞轉錄組基礎分析六:偽時間分析

本文是參考學習單細胞轉錄組基礎分析六:偽時間分析的學習筆記。可能根據學習情況有所改動。
Monocle進行偽時間分析的核心技術是一種機器學習算法——反向圖形嵌入 (Reversed Graph Embedding)。它分析的前提需要一張展現細胞轉錄特征相似性關系的圖,Monocle2使用DDTree降維圖,Monocle3使用UMAP降維圖。Monocle的機器學習算法可以依據上述降維圖形,學習描述細胞如何從一種狀態過渡到另一種狀態的軌跡。Monocle假設軌跡是樹狀結構,一端是“根”,另一端是“葉”。一個細胞在生物過程的開始,從根開始沿著主干進行,直到它到達第一個分支。然后,該細胞必須選擇一條路徑,并沿著樹移動越來越遠,直到它到達一片葉子。一個細胞的假時間值是它返回根所需的距離。降維方面monocle與seurat的過程大同小異,首先進行數據標準化,其次選擇部分基因代表細胞轉錄特征 ,最后選用適當的算法降維。對Monocle原理感興趣的同學可以登錄官網查看:

http://cole-trapnell-lab.github.io/monocle-release/

數據導入與處理

軌跡分析的前提是待分析的細胞有緊密的發育關系,PBMC細胞不是很好的的示例數據,我們選擇T細胞群體演示一下。Monocle建議導入原始表達矩陣,由它完成數據標準化和其他預處理。

dir.create("pseudotime")

expressionFamily參數用于指定表達矩陣的數據類型,有幾個選項可以選擇:

  • 稀疏矩陣用negbinomial.size(),

  • FPKM值用tobit(),

  • logFPKM值用gaussianff()

mycds是Monocle為我們的數據生成的對象,相當于我們在seurat使用的scRNA對象。數據導入后需要進行標準化和其他預處理:

mycds <- estimateSizeFactors(mycds)

與seurat把標準化后的表達矩陣保存在對象中不同,monocle只保存一些中間結果在對象中,需要用時再用這些中間結果轉化。經過上面三個函數的計算,mycds對象中多了SizeFactors、Dipersions、num_cells_expressed和num_genes_expressed等信息。

選擇代表性基因

完成數據導入和預處理后,就可以考慮選擇哪些基因代表細胞的發育特征,Monocle官網教程提供了4個選擇方法:

  • 選擇發育差異表達基因

  • 選擇clusters差異表達基因

  • 選擇離散程度高的基因

  • 自定義發育marker基因

前三種都是無監督分析方法,細胞發育軌跡生成完全不受人工干預;最后一種是半監督分析方法,可以使用先驗知識輔助分析。第一種方法要求實驗設計有不同的時間點,對起點和終點的樣本做基因表達差異分析,挑選顯著差異的基因進行后續分析。對于沒有時序設計的實驗樣本,可以使用第2、3種方法挑選基因。第2種方法要先對細胞降維聚類,然后用clusters之間差異表達的基因開展后續分析。Monocle有一套自己的降維聚類方法,與seurat的方法大同小異,很多教程直接使用seurat的差異分析結果。第3種方法使用離散程度高的基因開展分析,seurat有挑選高變基因的方法,monocle也有自己選擇的算法。本案例數據不具備使用第1、4種方法的條件,因此這里只演示2、3種方法的使用。

##使用clusters差異表達基因
圖片

選擇不同的基因集,擬時分析的結果不同,實踐中可以幾種方法都試一下。

降維及****細胞排序

使用disp.genes開展后續分析

#降維
圖片

使用diff.genes分析的結果

圖片

軌跡圖分面顯示

p1 <- plot_cell_trajectory(mycds, color_by = "State") + facet_wrap(~State, nrow = 1)
圖片

Monocle基因可視化

s.genes <- c("ITGB1","CCR7","KLRB1","GNLY")
圖片

擬時相關基因聚類熱圖

Monocle中differentialGeneTest()函數可以按條件進行差異分析,將相關參數設為fullModelFormulaStr = "~sm.ns(Pseudotime)"時,可以找到與擬時先關的差異基因。我們可以按一定的條件篩選基因后進行差異分析,全部基因都輸入會耗費比較長的時間。建議使用cluster差異基因或高變基因輸入函數計算。分析結果主要依據qval區分差異的顯著性,篩選之后可以用plot_pseudotime_heatmap函數繪制成熱圖。

#cluster差異基因
圖片

BEAM分析

單細胞軌跡中通常包括分支,它們的出現是因為細胞的表達模式不同。當細胞做出命運選擇時,或者遺傳、化學或環境擾動時,就會表現出不同的基因表達模式。BEAM(Branched expression analysis modeling)是一種統計方法,用于尋找以依賴于分支的方式調控的基因。

disp_table <- dispersionTable(mycds)
圖片
> dir.create("pseudotime")
> scRNAsub <- readRDS("scRNAsub.rds")  #scRNAsub是上一節保存的T細胞子集seurat對象
Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file 'scRNAsub.rds', probable reason 'No such file or directory'
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
 [1] splines   parallel  stats4    stats     graphics  grDevices utils     datasets 
 [9] methods   base     

other attached packages:
 [1] monocle_2.18.0              DDRTree_0.1.5               irlba_2.3.3                
 [4] VGAM_1.1-5                  Matrix_1.2-18               patchwork_1.1.1            
 [7] celldex_1.0.0               SingleR_1.4.1               SummarizedExperiment_1.20.0
[10] Biobase_2.50.0              GenomicRanges_1.42.0        GenomeInfoDb_1.26.2        
[13] IRanges_2.24.1              S4Vectors_0.28.1            BiocGenerics_0.36.0        
[16] MatrixGenerics_1.2.0        matrixStats_0.57.0          forcats_0.5.0              
[19] stringr_1.4.0               dplyr_1.0.2                 purrr_0.3.4                
[22] readr_1.4.0                 tidyr_1.1.2                 tibble_3.0.4               
[25] ggplot2_3.3.3               tidyverse_1.3.0             Seurat_3.2.3               

loaded via a namespace (and not attached):
  [1] reticulate_1.18               tidyselect_1.1.0             
  [3] RSQLite_2.2.2                 AnnotationDbi_1.52.0         
  [5] htmlwidgets_1.5.3             docopt_0.7.1                 
  [7] grid_4.0.2                    combinat_0.0-8               
  [9] BiocParallel_1.24.1           Rtsne_0.15                   
 [11] munsell_0.5.0                 codetools_0.2-18             
 [13] ica_1.0-2                     future_1.21.0                
 [15] miniUI_0.1.1.1                withr_2.3.0                  
 [17] fastICA_1.2-2                 colorspace_2.0-0             
 [19] rstudioapi_0.13               ROCR_1.0-11                  
 [21] tensor_1.5                    listenv_0.8.0                
 [23] labeling_0.4.2                slam_0.1-48                  
 [25] GenomeInfoDbData_1.2.4        polyclip_1.10-0              
 [27] bit64_4.0.5                   farver_2.0.3                 
 [29] pheatmap_1.0.12               parallelly_1.23.0            
 [31] vctrs_0.3.6                   generics_0.1.0               
 [33] BiocFileCache_1.14.0          R6_2.5.0                     
 [35] rsvd_1.0.3                    bitops_1.0-6                 
 [37] spatstat.utils_1.20-2         DelayedArray_0.16.0          
 [39] assertthat_0.2.1              promises_1.1.1               
 [41] scales_1.1.1                  gtable_0.3.0                 
 [43] beachmat_2.6.4                globals_0.14.0               
 [45] goftest_1.2-2                 rlang_0.4.9                  
 [47] lazyeval_0.2.2                broom_0.7.3                  
 [49] BiocManager_1.30.10           yaml_2.2.1                   
 [51] reshape2_1.4.4                abind_1.4-5                  
 [53] modelr_0.1.8                  backports_1.2.0              
 [55] httpuv_1.5.4                  tools_4.0.2                  
 [57] ellipsis_0.3.1                RColorBrewer_1.1-2           
 [59] sessioninfo_1.1.1             ggridges_0.5.3               
 [61] Rcpp_1.0.5                    plyr_1.8.6                   
 [63] sparseMatrixStats_1.2.1       zlibbioc_1.36.0              
 [65] RCurl_1.98-1.2                densityClust_0.3             
 [67] rpart_4.1-15                  deldir_0.2-3                 
 [69] viridis_0.5.1                 pbapply_1.4-3                
 [71] cowplot_1.1.1                 zoo_1.8-8                    
 [73] haven_2.3.1                   ggrepel_0.9.0                
 [75] cluster_2.1.0                 fs_1.5.0                     
 [77] magrittr_2.0.1                RSpectra_0.16-0              
 [79] data.table_1.13.6             scattermore_0.7              
 [81] lmtest_0.9-38                 reprex_0.3.0                 
 [83] RANN_2.6.1                    fitdistrplus_1.1-3           
 [85] hms_0.5.3                     mime_0.9                     
 [87] xtable_1.8-4                  sparsesvd_0.2                
 [89] readxl_1.3.1                  gridExtra_2.3                
 [91] HSMMSingleCell_1.10.0         compiler_4.0.2               
 [93] KernSmooth_2.23-18            crayon_1.3.4                 
 [95] htmltools_0.5.1.1             mgcv_1.8-33                  
 [97] later_1.1.0.1                 lubridate_1.7.9.2            
 [99] DBI_1.1.0                     ExperimentHub_1.16.0         
[101] dbplyr_2.0.0                  MASS_7.3-53                  
[103] rappdirs_0.3.1                cli_2.2.0                    
[105] igraph_1.2.6                  pkgconfig_2.0.3              
[107] plotly_4.9.3                  xml2_1.3.2                   
[109] XVector_0.30.0                rvest_0.3.6                  
[111] digest_0.6.27                 sctransform_0.3.2            
[113] RcppAnnoy_0.0.18              spatstat.data_1.7-0          
[115] cellranger_1.1.0              leiden_0.3.6                 
[117] uwot_0.1.10                   DelayedMatrixStats_1.12.3    
[119] curl_4.3                      shiny_1.5.0                  
[121] lifecycle_0.2.0               nlme_3.1-151                 
[123] jsonlite_1.7.2                BiocNeighbors_1.8.2          
[125] viridisLite_0.3.0             limma_3.46.0                 
[127] fansi_0.4.1                   pillar_1.4.7                 
[129] lattice_0.20-41               fastmap_1.0.1                
[131] httr_1.4.2                    survival_3.2-7               
[133] interactiveDisplayBase_1.28.0 glue_1.4.2                   
[135] qlcMatrix_0.9.7               FNN_1.1.3                    
[137] spatstat_1.64-1               png_0.1-7                    
[139] BiocVersion_3.12.0            bit_4.0.4                    
[141] stringi_1.5.3                 blob_1.2.1                   
[143] BiocSingular_1.6.0            AnnotationHub_2.22.0         
[145] memoise_1.1.0                 future.apply_1.7.0           
> #圖片
> ##保存數據
> saveRDS(scRNAsub, file="scRNAsub.rds")
> scRNAsub <- readRDS("scRNAsub.rds")  #scRNAsub是上一節保存的T細胞子集seurat對象
> data <- as(as.matrix(scRNAsub@assays$RNA@counts), 'sparseMatrix')
> fd <- new('AnnotatedDataFrame', data = fData)
Error in value[[3L]](cond) : 
  AnnotatedDataFrame 'data' is class 'standardGeneric' but should be or extend 'data.frame'
  AnnotatedDataFrame 'initialize' could not update varMetadata:
  perhaps pData and varMetadata are inconsistent?
> data <- as(as.matrix(scRNAsub@assays$RNA@counts), 'sparseMatrix')
> pd <- new('AnnotatedDataFrame', data = scRNAsub@meta.data)
> scRNAsub <- readRDS("scRNAsub.rds")  #scRNAsub是上一節保存的T細胞子集seurat對象
> data <- as(as.matrix(scRNAsub@assays$RNA@counts), 'sparseMatrix')
> pd <- new('AnnotatedDataFrame', data = scRNAsub@meta.data)
> fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
> fd <- new('AnnotatedDataFrame', data = fData)
> mycds <- newCellDataSet(data,
+                         phenoData = pd,
+                         featureData = fd,
+                         expressionFamily = negbinomial.size())
> mycds <- estimateSizeFactors(mycds)
> mycds <- estimateDispersions(mycds, cores=4, relative_expr = TRUE)
Removing 276 outliers
Warning messages:
1: `group_by_()` is deprecated as of dplyr 0.7.0.
Please use `group_by()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
2: `select_()` is deprecated as of dplyr 0.7.0.
Please use `select()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
> ##使用clusters差異表達基因
> diff.genes <- read.csv('subcluster/diff_genes_wilcox.csv')
> diff.genes <- subset(diff.genes,p_val_adj<0.01)$gene
> mycds <- setOrderingFilter(mycds, diff.genes)
> p1 <- plot_ordering_genes(mycds)
> ##使用seurat選擇的高變基因
> var.genes <- VariableFeatures(scRNAsub)
> mycds <- setOrderingFilter(mycds, var.genes)
> p2 <- plot_ordering_genes(mycds)
> ##使用monocle選擇的高變基因
> disp_table <- dispersionTable(mycds)
> disp.genes <- subset(disp_table, mean_expression >= 0.1 & dispersion_empirical >= 1 * dispersion_fit)$gene_id
> mycds <- setOrderingFilter(mycds, disp.genes)
> p3 <- plot_ordering_genes(mycds)
> ##結果對比
> p1|p2|p3
Warning messages:
1: Transformation introduced infinite values in continuous y-axis 
2: Transformation introduced infinite values in continuous y-axis 
3: Transformation introduced infinite values in continuous y-axis 
4: Transformation introduced infinite values in continuous y-axis 
5: Transformation introduced infinite values in continuous y-axis 
> #降維
> mycds <- reduceDimension(mycds, max_components = 2, method = 'DDRTree')
> #排序
> mycds <- orderCells(mycds)
There were 50 or more warnings (use warnings() to see the first 50)
> #State軌跡分布圖
> plot1 <- plot_cell_trajectory(mycds, color_by = "State")
> plot1
> ggsave("pseudotime/State.pdf", plot = plot1, width = 6, height = 5)
> ggsave("pseudotime/State.png", plot = plot1, width = 6, height = 5)
> ##Cluster軌跡分布圖
> plot2 <- plot_cell_trajectory(mycds, color_by = "seurat_clusters")
> ggsave("pseudotime/Cluster.pdf", plot = plot2, width = 6, height = 5)
> ggsave("pseudotime/Cluster.png", plot = plot2, width = 6, height = 5)
> plot2
> ##Pseudotime軌跡圖
> plot3 <- plot_cell_trajectory(mycds, color_by = "Pseudotime")
> plot3
> ggsave("pseudotime/Pseudotime.pdf", plot = plot3, width = 6, height = 5)
> ggsave("pseudotime/Pseudotime.png", plot = plot3, width = 6, height = 5)
> ##合并作圖
> plotc <- plot1|plot2|plot3
> plotc
> ggsave("pseudotime/Combination.pdf", plot = plotc, width = 10, height = 3.5)
> ggsave("pseudotime/Combination.png", plot = plotc, width = 10, height = 3.5)
> ##保存結果
> write.csv(pData(mycds), "pseudotime/pseudotime.csv")
> p1 <- plot_cell_trajectory(mycds, color_by = "State") + facet_wrap(~State, nrow = 1)
> p2 <- plot_cell_trajectory(mycds, color_by = "seurat_clusters") + facet_wrap(~seurat_clusters, nrow = 1)
> plotc <- p1/p2
> plotc <- p1/p2
> p1
> p2
> plotc
> ggsave("pseudotime/trajectory_facet.png", plot = plotc, width = 6, height = 5)
> #cluster差異基因
> diff.genes <- read.csv('subcluster/diff_genes_wilcox.csv')
> sig_diff.genes <- subset(diff.genes,p_val_adj<0.0001&abs(avg_logFC)>0.75)$gene
> sig_diff.genes <- unique(as.character(sig_diff.genes))
> diff_test <- differentialGeneTest(mycds[sig_diff.genes,], cores = 1, 
+                                   fullModelFormulaStr = "~sm.ns(Pseudotime)")
> sig_gene_names <- row.names(subset(diff_test, qval < 0.01))
> p1 = plot_pseudotime_heatmap(mycds[sig_gene_names,], num_clusters=3,
+                              show_rownames=T, return_heatmap=T)
> p1
> ggsave("pseudotime/pseudotime_heatmap1.png", plot = p1, width = 5, height = 8)
> #高變基因
> disp_table <- dispersionTable(mycds)
> disp.genes <- subset(disp_table, mean_expression >= 0.5&dispersion_empirical >= 1*dispersion_fit)
> disp.genes <- as.character(disp.genes$gene_id)
> diff_test <- differentialGeneTest(mycds[disp.genes,], cores = 4, 
+                                   fullModelFormulaStr = "~sm.ns(Pseudotime)")
> sig_gene_names <- row.names(subset(diff_test, qval < 1e-04))
> p2 = plot_pseudotime_heatmap(mycds[sig_gene_names,], num_clusters=5,
+                              show_rownames=T, return_heatmap=T)
Warning messages:
1: In slot(family, "validparams") :
  closing unused connection 7 (<-DESKTOP-2F2KC96:11566)
2: In slot(family, "validparams") :
  closing unused connection 6 (<-DESKTOP-2F2KC96:11566)
3: In slot(family, "validparams") :
  closing unused connection 5 (<-DESKTOP-2F2KC96:11566)
4: In slot(family, "validparams") :
  closing unused connection 4 (<-DESKTOP-2F2KC96:11566)
> ggsave("pseudotime/pseudotime_heatmap2.png", plot = p2, width = 5, height = 10)
> disp_table <- dispersionTable(mycds)
> disp.genes <- subset(disp_table, mean_expression >= 0.5&dispersion_empirical >= 1*dispersion_fit)
> disp.genes <- as.character(disp.genes$gene_id)
> mycds_sub <- mycds[disp.genes,]
> plot_cell_trajectory(mycds_sub, color_by = "State")
> beam_res <- BEAM(mycds_sub, branch_point = 1, cores = 8)
Warning messages:
1: In if (progenitor_method == "duplicate") { :
  the condition has length > 1 and only the first element will be used
2: In if (progenitor_method == "sequential_split") { :
  the condition has length > 1 and only the first element will be used
> beam_res <- beam_res[order(beam_res$qval),]
> beam_res <- beam_res[,c("gene_short_name", "pval", "qval")]
> mycds_sub_beam <- mycds_sub[row.names(subset(beam_res, qval < 1e-4)),]
> plot_genes_
Error: object 'plot_genes_' not found
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容