空間轉錄組去卷積分析工具介紹——CARD

簡介

??目前,單細胞技術發展迅速,單細胞轉錄組測序可以幫助我們獲得組織中每個細胞的基因表達水平,但卻無法獲得對應空間位置上的信息。空間轉錄組技術(spatial transcriptomics technologies)使我們能夠利用空間定位信息對組織位置進行基因表達譜分析,從而表征空間組織結構,更好的幫助我們去理解組織異質性。然而,基于高通量測序的空轉技術還達不到單細胞轉錄組的分辨率,測得的每個孔(spot)可能包含10個以上的細胞,這會導致我們收集的特定位置的基因表達是同質或者異質細胞類型的混合狀態。因此,需要使用細胞類型去卷積(deconvolution)的分析工具去解析每個孔中的細胞類型組成,從而實現對細胞類型的空間定位?,F在此類分析工具出來很多,像SPOTlight、RCTD、Cell2location、STdeconvolve。也有文章對此類工具進行了測試比較,感興趣的可以看看這篇文章:A comprehensive comparison on cell-type composition inference for spatial transcriptomics data,今天我們給大家介紹一款發表在Nature Biotechnology雜志上的軟件——CARD,它是由美國密歇根大學、生物統計系周翔課題組開發的,是一種對空間轉錄組數據實現細胞類型去卷積分析的方法。

軟件介紹

??CARD設計用于去卷積空間轉錄組數據,并基于參考scRNA-seq數據推斷每個空間位置上的細胞類型組成,如下圖所示。CARD需要帶有細胞類型特異性基因表達信息的scRNA-seq數據(左圖)以及帶有定位信息的空間轉錄組學數據(右圖)。有了這兩個數據輸入,CARD通過一個非負矩陣執行去卷積分解框架,并輸出跨空間位置的估計細胞類型組成。CARD的一個獨特之處在于,它能夠通過CAR模型解釋跨空間位置的細胞類型組成的空間相關性。通過計算空間位置上細胞類型組成的空間相關性,CARD還能夠在原始研究中沒有測量到的位置上輸入細胞類型組成和基因表達水平,從而促進在組織上構建精細的高分辨率空間地圖。此外,如果相應單細胞轉錄組數據缺失,無法構建參考矩陣時,我們可以輸入marker基因,CARD 可以進行無細胞類型特異性參考矩陣的去卷積分析。


CARD分析步驟示意圖

軟件實操

??下面我們使用代碼來進行CARD軟件分析,具體示例代碼見CARD分析流程。

數據準備

??我們首先需要準備兩個數據,一個是單細胞轉錄組數據(原始counts表達矩陣+每個細胞的注釋矩陣),還有一個空間轉錄組數據(原始counts表達矩陣+每個spot的空間位置矩陣)。示例數據獲取方式:

wget https://github.com/YingMa0107/CARD/raw/master/data/spatial_location.RData ###空轉的空間位置矩陣
wget https://github.com/YingMa0107/CARD/raw/master/data/spatial_count.RData ###空轉的counts表達矩陣
wget https://github.com/YingMa0107/CARD/raw/master/data/sc_meta.RData ###單細胞轉錄組的細胞注釋矩陣
wget https://github.com/YingMa0107/CARD/raw/master/data/sc_count.RData ###細胞轉錄組的counts表達矩陣

??接下來我們安裝和加載CARD包,導入所需數據:

### 安裝CARD包
devtools::install_github('YingMa0107/CARD')
###加載CARD包
library(CARD)
###導入數據
load("./spatial_count.RData")###導入空轉的counts表達矩陣
spatial_count[1:4,1:4]
4 x 4 sparse Matrix of class "dgCMatrix"
           10x10 10x13 10x14 10x15
X5S_rRNA       .     .     .     .
X5_8S_rRNA     .     .     .     .
X7SK           .     .     .     .
A1BG.AS1       .     .     .     .
load("./spatial_location.RData")###導入空轉的空間位置矩陣
spatial_location[1:4,]
       x  y
10x10 10 10
10x13 10 13
10x14 10 14
10x15 10 15
load("./sc_count.RData")
sc_count[1:4,1:4]###導入單細胞轉錄組的counts表達矩陣
4 x 4 sparse Matrix of class "dgCMatrix"
      Cell1 Cell2 Cell3 Cell4
A1BG      .     .     .     .
A1CF      .     .     .     1
A2M       .     .     .     .
A2ML1     .     .     .     .
load("./sc_meta.RData")###導入單細胞轉錄組的細胞注釋矩陣
sc_meta[1:4,]
      cellID                             cellType sampleInfo
Cell1  Cell1                         Acinar_cells    sample1
Cell2  Cell2          Ductal_terminal_ductal_like    sample1
Cell3  Cell3          Ductal_terminal_ductal_like    sample1
Cell4  Cell4 Ductal_CRISP3_high-centroacinar_like    sample1
##創建對象

??然后我們用導入的數據創建CARD對象,創建完成后空轉數據儲存在CARD_obj@spatial_countMat和CARD_obj@spatial_location,單細胞轉錄組數據儲存在CARD_obj@sc_eset中:

CARD_obj = createCARDObject(
    sc_count = sc_count,
    sc_meta = sc_meta,
    spatial_count = spatial_count,
    spatial_location = spatial_location,
    ct.varname = "cellType",
    ct.select = unique(sc_meta$cellType),
    sample.varname = "sampleInfo",
    minCountGene = 100,
    minCountSpot = 5) 
## QC on scRNASeq dataset! ...
## QC on spatially-resolved dataset! ..
##去卷積分析

??現在我們把所有東西都存儲在CARD對象中,我們可以使用CARD去卷積空間轉錄組數據,完成后結果儲存在CARD_obj@Proportion_CARD中,獲得每個spot中各種細胞類型的組成占比。

CARD_obj = CARD_deconvolution(CARD_object = CARD_obj)
## create reference matrix from scRNASeq...
## Select Informative Genes! ...
## Deconvolution Starts! ...
## Deconvolution Finish! ...
print(CARD_obj@Proportion_CARD[1:2,])
      Acinar_cells Ductal_terminal_ductal_like
10x10 6.370860e-02                  0.02391251
10x13 7.818278e-08                  0.02996182
      Ductal_CRISP3_high-centroacinar_like Cancer_clone_A Ductal_MHC_Class_II
10x10                            0.1801753   4.229928e-04         0.021557706
10x13                            0.9620440   1.910394e-07         0.006437371
      Cancer_clone_B      mDCs_A Ductal_APOL1_high-hypoxic   Tuft_cells
10x10   4.339480e-05 0.011136707              4.967235e-04 2.025089e-03
10x13   2.319262e-05 0.001013068              3.864917e-06 1.796433e-06
            mDCs_B         pDCs Endocrine_cells Endothelial_cells Macrophages_A
10x10 0.0792525811 7.432979e-07    6.848627e-03      1.722855e-01  9.909662e-02
10x13 0.0004695018 1.566377e-11    3.925412e-11      4.198468e-11  2.766705e-05
        Mast_cells Macrophages_B T_cells_&_NK_cells    Monocytes        RBCs
10x10 7.411147e-11  3.090675e-02       4.976256e-06 2.663846e-06 3.84187e-10
10x13 2.387132e-11  9.499908e-06       1.172387e-11 2.000116e-06 1.00967e-06
       Fibroblasts
10x10 3.081225e-01
10x13 4.898874e-06

可視化

??CARD軟件提供兩種比較好的可視化函數,CARD.visualize.pie和CARD.visualize.prop,前者是在空轉位置上繪制每個spot的細胞類型組成拼圖;后者是在空轉位置上繪制每種細胞類型在所有spot中的占比情況,用顏色代表占比的高低。

colors = c("#FFD92F","#4DAF4A","#FCCDE5","#D9D9D9","#377EB8","#7FC97F","#BEAED4",
    "#FDC086","#FFFF99","#386CB0","#F0027F","#BF5B17","#666666","#1B9E77","#D95F02",
    "#7570B3","#E7298A","#66A61E","#E6AB02","#A6761D")#可以自定義顏色
p1 <- CARD.visualize.pie(proportion = CARD_obj@Proportion_CARD,spatial_location = CARD_obj@spatial_location, colors = colors)
print(p1)

??這個函數不能設置每個spot餅圖的大小,如果想自己調整,可以手動更改函數內部再繪圖,如下所示:

CARD.visualize.pie2 <- function(proportion,spatial_location,colors = NULL,round_size){
  res_CARD = as.data.frame(proportion)
  res_CARD = res_CARD[,mixedsort(colnames(res_CARD))]
  location = as.data.frame(spatial_location)
  if(sum(rownames(res_CARD)==rownames(location))!= nrow(res_CARD)){
    stop("The rownames of proportion data does not match with the rownames of spatial location data")
  }
  colorCandidate = c("#1e77b4","#ff7d0b","#ceaaa3","#2c9f2c","#babc22","#d52828","#9267bc",
                     "#8b544c","#e277c1","#d42728","#adc6e8","#97df89","#fe9795","#4381bd","#f2941f","#5aa43a","#cc4d2e","#9f83c8","#91675a",
                     "#da8ec8","#929292","#c3c237","#b4e0ea","#bacceb","#f7c685",
                     "#dcf0d0","#f4a99f","#c8bad8",
                     "#F56867", "#FEB915", "#C798EE", "#59BE86", "#7495D3",
                     "#D1D1D1", "#6D1A9C", "#15821E", "#3A84E6", "#997273",
                     "#787878", "#DB4C6C", "#9E7A7A", "#554236", "#AF5F3C",
                     "#93796C", "#F9BD3F", "#DAB370", "#877F6C", "#268785")
  if(is.null(colors)){
    #colors = brewer.pal(11, "Spectral")
    if(ncol(res_CARD) > length(colorCandidate)){
      colors = colorRampPalette(colorCandidate)(ncol(res_CARD))
    }else{
      colors = colorCandidate[sample(1:length(colorCandidate),ncol(res_CARD))]
    }
  }else{
    colors = colors
  }
  data = cbind(res_CARD,location)
  ct.select = colnames(res_CARD)
  p = suppressMessages(ggplot() + geom_scatterpie(aes(x=x, y=y,r = round_size),data=data, ###r默認是0.52
                                                  cols=ct.select,color=NA) + coord_fixed(ratio = 1) + 
                         scale_fill_manual(values =  colors)+
                         theme(plot.margin = margin(0.1, 0.1, 0.1, 0.1, "cm"),
                               panel.background = element_blank(),
                               plot.background = element_blank(),
                               panel.border = element_rect(colour = "grey89", fill=NA, size=0.5),
                               axis.text =element_blank(),
                               axis.ticks =element_blank(),
                               axis.title =element_blank(),
                               legend.title=element_text(size = 16,face="bold"),
                               legend.text=element_text(size = 15),
                               legend.key = element_rect(colour = "transparent", fill = "white"),
                               legend.key.size = unit(0.45, 'cm'),
                               strip.text = element_text(size = 16,face="bold"),
                               legend.position="bottom")+
                         guides(fill=guide_legend(title="Cell Type")))
  return(p)
}
p1 <- CARD.visualize.pie2(proportion = CARD_obj@Proportion_CARD,spatial_location = CARD_obj@spatial_location, colors = colors, round_size=5)

??我們同時可以選擇一些感興趣的細胞類型分別進行可視化,查看細胞類型比例的空間分布:

ct.visualize = c("Acinar_cells","Cancer_clone_A","Cancer_clone_B","Ductal_terminal_ductal_like","Ductal_CRISP3_high-centroacinar_like","Ductal_MHC_Class_II","Ductal_APOL1_high-hypoxic","Fibroblasts")
p2 <- CARD.visualize.prop(
    proportion = CARD_obj@Proportion_CARD,        
    spatial_location = CARD_obj@spatial_location, 
    ct.visualize = ct.visualize,                 ### 選擇要可視化的細胞類型
    colors = c("lightblue","lightyellow","red"), ###可以手動設置顏色
    NumCols = 4)                                 ###圖中展示的細胞類型的列數
print(p2)

精細化空間圖譜

??同時,CARD還可以將細胞類型組成和基因表達水平歸到未測量的組織位置上,構建更精細的空間組織圖。也就是如果空轉上沒測到的孔,可以用單細胞轉錄組數據將其還原,分析結果儲存在CARD_obj@refined_prop和CARD_obj@refined_expression中。

CARD_obj = CARD.imputation(CARD_obj,NumGrids = 2000,ineibor = 10,exclude = NULL)
## The rownames of locations are matched ...
## Make grids on new spatial locations ...
p5 <- CARD.visualize.prop(
    proportion = CARD_obj@refined_prop,                         
    spatial_location = location_imputation,            
    ct.visualize = ct.visualize,                    
    colors = c("lightblue","lightyellow","red"),    
    NumCols = 4)                                  
print(p5)

??CARD還可以提供在增強分辨率下的細胞類型比例后,預測增強分辨率下的空間基因表達,與原始分辨率相比,CARD有助于構建精細的空間組織圖,其分辨率遠遠高于原始研究中測量到的分辨率。

###增強分辨率下的基因表達
p6 <- CARD.visualize.gene(
   spatial_expression = CARD_obj@refined_expression,
   spatial_location = location_imputation,
   gene.visualize = c("Tm4sf1","S100a4","Tff3","Apol1","Crisp3","CD248"),
   colors = NULL,
   NumCols = 6)
print(p6)
###原始的基因表達
p7 <- CARD.visualize.gene(
   spatial_expression = CARD_obj@spatial_countMat,
   spatial_location = CARD_obj@spatial_location,
   gene.visualize = c("Tm4sf1","S100a4","Tff3","Apol1","Crisp3","CD248"),
   colors = NULL,
   NumCols = 6)
print(p7)

不使用單細胞轉錄組數據進行去卷積

??開發者同時擴展了CARD用以支持無參考細胞類型的去卷積分析,簡稱為CARDfree。CARDfree不再需要外部單細胞參考數據集,只需要用戶輸入之前已知的細胞類型標記的基因名稱列表。

wget https://github.com/YingMa0107/CARD/raw/master/data/markerList.RData#軟件提供的參考marker基因列表,可以自行提供
###導入參考marker基因列表
load("./markerList.RData")
CARDfree_obj = createCARDfreeObject(
    markerList = markerList,
    spatial_count = spatial_count,
    spatial_location = spatial_location,
    minCountGene = 100,
    minCountSpot =5) ###創建CARDfree對象
CARDfree_obj = CARD_refFree(CARDfree_obj)
colors = c("#FFD92F","#4DAF4A","#FCCDE5","#D9D9D9","#377EB8","#7FC97F","#BEAED4","#FDC086","#FFFF99","#386CB0","#F0027F","#BF5B17","#666666","#1B9E77","#D95F02","#7570B3","#E7298A","#66A61E","#E6AB02","#A6761D")
CARDfree_obj@Proportion_CARD = CARDfree_obj@Proportion_CARD[,c(8,10,14,2,1,6,12,18,7,13,20,19,16,17,11,15,4,9,3,5)]
colnames(CARDfree_obj@Proportion_CARD) = paste0("CT",1:20)
p8 <- CARD.visualize.pie(CARDfree_obj@Proportion_CARD,CARDfree_obj@spatial_location,colors = colors)
print(p8)

繪制單細胞分辨率的空間圖譜

??此外,CARD還可以在非單細胞分辨率的空間轉錄組上構建單細胞分辨率空間轉錄組。根據用于去卷積的參考單細胞轉錄組數據,從非單細胞分辨率空間轉錄組數據推斷出每個測量空間位置的單細胞分辨率基因表達。

scMapping = CARD_SCMapping(CARD_obj,shapeSpot="Square",numCell=20,ncore=10)###shapeSpot:一個字符,指示單個細胞的采樣空間坐標是位于類方形區域還是類圓形區域。該區域的中心是在非單細胞分辨率空間轉錄組數據中測量的空間位置。默認為“Square”,另一個選項為“Circle”
library(SingleCellExperiment)
MapCellCords = as.data.frame(colData(scMapping))
count_SC = assays(scMapping)$counts
library(ggplot2)
df = MapCellCords
colors = c("#8DD3C7","#CFECBB","#F4F4B9","#CFCCCF","#D1A7B9","#E9D3DE","#F4867C","#C0979F",
    "#D5CFD6","#86B1CD","#CEB28B","#EDBC63","#C59CC5","#C09CBF","#C2D567","#C9DAC3","#E1EBA0",
    "#FFED6F","#CDD796","#F8CDDE")
p9 = ggplot(df, aes(x = x, y = y, colour = CT)) + 
    geom_point(size = 3.0) +
    scale_colour_manual(values =  colors) +
    #facet_wrap(~Method,ncol = 2,nrow = 3) + 
        theme(plot.margin = margin(0.1, 0.1, 0.1, 0.1, "cm"),
              panel.background = element_rect(colour = "white", fill="white"),
              plot.background = element_rect(colour = "white", fill="white"),
    legend.position="bottom",
    panel.border = element_rect(colour = "grey89", fill=NA, size=0.5),
    axis.text =element_blank(),
    axis.ticks =element_blank(),
    axis.title =element_blank(),
    legend.title=element_text(size = 13,face="bold"),
    legend.text=element_text(size = 12),
    legend.key = element_rect(colour = "transparent", fill = "white"),
    legend.key.size = unit(0.45, 'cm'),
    strip.text = element_text(size = 15,face="bold"))+
                                guides(color=guide_legend(title="Cell Type"))###可視化每個細胞的細胞類型及其空間位置信息
print(p9)

總結

??與其他去卷積分析軟件相比,CARD優勢在于除了傳統基于單細胞轉錄組數據去卷積分析外,還可以基于細胞類型的marker基因進行分析,最后還可以繪制單細胞層次的空間圖譜。

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容