單細胞空間轉錄分析之Scanpy

將空間位置信息和轉錄組分析相結合,對于癌癥、免疫、腫瘤免疫相互作用,組織微環境,神經和發育等領域,有著令人期待的應用前景。

單細胞的一切分析,加前綴Spatial 都是一個新的分析點,因此Scanpy 擴展后也可用于空間轉錄組數據分析。 https://scanpy-tutorials.readthedocs.io/en/latest/spatial/integration-scanorama.html


單細胞轉錄數據分析之Scanpy:http://www.lxweimin.com/p/e22a947e6c60
單細胞轉錄組之Scanpy - 軌跡推斷/擬時序分析:http://www.lxweimin.com/p/0b2ca0e0b544
單細胞轉錄組之Scanpy - 樣本整合分析:http://www.lxweimin.com/p/beef8a8be360
單細胞空間轉錄分析之Scanpy:http://www.lxweimin.com/p/8dc231c06932
單細胞空間轉錄分析之Scanpy-多樣本整合:http://www.lxweimin.com/p/caa98aeac191
單細胞空間轉錄分析之Seurat:http://www.lxweimin.com/p/c9a601ced91f
單細胞空間轉錄分析之Seurat-多樣本整合(淺談空間批次):http://www.lxweimin.com/p/609b04096b79

python 包Scanpy很多函數是借鑒了R包Seurat,所以這兩種方法分析結果差異不大,大家可以對照Seurat分析,上面網址也提供了Seurat包處理單細胞空間轉錄分析過程。

和分析單細胞轉錄組數據一樣,單細胞空間轉錄組主要包括了:質控(QC),標準化(Normalization),降維聚類(Dimensional reduction and clustering),Cluster marker genes, Spatially variable genes

為了和Seurat結果比較,我們使用了相同的一套數據集:https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Mouse_Brain_Sagittal_Anterior, 新鮮的冷凍小鼠腦組織, 前牙矢狀切面,可以參考前面講述的ABA大腦圖譜:http://www.lxweimin.com/p/5d087fffeb35
導入相關包

import scanpy as sc
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sc.logging.print_versions()
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.settings.verbosity = 3
import os ##設置輸入路徑
os.chdir('/home/wucheng/jianshu/scanpy/spatial') ##修改路徑
results_file = 'Anterior.h5ad' ##設置結果文件保存路徑

讀取數據

adata = sc.read_visium("/home/wucheng/data_set/Spatial/Mouse/Brain_Section1_Sagittal_Anterior/Brain_anterior1/outs")
adata.var_names_make_unique()
adata
AnnData object with n_obs × n_vars = 2695 × 32285
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

預處理

adata.var["mt"] = adata.var_names.str.match("^MT-|^mt-") #計算線粒體基因比例 ,這兒沒有用作者推薦的str.startswith
sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True)
adata.var["RP"] = adata.var_names.str.match("^RPS|^RPL|^Rps|^Rpl") #同樣我們也可以計算核糖體基因比例
sc.pp.calculate_qc_metrics(adata, qc_vars=["RP"], inplace=True)

我們根據總counts和表達的genes對spots進行一些基本的過濾:

fig, axs = plt.subplots(1, 4, figsize=(16, 4))
sns.distplot(adata.obs["total_counts"], kde=False,bins=60, ax=axs[0]) #bins柱子的數目
sns.distplot(adata.obs["n_genes_by_counts"], kde=False, bins=60, ax=axs[1])
sns.distplot(adata.obs["pct_counts_mt"], kde=False, bins=60, ax=axs[2])
sns.distplot(adata.obs["pct_counts_RP"], kde=False, bins=60, ax=axs[3])
plt.savefig("QC_plot.pdf")
QC

這兒和Seurat得到的QC小提琴圖一樣,只是形式不同。

# 進一步細化,用于過濾spots
fig, axs = plt.subplots(1, 4, figsize=(16, 4))
sns.distplot(adata.obs["total_counts"][adata.obs["total_counts"] < 10000], kde=False, bins=40, ax=axs[0]) 
sns.distplot(adata.obs["total_counts"][adata.obs["total_counts"] > 40000], kde=False, bins=40, ax=axs[1]) 
sns.distplot(adata.obs["n_genes_by_counts"][adata.obs["n_genes_by_counts"] < 4000], kde=False, bins=60, ax=axs[2])
sns.distplot(adata.obs["pct_counts_mt"][adata.obs["pct_counts_mt"] >10], kde=False, bins=30, ax=axs[3])
plt.savefig("QC_plot1.pdf")

QC1

過濾 可以看到更加質量評估我們是可以去除一些低質量的細胞的,但是因為Seurat對此數據集未作過濾,我們這兒也不過濾,實際可以按照下面過濾細胞

sc.pp.filter_cells(adata, min_counts=5000) #filtered out 80 cells that have less than 5000 counts
sc.pp.filter_cells(adata, max_counts=50000) #filtered out 39 cells that have more than 50000 counts
adata = adata[adata.obs["pct_counts_mt"] < 25]
print(f"#cells after MT filter: {adata.n_obs}") #cells after MT filter: 2502
sc.pp.filter_genes(adata, min_cells=10)

標準化,HVG

sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor="seurat", n_top_genes=2000)

PCA,聚類,可視化

sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata, key_added="clusters")
sc.pl.umap(adata, color=["total_counts",  "clusters"], wspace=0.4)
plt.savefig("umap_cluster.pdf")
sc.pl.spatial(adata, img_key="hires", color=["total_counts", "clusters"])
plt.savefig("spatial_cluster.pdf")

Umap_Cluster

spatial_Cluster

顯示每簇位置

sc.pl.spatial(adata, img_key="hires", color="clusters", groups=["0"], alpha=0.5, size=1.3)
plt.savefig("spatial_cluster_sub_C0.pdf")
sc.pl.spatial(adata, img_key="hires", color="clusters", groups=["1"], alpha=0.5, size=1.3)
plt.savefig("spatial_cluster_sub_C1.pdf")

spatial_cluster_sub_C0

關鍵基因的表達可視化

sc.pl.umap(adata, color=["Hpca", "Ttr"])
plt.savefig("umap_gene_exp.pdf")
sc.pl.spatial(adata, img_key="hires", color=["Hpca", "Ttr"])
plt.savefig("spatial_gene_exp.pdf")
umap_gene_exp
spatial_gene_exp.pdf

每一簇marker genes

sc.tl.rank_genes_groups(adata, "clusters", method="t-test")
sc.pl.rank_genes_groups_heatmap(adata, groups="0", n_genes=20, groupby="clusters") #熱圖可視化
plt.savefig("Cluster0_marker_heatmap.pdf")
sc.pl.spatial(adata, img_key="hires", color=["Ppp1r1b", "Gpr88","Penk"], alpha=0.7)
plt.savefig("Cluster0_spatial_marker.pdf")
#輸出每一簇marker genes
pd.DataFrame(adata.uns['rank_genes_groups']['names']).head(5)
result = adata.uns['rank_genes_groups']
groups = result['names'].dtype.names
pd.DataFrame(    {group + '_' + key[:1]: result[key][group]    for group in groups for key in ['names', 'pvals']}).head(5)
res = pd.DataFrame(    {group + '_' + key: result[key][group]    for group in groups for key in ['names', 'pvals','logfoldchanges','pvals_adj','scores']})
res.to_csv("dif_markers.csv") #輸出保存
res
       0_names        0_pvals  0_logfoldchanges    0_pvals_adj   0_scores  1_names        1_pvals  ...   16_pvals_adj   16_scores  17_names       17_pvals  17_logfoldchanges   17_pvals_adj  17_scores
0      Ppp1r1b   0.000000e+00          3.872787   0.000000e+00  87.917786  Camk2n1   0.000000e+00  ...   8.478769e-43  100.765488   Neurod6   9.957260e-22           4.388139   4.848720e-20  30.808397
1        Gpr88   0.000000e+00          4.403049   0.000000e+00  85.114883     Nrgn   0.000000e+00  ...   8.057953e-47   90.942802    Shisa6   3.514023e-18           4.730407   1.345792e-16  23.194458
2         Penk   0.000000e+00          4.161317   0.000000e+00  81.762627      Cck  5.144804e-319  ...   3.267212e-36   61.300213    Kcnab2   5.501745e-19           1.907810   2.231455e-17  21.838722
3        Pde1b   0.000000e+00          3.090780   0.000000e+00  79.073380    Nptxr  4.326942e-262  ...   3.116630e-38   54.045643     Nell2   1.468953e-18           2.391361   5.804790e-17  21.818993
4       Pde10a   0.000000e+00          3.938498   0.000000e+00  77.363052   Camk2a  1.998454e-264  ...   3.185207e-28   42.650623   Slc17a7   6.004060e-21           2.086003   2.793099e-19  21.763395
...        ...            ...               ...            ...        ...      ...            ...  ...            ...         ...       ...            ...                ...            ...        ...
32280  Slc17a7  5.392274e-138         -2.638275  3.108743e-135 -31.638594   Pcp4l1  4.422776e-130  ...  1.257510e-237  -36.938381     Htr1b  7.360363e-161         -28.121906  4.752586e-157 -28.987629
32281      Cck  2.454884e-151         -2.442926  1.617468e-148 -33.223202      Mal  3.544791e-138  ...  7.927483e-259  -38.930893    Shisa8  5.588661e-167         -28.738472  4.510748e-163 -29.622454
32282    Stmn1  1.575607e-143         -1.639276  9.782401e-141 -34.924587     Plp1  8.282063e-151  ...  5.406098e-260  -39.055321      Dlx6  8.146516e-176         -28.056419  8.767009e-172 -30.527283
32283    Olfm1  1.527035e-160         -2.071354  1.173817e-157 -35.563046     Mobp  1.650645e-178  ...  1.075188e-280  -40.989178    Cdkn1a  6.932729e-189         -28.244112  1.119116e-184 -31.842714
32284     Nrn1  6.550518e-207         -3.461612  9.194933e-204 -37.689751      Mbp  1.564585e-190  ...   5.941411e-33  -46.059914      Rxrg  4.265618e-220         -29.022448  1.377155e-215 -34.893322

[32285 rows x 90 columns]
Cluster0_marker_heatmap

Cluster0_spatial_marker

空間特異性genes

import SpatialDE #導入包pip install spatialde
counts = pd.DataFrame(adata.X.todense(), columns=adata.var_names, index=adata.obs_names)
coord = pd.DataFrame(adata.obsm['spatial'], columns=['x_coord', 'y_coord'], index=adata.obs_names)
results = SpatialDE.run(coord, counts)
results.index = results["g"]
adata.var = pd.concat([adata.var, results.loc[adata.var.index.values, :]], axis=1)
results.sort_values("qval").head(10)
             FSV  M       g           l  max_delta       max_ll  max_mu_hat  max_s2_t_hat model     n    s2_FSV  s2_logdelta      time          BIC  max_ll_null         LLR  pval  qval
g                                                                                                                                                                                       
Dgkb    0.443220  4    Dgkb  456.331505   1.223975 -2400.272142    0.960307      0.261535    SE  2695  0.000034     0.000653  0.006909  4832.140898 -3068.090390  667.818248   0.0   0.0
Dlg4    0.250555  4    Dlg4  456.331505   2.914374 -2323.810810    1.374468      0.131051    SE  2695  0.000026     0.000777  0.007951  4679.218235 -2650.952950  327.142139   0.0   0.0
Eif5a   0.183140  4   Eif5a  456.331505   4.345842 -1760.856838    2.325085      0.125122    SE  2695  0.000025     0.001113  0.008964  3553.310290 -1946.308672  185.451834   0.0   0.0
Nlgn2   0.157334  4   Nlgn2  456.331505   5.218447 -2322.206831    1.041536      0.074196    SE  2695  0.000024     0.001289  0.009007  4676.010275 -2486.621146  164.414315   0.0   0.0
Atp1b2  0.209184  4  Atp1b2  456.331505   3.683460 -2412.570465    1.691318      0.129979    SE  2695  0.000022     0.000824  0.008972  4856.737545 -2670.858217  258.287751   0.0   0.0
Efnb3   0.298024  4   Efnb3  456.331505   2.294977 -2520.276506    0.884368      0.160919    SE  2695  0.000022     0.000557  0.007954  5072.149626 -3041.920377  521.643871   0.0   0.0
Naa38   0.119160  4   Naa38  456.331505   7.202339 -2284.186889    1.567088      0.075790    SE  2695  0.000027     0.002230  0.007951  4599.970392 -2377.750298   93.563409   0.0   0.0
Apcdd1  0.409983  4  Apcdd1  858.638612   1.321026 -1339.982577    0.374860      0.112883    SE  2695  0.000240     0.004741  0.006995  2711.561768 -1548.322641  208.340064   0.0   0.0
Rnf227  0.281065  4  Rnf227  456.331505   2.492253 -2386.449172    1.251820      0.148585    SE  2695  0.000029     0.000776  0.007941  4804.494957 -2690.556304  304.107132   0.0   0.0
Kcnab3  0.361125  4  Kcnab3  456.331505   1.723719 -2047.492804    0.551128      0.142294    SE  2695  0.000023     0.000479  0.007971  4126.582221 -2618.187562  570.694759   0.0   0.0
res = results.sort_values("qval")
res.to_csv("dif_spatial.csv") #輸出保存
sc.pl.spatial(adata, img_key="hires", color=["Dgkb", "Dlg4","Eif5a"], alpha=0.7)
plt.savefig("spatial_Var_feature.pdf")
spatial_Var_feature

保存數據

adata.write(results_file)

我們可以發現與Seurat相比,分類結果還是有差異,不過大的區域識別兩種方法都沒有什么問題。

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容