將空間位置信息和轉錄組分析相結合,對于癌癥、免疫、腫瘤免疫相互作用,組織微環境,神經和發育等領域,有著令人期待的應用前景。
單細胞的一切分析,加前綴Spatial 都是一個新的分析點,因此Scanpy 擴展后也可用于空間轉錄組數據分析。 https://scanpy-tutorials.readthedocs.io/en/latest/spatial/integration-scanorama.html
單細胞轉錄數據分析之Scanpy:http://www.lxweimin.com/p/e22a947e6c60
單細胞轉錄組之Scanpy - 軌跡推斷/擬時序分析:http://www.lxweimin.com/p/0b2ca0e0b544
單細胞轉錄組之Scanpy - 樣本整合分析:http://www.lxweimin.com/p/beef8a8be360
單細胞空間轉錄分析之Scanpy:http://www.lxweimin.com/p/8dc231c06932
單細胞空間轉錄分析之Scanpy-多樣本整合:http://www.lxweimin.com/p/caa98aeac191
單細胞空間轉錄分析之Seurat:http://www.lxweimin.com/p/c9a601ced91f
單細胞空間轉錄分析之Seurat-多樣本整合(淺談空間批次):http://www.lxweimin.com/p/609b04096b79
python 包Scanpy很多函數是借鑒了R包Seurat,所以這兩種方法分析結果差異不大,大家可以對照Seurat分析,上面網址也提供了Seurat包處理單細胞空間轉錄分析過程。
和分析單細胞轉錄組數據一樣,單細胞空間轉錄組主要包括了:質控(QC),標準化(Normalization),降維聚類(Dimensional reduction and clustering),Cluster marker genes, Spatially variable genes
為了和Seurat結果比較,我們使用了相同的一套數據集:https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Mouse_Brain_Sagittal_Anterior, 新鮮的冷凍小鼠腦組織, 前牙矢狀切面,可以參考前面講述的ABA大腦圖譜:http://www.lxweimin.com/p/5d087fffeb35
導入相關包
import scanpy as sc
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sc.logging.print_versions()
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.settings.verbosity = 3
import os ##設置輸入路徑
os.chdir('/home/wucheng/jianshu/scanpy/spatial') ##修改路徑
results_file = 'Anterior.h5ad' ##設置結果文件保存路徑
讀取數據
adata = sc.read_visium("/home/wucheng/data_set/Spatial/Mouse/Brain_Section1_Sagittal_Anterior/Brain_anterior1/outs")
adata.var_names_make_unique()
adata
AnnData object with n_obs × n_vars = 2695 × 32285
obs: 'in_tissue', 'array_row', 'array_col'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatial'
obsm: 'spatial'
預處理
adata.var["mt"] = adata.var_names.str.match("^MT-|^mt-") #計算線粒體基因比例 ,這兒沒有用作者推薦的str.startswith
sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True)
adata.var["RP"] = adata.var_names.str.match("^RPS|^RPL|^Rps|^Rpl") #同樣我們也可以計算核糖體基因比例
sc.pp.calculate_qc_metrics(adata, qc_vars=["RP"], inplace=True)
我們根據總counts和表達的genes對spots進行一些基本的過濾:
fig, axs = plt.subplots(1, 4, figsize=(16, 4))
sns.distplot(adata.obs["total_counts"], kde=False,bins=60, ax=axs[0]) #bins柱子的數目
sns.distplot(adata.obs["n_genes_by_counts"], kde=False, bins=60, ax=axs[1])
sns.distplot(adata.obs["pct_counts_mt"], kde=False, bins=60, ax=axs[2])
sns.distplot(adata.obs["pct_counts_RP"], kde=False, bins=60, ax=axs[3])
plt.savefig("QC_plot.pdf")
這兒和Seurat得到的QC小提琴圖一樣,只是形式不同。
# 進一步細化,用于過濾spots
fig, axs = plt.subplots(1, 4, figsize=(16, 4))
sns.distplot(adata.obs["total_counts"][adata.obs["total_counts"] < 10000], kde=False, bins=40, ax=axs[0])
sns.distplot(adata.obs["total_counts"][adata.obs["total_counts"] > 40000], kde=False, bins=40, ax=axs[1])
sns.distplot(adata.obs["n_genes_by_counts"][adata.obs["n_genes_by_counts"] < 4000], kde=False, bins=60, ax=axs[2])
sns.distplot(adata.obs["pct_counts_mt"][adata.obs["pct_counts_mt"] >10], kde=False, bins=30, ax=axs[3])
plt.savefig("QC_plot1.pdf")
過濾 可以看到更加質量評估我們是可以去除一些低質量的細胞的,但是因為Seurat對此數據集未作過濾,我們這兒也不過濾,實際可以按照下面過濾細胞
sc.pp.filter_cells(adata, min_counts=5000) #filtered out 80 cells that have less than 5000 counts
sc.pp.filter_cells(adata, max_counts=50000) #filtered out 39 cells that have more than 50000 counts
adata = adata[adata.obs["pct_counts_mt"] < 25]
print(f"#cells after MT filter: {adata.n_obs}") #cells after MT filter: 2502
sc.pp.filter_genes(adata, min_cells=10)
標準化,HVG
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor="seurat", n_top_genes=2000)
PCA,聚類,可視化
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata, key_added="clusters")
sc.pl.umap(adata, color=["total_counts", "clusters"], wspace=0.4)
plt.savefig("umap_cluster.pdf")
sc.pl.spatial(adata, img_key="hires", color=["total_counts", "clusters"])
plt.savefig("spatial_cluster.pdf")
顯示每簇位置
sc.pl.spatial(adata, img_key="hires", color="clusters", groups=["0"], alpha=0.5, size=1.3)
plt.savefig("spatial_cluster_sub_C0.pdf")
sc.pl.spatial(adata, img_key="hires", color="clusters", groups=["1"], alpha=0.5, size=1.3)
plt.savefig("spatial_cluster_sub_C1.pdf")
關鍵基因的表達可視化
sc.pl.umap(adata, color=["Hpca", "Ttr"])
plt.savefig("umap_gene_exp.pdf")
sc.pl.spatial(adata, img_key="hires", color=["Hpca", "Ttr"])
plt.savefig("spatial_gene_exp.pdf")
每一簇marker genes
sc.tl.rank_genes_groups(adata, "clusters", method="t-test")
sc.pl.rank_genes_groups_heatmap(adata, groups="0", n_genes=20, groupby="clusters") #熱圖可視化
plt.savefig("Cluster0_marker_heatmap.pdf")
sc.pl.spatial(adata, img_key="hires", color=["Ppp1r1b", "Gpr88","Penk"], alpha=0.7)
plt.savefig("Cluster0_spatial_marker.pdf")
#輸出每一簇marker genes
pd.DataFrame(adata.uns['rank_genes_groups']['names']).head(5)
result = adata.uns['rank_genes_groups']
groups = result['names'].dtype.names
pd.DataFrame( {group + '_' + key[:1]: result[key][group] for group in groups for key in ['names', 'pvals']}).head(5)
res = pd.DataFrame( {group + '_' + key: result[key][group] for group in groups for key in ['names', 'pvals','logfoldchanges','pvals_adj','scores']})
res.to_csv("dif_markers.csv") #輸出保存
res
0_names 0_pvals 0_logfoldchanges 0_pvals_adj 0_scores 1_names 1_pvals ... 16_pvals_adj 16_scores 17_names 17_pvals 17_logfoldchanges 17_pvals_adj 17_scores
0 Ppp1r1b 0.000000e+00 3.872787 0.000000e+00 87.917786 Camk2n1 0.000000e+00 ... 8.478769e-43 100.765488 Neurod6 9.957260e-22 4.388139 4.848720e-20 30.808397
1 Gpr88 0.000000e+00 4.403049 0.000000e+00 85.114883 Nrgn 0.000000e+00 ... 8.057953e-47 90.942802 Shisa6 3.514023e-18 4.730407 1.345792e-16 23.194458
2 Penk 0.000000e+00 4.161317 0.000000e+00 81.762627 Cck 5.144804e-319 ... 3.267212e-36 61.300213 Kcnab2 5.501745e-19 1.907810 2.231455e-17 21.838722
3 Pde1b 0.000000e+00 3.090780 0.000000e+00 79.073380 Nptxr 4.326942e-262 ... 3.116630e-38 54.045643 Nell2 1.468953e-18 2.391361 5.804790e-17 21.818993
4 Pde10a 0.000000e+00 3.938498 0.000000e+00 77.363052 Camk2a 1.998454e-264 ... 3.185207e-28 42.650623 Slc17a7 6.004060e-21 2.086003 2.793099e-19 21.763395
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
32280 Slc17a7 5.392274e-138 -2.638275 3.108743e-135 -31.638594 Pcp4l1 4.422776e-130 ... 1.257510e-237 -36.938381 Htr1b 7.360363e-161 -28.121906 4.752586e-157 -28.987629
32281 Cck 2.454884e-151 -2.442926 1.617468e-148 -33.223202 Mal 3.544791e-138 ... 7.927483e-259 -38.930893 Shisa8 5.588661e-167 -28.738472 4.510748e-163 -29.622454
32282 Stmn1 1.575607e-143 -1.639276 9.782401e-141 -34.924587 Plp1 8.282063e-151 ... 5.406098e-260 -39.055321 Dlx6 8.146516e-176 -28.056419 8.767009e-172 -30.527283
32283 Olfm1 1.527035e-160 -2.071354 1.173817e-157 -35.563046 Mobp 1.650645e-178 ... 1.075188e-280 -40.989178 Cdkn1a 6.932729e-189 -28.244112 1.119116e-184 -31.842714
32284 Nrn1 6.550518e-207 -3.461612 9.194933e-204 -37.689751 Mbp 1.564585e-190 ... 5.941411e-33 -46.059914 Rxrg 4.265618e-220 -29.022448 1.377155e-215 -34.893322
[32285 rows x 90 columns]
空間特異性genes
import SpatialDE #導入包pip install spatialde
counts = pd.DataFrame(adata.X.todense(), columns=adata.var_names, index=adata.obs_names)
coord = pd.DataFrame(adata.obsm['spatial'], columns=['x_coord', 'y_coord'], index=adata.obs_names)
results = SpatialDE.run(coord, counts)
results.index = results["g"]
adata.var = pd.concat([adata.var, results.loc[adata.var.index.values, :]], axis=1)
results.sort_values("qval").head(10)
FSV M g l max_delta max_ll max_mu_hat max_s2_t_hat model n s2_FSV s2_logdelta time BIC max_ll_null LLR pval qval
g
Dgkb 0.443220 4 Dgkb 456.331505 1.223975 -2400.272142 0.960307 0.261535 SE 2695 0.000034 0.000653 0.006909 4832.140898 -3068.090390 667.818248 0.0 0.0
Dlg4 0.250555 4 Dlg4 456.331505 2.914374 -2323.810810 1.374468 0.131051 SE 2695 0.000026 0.000777 0.007951 4679.218235 -2650.952950 327.142139 0.0 0.0
Eif5a 0.183140 4 Eif5a 456.331505 4.345842 -1760.856838 2.325085 0.125122 SE 2695 0.000025 0.001113 0.008964 3553.310290 -1946.308672 185.451834 0.0 0.0
Nlgn2 0.157334 4 Nlgn2 456.331505 5.218447 -2322.206831 1.041536 0.074196 SE 2695 0.000024 0.001289 0.009007 4676.010275 -2486.621146 164.414315 0.0 0.0
Atp1b2 0.209184 4 Atp1b2 456.331505 3.683460 -2412.570465 1.691318 0.129979 SE 2695 0.000022 0.000824 0.008972 4856.737545 -2670.858217 258.287751 0.0 0.0
Efnb3 0.298024 4 Efnb3 456.331505 2.294977 -2520.276506 0.884368 0.160919 SE 2695 0.000022 0.000557 0.007954 5072.149626 -3041.920377 521.643871 0.0 0.0
Naa38 0.119160 4 Naa38 456.331505 7.202339 -2284.186889 1.567088 0.075790 SE 2695 0.000027 0.002230 0.007951 4599.970392 -2377.750298 93.563409 0.0 0.0
Apcdd1 0.409983 4 Apcdd1 858.638612 1.321026 -1339.982577 0.374860 0.112883 SE 2695 0.000240 0.004741 0.006995 2711.561768 -1548.322641 208.340064 0.0 0.0
Rnf227 0.281065 4 Rnf227 456.331505 2.492253 -2386.449172 1.251820 0.148585 SE 2695 0.000029 0.000776 0.007941 4804.494957 -2690.556304 304.107132 0.0 0.0
Kcnab3 0.361125 4 Kcnab3 456.331505 1.723719 -2047.492804 0.551128 0.142294 SE 2695 0.000023 0.000479 0.007971 4126.582221 -2618.187562 570.694759 0.0 0.0
res = results.sort_values("qval")
res.to_csv("dif_spatial.csv") #輸出保存
sc.pl.spatial(adata, img_key="hires", color=["Dgkb", "Dlg4","Eif5a"], alpha=0.7)
plt.savefig("spatial_Var_feature.pdf")
保存數據
adata.write(results_file)
我們可以發現與Seurat相比,分類結果還是有差異,不過大的區域識別兩種方法都沒有什么問題。