新的一年,新的開始,既然選擇了前方,只顧風雨兼程。
這一次,我們來繼續回顧一下空間轉錄組的分析軟件-------squidpy,當然了,我們主要關注10X空間轉錄組的分析內容。
Import packages & data
import scanpy as sc
import anndata as ad
import squidpy as sq
import numpy as np
import pandas as pd
sc.logging.print_header()
print(f"squidpy=={sq.__version__}")
# load the pre-processed dataset
img = sq.datasets.visium_hne_image()
adata = sq.datasets.visium_hne_adata()
First, let’s visualize cluster annotation in spatial context with [scanpy.pl.spatial()
].(看來空間轉錄組是提前分析過的,我們要關注一下這里的空間注釋)
sc.pl.spatial(adata, color="cluster")
Image features
Visium 數據集包含用于基因提取的組織的高分辨率圖像。 使用函數 squidpy.im.calculate_image_features()
可以計算每個 Visium 點的圖像特征并在 adata 中創建 obs x features矩陣,然后可以與 obs x gene基因表達矩陣一起分析。
通過提取圖像特征,我們的目標是獲得與基因表達值相似和互補的信息。 例如,在具有形態不同的兩種不同細胞類型的組織的情況下存在類似信息。 這樣的細胞類型信息然后包含在基因表達值和組織圖像特征中。
Squidpy 包含幾個特征提取器和一個靈活的計算不同尺度和大小特征的管道。 有幾個關于如何使用 squidpy.im.calculate_image_features() 的詳細示例。 提取圖像特征為學習更多信息提供了一個很好的起點。 (這個我們放在后面)
# calculate features for different scales (higher value means more context)
for scale in [1.0, 2.0]:
feature_name = f"features_summary_scale{scale}"
sq.im.calculate_image_features(
adata,
img.compute(),
features="summary",
key_added=feature_name,
n_jobs=4,
scale=scale,
)
# combine features in one dataframe
adata.obsm["features"] = pd.concat(
[adata.obsm[f] for f in adata.obsm.keys() if "features_summary" in f], axis="columns"
)
# make sure that we have no duplicated feature names in the combined table
adata.obsm["features"].columns = ad.utils.make_index_unique(adata.obsm["features"].columns)
可以使用提取的圖像特征來計算新的cluster注釋。 這可能有助于根據圖像形態深入了解點之間的相似性。
# helper function returning a clustering
def cluster_features(features: pd.DataFrame, like=None) -> pd.Series:
"""
Calculate leiden clustering of features.
Specify filter of features using `like`.
"""
# filter features
if like is not None:
features = features.filter(like=like)
# create temporary adata to calculate the clustering
adata = ad.AnnData(features)
# important - feature values are not scaled, so need to scale them before PCA
sc.pp.scale(adata)
# calculate leiden clustering
sc.pp.pca(adata, n_comps=min(10, features.shape[1] - 1))
sc.pp.neighbors(adata)
sc.tl.leiden(adata)
return adata.obs["leiden"]
# calculate feature clusters
adata.obs["features_cluster"] = cluster_features(adata.obsm["features"], like="summary")
# compare feature and gene clusters
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.pl.spatial(adata, color=["features_cluster", "cluster"])
比較基因和特征cluster,我們注意到在某些區域,它們看起來非常相似,比如cluster Fiber_tract,或者海馬周圍的cluster似乎被圖像特征空間中的cluster粗略地概括了。 在其他情況下,特征cluster看起來不同,比如在皮層中,基因cluster顯示了皮層的分層結構,而特征cluster似乎顯示了皮層的不同區域。
這只是對圖像特征的簡單比較分析,請注意,還可以使用圖像特征來例如 通過計算共享鄰居圖(例如在兩個特征空間上的連接 PCA)來計算公共圖像和基因聚類。
Spatial statistics and graph analysis
與其他空間數據類似,可以利用 Visium 數據中的空間和圖形統計來研究空間組織。
Neighborhood enrichment(鄰居富集)
計算鄰域富集可以幫助我們識別在整個組織中共享公共鄰域結構的spots clusters。 可以使用以下函數計算這樣的分數:squidpy.gr.nhood_enrichment()。 簡而言之,它是聚類空間接近度的豐富度得分:如果屬于兩個不同聚類的點通常彼此靠近,那么它們將具有高得分,可以定義為被 enriched。 另一方面,如果它們相距很遠,因此很少是鄰域,則分數將很低,可以將它們定義為depleted。 此分數基于置換檢驗,您可以使用 n_perms 參數(默認為 1000)設置。
由于該函數適用于連接矩陣,因此我們也需要計算它。 這可以通過 squidpy.gr.spatial_neighbors() 來完成。 有關此功能如何工作的更多詳細信息,請參閱構建空間鄰居圖(這個文章后面會詳細分享)。
Finally, we’ll directly visualize the results with squidpy.pl.nhood_enrichment()
.
sq.gr.spatial_neighbors(adata)
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(adata, cluster_key="cluster")
Given the spatial organization of the mouse brain coronal section, not surprisingly we find high neighborhood enrichment the Hippocampus region: Pyramidal_layer_dentate_gyrus and Pyramidal_layer clusters seems to be often neighbors with the larger Hippocampus cluster.
Co-occurrence across spatial dimensions(共定位)
除了鄰居富集分數,還可以在空間維度上可視化cluster共現。 這是對上述分析的類似分析,但它不是對連接矩陣進行操作,而是對原始空間坐標進行操作。 共現分數定義為:
where is the conditional probability of observing a cluster
conditioned on the presence of a cluster
, whereas
is the probability of observing
in the radius size of interest.該分數是通過增加組織中每個觀察(即此處的點)周圍的半徑大小來計算的。
We are gonna compute such score with squidpy.gr.co_occurrence()
and set the cluster annotation for the conditional probability with the argument clusters
. Then, we visualize the results with squidpy.pl.co_occurrence()
sq.gr.co_occurrence(adata, cluster_key="cluster")
sq.pl.co_occurrence(
adata,
cluster_key="cluster",
clusters="Hippocampus",
figsize=(8, 4),
)
The result largely recapitulates the previous analysis: the Pyramidal_layer cluster seem to co-occur at short distances with the larger Hippocampus cluster. It should be noted that the distance units are given in pixels of the Visium source_image, and corresponds to the same unit of the spatial coordinates saved in adata.obsm['spatial'].
Ligand-receptor interaction analysis
將繼續分析顯示與空間分子數據分析非常相關的幾個特征級方法。例如,在成對cluster共現進行量化之后,可能對尋找可能驅動細胞通信的分子實例感興趣。這自然轉化為配體-受體相互作用分析。在 Squidpy 中,提供了一種快速重新實現的流行方法 CellPhoneDB,并使用流行的數據庫擴展了其帶注釋的配體-受體相互作用對的數據庫。可以使用 squidpy.gr.ligrec()
對所有cluster對和所有基因進行分析。此外,我們將直接可視化結果,過濾掉低表達的基因(使用 mean_range 參數)并增加調整后的 p 值的閾值(使用 alpha 參數)。
sq.gr.ligrec(
adata,
n_perms=100,
cluster_key="cluster",
)
sq.pl.ligrec(
adata,
cluster_key="cluster",
source_groups="Hippocampus",
target_groups=["Pyramidal_layer", "Pyramidal_layer_dentate_gyrus"],
means_range=(3, np.inf),
alpha=1e-4,
swap_axes=True,
)
點圖可視化提供了一組有趣的候選配體-受體注釋,可能涉及海馬中的細胞相互作用。 例如,更精細的分析是將這些結果與去卷積方法的結果相結合,以了解該組織區域中存在的單細胞類型的比例是多少。
Spatially variable genes with Moran’s I(空間高變基因)
最后,我們可能對尋找顯示空間模式的基因感興趣。 有幾種方法旨在明確解決這個問題,基于點過程或高斯過程回歸框架:
- SPARK
- Spatial DE
- trendsceek
- HMRF
這里提供了一種基于著名的 Moran's I 統計量的簡單方法,該方法實際上也用作上面列出的空間變量基因論文中的基線方法。 Squidpy 中的函數稱為 squidpy.gr.spatial_autocorr(),并在 anndata.AnnData.var 槽中返回測試統計量和調整后的 p 值。 由于時間原因,我們將僅評估高度可變基因的子集。
genes = adata[:, adata.var.highly_variable].var_names.values[:1000]
sq.gr.spatial_autocorr(
adata,
mode="moran",
genes=genes,
n_perms=100,
n_jobs=1,
)
The results are saved in adata.uns['moranI'] slot. Genes have already been sorted by Moran’s I statistic.
sc.pl.spatial(adata, color=["Olfm1", "Plp1", "Itpka", "cluster"])
接下來要分步解讀了
Compute centrality scores
This example shows how to compute centrality scores, given a spatial graph and cell type annotation.
The scores calculated are closeness centrality, degree centrality and clustering coefficient with the following properties:
- closeness centrality - measure of how close the group is to other nodes.
- clustering coefficient - measure of the degree to which nodes cluster together.
- degree centrality - fraction of non-group members connected to group members.
All scores are descriptive statistics of the spatial graph.
import squidpy as sq
adata = sq.datasets.imc()
adata
This dataset contains cell type annotations in anndata.AnnData.obs
, which are used for calculation of centrality scores. First, we need to compute a connectivity matrix from spatial coordinates. We can use squidpy.gr.spatial_neighbors()
for this purpose.
sq.gr.spatial_neighbors(adata)
### Centrality scores are calculated with [`squidpy.gr.centrality_scores()`]
sq.gr.centrality_scores(adata, "cell type")
###And visualize results with [`squidpy.pl.centrality_scores()`]
sq.pl.centrality_scores(adata, "cell type")
Compute co-occurrence probability
除了鄰居富集分數,還可以在空間維度上可視化cluster共現。 這是對上述分析的類似分析,但它不是對連接矩陣進行操作,而是對原始空間坐標進行操作。 共現分數定義為:
where is the conditional probability of observing a cluster
conditioned on the presence of a cluster
, whereas
is the probability of observing
in the radius size of interest.該分數是通過增加組織中每個觀察(即此處的點)周圍的半徑大小來計算的。
import scanpy as sc
import squidpy as sq
adata = sq.datasets.imc()
adata
We can compute the co-occurrence score with squidpy.gr.co_occurrence()
. Results can be visualized with squidpy.pl.co_occurrence()
.
sq.gr.co_occurrence(adata, cluster_key="cell type")
sq.pl.co_occurrence(adata, cluster_key="cell type", clusters="basal CK tumor cell")
We can further visualize tissue organization in spatial coordinates with scanpy.pl.spatial()
.
Compute interaction matrix
This example shows how to compute the interaction matrix.
The interaction matrix quantifies the number of edges that nodes belonging to a given annotation shares with the other annotations. It’s a descriptive statistics of the spatial graph.
import squidpy as sq
adata = sq.datasets.imc()
adata
First, we need to compute a connectivity matrix from spatial coordinates. We can use squidpy.gr.spatial_neighbors()
for this purpose.
sq.gr.spatial_neighbors(adata)
We can compute the interaction matrix with squidpy.gr.interaction_matrix()
. Specify normalized = True
if you want a row-normalized matrix. Results can be visualized with squidpy.pl.interaction_matrix()
.
sq.gr.interaction_matrix(adata, cluster_key="cell type")
sq.pl.interaction_matrix(adata, cluster_key="cell type")
Receptor-ligand analysis
import squidpy as sq
adata = sq.datasets.seqfish()
adata
To get started, we just need an anndata.AnnData
object with some clustering information. Below are some useful parameters of squidpy.gr.ligrec()
:
n_perms
- number of permutations for the permutation test.
interactions
- list of interaction, by default we fetch all available interactions from [Türei et al., 2016].
{interactions,transmitter,receiver}_params
- parameters used if downloading theinteractions
, seeomnipah.interactions.import_intercell_network()
for more information.
threshold
- percentage of cells required to be expressed in a given cluster.
corr_method
- false discovery rate (FDR) correction method to use.
Since we’re interested in receptors and ligands in this example, we specify these categories in receiver_params
and transmitter_params
, respectively. If desired, we can also restrict the resources to just a select few. For example, in order to only use [Efremova et al., 2020], set interactions_params={'resources': 'CellPhoneDB'}
.
res = sq.gr.ligrec(
adata,
n_perms=1000,
cluster_key="celltype_mapped_refined",
copy=True,
use_raw=False,
transmitter_params={"categories": "ligand"},
receiver_params={"categories": "receptor"},
)
First, we inspect the calculated means. The resulting object is a pandas.DataFrame
, with rows corresponding to interacting pairs and columns to cluster combinations.
res["means"].head()
Next, we take a look at the p-values. If corr_method != None, this will contained the corrected p-values. The p-values marked as NaN correspond to interactions, which did not pass the filtering threshold specified above.
res["pvalues"].head()
In order to plot the results, we can run squidpy.pl.ligrec()
. Some useful parameters are:
{source,target}_groups
- only plot specific source/target clusters.
dendrogram
- whether to hierarchically cluster the rows, columns or both.
mean_range
- plot only interactions whose means are in this range.
pval_threshold
- plot only interactions whose p-values are below this threshold.
In the plot below, to highlight significance, we’ve marked all p-values <= 0.005 with tori.
sq.pl.ligrec(res, source_groups="Erythroid", alpha=0.005)
Compute Moran’s I score
This example shows how to compute the Moran’s I global spatial auto-correlation statistics.
The Moran’s I global spatial auto-correlation statistics evaluates whether features (i.e. genes) shows a pattern that is clustered, dispersed or random in the tissue are under consideration.
import scanpy as sc
import squidpy as sq
adata = sq.datasets.visium_hne_adata()
genes = adata[:, adata.var.highly_variable].var_names.values[:100]
sq.gr.spatial_neighbors(adata)
sq.gr.moran(
adata,
genes=genes,
n_perms=100,
n_jobs=1,
)
adata.uns["moranI"].head(10)
sc.pl.spatial(adata, color=["Resp18", "Tuba4a"])
Neighbors enrichment analysis
This example shows how to run the neighbors enrichment analysis routine.
It calculates an enrichment score based on proximity on the connectivity graph of cell clusters. The number of observed events is compared against
permutations and a z-score is computed.
import squidpy as sq
adata = sq.datasets.visium_fluo_adata()
####This dataset contains cell type annotations in anndata.Anndata.obs which are used for calculation of the neighborhood enrichment. First, we need to compute a connectivity matrix from spatial coordinates.
sq.gr.spatial_neighbors(adata)
####Then we can calculate the neighborhood enrichment score with [`squidpy.gr.nhood_enrichment()`](https://squidpy.readthedocs.io/en/latest/api/squidpy.gr.nhood_enrichment.html#squidpy.gr.nhood_enrichment "squidpy.gr.nhood_enrichment").
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(adata, cluster_key="cluster")
Building spatial neighbors graph
This example shows how to compute a spatial neighbors graph.
Spatial graph is a graph of spatial neighbors with observations as nodes and neighbor-hood relations between observations as edges. We use spatial coordinates of spots/cells to identify neighbors among them. Different approach of defining a neighborhood relation among observations are used for different types of spatial datasets.
import scanpy as sc
import squidpy as sq
import numpy as np
First, we show how to compute the spatial neighbors graph for a Visium dataset.
adata = sq.datasets.visium_fluo_adata()
We use squidpy.gr.spatial_neighbors()
for this. The function expects coord_type = 'visium'
by default. We set this parameter here explicitly for clarity. n_rings
should be used only for Visium datasets. It specifies for each spot how many hexagonal rings of spots around will be considered neighbors.
sq.gr.spatial_neighbors(adata, n_rings=2, coord_type="grid", n_neighs=6)
The function builds a spatial graph and saves its adjacency matrix to adata.obsp['spatial_connectivities'] and weighted adjacency matrix to adata.obsp['spatial_distances'] by default. Note that it can also build a a graph from a square grid, just set n_neighs = 4.
adata.obsp["spatial_connectivities"]
The weights of the weighted adjacency matrix are ordinal numbers of hexagonal rings in the case of coord_type = 'visium'.
adata.obsp["spatial_distances"]
We can visualize the neighbors of a point to better visualize what n_rings mean:
_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
adata[idx, :],
neighbors_key="spatial_neighbors",
edges=True,
edges_width=1,
img_key=None,
)
sq.gr.spatial_neighbors(adata, n_neighs=10, coord_type="generic")
_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
adata[idx, :],
color="cell type",
neighbors_key="spatial_neighbors",
spot_size=1,
edges=True,
edges_width=1,
img_key=None,
)
We use the same function for this with coord_type = 'generic' and delaunay = True. You can appreciate that the neighbor graph is slightly different than before.
sq.gr.spatial_neighbors(adata, delaunay=True, coord_type="generic")
_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
adata[idx, :],
color="cell type",
neighbors_key="spatial_neighbors",
spot_size=1,
edges=True,
edges_width=1,
img_key=None,
)
In order to get all spots within a specified radius (in units of the spatial coordinates) from each spot as neighbors, the parameter radius should be used.
sq.gr.spatial_neighbors(adata, radius=0.3, coord_type="generic")
adata.obsp["spatial_connectivities"]
adata.obsp["spatial_distances"]
生活很好,有你更好