10X單細胞空間聯合分析回顧之squidpy

新的一年,新的開始,既然選擇了前方,只顧風雨兼程。

這一次,我們來繼續回顧一下空間轉錄組的分析軟件-------squidpy,當然了,我們主要關注10X空間轉錄組的分析內容。

Import packages & data

import scanpy as sc
import anndata as ad
import squidpy as sq

import numpy as np
import pandas as pd

sc.logging.print_header()
print(f"squidpy=={sq.__version__}")

# load the pre-processed dataset
img = sq.datasets.visium_hne_image()
adata = sq.datasets.visium_hne_adata()
First, let’s visualize cluster annotation in spatial context with [scanpy.pl.spatial()].(看來空間轉錄組是提前分析過的,我們要關注一下這里的空間注釋)
sc.pl.spatial(adata, color="cluster")
圖片.png

Image features

Visium 數據集包含用于基因提取的組織的高分辨率圖像。 使用函數 squidpy.im.calculate_image_features() 可以計算每個 Visium 點的圖像特征并在 adata 中創建 obs x features矩陣,然后可以與 obs x gene基因表達矩陣一起分析。

通過提取圖像特征,我們的目標是獲得與基因表達值相似和互補的信息。 例如,在具有形態不同的兩種不同細胞類型的組織的情況下存在類似信息。 這樣的細胞類型信息然后包含在基因表達值和組織圖像特征中。

Squidpy 包含幾個特征提取器和一個靈活的計算不同尺度和大小特征的管道。 有幾個關于如何使用 squidpy.im.calculate_image_features() 的詳細示例。 提取圖像特征為學習更多信息提供了一個很好的起點。 (這個我們放在后面)

# calculate features for different scales (higher value means more context)
for scale in [1.0, 2.0]:
    feature_name = f"features_summary_scale{scale}"
    sq.im.calculate_image_features(
        adata,
        img.compute(),
        features="summary",
        key_added=feature_name,
        n_jobs=4,
        scale=scale,
    )


# combine features in one dataframe
adata.obsm["features"] = pd.concat(
    [adata.obsm[f] for f in adata.obsm.keys() if "features_summary" in f], axis="columns"
)
# make sure that we have no duplicated feature names in the combined table
adata.obsm["features"].columns = ad.utils.make_index_unique(adata.obsm["features"].columns)

可以使用提取的圖像特征來計算新的cluster注釋。 這可能有助于根據圖像形態深入了解點之間的相似性

# helper function returning a clustering
def cluster_features(features: pd.DataFrame, like=None) -> pd.Series:
    """
    Calculate leiden clustering of features.

    Specify filter of features using `like`.
    """
    # filter features
    if like is not None:
        features = features.filter(like=like)
    # create temporary adata to calculate the clustering
    adata = ad.AnnData(features)
    # important - feature values are not scaled, so need to scale them before PCA
    sc.pp.scale(adata)
    # calculate leiden clustering
    sc.pp.pca(adata, n_comps=min(10, features.shape[1] - 1))
    sc.pp.neighbors(adata)
    sc.tl.leiden(adata)

    return adata.obs["leiden"]


# calculate feature clusters
adata.obs["features_cluster"] = cluster_features(adata.obsm["features"], like="summary")

# compare feature and gene clusters
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.pl.spatial(adata, color=["features_cluster", "cluster"])
圖片.png

比較基因和特征cluster,我們注意到在某些區域,它們看起來非常相似,比如cluster Fiber_tract,或者海馬周圍的cluster似乎被圖像特征空間中的cluster粗略地概括了。 在其他情況下,特征cluster看起來不同,比如在皮層中,基因cluster顯示了皮層的分層結構,而特征cluster似乎顯示了皮層的不同區域。

這只是對圖像特征的簡單比較分析,請注意,還可以使用圖像特征來例如 通過計算共享鄰居圖(例如在兩個特征空間上的連接 PCA)來計算公共圖像和基因聚類

Spatial statistics and graph analysis

與其他空間數據類似,可以利用 Visium 數據中的空間和圖形統計來研究空間組織

Neighborhood enrichment(鄰居富集)

計算鄰域富集可以幫助我們識別在整個組織中共享公共鄰域結構的spots clusters。 可以使用以下函數計算這樣的分數:squidpy.gr.nhood_enrichment()。 簡而言之,它是聚類空間接近度的豐富度得分:如果屬于兩個不同聚類的點通常彼此靠近,那么它們將具有高得分,可以定義為被 enriched。 另一方面,如果它們相距很遠,因此很少是鄰域,則分數將很低,可以將它們定義為depleted。 此分數基于置換檢驗,您可以使用 n_perms 參數(默認為 1000)設置。

由于該函數適用于連接矩陣,因此我們也需要計算它。 這可以通過 squidpy.gr.spatial_neighbors() 來完成。 有關此功能如何工作的更多詳細信息,請參閱構建空間鄰居圖(這個文章后面會詳細分享)。

Finally, we’ll directly visualize the results with squidpy.pl.nhood_enrichment().

sq.gr.spatial_neighbors(adata)
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(adata, cluster_key="cluster")
圖片.png

Given the spatial organization of the mouse brain coronal section, not surprisingly we find high neighborhood enrichment the Hippocampus region: Pyramidal_layer_dentate_gyrus and Pyramidal_layer clusters seems to be often neighbors with the larger Hippocampus cluster.

Co-occurrence across spatial dimensions(共定位)

除了鄰居富集分數,還可以在空間維度上可視化cluster共現。 這是對上述分析的類似分析,但它不是對連接矩陣進行操作,而是對原始空間坐標進行操作。 共現分數定義為:

圖片.png

where p(exp/cond) is the conditional probability of observing a cluster exp conditioned on the presence of a cluster cond, whereas p(exp) is the probability of observing exp in the radius size of interest.該分數是通過增加組織中每個觀察(即此處的點)周圍的半徑大小來計算的

We are gonna compute such score with squidpy.gr.co_occurrence() and set the cluster annotation for the conditional probability with the argument clusters. Then, we visualize the results with squidpy.pl.co_occurrence()
sq.gr.co_occurrence(adata, cluster_key="cluster")
sq.pl.co_occurrence(
    adata,
    cluster_key="cluster",
    clusters="Hippocampus",
    figsize=(8, 4),
)
圖片.png
The result largely recapitulates the previous analysis: the Pyramidal_layer cluster seem to co-occur at short distances with the larger Hippocampus cluster. It should be noted that the distance units are given in pixels of the Visium source_image, and corresponds to the same unit of the spatial coordinates saved in adata.obsm['spatial'].

Ligand-receptor interaction analysis

將繼續分析顯示與空間分子數據分析非常相關的幾個特征級方法。例如,在成對cluster共現進行量化之后,可能對尋找可能驅動細胞通信的分子實例感興趣。這自然轉化為配體-受體相互作用分析。在 Squidpy 中,提供了一種快速重新實現的流行方法 CellPhoneDB,并使用流行的數據庫擴展了其帶注釋的配體-受體相互作用對的數據庫。可以使用 squidpy.gr.ligrec() 對所有cluster對和所有基因進行分析。此外,我們將直接可視化結果,過濾掉低表達的基因(使用 mean_range 參數)并增加調整后的 p 值的閾值(使用 alpha 參數)。

sq.gr.ligrec(
    adata,
    n_perms=100,
    cluster_key="cluster",
)
sq.pl.ligrec(
    adata,
    cluster_key="cluster",
    source_groups="Hippocampus",
    target_groups=["Pyramidal_layer", "Pyramidal_layer_dentate_gyrus"],
    means_range=(3, np.inf),
    alpha=1e-4,
    swap_axes=True,
)
圖片.png

點圖可視化提供了一組有趣的候選配體-受體注釋,可能涉及海馬中的細胞相互作用。 例如,更精細的分析是將這些結果與去卷積方法的結果相結合,以了解該組織區域中存在的單細胞類型的比例是多少。

Spatially variable genes with Moran’s I(空間高變基因)

最后,我們可能對尋找顯示空間模式的基因感興趣。 有幾種方法旨在明確解決這個問題,基于點過程或高斯過程回歸框架:

  • SPARK
  • Spatial DE
  • trendsceek
  • HMRF

這里提供了一種基于著名的 Moran's I 統計量的簡單方法,該方法實際上也用作上面列出的空間變量基因論文中的基線方法。 Squidpy 中的函數稱為 squidpy.gr.spatial_autocorr(),并在 anndata.AnnData.var 槽中返回測試統計量和調整后的 p 值。 由于時間原因,我們將僅評估高度可變基因的子集。

genes = adata[:, adata.var.highly_variable].var_names.values[:1000]
sq.gr.spatial_autocorr(
    adata,
    mode="moran",
    genes=genes,
    n_perms=100,
    n_jobs=1,
)

The results are saved in adata.uns['moranI'] slot. Genes have already been sorted by Moran’s I statistic.

圖片.png
sc.pl.spatial(adata, color=["Olfm1", "Plp1", "Itpka", "cluster"])
圖片.png

接下來要分步解讀了

Compute centrality scores

This example shows how to compute centrality scores, given a spatial graph and cell type annotation.

The scores calculated are closeness centrality, degree centrality and clustering coefficient with the following properties:

  • closeness centrality - measure of how close the group is to other nodes.
  • clustering coefficient - measure of the degree to which nodes cluster together.
  • degree centrality - fraction of non-group members connected to group members.

All scores are descriptive statistics of the spatial graph.

import squidpy as sq

adata = sq.datasets.imc()
adata

This dataset contains cell type annotations in anndata.AnnData.obs, which are used for calculation of centrality scores. First, we need to compute a connectivity matrix from spatial coordinates. We can use squidpy.gr.spatial_neighbors() for this purpose.

sq.gr.spatial_neighbors(adata)
### Centrality scores are calculated with [`squidpy.gr.centrality_scores()`]
sq.gr.centrality_scores(adata, "cell type")
###And visualize results with [`squidpy.pl.centrality_scores()`]
sq.pl.centrality_scores(adata, "cell type")
圖片.png

Compute co-occurrence probability

除了鄰居富集分數,還可以在空間維度上可視化cluster共現。 這是對上述分析的類似分析,但它不是對連接矩陣進行操作,而是對原始空間坐標進行操作。 共現分數定義為:

圖片.png

where p(exp/cond) is the conditional probability of observing a cluster exp conditioned on the presence of a cluster cond, whereas p(exp) is the probability of observing exp in the radius size of interest.該分數是通過增加組織中每個觀察(即此處的點)周圍的半徑大小來計算的

import scanpy as sc
import squidpy as sq

adata = sq.datasets.imc()
adata

We can compute the co-occurrence score with squidpy.gr.co_occurrence(). Results can be visualized with squidpy.pl.co_occurrence().

sq.gr.co_occurrence(adata, cluster_key="cell type")
sq.pl.co_occurrence(adata, cluster_key="cell type", clusters="basal CK tumor cell")
圖片.png

We can further visualize tissue organization in spatial coordinates with scanpy.pl.spatial().

圖片.png

Compute interaction matrix

This example shows how to compute the interaction matrix.

The interaction matrix quantifies the number of edges that nodes belonging to a given annotation shares with the other annotations. It’s a descriptive statistics of the spatial graph.

import squidpy as sq

adata = sq.datasets.imc()
adata
First, we need to compute a connectivity matrix from spatial coordinates. We can use squidpy.gr.spatial_neighbors() for this purpose.
sq.gr.spatial_neighbors(adata)

We can compute the interaction matrix with squidpy.gr.interaction_matrix(). Specify normalized = True if you want a row-normalized matrix. Results can be visualized with squidpy.pl.interaction_matrix().

sq.gr.interaction_matrix(adata, cluster_key="cell type")
sq.pl.interaction_matrix(adata, cluster_key="cell type")
圖片.png

Receptor-ligand analysis

import squidpy as sq

adata = sq.datasets.seqfish()
adata

To get started, we just need an anndata.AnnData object with some clustering information. Below are some useful parameters of squidpy.gr.ligrec():

  • n_perms - number of permutations for the permutation test.
  • interactions - list of interaction, by default we fetch all available interactions from [Türei et al., 2016].
  • {interactions,transmitter,receiver}_params - parameters used if downloading the interactions, see omnipah.interactions.import_intercell_network() for more information.
  • threshold - percentage of cells required to be expressed in a given cluster.
  • corr_method - false discovery rate (FDR) correction method to use.

Since we’re interested in receptors and ligands in this example, we specify these categories in receiver_params and transmitter_params, respectively. If desired, we can also restrict the resources to just a select few. For example, in order to only use [Efremova et al., 2020], set interactions_params={'resources': 'CellPhoneDB'}.

res = sq.gr.ligrec(
    adata,
    n_perms=1000,
    cluster_key="celltype_mapped_refined",
    copy=True,
    use_raw=False,
    transmitter_params={"categories": "ligand"},
    receiver_params={"categories": "receptor"},
)
First, we inspect the calculated means. The resulting object is a pandas.DataFrame, with rows corresponding to interacting pairs and columns to cluster combinations.
res["means"].head()
圖片.png
Next, we take a look at the p-values. If corr_method != None, this will contained the corrected p-values. The p-values marked as NaN correspond to interactions, which did not pass the filtering threshold specified above.
res["pvalues"].head()
圖片.png

In order to plot the results, we can run squidpy.pl.ligrec(). Some useful parameters are:

  • {source,target}_groups - only plot specific source/target clusters.
  • dendrogram - whether to hierarchically cluster the rows, columns or both.
  • mean_range - plot only interactions whose means are in this range.
  • pval_threshold - plot only interactions whose p-values are below this threshold.

In the plot below, to highlight significance, we’ve marked all p-values <= 0.005 with tori.

sq.pl.ligrec(res, source_groups="Erythroid", alpha=0.005)
圖片.png

Compute Moran’s I score

This example shows how to compute the Moran’s I global spatial auto-correlation statistics.

The Moran’s I global spatial auto-correlation statistics evaluates whether features (i.e. genes) shows a pattern that is clustered, dispersed or random in the tissue are under consideration.

import scanpy as sc
import squidpy as sq

adata = sq.datasets.visium_hne_adata()
genes = adata[:, adata.var.highly_variable].var_names.values[:100]
sq.gr.spatial_neighbors(adata)
sq.gr.moran(
    adata,
    genes=genes,
    n_perms=100,
    n_jobs=1,
)
adata.uns["moranI"].head(10)
圖片.png
sc.pl.spatial(adata, color=["Resp18", "Tuba4a"])
圖片.png

Neighbors enrichment analysis

This example shows how to run the neighbors enrichment analysis routine.

It calculates an enrichment score based on proximity on the connectivity graph of cell clusters. The number of observed events is compared against N
permutations and a z-score is computed.

import squidpy as sq
adata = sq.datasets.visium_fluo_adata()
####This dataset contains cell type annotations in anndata.Anndata.obs which are used for calculation of the neighborhood enrichment. First, we need to compute a connectivity matrix from spatial coordinates.
sq.gr.spatial_neighbors(adata)
####Then we can calculate the neighborhood enrichment score with [`squidpy.gr.nhood_enrichment()`](https://squidpy.readthedocs.io/en/latest/api/squidpy.gr.nhood_enrichment.html#squidpy.gr.nhood_enrichment "squidpy.gr.nhood_enrichment").
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(adata, cluster_key="cluster")
圖片.png

Building spatial neighbors graph

This example shows how to compute a spatial neighbors graph.

Spatial graph is a graph of spatial neighbors with observations as nodes and neighbor-hood relations between observations as edges. We use spatial coordinates of spots/cells to identify neighbors among them. Different approach of defining a neighborhood relation among observations are used for different types of spatial datasets.

import scanpy as sc
import squidpy as sq

import numpy as np
First, we show how to compute the spatial neighbors graph for a Visium dataset.
adata = sq.datasets.visium_fluo_adata()
We use squidpy.gr.spatial_neighbors() for this. The function expects coord_type = 'visium' by default. We set this parameter here explicitly for clarity. n_rings should be used only for Visium datasets. It specifies for each spot how many hexagonal rings of spots around will be considered neighbors.
sq.gr.spatial_neighbors(adata, n_rings=2, coord_type="grid", n_neighs=6)
The function builds a spatial graph and saves its adjacency matrix to adata.obsp['spatial_connectivities'] and weighted adjacency matrix to adata.obsp['spatial_distances'] by default. Note that it can also build a a graph from a square grid, just set n_neighs = 4.
adata.obsp["spatial_connectivities"]
The weights of the weighted adjacency matrix are ordinal numbers of hexagonal rings in the case of coord_type = 'visium'.
adata.obsp["spatial_distances"]
We can visualize the neighbors of a point to better visualize what n_rings mean:
_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
    adata[idx, :],
    neighbors_key="spatial_neighbors",
    edges=True,
    edges_width=1,
    img_key=None,
)
圖片.png
sq.gr.spatial_neighbors(adata, n_neighs=10, coord_type="generic")
_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
    adata[idx, :],
    color="cell type",
    neighbors_key="spatial_neighbors",
    spot_size=1,
    edges=True,
    edges_width=1,
    img_key=None,
)
圖片.png

We use the same function for this with coord_type = 'generic' and delaunay = True. You can appreciate that the neighbor graph is slightly different than before.

sq.gr.spatial_neighbors(adata, delaunay=True, coord_type="generic")
_, idx = adata.obsp["spatial_connectivities"][420, :].nonzero()
idx = np.append(idx, 420)
sc.pl.spatial(
    adata[idx, :],
    color="cell type",
    neighbors_key="spatial_neighbors",
    spot_size=1,
    edges=True,
    edges_width=1,
    img_key=None,
)
圖片.png
In order to get all spots within a specified radius (in units of the spatial coordinates) from each spot as neighbors, the parameter radius should be used.
sq.gr.spatial_neighbors(adata, radius=0.3, coord_type="generic")

adata.obsp["spatial_connectivities"]
adata.obsp["spatial_distances"]

生活很好,有你更好

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
禁止轉載,如需轉載請通過簡信或評論聯系作者。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 230,106評論 6 542
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 99,441評論 3 429
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 178,211評論 0 383
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 63,736評論 1 317
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 72,475評論 6 412
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 55,834評論 1 328
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 43,829評論 3 446
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 43,009評論 0 290
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 49,559評論 1 335
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 41,306評論 3 358
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 43,516評論 1 374
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 39,038評論 5 363
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,728評論 3 348
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 35,132評論 0 28
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 36,443評論 1 295
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 52,249評論 3 399
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 48,484評論 2 379

推薦閱讀更多精彩內容