10X單細胞(10X空間轉錄組)通訊分析之CytoTalk(從頭構建信號轉導網絡 )

hello,大家好,今天我們來分享一個很好的做細胞通訊的分析軟件,在原有細胞通訊軟件的基礎上更上一層,利用單細胞數據從頭構建信號網絡,今天我們就開參透它,看看這個軟件主要的功能和適用環境。

其實關于細胞通訊的軟件已經分享了很多了,每個軟件都有其特點和優劣勢,這里列舉出來,大家有興趣的可以參考

10X單細胞(10X空間轉錄組)通訊分析之NicheNet

10X單細胞(10X空間轉錄組)通訊分析CellChat之多樣本通訊差異分析

10X單細胞(10X空間轉錄組)通訊分析之CellChat

10X單細胞通訊分析之scMLnet(配受體與TF,差異基因(靶基因)網絡通訊分析)

10X單細胞之細胞通訊篇章-----Connectome

10X單細胞通訊分析之CrosstalkR(特異性和通訊強度的變化都很重要)

10X單細胞通訊分析之ICELLNET

10X空間轉錄組通訊分析章節3

空間通訊分析章節2

10X空間轉錄組做細胞通訊的打開方式

細胞通訊軟件RNAMagnet

單細胞數據細胞通訊分析軟件NATMI

好了,開始我們今天的分享,文章在CytoTalk: De novo construction of signal transduction networks using single-cell RNA-Seq data,今年剛才發表于Science Advances,影響因子13分,我們先來看看文章的內容,最后看一看示例代碼

Abstract

Single-cell technology has opened the door for studying signal transduction in a complex tissue at unprecedented resolution. However, there is a lack of analytical methods for de novo construction of signal transduction pathways using single-cell omics data.(一開始就拋出問題,從頭構建信號轉導),于是作者就開發了一個新的方法,CytoTalk

  • CytoTalk first constructs intracellular and intercellular(細胞內和細胞間) gene-gene interaction networks(這里指配受體) using an information-theoretic measure between two cell types
  • Candidate signal transduction pathways in the integrated network are identified using the prizecollecting Steiner forest algorithm.(信號識別,這個算法我們在方法中看一下)。
    We applied CytoTalk to a single-cell RNA-Seq data set on mouse visual cortex and evaluated predictions using high-throughput spatial transcriptomics data generated from the same tissue.(這個地方注意,10X單細胞和10X空間轉錄組的數據都用到了),Compared to published methods, genes in our inferred signaling pathways have significantly higher spatial expression correlation only in cells that are spatially closer to each other, suggesting improved accuracy of CytoTalk(嗯,效果不錯,挑選出來的配受體有明顯的空間區域性,配受體在空間上都是在鄰近區域交流,很贊),Furthermore, using single-cell RNA-Seq data with receptor gene perturbation, we found that predicted pathways are enriched for differentially expressed genes between the receptor knockout and wild type cells, further validating the accuracy of CytoTalk(這部分在結果中看看),In summary, CytoTalk enables de novo construction of signal transduction pathways and facilitates comparative analysis of these pathways across tissues and conditions.

Introduction,這部分我們提煉一下

  • Signal transduction is the primary mechanism for cell-cell communication
  • Signaling pathways are highly dynamic and crosstalk among them is prevalent.(信號通路是高度動態的,并且它們之間的串擾很普遍。 )。
    重點來了,Due to these two features, simply examining expression levels of ligand and receptor genes cannot reliably capture the overall activities of signaling pathways and interactions among them
    這里提到了[### NicheNet,配受體和靶基因的網絡分析
  • 這些方法的缺陷,However, these methods are based on known annotations of signaling pathways.
    To our knowledge, currently no method exists to perform de novo prediction of the entire signal transduction pathways emanating from the ligand-receptor pairs.,這個思路跟我之前分享的文章 10X單細胞通訊分析之scMLnet(配受體與TF,差異基因(靶基因)網絡通訊分析)應該是一樣的。
    Here we describe the CytoTalk algorithm for de novo construction of signaling network (union of multiple signaling pathways) between two cell types using scRNASeq data.
  • The algorithm first constructs an integrated network consisting of intracellular and inter-cellular functional gene interactions.
  • It then identifies the signaling network by solving a prize-collecting Steiner forest problem.這個專有名詞我們在方法中介紹)。
  • We demonstrate the performance of the algorithm using high throughput spatial transcriptomics(空間轉錄組數據) data and scRNA-Seq data(單細胞數據) with perturbation(攝動; 微擾) to the receptor genes in a signaling pathway。

Results

結果1 、 Wiring of signaling pathways is highly cell type-dependent 信號通路的"接線"與細胞類型高度相關

A hallmark of signal transduction pathways is their high level of cell-type specific wiring pattern.(hallmark
大家應該不陌生吧
),Single-cell transcriptome data allows us to examine the cell typespecific activity of individual signaling pathways beyond just ligand and receptor genes.(這個地方大家注意一下,信號通路的活性高低是可以通過富集的方式計算出來,但是某個信號通路表達水平高低的受到配受體信號的調控)。To this end, we examined the canonical fibroblast growth factor receptor 2 (FGFR2) signaling pathway in two tissue types, mammary gland and skin.(為此,我們檢查了乳腺和皮膚兩種組織中的典型成纖維細胞生長因子受體2(FGFR2)信號傳導途徑。 看來讀文獻對英文水平也很有幫助哈 ??),我們就不著重介紹這個生理過程了,看軟件帶給了我們什么,我們需要知道的是一些受體的激活,導致了一些通路基因的上調,從而改變了一些生物學的功能。

圖片.png

對于一個公共的單細胞數據,這個數據當然是進行注釋過的,計算表達特意分數,preferential expression measure (PEM) (有關PEM的計算我們在方法中討論),for each pathway gene in each involved cell type,發現同一受體(FGFR2)下游的四個典型亞通路顯示驚人的細胞類型特異性活性,具體取決于所涉及的細胞類型。 那也就是說,其實對于相同的受體,不同細胞類型激活的信號通路上是有差別的,The PI3K/AKT pathway is most active for signaling between fibroblasts and luminal epithelial cells in the mammary gland. In contrast, The JAK-STAT pathway is most active for signaling between keratinocyte stem cells and basal cells in skin.
To evaluate the extent of cell type-specific wiring of signaling pathways, we examined all manually annotated signaling pathways in the Reactome database。For each pathway, we computed its cell type-specific activity score。We found that the majority of pathways exhibit high degree of cell typespecific activities(這個我感覺應該就是這樣的吧,不算什么新的發現)。
圖片.png

This is true even for the same cell types but located in different tissues(這個地方是需要格外注意的),In summary, these results highlight the need for analytical tools for de novo construction of complete signaling pathways (instead of ligand-receptor pairs) using single-cell transcriptome data.確實是這樣)。

結果2 Overview of the CytoTalk algorithm 我們提煉一下

CytoTalk is designed for de novo construction of a signal transduction network between two cell types,which is defined as the union of multiple signal transduction pathways.


圖片.png
  • It first constructs a weighted integrated gene network comprised of both intracellular and intercellular functional gene-gene interactions(也就是配受體網絡)。Intracellular functional gene interactions are computed and weighted using mutual information(共同信息) between two genes.Two intracellular networks are connected via crosstalk edges。Ligand-receptor pairs with higher cell-type-specific(細胞類型特異性) gene expression but lower correlated expression within the same cell type (thus more likely to be involved in crosstalk instead of self talk) are assigned higher crosstalk weights.(這個地方重點理解一下,一個配體或者受體gene隨便表達水平較低,但是細胞類型特異性很強,說明這個gene參與了網絡的CrossTalk,不可能是自身隨意產生,這種情況給予更高的權重,很合理)。集成網絡中的節點通過其細胞類型特定的基因表達和與網絡中配體/受體基因的接近程度相結合來加權。 (看來涉及到很多的算法了),We use a network propagation procedure to determine the closeness of a gene to the ligand/receptor gene.With the integrated network as the input, we formulate the identification of signaling network as a prizecollecting Steiner forest (PCSF) problem(這個地方很陌生,大家可以參考文章PRODIGY: personalized prioritization of driver genes)。使用PCSF算法的基本原理是找到一個最佳子網絡,其中包括具有高水平細胞類型特異性表達并與高得分配體-受體對緊密相連的基因。(我們需要知道這個)This optimal subnetwork is defined as the signaling network between the two cell types. The statistical significance of the candidate signaling network is computed using a null score distribution of signaling networks generated using degreepreserving randomized networks.(顯著性檢驗,這部分結果需要在方法中重點關注一下了)。

結果3 Performance evaluation using spatial transcriptomics data(用到小鼠皮層的數據)

We identified signaling networks between the three pairs of cell types, endothelial-microglia (EndoMicro), endothelial-astrocyte (EndoAstro) and astrocyte-neuron (AstroNeuro), respectively。The predicted cell-type-specific signaling networks consist of 481, 404, and 1051 genes and involves 51, 44, and 35 ligand-receptor interactions (crosstalk edges), respectively。Compared to PCSFs identified using 1000 randomized input networks(置換檢驗), all predicted signaling networks have significantly smaller objective function scores and larger fractions of crosstalk edges (empirical p-values < 0.001)

圖片.png

Several predicted ligandreceptor pairs are known to mediate signal transduction between the three cell types.
接下來借助空間數據,這個時候的網絡會考慮到的細胞之間的距離
圖片.png

Our rationale is that cells that are close together are more likely to signal to each other.(這個在10X空間轉錄組上也是同樣適用)。因此,signaling pathway genes are expected to have higher spatial expression correlation in these cells than cells that are further apart.

首先是方法之間的比較
we first asked what fractions of the predicted ligand-receptor pairs are shared among the six methods.(六個方法共同預測的配受體對)。We reason that a more accurate method will have on average a larger fraction of overlapped predictions with all other methods(按照這個說法,作者的軟件最好 ??

圖片.png

然后是對空間數據的研究發現,鄰近的細胞類型更容易發生交流,距離遠的細胞交流較少,其他的方法越沒有這樣的特點。


圖片.png

However, pathways predicted by NicheNet and SoptSC also show significantly larger PCCs compared to random gene pairs among intermediate and distant cell pairs, suggesting that those predictions are false positive predictions.

Taken together, these results demonstrate that CytoTalk has significant improvement over published methods.

結果4 Performance evaluation using scRNA-Seq data without receptor gene expression(受體基因被敲除)。

這種條件下, 作者發現了新的信號通路,當然了,作者的軟件預測準確性最高。


圖片.png

Discussion

We introduce a computational method, CytoTalk, for the construction of cell-typespecific signal transduction pathways using scRNA-Seq data.The input to CytoTalk are scRNA-Seq data and known ligand-receptor interactions. Unlike previous methods using known pathway annotations , CytoTalk constructs full pathways de novo.
反正效果就是好。
In summary, CytoTalk provides a much-needed means for de novo construction of complete cell-type-specific signaling pathways. Comparative analysis of signaling pathways will lead to a better understanding of cell-cell communication in healthy and diseased tissues.

Method

方法1 Construction of intracellular functional gene interaction network

基因共表達網絡,成對基因之間的關系,算法比較陌生,大家可以查一下

圖片.png

2、Crosstalk score of a ligand-receptor pair between two cell types

define a crosstalk score between gene i in cell type A and gene j in cell type B as below. Genes i and j encode a ligand and a receptor or vice versa.


圖片.png

圖片.png

圖片.png
圖片.png

3、Construction of an integrated network between two cell types

我們構建了一個集成的網絡,該網絡由通過已知的配體-受體相互作用連接的兩個細胞內網絡組成。 We collected 1,941 manually annotated ligand-receptor interactions,if the ligand gene and the receptor gene are present in the two intracellular networks, we connect them and denote the edge as a crosstalk edge.

4、重點 De novo identification of signaling network between two cell types

We formulate the identification of a signaling network between two cell types as a prize-collecting Steiner forest (PCSF) problem. Because the forest is a disjoint set of trees, PCSF problem is a generalization of the classical prize-collecting Steiner tree (PCST) problem. The individual signaling pathways are represented as trees, the collection of which (forest) represents the entire signaling network between two cell types.
We define edge costs and node prizes in the integrated network as follows. The z-score normalized edge weights of the integrated network are first scaled to the range of [0, 1]. Edge cost is then defined as 1 ? ???????????????????????????????. Node prize is defined based on both PEM value of a gene and its closeness to the ligand/receptor genes in the network in order to identify signaling networks centered around the crosstalk edges. To capture the closeness, we use a network propagation procedure to calculate a relevance coefficient for each gene in an intracellular network.

圖片.png

where ???????????????????? is the relevance coefficient vector for all genes in the intracellular network at iteration t. ???????????????????? is the initial value of the relevance coefficient vector such that ??????????????????2(??) = 1 if gene i is a ligand or receptor. Otherwise, ??????????????????2(??) = 0. ??′ is a normalized edge weight matrix for an intracellular network, which is defined as ??3 = ??/??/??????/??/??. Here, W is set to the original mutual information matrix and D is defined as a diagonal matrix such that ??(??, ??) is the sum of row i of the matrix W. This network propagation procedure is equivalent to a random walk with restart on the network. ?? is a tuning parameter that controls the balance between prior information (known ligands or receptors) and network smoothing. Node prize of a gene is defined as the product of its PEM value and the relevance coefficient to capture both the cell-type-specificity and the closeness of this gene to the ligand or receptor gene in the network. To avoid extremely large node prizes for ligand or receptor genes, we used ?? = 0.9 in this study.
The PCSF algorithm identifies an optimal forest in a network that maximizes the total amount of node prizes and minimizes the total amount of edge costs in the forest. While PCSF problem is NP-hard and often needs a high computational cost, we employ a PCSF formulation established in and use a highly efficient prizecollecting Steiner tree (PCST) algorithm to identify the PCSF. The objective function of the PCSF problem is defined as below.
圖片.png

where F represents a forest (i.e. multiple disconnected trees) in the integrated network. ??(??) denotes the sum of edge costs in the forest F and ??(?? c ) denotes the sum of node prizes of the remaining subnetwork excluding the forest F from the network. We modify the integrated network by introducing an artificial node and a number of artificial edges to the original network. The artificial edges connect the artificial node to all genes in the original network. The costs of all artificial edges are the same and are defined as ??, which influences the number of trees, k, in the resulting PCSF. ?? is a parameter for balancing the edge costs and node prizes, which influences the size of the resulting PCSF. By tuning parameters ?? and ??, multiple PCSTs can be identified with the artificial node as the root node. For each identified PCST, a PCSF can be obtained by removing the artificial node and artificial edges from the PCST.
We identify the signaling network between two cell types by searching for a robust PCSF across the full parameter space . For each identified PCSF, we compute the occurrence of each edge in all identified PCSFs to construct a background distribution of edge occurrence frequency. Next, we calculate a p-value for each PCSF by comparing the edge occurrence frequency distribution of this PCSF to the distribution of all other identified PCSFs using one-sided Kolmogorov-Smirnov test. The PCSF with the minimum p-value is considered as the most robust signaling network predicted by CytoTalk.
To further evaluate the statistical significance of the identified PCSF, we construct null distributions for the objective function and for the fraction of crosstalk edges in a PCSF using 1000 null PCSFs identified from randomized integrated networks. To generated the randomized networks, we separately shuffle the edges of the two intracellular networks while preserving the node degree distribution, node prizes and crosstalk edges as the original integrated network.

算法理解起來有點難,頭都有點疼了。

我們看看示例代碼

看來腳本都已經封好了,直接用

Input files

  • A comma-delimited “.csv” file containing scRNA-Seq data for each cell type under study. Each file contains the ln-transformed normalized scRNA-Seq data for a cell type with rows as genes (GENE SYMBOL) and columns as cells. The files should be named as: scRNAseq_Fibroblasts.csv, scRNAseq_Macrophages.csv, scRNAseq_EndothelialCells.csv, scRNAseq_CellTypeName.csv

  • A “TwoCellTypes.txt” file indicating the two cell types between which the signaling network is predicted. Please make sure that the cell type names should be consistent with scRNA-Seq data files above.

  • A “LigandReceptor_Human.txt” or "LigandReceptor_Mouse.txt" file listing all known ligand-receptor pairs. The first column (ligand) and the second column (receptor) are separated by a tab (\t). Currently, 1942 and 1855 ligand-receptor pairs are provided for human and mouse, respectively.

  • A “Species.txt” file indicating the species from which the scRNA-Seq data are generated. Currently, “Human” and “Mouse” are supported.

  • A “Cutoff_GeneFilter.txt” file indicating the cutoff for removing lowly-expressed genes in the processing of scRNA-Seq data. The default cutoff value is 0.1, which means that genes expressed in less than 10% of all cells of a given type are removed.

  • A “BetaUpperLimit.txt” file indicating the upper limit of the test values of the algorithm parameter β, which is inversely proportional to the total number of genes in a given cell-type pair after removing lowly-expressed genes in the processing of scRNA-Seq data. Based on preliminary tests, the upper limit of β value is suggested to be 100 (default) if the total number of genes in a given cell-type pair is above 10,000. However, if the total number of genes is below 5000, it is necessary to increase the upper limit of β value to 500.

Please download "CytoTalk_package_v2.0.zip". All example input files are in the /Input/ folder and should be customized and copied into the /CytoTalk/ folder before running. The /CytoTalk/ folder can only be used ONCE for a given cell-type pair. Please use a new /CytoTalk/ folder for analysis of other cell-type pairs.

Run CytoTalk

Copy the input file-added “/CytoTalk/” folder to your working directory and execute the following script:

bash InferSignalingNetwork.sh

[Alternative way] The whole computation above may take 5.5 hours (2.3 GHz 8-Core Intel Core i9, 14 logical cores for parallel computation), of which 4 hours are used for computing pair-wise mutual information between genes in the construction of intracellular networks for the given two cell types. Considering that users may have alternative ways for constructing cell-type-specific intracellular networks, we divide the whole computation into two steps below.

bash InferIntracellularNetwork_part1.sh  # around 4 hours
bash InferIntercellularNetwork_part2.sh  # around 1.5 hours

The outputs of the script "part1.sh" are two comma-delimited files "IntracellularNetwork_TypeA.txt" and "IntracellularNetwork_TypeB.txt", containing the adjacency matrices of two intracellular networks for the given two cell types, respectively. These two files are the inputs of the script "part2.sh", which can generate the final predicted signaling network.

CytoTalk output

The output folder, “/CytoTalk/IllustratePCSF/”, contains a network topology file and six attribute files that are ready for import into Cytoscape for visualization and further analysis of the predicted signaling network between the given two cell types.

Network topology Edge attribute Node attribute
PCSF_edgeSym.sif PCSF_edgeCellType.txt PCSF_edgeCost.txt PCSF_geneCellType.txt,PCSF_geneExp.txt PCSF_genePrize.txt,PCSF_geneRealName.txt

大家不妨試一下吧, 生活很好,有你更好

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容