hello,大家好, 這一次給大家分享一下有關(guān)各個通訊軟件之間的結(jié)果是否具有一致性,當然,做細胞通訊的軟件非常多了,我也分享了很多,但是分享不是目的,用起來才是我們的終極目的,哪個軟件該用,軟件哪個好,優(yōu)劣勢都是什么,今天我們就來看一下。
Comparison of Resources and Methods to infer Cell-Cell Communication from Single-cell RNA Data
Abstract
1、做細胞通訊的軟件很多,Each of them consists of a resource of intercellular interactions prior knowledge and a method to predict potential cell-cell communication events.(每個軟件的配受體庫和算法都不一樣),Yet the impact of the choice of resource and method on the resulting predictions is largely unknown.
2、不同軟件之間的分析比較,We found few unique interactions and a varying degree of overlap among the resources(配受體庫的差異), and observed uneven coverage in terms of pathways and biological categories.
3、在用同一個數(shù)據(jù)進行測試的時候,We found major differences among the highest ranked intercellular interactions inferred by each method even when using the same resources.(方法之間的差異也很大)。
4、The varying predictions lead to fundamentally different biological interpretations, highlighting the need to benchmark resources and methods.(不同的軟件分析出來的結(jié)果不一樣,該用哪個???)
主要的結(jié)論
1、Different methods and resources provided notably different results(意料之中的事情,項目做的多了,早就發(fā)現(xiàn)了這個問題)。
2、The observed disagreement among the methods could have a considerable impact on the interpretation of results(結(jié)果不同,當然生物學解釋就不同,用哪個呢??)。
Introduction
1、細胞通訊的意義,CCC commonly refers to interactions between secreted ligands and plasma membrane receptors(質(zhì)膜受體 ). This picture can be broadened to include secreted enzymes, extracellular matrix proteins, transporters, and interactions that require the physical contact between cells, such as cell-cell adhesion proteins and gap junctions。CCC events are essential for homeostasis, development, and disease, and their estimation is becoming a routine approach in scRNA-seq data analysis(細胞通訊的研究確實非常重要)。
2.1、軟件對于細胞通訊的預測,These CCC tools typically use gene expression information obtained by scRNA-Seq. In general, single cells are clustered by their gene expression profile and cell type identities are assigned to the clusters based on known gene markers.(首先對單細胞數(shù)據(jù)聚類和定義)。
2.2、CCC tools can predict intercellular crosstalk between any pair of clusters, one cluster being the source and the other the target of a CCC event.
3、每個軟件都是一個配受體數(shù)據(jù)庫,The information about which transmitter binds to which receiver is extracted from diverse sources of prior knowledge.(配受體庫都是先驗知識的積累)。
4、Roughly, CCC tools then estimate the likelihood of crosstalk based on the expression level of the transmitter and the receiver in the source and target clusters, respectively.(基本都是這么做的)。
5、每個軟件有兩個主要的組成部分,a resource of prior knowledge on CCC (interactions), and a method to estimate CCC from the known interactions and the dataset at hand
6、雖然每個軟件的配受體和方法都不一樣,但是原則上,any resource could be combined with any method.
7、軟件之間的方法差異(6個軟件),In turn, these different approaches result in diverse scoring systems that are difficult to compare and evaluate.(方法很多,選擇哪一個??缺少一個好的標準)。
圖片.png
關(guān)于Cellchat,大家可以參考文章10X單細胞(10X空間轉(zhuǎn)錄組)通訊分析之CellChat、10X單細胞(10X空間轉(zhuǎn)錄組)通訊分析CellChat之多樣本通訊差異分析,關(guān)于Squidpy,大家可以參加文章空間轉(zhuǎn)錄組細胞類型的距離分析之二---代碼實現(xiàn),10X空間轉(zhuǎn)錄組通訊分析章節(jié)3、關(guān)于Connectome,大家可以參考文章10X單細胞之細胞通訊篇章-----Connectome,關(guān)于iTALK,大家可以參考文章細胞通訊-iTALK使用方法,關(guān)于NATMI,大家可以參考文章單細胞數(shù)據(jù)細胞通訊分析軟件NATMI。
8、軟件之間配受體的不同,The available prior knowledge resources are typically distinct but often show partial overlap。Some of these resources also provide additional details for the interactions such as information about protein complexes、subcellular localisation、and classification into signalling pathways and categories。CCC resources are often manually curated and/or built from other resources, with varying proportions of expert curation and literature support,Some databases gather and harmonize the information contained in the individual resources(數(shù)據(jù)庫的來源五花八門,數(shù)據(jù)庫的影響也是今天研究的一個重點)。
圖片.png
表注:We defined unique and shared interactions, receivers and transmitters between the CCC resources if they could be found in only one or at least two of the resources, respectively.
9.1、軟件之間的比較測試,F(xiàn)irst, we explored the degree of overlap among resources and whether certain resources are biased toward specific biological terms, such as pathways and functional cancer states
9.2、we analysed how different combinations of resources and methods influence CCC inference, by decoupling the methods from their corresponding resources(數(shù)據(jù)庫和軟件的拆開組合,策略如下圖)。
圖片.png
我們來看看分析得到的結(jié)果,講實話,我很驚訝,我知道每個軟件分析結(jié)果不同,但是沒有想到差異這么大。
結(jié)果1、Resource Uniqueness and Overlap(數(shù)據(jù)庫比較,結(jié)論就是大家都不相同,之間的相似性差異也很大)。
首先是各個軟件數(shù)據(jù)庫的來源,Many of these resources share the same original data sources, including general biological databases such as KEGG, Reactome, and STRING,當然,還有很多其他的數(shù)據(jù)庫。
圖片.png
來看看配受體對的差異,As a consequence of their common origins, we noted limited uniqueness across the resources, with mean percentages of 4.6 unique receivers, 5.3 unique transmitters, and 16.8% unique interactions, for all resources(共有性非常低,各個軟件都有其第一無二的配受體對,而且占比差異很大,如下圖)。
圖片.png
Despite the sparse uniqueness among the resources, the pairwise overlap between them varied,有的軟件之間的相似性很高。
圖片.png
圖片.png
圖片.png
關(guān)于Jaccard Index,大家可以參加百度百科Jaccard系數(shù),簡單來講就是兩個數(shù)據(jù)集的交集除以兩個數(shù)據(jù)集的并集。
圖片.png
圖片.png
圖片.png
圖注:Upset plots representing the shared Interactions, Receivers, and Transmitters between all resources (A-C) and all resources except OmniPath (D-F).。
每個配受體數(shù)據(jù)庫,contained on average more than 65% the interactions present in the other resources。
圖片.png
圖片.png
圖注,A) Interactions B) Receivers and C) Transmitters present in each resource when taken from the rest of the resources. Note these plots are asymmetric and represent the % of interactions from the resources on the X axis found in each resource on the Y axis.
配受體庫差異總結(jié),In summary, our results indicate that many of the transmitters, receivers, and interactions are not unique to any single resource, due to their common origins. However, different resources include varying proportions of the collective CCC prior knowledge.(反正都有差異,只是比例大小不同)。
結(jié)果2、Resource Prior Knowledge Bias
首先來看Subcellular Localisation,On average 90% of transmitters and 79% of receivers were annotated as secreted and transmembrane proteins, respectively。(看來分泌型的配受體占主流)。further used the localisations of transmitters and receivers to categorize the interactions as secreted or direct-contact signaling.
圖片.png
圖片.png
圖注,Numbers and Percentages of Subcellular locations annotations of Receivers (A-B) and Transmitters (C-D) for each CCC resource. S, P and T stand for Secreted, Peripheral plasma membrane(外周質(zhì)膜), and Transmembrane plasma membrane proteins, respectively.
observed that all resources were predominantly (74% on average) composed of interactions associated with secreted signalling, while direct-contact signalling constituted a substantially smaller (16% on average) proportion of interactions(分泌型的信號占據(jù)主流)。
圖片.png
圖片.png
圖注,Interactions categorized as neither secreted nor direct-contact were labeled as ‘Other’ and made up the remainder of the interactions
每個數(shù)據(jù)庫分泌型和接觸型的信號占比均不相同,CellChatDB showed an overrepresentation of interactions matched to the category Other
圖片.png
配受體細胞定位的結(jié)論,Our results suggest that localisations of transmitters and receivers were largely uniformly distributed and that secreted signalling was predominant across all resources. Yet, differences were noted between the relative abundance of secreted and direct-contact signalling interactions.(分泌型和接觸型的配受體,每個數(shù)據(jù)庫的比例均不相同).
Functional Term Enrichment(配受體通路的不同),每個數(shù)據(jù)庫覆蓋的通路及數(shù)量都有差別。
圖片.png
圖片.png
interactions associated with innate immune pathways and T-cell receptor categories were under-represented in Guide to Pharmacology, Baccin2019, EMBRACE, Kirouac2010, ICELLNET, CellPhoneDB, and HMPR(免疫相關(guān)的通路差異比較大,有的數(shù)據(jù)庫甚至沒有,但同時也很注釋的數(shù)據(jù)庫有關(guān))。
圖片.png
圖片.png
圖片.png
圖注,Number of matches to A) Interactions, B) Receivers and C) Transmitters, Enrichment Scores for their Receivers and Transmitters (D-E), and the Percentages of Interactions, Receivers and Transmitters (F-H) matched to the NetPath database per resource。
These observations for the WNT pathway were further supported by the relative abundance of HGNC。(不同的數(shù)據(jù)庫注釋也帶來了很大的差異)。
圖片.png
圖片.png
圖注,Number of matches to A) Merged Sets of Receivers and Transmitters, B) Receivers and C) Transmitters, their corresponding Enrichment Scores (D-F), and Percentages (G-I) per resource matched to the HGNC database.
Functional cancer cell states from CancerSEA were also unevenly represented in sets of receivers and transmitters across the resources(差異太大了,大到?jīng)]有思路了)。
圖片.png
圖片.png
圖注,Number of matches to A) Merged Sets of Receivers and Transmitters, B)Receivers and C) Transmitters and their corresponding (D-F) Enrichment Scores, and Percentages (G-I) per resource matched to the CancerSEA database.
運用一個注釋好的數(shù)據(jù)來判斷軟件結(jié)果的一致性,這里我們關(guān)注數(shù)據(jù)的 the interactions between tumour cells subclassified by their resemblance of CRC consensus molecular subtypes (CMS) and immune cells from tumour samples,reasoning that this subset of cell types represents a complex example where CCC events are known to have an important role.
第一個結(jié)果,Interaction overlap
We then used each method-resource combination to infer CCC interactions, assuming that different methods should generally agree on the most relevant CCC events for the same resource and expression data.(這個假設(shè)~~~~~~~??)。To measure the agreement between method-resource combinations, we looked at the overlap between the 500 highest ranked interactions as predicted by each method。Whenever available, author recommendations were used to filter out the false-positive interactions.
結(jié)論1、Our analysis showed considerable differences in the interactions predicted by each of the methods regardless of the resource used(我們的分析表明,無論使用何種資源,每種方法預測的相互作用都有很大差異 ),as the mean Jaccard index per resource ranged from 0.01 to 0.06 (mean = 0.024) when using different methods(真夠低的)。These large discrepancies in the results were further supported by the pairwise comparisons between methods using the same resource, with mean Jaccard indices ranging from 0.063 (CellChat-SingleCellSignalR) to 0.110 (Connectome-NATMI).(也很低)。The overlap among the top predicted interactions was slightly higher when using the same method but with different resources, as Jaccard indices ranged from 0.113 to 0.203 per method (mean = 0.167)(相同的方法,不同的數(shù)據(jù)庫的分析結(jié)果一致性提高了一點,但是絕對值還是很低,我都懷疑之前的分析到底對不對了??)。
圖片.png
圖注,Jaccard indices for the 500 highest ranked interactions obtained from each method-resource combination.
結(jié)論2、Consequently, the highest ranked interactions for each method-resource combination largely showed stronger clustering by method than resource(方法對結(jié)果的影響更大 suggesting that the overlap between these combinations occurs predominantly when using the same method regardless of the resource)。
圖片.png
圖注,Overlap in the 500 highest ranked CCC interactions between different combinations of methods and resources. Method-resource combinations were clustered according to binary ( Jaccard index) distances. SCA refers to the SingleCellSignalR method
關(guān)于配受體的復合物,This analysis showed that the proportion of complexes among the highest ranked hits was 2-23% for CellChat and 10-38% for Squidpy, largely reflecting the relative complex content in each resource.(差異也很大)。
結(jié)論2的總結(jié)1,Our results suggest that the overlap between methods when using the same resource was low
圖片.png
圖片.png
圖片.png
圖注,Upset plot showing the overlap between the 500 highest ranked interactions using the same method with all resources.
結(jié)論2的總結(jié)2,The overlap when using the same method with different resources, albeit higher than that between different methods, was also modest(相同的方法不同的數(shù)據(jù)庫,差異也比較大)。Hence, our results indicate that both the method and the resource had a considerable impact on the predicted interactions.
圖片.png
圖片.png
圖片.png
圖片.png
圖片.png
圖片.png
圖片.png
圖片.png
圖片.png
圖注,Upset plot showing overlap of most relevant interactions for each method with the same resource
結(jié)論3,Next, we asked whether the discrepancies observed between the methods stem from the differences in the cell types inferred as most active in terms of CCC interactions。To this end, we used the 500 highest ranked interactions to examine the cell type activities, defined as the proportion of interactions per cell type, separately as a source and a target of CCC events。結(jié)論和上面的差不多,不同的方法影響很大,as each method largely clustered by itself, regardless of the resource used, including the reshuffled resource. 采用不同的方法,對結(jié)果的影響很大。As a consequence, the disagreement between the methods in which cell types are the most active is expected to have a major impact on the biological interpretation of CCC communication predictions.
圖片.png
圖片.png
圖注,PCA of normalized average interaction rank frequencies per cell pair
目前推斷細胞通訊的缺陷
1、CCC events are mainly predicted based on the average gene expression at the cluster or cell type/state level. Such an assumption inherently suggests that gene expression is informative of the activity of transmitters and receivers However, gene expression provided by scRNA-Seq is typically limited to protein coding genes and the cells within the dataset, and hence does not capture secreted signalling events driven by non-protein molecules or long-distance endocrine signalling events.(這個缺點光靠單細胞數(shù)據(jù)無法解決).
2、CCC inference from scRNA-Seq data assumes that the product of the gene expression of a transmitter and a receiver is a good proxy for their joint activity, and thus does not consider any of the processes preceding transmitter-receiver interactions, including protein translation and processing, secretion, and diffusion.(更無法解決)。
Conclusion
方法尚未完善,我們?nèi)孕枧?/h4>
到底做細胞通訊用哪個方法好呢????不知道讀者有什么想法
生活很好,有你更好~~~