10X單細胞通訊分析之ICELLNET

今天我們來分享一下新的10X單細胞通訊分析的方法,名字是ICELLNET,文章在Dissection of intercellular communication using the transcriptome-based framework ICELLNET,2021年2月份發表于NC,影響因子12分,我們先分享文章,最后來看參考代碼。

abstract

這部分首先介紹了細胞通訊存在的挑戰
1、global integration of cell-to-cell communication
2、biological interpretation
3、application to individual cell population transcriptomic profiles、
這幾部分是細胞通訊問題的共識,當然,總是會有解決辦法的。
其次介紹了軟件ICELLNET的優勢
1、an original expert-curated database of ligand–receptor interactions accounting for multiple subunits expression(配受體數據庫,多亞基的情況,跟cellphoneDB相似)。
2、quantification of communication scores(通訊強度,這個大部分軟件都會有)
3、the possibility to connect a cell population of interest with 31 reference human cell types;(這一點不是很理解,我們往下看看)。
4、three visualization modes to facilitate biological interpretation.(可視化)
最后自夸一下ICELLNET is a global, versatile, biologically validated, and easy-to-use framework to dissect cell communication from individual or multiple cell-based transcriptomic profiles(熟悉的味道,??)

introduction

我們來看一下這部分的重點
Most studies in the past decades have focused on a limited number of communication molecules in a given anatomical site or physiological process.The availability of large-scale transcriptomic datasets from several cell types, tissue locations, and cell activation states, opened the possibility of reconstructing cell-to-cell interactions based on the expression of specific ligand–receptor pairs on sender and target cells, respectively.(這也是目前我們通常的做法)。Many of them exploit single-cell RNA-seq datasets to infer communication between groups of cells within the same dataset(這一點已經改進了,配受體數據庫已經擴充了很多)。Despite leading to interesting and often innovative hypotheses,these methods do not integrate putative signals that may come from more distant cells.(這個地方介紹細胞通訊跟距離有關,當然有關系,分泌型的配體有半衰期,但作者怎么分析,我們往下看看)。Also, they cannot be applied to bulk transcriptomic data derived from a given cell population(這個當然是)。Such datasets are numerous in public databases, and can be a source of novel insights into how each cell type may send or receive communication signals.
Another important aspect when inferring cell-to-cell communication is the use of databases of ligand–receptor interactions(配受體庫的選擇). Some are very broad with over 2000 ligand–receptor pairs(例如NATMI),but lack systematic manual or expert curation(這個有的數據庫很多配受體都沒有驗證過)。which may impact the quality and biological relevance of the annotation.Others include lower numbers of ligand–receptor pairs and provide manually curated information from the literature,without necessarily providing systematic combinatorial rules for the association of protein subunits into multimeric ligands or receptors(復合物的考慮,這個當然首推celltalker)。
The last point relates to the granularity that is structuring the biological information into families and subfamilies of functionally and structurally related molecules.(配受體的家族劃分確實很多軟件是不做的,我們只得到了配受體對,卻沒有大類劃分) 。
In this study we develop ICELLNET, a versatile computational framework to infer cell-to-cell communication from a wide range of bulk and single-cell transcriptomic datasets. Each family of communication molecules is expert curated and organized into biologically relevant sub-families.(這個軟件的優勢,看來還是有進步的)。ICELLNET offers an array of visualization tools in order to facilitate biological interpretation and discoveries.(沒有可視化的軟件都是耍流氓)。

Result

第一部分Expert-curated database of ligand–receptor interactions

第一部分首先介紹數據庫
數據庫整合了literature and public databases(可以理解為數據庫的擴充)。
數據庫的特點:
1、robustness of the findings(穩定性)
2、consistency with international classifications and nomenclature(分類)
3、experimental validation of the functionality of the ligand–receptor interaction。(實驗驗證)。
4、We also used consensus reviews from leaders in the field(權威認證)
5、We did not include putative interactions based on protein–protein interaction predictions, as it is done in some other databases(推定的配受體要排除)
這幾部分倒是很全面。
This led to the integration of 380 ligand–receptor interactions into the ICELLNET database(這個數據庫是真的少)。Whenever relevant, we took into account the
multiple subunits of the ligands and the receptors。


圖片.png

Interactions were classified into 6 major families of communication molecules, with a strong emphasis on inflammatory and immune processes: Growth factors, Cytokines, Chemokines, Immune Checkpoints, Notch signaling, and Antigen binding(分類倒是很不錯)。


圖片.png

Cytokine–receptor pairs were mapped in an exhaustive manner, by exploiting a series of reference articles and consensus classifications. They represent 50% of the total interactions included in the database (194 interactions), and were further classified into 7 sub-families according to structural protein motifs: type 1 cytokines, type 2 cytokines, IL-1 family, IL-17 family, TNF family, TGF-? family, and RTK cytokines(對細胞因子進行了詳細的劃分)。
This database is integrating information on both multiple subunits of ligands and receptors, and a classification into molecular families/subfamilies.(數據庫的優勢還是很明顯的,就是有點少)。
第二部分結果Development of a computational pipeline to dissect intercellular

communication.(新的算法)
In the ICELLNET framework, we developed an automatized tool in R script to infer communication between multiple cell types by integrating“
(1)prior knowledge on ligand–receptor interactions
(2)computation of a communication score between pairs of cells based on their
transcriptomic profiles(這個很大眾)
(3)several visualization modes to guide results interpretation.(可視化,也很大眾)
Quantification of intercellular communication was achieved by scoring the intensity of each ligand–receptor interaction between two cell types from their expression profiles


圖片.png

這里我們需要重點注意配受體分數的算法,我們方法中分析。還有就是自己的樣本會與數據庫的基礎分析結果進行比較。
From each transcriptomic profile, all genes or only differentially expressed genes could be used,(這個地方要注意,最好還是差異基因)and no filtering threshold
was applied to gene expression.the genes coding for ligands/receptors were
selected from all 380 interactions to compute the score, but it is also possible to restrict the database to specific families of molecules, depending on the biological question.(大眾的做法)。
A unique feature and strength of ICELLNET is its ability to infer cell-to-cell communication even from an individual cell population-based transcriptome of interest(這不算優勢了,很多軟件可以做了)。ICELLNET separately considers other cell types with known transcriptomic profiles (hereafter called ? partner cells ?) that can connect to the central cell. These can be cell types coming from the same dataset as the central cell, or from any other transcriptomic dataset.(這個地方值得注意,對其他轉錄組數據和自己的目標細胞配受體推斷,其中提到了Human Primary Cell Atlas)。This public dataset includes transcriptomic profiles of 31 human cell types including immune cells, stromal cells, neural cells, and tissuespecific cell types, all generated with the same Affymetrix technology(這就是abstract里面疑問的解答)。

第三部分結果Establishment of a score to assess the communication between

cells.
Since cell-to-cell communication is directional, we considered ligand expression from the central cell, and receptor expression from the partner cells in order to assess outward communication. Conversely, we then selected receptor expression from the central cell, and ligand expression from partner cells in order to assess inward communication。(很簡單),接下來看一個炸彈,For each gene, expression levels were scaled by maximum of gene expression in the dataset, in order to avoid a communication score predominantly driven by highly expressed genes. Indeed, bioactivity of communication molecules varies a lot. Some cytokines, such as IL-12 and IL-4, are very bioactive at low concentrations, and often expressed at very low levels both in transcript and protein. Conversely, many chemokines are produced at much higher levels, without necessarily having a higher bioactivity. Not scaling the data before inferring a communication score would systematically favor a few highly expressed molecules, and would not allow detecting the contribution of important molecules expressed at much lower levels.(這個地方值得深思,我相信大家做細胞通訊都是使用均一化的數據,使用scale的數據非常少見,但作者給出的理由也很充分)。

分析結果第四部分ICELLNET offers different graphical representations allowing multiple layers of interpretation.(可視化)
圖片.png

至于結果的后面都是一些應用案例了,不用看都知道效果不錯,不然肯定發不出來。

Method,我們關注重點

Gene expression matrix scaling method. After selecting the genes corresponding to the ligands and/or receptors from the transcriptional profiles, each ligand/receptor gene expression is scaled by maximum of gene expression among all the conditions and then multiplied by 10, to have values ranging from 0 to 10(這個地方要不要借鑒值得一試). For each gene, the maximum value (10) is defined as the mean of expression of the 5% highest values of expression for RNA-seq and microarray datasets. Outliers are rescaled at 10 if above maximum value.
接下來是難點,score的計算Intercellular communication score computation.
To score the intensity of a particular ligand–receptor interaction between a central cell and a given partner cell, we considered the product of the expression of the ligand in the central cell and of the cognate receptor in the partner cells.Formally, if lij is the average expression level of ligand i by the central cell in the experimental condition j, and rik
is the average expression of the corresponding receptor by cell type k, the intensity sij,k of the corresponding interaction was quantified by sij,k = rik * lij (還是平均值相乘)。For interactions requiring
multiple components of the ligand and/or of the receptor, we considered a geometric
average of the receptor components.(對于復合物的計算采用復合物表達的平均值,這個相對于cellphoneDB的方法有差異)。To assign a global score Sj,k to the communication between the central cell in the condition j and cell type k, a composite score was defined by summing up the intensity of all the possible ligand–receptor interactions,

圖片.png

圖片.png

當然還有一些其他注意的地方,這里我們就不多說了。

最后,我們來看看實例代碼

首先加載模塊

library(BiocGenerics)
library("org.Hs.eg.db")
library("hgu133plus2.db")
library(jetset)
library(ggplot2)
library(dplyr)
library(icellnet)
library(gridExtra)
library(Seurat)

先來看一看配受體庫

db=as.data.frame(read.csv(curl::curl(url="https://raw.githubusercontent.com/soumelis-lab/ICELLNET/master/data/ICELLNETdb.tsv"), sep="\t",header = T, check.names=FALSE, stringsAsFactors = FALSE, na.strings = ""))
圖片.png

確實進行了詳細的劃分,值得學習。
1 - Load Seurat object

#Load data
seurat <- readRDS(file = "Lupus_Seurat_SingleCell_Landscape.Rds")
seurat <- NormalizeData(seurat)
seurat <- ScaleData(seurat)

#only for UMAP visualization, not for ICELLNET purpose
seurat <- FindVariableFeatures(seurat, selection.method = "vst", nfeatures = 2000)
seurat <- RunPCA(seurat)
seurat <- RunUMAP(seurat, dims = 1:50)
DimPlot(seurat, reduction = 'umap', group.by = 'author_annotation', label = T)

如果我們之前做過Seurat分析了,這一步不需要再進行這么多的處理了。

2 - Retrieve gene expression matrix

a - Compute manually average gene expression per cluster without filtering

# Taking into account the total nb of cells in each cluster
filter.perc=0
average.clean= sc.data.cleaning(object = seurat, db = db, filter.perc = filter.perc, save_file = T, path="path/", force.file = F)

b - Compute manually average gene expression per cluster with filtering for gene expression by a defined cell percentage at a cluster level

ICELLNET offers the possibility to filter the initial gene expression matrix to keep genes at least expressed by defined percentage of cell in their respective cluster (below 2%):

filter.perc=2
average.clean= sc.data.cleaning(object = seurat, db = db2, filter.perc = filter.perc, save_file = T, path="path/", force.file = F)

This filtering allows to remove all the genes that are expressed by a very low number of cells in some clusters, to avoid false negative cell-cell interactions scores. If you are applying ICELLNET for the first time on your dataset, we advice to apply first ICELLNET without filtering, and then with filtering at 2% to see the differences and filtered genes. This will help a lot in the analysis and for biological interpretation of the data(挑選配受體)。

3 - Apply icellnet pipeline on cluster of interest

In this example, we investigate cDC to T cell communication from CM3 cluster (= conventional dendritic cells, 82 cells), to either CT3b or CT0a clusters (CT3b=TFH-like cells, 50 cells ; CT0a = effector memory CD4+ T cells, 220 cells).

Format CC.data and PC.data and PC.target


data.icell=as.data.frame(gene.scaling(as.data.frame(average.clean), n=1, db=db))

PC.data=as.data.frame(data.icell[,c("CT3b","CT0a", "Symbol")], row.names = rownames(data.icell))

PC.target=data.frame("Class"=c("CT3b","CT0a"), "ID"= c("CT3b","CT0a"), "Cell_type"=c("CT3b","CT0a"))
rownames(PC.target)=c("CT3b","CT0a")

my.selection=c("CT3b","CT0a")

Compute intercellular communication scores

We investigate conventional dendritic cells (cDCs, CM3 cluster) to T cell (either CT3b or CT0a clusters) outward communication, so this means that we consider ligands expressed by cDCs and receptors expressed by T cells to compute intercellular communication scores. Outward communication -> direction = "out"

score.computation.1= icellnet.score(direction="out", PC.data=PC.data, 
                                    CC.data= as.data.frame(data.icell[,c("CM3")], row.names = rownames(data.icell)),  
                                    PC.target = PC.target, PC=my.selection, CC.type = "RNAseq", 
                                    PC.type = "RNAseq",  db = db)
score1=as.data.frame(score.computation.1[[1]])
lr1=score.computation.1[[2]]

Visualisation of contribution of family of molecules to communication scores

# label and color label if you are working families of molecules already present in the database
my.family=c("Growth factor","Chemokine","Checkpoint","Cytokine","Notch family","Antigen binding")
family.col = c( "Growth factor"= "#AECBE3", "Chemokine"= "#66ABDF", "Checkpoint"= "#1D1D18"  ,
            "Cytokine"="#156399", "Notch family" ="#676766", "Antigen binding" = "#12A039",  "other" = "#908F90",  "NA"="#908F90")

ymax=round(max(score1))+1 #to define the y axis range of the barplot

LR.family.score(lr=lr1, my.family=my.family, db.couple=db.name.couple, plot=F) # table of contribution of each family of molecule to the scores

LR.family.score(lr=lr1, my.family=my.family, db.couple=db.name.couple, plot=T, title="DC-T comm", family.col=family.col) #display barplot

圖片.png

Visualisation of highest and most different interactions between the two conditions (selection of topn=30 interactions):

30 first most contributing interactions (sort.by="sum")

colnames(lr1)=c("CM3_to_CT3b", "CM3_to_CT0a")
LR.balloon.plot(lr = lr1, thresh = 0 , topn=30 , sort.by="sum",  db.name.couple=db.name.couple, family.col=family.col, title="Most contributing interactions")

圖片.png

30 first most different interactions between the conditions (sort.by="var")

colnames(lr1)=c("CM3_to_CT3b", "CM3_to_CT0a")
LR.balloon.plot(lr = lr1, thresh = 0 , topn=30 , sort.by="var",  db.name.couple=db.name.couple, family.col=family.col, title="Most contributing interactions")

圖片.png

Remarks on biological interpretation:
ICELLNET will always set, for each gene, maximum gene expression value at 10. Then, the maximum score that you can obtain for an individual interaction is 100 (10 for the ligand, 10 for the receptor).(注意分析得到的配受體強度)

This means that high interaction scores does not mean high expression. You should come back to the initial SeuratObject to look at individual gene expression, and that the ligand/receptor of interest if effectively expressed by the cluster.(得到的結果需要手動檢驗)

Filtering of genes expressed by each cluster according to cell percentage expressing the gene (= with counts >0) for each cluster can be an option to remove false-negative interactions scores. This can be done with the sc.data.clean function, by setting filter.perc to a defined value (2 for 2%, 5 for 5% etc...). Filtered genes (expressed by a number of cells among the cluster below the percentage) will be set to 0.(挑選基因)

最后圈圖的畫法

network.create(score1,PC.col = color)
圖片.png

最后呢,問各位道友一句做細胞通訊是不是要對輸入的矩陣做Scale呢?,不知道大家自己的答案是怎么樣的

生活很好,有你更好

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
禁止轉載,如需轉載請通過簡信或評論聯系作者。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 228,786評論 6 534
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 98,656評論 3 419
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 176,697評論 0 379
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 63,098評論 1 314
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 71,855評論 6 410
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 55,254評論 1 324
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 43,322評論 3 442
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 42,473評論 0 289
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 49,014評論 1 335
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 40,833評論 3 355
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 43,016評論 1 371
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,568評論 5 362
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,273評論 3 347
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 34,680評論 0 26
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 35,946評論 1 288
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 51,730評論 3 393
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 48,006評論 2 374

推薦閱讀更多精彩內容