hello,大家好,生信一線員工這次給大家再分享一個細胞通訊的軟件CrosstalkR,一樣的道理,我們先來看文獻,最后分享一下代碼,文章在CrossTalkeR: Analysis and Visualisation of Ligand Receptor Networks,在這里再次重新強調(diào)一下,做細胞通訊,不能只關注對比中特異的配受體對,更好看看共有的配受體對通訊強度的變化,之前分享過一個方法也分析了通訊強度的變化,文章在單細胞數(shù)據(jù)細胞通訊分析軟件NATMI。希望引起大家足夠的重視,好了,我們開始看看文獻。
Abstract
Motivation: Ligand-receptor (LR) analysis allows the characterization of cellular crosstalk from single cell RNA-seq data. However, current LR methods provide limited approaches for prioritization of cell types, ligands or receptors or characterizing changes in crosstalk between two biological conditions.(這里如前所說,配受體特異性和強度變化同等重要)。
Results:CrossTalkeR is a framework for network analysis and visualisation of LR networks. CrossTalkeR identifies relevant ligands, receptors and cell types contributing to changes in cell communication when contrasting two biological states: disease vs. homeostasis. A case study on scRNA-seq of human myeloproliferative neoplasms reinforces the strengths of CrossTalkeR for characterisation of changes in cellular crosstalk in disease state.(這個地方也說的我們通常的分析思路,大家對于這個應該不陌生)。
Introduction
Understanding cellular crosstalk is vital for uncovering molecular mechanisms associated with cell differentiation and disease progression.()。這一部分主要講了一下幾點:
(1)information of cellular proximity and crosstalk is not captured(單細胞技術的缺點)
(2)current LR inference methods usually predicts, hundreds of potential LR pairs for a given scRNA-seq dataset making their interpretation challenging (目前方法上的缺陷,大部分軟件是這樣的預測配受體)。
(3) most 20 LR methods focus on the analysis of a single scRNA-seq experiment and are not able to characterize changes in cellular crosstalk between pairs of conditions,(這里就體現(xiàn)通訊強度在不同條件下的變化,而不是簡簡單單知道哪些特異性的配受體)。
(4)CrossTalkeR explores network-based measures to rank cell types, receptors,ligands, cell-receptors and cell-ligand pairs regarding their importance in cellular crosstalk in a single or a pair of biological conditions.(怎么樣,分析思路絕對是perfect),并且和cellphoneDB的庫可兼容,看來cellphoneDB一哥的地位無可撼動。
Overview and implementation
我們來看看這個軟件的幾個特點。
1、疾病樣本和正常樣本是分開分析的。(數(shù)據(jù)是否整合我們需要進一步看看)
2、配受體的強度仍然采用平均值相乘的方法
3、對于不同state下的配受體,對比配受體強度的變化(上調(diào)和下調(diào)),及調(diào)控網(wǎng)絡
4、配受體優(yōu)先次序的排序,這個很創(chuàng)新
我們詳細看看每步的過程。
Network Construction
CrossTalkeR constructs two representations of the LR network.
(1)On the cell-cell interaction (CCI) network, the nodes are defined by each cell-type and the edges are weighted by characteristics of the interactions 。 number of LR pairs and sum of weights of LR pairs。(數(shù)量和強度)
(2)On the cell-gene interaction (CGI) network, the nodes represent gene and cell pairs (ligand and sender cell or receptor and receiving cell) and the weights are amsmath igiven by the mean LR expression.(細胞基因的相互網(wǎng)絡)。
(3)differential CCI and CGI networks are obtained by subtracting the condition state network, e.g.disease, from the control state network. In this way, the interactions with positive values are up-regulated in the condition and the negative valued interactions are down-regulated on the experiment (disease) state.(對比分析)。
Network-based analysis
(1)Node Ranking approaches:(這部分我們詳細翻譯一下):
We apply distinct network property methods to rank either the CCI and CGI network. By default, measures take the weights of LR pairs into account.(使用不同的網(wǎng)絡屬性方法對CCI和CGI網(wǎng)絡進行排名。 默認情況下,度量會考慮LR對的權重。 )
不知道大家能不能詳細看出來和之前分析方法的不同。
方法:
這部分是解釋說明,相對簡單。
這個地方說的是基因細胞網(wǎng)絡的構建。
這個地方解釋CCI網(wǎng)絡,和我們通常的網(wǎng)絡差別不大。
研究差別倒是很省事??
這里強調(diào)了,數(shù)據(jù)是在整合的基礎上構建網(wǎng)絡的,整合方法推薦的是Seurat的(恕我直言,der)。
和上面一樣,研究等級。
最后一部分很值得我們注意::
PC analysis of network properties
CrossTalkerR employs PCA as a mean to combine distinct network measures. By default, it provides scores of two first PCs as a mean to rank either gene-cell pairs in GCI networks or cells in CCI networks.(排序的方式居然是PCA),CrossTalkeR also provide importance of variables within the PCs to make then interpretable. We interpret that the first PC capture global interactions, whilethe second PC local interactions. Top up-regulated cell-gene pairs predicted by PC2 include matrix (FN1/MSC, COL1A1/MSC) and TGFB signalling associated genes (TGFB1/Megakaryocyte), as previously discussed.。
我們來看看參考代碼
1 - Step Extract data from Seurat Object
require(EWCE)
require(tibble)
require(biomaRt)
require(tidyr)
require(dplyr)
# Basic function to convert mouse to human gene names
data_ <- readRDS('path_to_data')
alldata <- NormalizeData(alldata, normalization.method = "LogNormalize", scale.factor = 10000)
allgenes <- rownames(alldata)
matrix1 <- as.data.frame(alldata@assays$RNA@data)
matrix1 <- matrix1[rowSums(matrix1[,2:dim(matrix1)[2]])!=0,]
### If you are using a mouse data, then its needed to convert the gene names to human orthologs
human = useEnsembl("ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useEnsembl("ensembl", dataset = "mmusculus_gene_ensembl")
genesV2 = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = rownames(alldata@assays$RNA@data) , mart = mouse, attributesL = c("hgnc_symbol","hgnc_id",'ensembl_gene_id'), martL = human, uniqueRows=T)
print(head(genesV2))
matrix1 <- matrix1[match(genesV2$MGI.symbol,rownames(alldata),nomatch=F),]
matrix1$gene <- genesV2$Gene.stable.ID
#rownames(matrix1) <- genesV2$Gene.stable.ID
### Subseting the matrix
s1 <- grepl('state1',alldata@meta.data$cond)
s2 <- grepl('state2',alldata@meta.data$cond)
s1[match('gene',colnames(matrix1))] <- TRUE
s2[match('gene',colnames(matrix1))] <- TRUE
## Checking the dimensions
print(dim(matrix1[,s1]))
print(dim(matrix1[,s2]))
## If the cluster names are categorical, you will need to convert it to numerical
alldata@meta.data$sub_cell_type <- as.factor(alldata@meta.data$sub_cell_type)
print(levels(alldata@meta.data$sub_cell_type))
levels(alldata@meta.data$sub_cell_type) <- 1:length(levels(alldata@meta.data$sub_cell_type))
print(1:length(levels(alldata@meta.data$sub_cell_type)))
alldata@meta.data$sub_cell_type <- as.numeric(alldata@meta.data$sub_cell_type)
write.table(matrix1[,s1], 's1_filtered_hcount.csv',row.names=T,sep=',')
write.table(matrix1[,s2], 's2_filtered_hcount.csv',row.names=T,sep=',')
metadata <- data.frame(cells=rownames(alldata@meta.data[grepl('state1',alldata@meta.data$stim),]),cluster=alldata@meta.data$sub_cell_type[grepl('state1',alldata@meta.data$stim)])
metadata_s2 <- data.frame(cells=rownames(alldata@meta.data[!grepl('state1',alldata@meta.data$stim),]),cluster=alldata@meta.data$sub_cell_type[!grepl('state1',alldata@meta.data$stim)]) ## Just negate grepl('state1',alldata@meta.data$stim),]
print('Writing Metadata')
write.csv(metadata, 's1_filtered_meta.csv', row.names=FALSE)
write.csv(metadata_tac, 's2_filtered_meta.csv', row.names=FALSE)```
Note that the gene ID needs to be unique, you will need to use an approach to combine multiple mapped orthologs (sum, max or bitwiseor )
Run CellPhoneDB
** The parameters list is available at https://github.com/Teichlab/cellphonedb
Ensembl based ID
#! /bin/bash
mkdir s1 s2 # creating the output folders
cellphonedb method statistical_analysis s1_filtered_meta.csv s1_filtered_hcount.csv --threads 30 --output-path s1/
cellphonedb method statistical_analysis s2_filtered_meta.csv s2_filtered_hcount.csv --threads 30 --output-path s2/
HUGO based ID
#! /bin/bash
mkdir s1 s2 # creating the output folders
cellphonedb method statistical_analysis s1_filtered_meta.csv --counts-data hgnc_symbol s1_filtered_hcount.csv --threads 30 --output-path s1/
cellphonedb method statistical_analysis s2_filtered_meta.csv --counts-data hgnc_symbol s2_filtered_hcount.csv --threads 30 --output-path s2/
Extracting LR
def correct_lr(data):
'''
Invert the RL to LR and R1R2 to r2>r1
'''
import pandas as pd
def swap(a,b): return b,a
data = data.to_dict('index')
for k,v in data.items():
if v['isReceptor_fst'] and v['isReceptor_scn']:
v['isReceptor_fst'],v['isReceptor_scn'] = swap(v['isReceptor_fst'],v['isReceptor_scn'])
v['Ligand'],v['Receptor'] = swap(v['Ligand'],v['Receptor'])
v['Ligand.Cluster'],v['Receptor.Cluster'] = swap(v['Ligand.Cluster'],v['Receptor.Cluster'])
elif v['isReceptor_fst'] and not v['isReceptor_scn']:
v['isReceptor_fst'],v['isReceptor_scn'] = swap(v['isReceptor_fst'],v['isReceptor_scn'])
v['Ligand'],v['Receptor'] = swap(v['Ligand'],v['Receptor'])
v['Ligand.Cluster'],v['Receptor.Cluster'] = swap(v['Ligand.Cluster'],v['Receptor.Cluster'])
res_df = pd.DataFrame.from_dict(data,orient='index')
return (res_df)
def cpdb2df(data,clsmapping):
data = data.fillna(0)
df_data = {}
df_data['Ligand'] = []
df_data['Receptor'] = []
df_data['Ligand.Cluster'] = []
df_data['Receptor.Cluster'] = []
df_data['isReceptor_fst'] = []
df_data['isReceptor_scn'] = []
df_data['MeanLR'] = []
for i in range(data.shape[0]):
pair = list(data['interacting_pair'])[i].split('_')
for j in range(data.iloc[:,12:].shape[1]):
c_pair = list(data.columns)[j+12].split('|')
if float(data.iloc[i,j+12]) !=0.0:
df_data['Ligand'].append(pair[0])
df_data['Receptor'].append(pair[1])
df_data['Ligand.Cluster'].append(clsmapping[c_pair[0]])
df_data['Receptor.Cluster'].append(clsmapping[c_pair[1]])
df_data['isReceptor_fst'].append(list(data['receptor_a'])[i])
df_data['isReceptor_scn'].append(list(data['receptor_b'])[i])
df_data['MeanLR'].append(data.iloc[i,j+12])
data_final = pd.DataFrame.from_dict(df_data)
return(data_final)
def cpdb2df_nocls(data):
'''
When the cluster name is used on CPDB
'''
data = data.fillna(0)
df_data = {}
df_data['Ligand'] = []
df_data['Receptor'] = []
df_data['Ligand.Cluster'] = []
df_data['Receptor.Cluster'] = []
df_data['isReceptor_fst'] = []
df_data['isReceptor_scn'] = []
df_data['MeanLR'] = []
for i in range(data.shape[0]):
pair = list(data['interacting_pair'])[i].split('_')
for j in range(data.iloc[:,12:].shape[1]):
c_pair = list(data.columns)[j+12].split('|')
if float(data.iloc[i,j+12]) !=0.0:
df_data['Ligand'].append(pair[0])
df_data['Receptor'].append(pair[1])
df_data['Ligand.Cluster'].append(c_pair[0])
df_data['Receptor.Cluster'].append(c_pair[1])
df_data['isReceptor_fst'].append(list(data['receptor_a'])[i])
df_data['isReceptor_scn'].append(list(data['receptor_b'])[i])
df_data['MeanLR'].append(data.iloc[i,j+12])
data_final = pd.DataFrame.from_dict(df_data)
return(data_final)
s1 = pd.read_csv('./s1/significant_means.txt',sep='\t')
s2 = pd.read_csv('./s2/ significant_means.txt',sep='\t')
#dict with the mapping
num_to_clust = {'1':'Cluters 1',
'2':'Cluters 2',
'3':'Cluters 3',
'4':'Cluters 4',
'5':'Cluters 5'}
s1_filtered = cpdb2df(s1,num_to_clust)
s2_filtered = cpdb2df(s2,num_to_clust)
s1_filtered = correct_lr(s1_filtered)
s2_filtered = correct_lr(s2_filtered)
s1_filtered.to_csv('s1_filtered_corrected.csv')
s2_filtered.to_csv('s2_filtered_corrected.csv')
CrossTalkeR
library('CrossTalkeR')
# the method always consider the first path as control: the multiple control case will be handle soon
paths <- c('CTR' = 's1_filtered_corrected.csv',
'EXP' = 's1_filtered_corrected.csv',
'EXP1' = 's1_filtered_corrected.csv',)
# Selected gene list
genes <- c('TGFB1', 'PF4')
# Generating the report and the object
data <- generate_report(paths=paths, # paths list
selected_genes=genes, # Selected list
output='/home/nagai/Documents/', # output path
threshold = 0, # threshold of prune edges 0=keep all
out_file='All_Nils.html' report name
)
至于報告:
生活很好,有你更好