使用immunarch包進行單細胞免疫組庫數據分析(五):Repertoire overlap and public clonotypes

免疫組庫重疊分析

免疫組庫重疊(Repertoire overlap)是度量不同樣本之間TCR或BCR庫相似性的最常用方法。它是通過計算在給定的Repertoire之間共享的克隆型(也稱為“公共”克隆型)的特定統計數據來實現的。immunarch提供了幾個指標: - 公共克隆型的數量 ( .method = "public") - 一個重疊相似性的經典度量指標。

  • overlap coefficient,重疊系數 ( .method = "overlap") - 重疊相似性的標準化度量。它被定義為交集的大小除以兩個集合中的較小者。
  • Jaccard index,Jaccard指數 ( .method = "jaccard") - 它衡量有限樣本集之間的相似度,定義為交集的大小除以樣本集并集的大小。
  • Tversky index,Tversky指數 ( .method = "tversky") - 一種集合上的非對稱相似性度量,它將變體與原型進行比較。如果使用默認參數,則類似于 Dice 的系數。
  • cosine similarity,余弦相似度 ( .method = "cosine") - 兩個非零向量之間相似度的度量
  • Morisita’s overlap index,Morisita重疊指數 ( .method = "morisita") - 一種用于計算個體在總體中的分散程度的統計量度。它用于比較樣本之間的重疊。
  • incremental overlap,增量重疊 - 計算N個最豐富的克隆型與增量增長的N的重疊(.method = "inc+METHOD"例如,"inc+public""inc+morisita")。

我們可以使用repOverlap函數計算不同樣本之間Repertoire的重疊情況。同樣,我們可以將分析結果傳遞給vis()函數,以完成所有輸出結果的可視化展示。

library(immunarch)  # Load the package into R
data(immdata)  # Load the test dataset

# 使用不同的度量方法計算Repertoire的重疊情況
imm_ov1 <- repOverlap(immdata$data, .method = "public", .verbose = F)
head(imm_ov1)
#        A2-i129 A2-i131 A2-i133 A2-i132 A4-i191 A4-i192 MS1 MS2 MS3 MS4 MS5 MS6
#A2-i129      NA      63      74      69      46      55  30  41  24  35  44  54
#A2-i131      63      NA      56      81      42      64  34  31  33  33  23  49
#A2-i133      74      56      NA      87      49      61  41  44  31  31  44  65
#A2-i132      69      81      87      NA      62      67  47  46  50  48  60  76
#A4-i191      46      42      49      62      NA      55  42  34  41  29  37  49
#A4-i192      55      64      61      67      55      NA  56  37  27  37  56  61

imm_ov2 <- repOverlap(immdata$data, .method = "morisita", .verbose = F)
head(imm_ov2)
#             A2-i129      A2-i131      A2-i133      A2-i132      A4-i191      A4-i192
#A2-i129           NA 0.0024642881 0.0011511984 0.0044505612 0.0005804524 0.0024253356
#A2-i131 0.0024642881           NA 0.0011475178 0.0088347844 0.0006212924 0.0019547325
#A2-i133 0.0011511984 0.0011475178           NA 0.0043090343 0.0004456898 0.0016076124
#A2-i132 0.0044505612 0.0088347844 0.0043090343           NA 0.0009178361 0.0023583418
#A4-i191 0.0005804524 0.0006212924 0.0004456898 0.0009178361           NA 0.0005006889
#A4-i192 0.0024253356 0.0019547325 0.0016076124 0.0023583418 0.0005006889           NA
#                MS1          MS2          MS3          MS4          MS5          MS6
#A2-i129 0.0003009428 0.0011482287 0.0001797280 0.0014031280 0.0007196454 0.0027679140
#A2-i131 0.0001927309 0.0014283644 0.0002328510 0.0021404462 0.0002198598 0.0034297172
#A2-i133 0.0002194163 0.0018138252 0.0001618185 0.0007751521 0.0002272166 0.0017382456
#A2-i132 0.0004486568 0.0032894737 0.0005874910 0.0073378655 0.0008173229 0.0106015902
#A4-i191 0.0007469433 0.0002730513 0.0001892369 0.0004114056 0.0003530021 0.0008469919
#A4-i192 0.0002977945 0.0007443917 0.0001358868 0.0016537104 0.0003422629 0.0544339382

p1 <- vis(imm_ov1)
p2 <- vis(imm_ov2, .text.size = 2)

p1 + p2
image.png
vis(imm_ov1, "heatmap2")
image.png

我們可以設置更改有效數字的位數:

p1  <-  vis ( imm_ov2 , .text.size  =  2.5 , .signif.digits  =  1 )
p2  <-  vis ( imm_ov2 , .text.size  =  2 , .signif.digits  =  2 )

p1 + p2
image.png

我們還可以使用repOverlapAnalysis函數對計算得到的重疊指標進行分析。

# Apply different analysis algorithms to the matrix of public clonotypes:
# "mds" - Multi-dimensional Scaling
# MDS降維
repOverlapAnalysis(imm_ov1, "mds")
## Standard deviations (1, .., p=4):
## [1] 0 0 0 0
## 
## Rotation (n x k) = (12 x 2):
##                [,1]       [,2]
## A2-i129 -18.7767715 -18.360817
## A2-i131  29.9586985  -7.870441
## A2-i133  28.1148594  22.629093
## A2-i132 -44.3435640   6.221812
## A4-i191  13.8586515   7.452149
## A4-i192 -14.0065477  27.068830
## MS1      -8.8469009  -8.151574
## MS2      -0.9712073  -1.297017
## MS3     -10.4398629   4.894354
## MS4       0.5131505  10.471309
## MS5      18.5153823 -12.628029
## MS6       6.4241122 -30.429669
# "tsne" - t-Stochastic Neighbor Embedding
# TSNE降維
repOverlapAnalysis(imm_ov1, "tsne")
##                DimI       DimII
## A2-i129  141.757274   -1.875981
## A2-i131 -336.372028  177.099380
## A2-i133   82.395447  -42.878936
## A2-i132  -11.731661  -41.878646
## A4-i191   38.681498 -109.060994
## A4-i192  169.797839   36.512757
## MS1      116.225222   42.689482
## MS2        3.659358  -70.354712
## MS3      139.036548   18.337403
## MS4       21.703642  -81.842537
## MS5     -320.584081  165.745570
## MS6      -44.569057  -92.492786
## attr(,"class")
## [1] "immunr_tsne" "matrix"
# Visualise the results
# MDS降維可視化
repOverlapAnalysis(imm_ov1, "mds") %>% vis()
image.png
# Visualise the results
# TSNE降維可視化
repOverlapAnalysis(imm_ov1, "tsne") %>% vis()
image.png
# Clusterise the MDS resulting components using K-means
repOverlapAnalysis(imm_ov1, "mds+kmeans") %>% vis()
image.png

構建公共克隆型庫

為了從repertoires列表中構建一個包含所有clonotypes的大型公共克隆型庫,我們可以使用pubRep該函數。

# Pass "nt" as the second parameter to build the public repertoire table using CDR3 nucleotide sequences
# 使用CDR3區域的核苷酸序列計算構建公共克隆型庫
pr.nt <- pubRep(immdata$data, "nt", .verbose = F)
pr.nt
##                                                    CDR3.nt Samples A2-i129
##     1:                   TGCGCCAGCAGCTTGGAAGAGACCCAGTACTTC       8       1
##     2:                   TGTGCCAGCAGCTTCCAAGAGACCCAGTACTTC       7      NA
##     3:                   TGTGCCAGCAGTTACCAAGAGACCCAGTACTTC       7       1
##     4:                   TGCGCCAGCAGCTTCCAAGAGACCCAGTACTTC       6       2
##     5:                      TGTGCCAGCAGCCAAGAGACCCAGTACTTC       6       4
##    ---                                                                    
## 75101:             TGTGCTTCACAACTCTTATTGGACGAGACCCAGTACTTC       1      NA
## 75102: TGTGCTTCACAAGCCCTACAGGGCACTTTCCATAATTCACCCCTCCACTTT       1      NA
## 75103:                   TGTGCTTCAGGGCGGGCCTACGAGCAGTACTTC       1      NA
## 75104:             TGTGCTTCCGCCGGACCGGACCGGGAGACCCAGTACTTC       1      NA
## 75105:                TGTGCTTGCGGGACAGATAACTATGGCTACACCTTC       1      NA
##        A2-i131 A2-i133 A2-i132 A4-i191 A4-i192 MS1 MS2 MS3 MS4 MS5 MS6
##     1:      NA       1       1      NA       1  NA  NA   1   1   1   1
##     2:       1       1       2       1      NA   1  NA  NA   2  NA   1
##     3:       1       1      NA       1       1   1  NA   2  NA  NA  NA
##     4:      NA       1       1      NA      NA  NA   1  NA   1  NA   1
##     5:       2      NA       2       3       1  NA  NA  NA  NA   4  NA
##    ---                                                                
## 75101:       1      NA      NA      NA      NA  NA  NA  NA  NA  NA  NA
## 75102:      NA      NA      NA      NA      NA  NA  NA  NA  NA   1  NA
## 75103:      NA      NA      NA      NA      NA   1  NA  NA  NA  NA  NA
## 75104:      NA       1      NA      NA      NA  NA  NA  NA  NA  NA  NA
## 75105:      NA      NA      NA      NA       1  NA  NA  NA  NA  NA  NA
# Pass "aa+v" as the second parameter to build the public repertoire table using CDR3 aminoacid sequences and V alleles
# In order to use only CDR3 aminoacid sequences, just pass "aa"
# 使用CDR3區域的氨基酸序列和V等位基因序列計算構建公共克隆型庫
pr.aav <- pubRep(immdata$data, "aa+v", .verbose = F)
pr.aav
##                    CDR3.aa   V.name Samples A2-i129 A2-i131 A2-i133 A2-i132
##     1:         CASSLEETQYF  TRBV5-1       8       1      NA       2       1
##     2:     CASSDSSGGANEQFF  TRBV6-4       6       1       1       2      NA
##     3:         CASSFQETQYF  TRBV5-1       6       3      NA       1       1
##     4:         CASSLGETQYF TRBV12-4       6       2      NA      NA       4
##     5:     CASSDSGGSYNEQFF  TRBV6-4       5      NA      NA      NA       3
##    ---                                                                     
## 74440:     CTSSRPTQGAYEQYF  TRBV7-2       1      NA      NA      NA      NA
## 74441:    CTSSSRAGAGTDTQYF  TRBV7-2       1      NA      NA      NA      NA
## 74442: CTSSYPGLAGLKRKETQYF  TRBV7-2       1      NA      NA      NA       1
## 74443:    CTSSYRQRPYQETQYF  TRBV7-2       1      NA      NA      NA      NA
## 74444:      CTSSYSTSGVGQFF  TRBV7-2       1      NA      NA      NA      NA
##        A4-i191 A4-i192 MS1 MS2 MS3 MS4 MS5 MS6
##     1:      NA       2  NA  NA   1   1   1   1
##     2:       3      NA  NA  NA   2  NA  NA  12
##     3:      NA      NA  NA   1  NA   1  NA   1
##     4:       3      NA   1  NA  NA  NA   2   1
##     5:      NA       1   1  NA   1  NA  NA   1
##    ---                                        
## 74440:      NA      NA  NA  NA  NA  NA  NA   1
## 74441:      NA      NA  NA  NA   1  NA  NA  NA
## 74442:      NA      NA  NA  NA  NA  NA  NA  NA
## 74443:      NA      NA  NA  NA   1  NA  NA  NA
## 74444:      NA      NA  NA  NA  NA   1  NA  NA
# You can also pass the ".coding" parameter to filter out all noncoding sequences first:
# 也可以通過設置“.coding=T”參數,事先過濾掉所有非編碼序列:
pr.aav.cod <- pubRep(immdata$data, "aa+v", .coding = T)
# Create a public repertoire with coding-only sequences using both CDR3 amino acid sequences and V genes
# 使用 CDR3 氨基酸序列和 V 基因創建一個僅包含編碼序列的公共庫
pr <- pubRep(immdata$data, "aa+v", .coding = T, .verbose = F)
head(pr)
#          CDR3.aa   V.name Samples A2-i129 A2-i131 A2-i133 A2-i132 A4-i191 A4-i192 MS1 MS2 MS3
#1:     CASSLEETQYF  TRBV5-1       8       1      NA       2       1      NA       2  NA  NA   1
#2: CASSDSSGGANEQFF  TRBV6-4       6       1       1       2      NA       3      NA  NA  NA   2
#3:     CASSFQETQYF  TRBV5-1       6       3      NA       1       1      NA      NA  NA   1  NA
#4:     CASSLGETQYF TRBV12-4       6       2      NA      NA       4       3      NA   1  NA  NA
#5: CASSDSGGSYNEQFF  TRBV6-4       5      NA      NA      NA       3      NA       1   1  NA   1
#6: CASSDSSGSTDTQYF  TRBV6-4       5      NA      NA      NA       4       1       1  NA  NA   1
#   MS4 MS5 MS6
#1:   1   1   1
#2:  NA  NA  12
#3:   1  NA   1
#4:  NA   2   1
#5:  NA  NA   1
#6:  NA  NA   2

# Apply the filter subroutine to leave clonotypes presented only in healthy individuals
# 應用過濾子程序,過濾出只出現在健康個體中克隆型
pr1 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "C"))

# Apply the filter subroutine to leave clonotypes presented only in diseased individuals
# 應用過濾子程序,過濾出只出現在患病個體中克隆型
pr2 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "MS"))

# Divide one by another
pr3 <- pubRepApply(pr1, pr2)

# Plot it
p <- ggplot() +
  geom_jitter(aes(x = "Treatment", y = Result), data = pr3)
p
image.png

參考來源:https://immunarch.com/articles/web_only/v4_overlap.html

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容