免疫組庫重疊分析
免疫組庫重疊(Repertoire overlap)是度量不同樣本之間TCR或BCR庫相似性的最常用方法。它是通過計算在給定的Repertoire之間共享的克隆型(也稱為“公共”克隆型)的特定統計數據來實現的。immunarch
提供了幾個指標: - 公共克隆型的數量 ( .method = "public"
) - 一個重疊相似性的經典度量指標。
- overlap coefficient,重疊系數 (
.method = "overlap"
) - 重疊相似性的標準化度量。它被定義為交集的大小除以兩個集合中的較小者。 - Jaccard index,Jaccard指數 (
.method = "jaccard"
) - 它衡量有限樣本集之間的相似度,定義為交集的大小除以樣本集并集的大小。 - Tversky index,Tversky指數 (
.method = "tversky"
) - 一種集合上的非對稱相似性度量,它將變體與原型進行比較。如果使用默認參數,則類似于 Dice 的系數。 - cosine similarity,余弦相似度 (
.method = "cosine"
) - 兩個非零向量之間相似度的度量 - Morisita’s overlap index,Morisita重疊指數 (
.method = "morisita"
) - 一種用于計算個體在總體中的分散程度的統計量度。它用于比較樣本之間的重疊。 - incremental overlap,增量重疊 - 計算N個最豐富的克隆型與增量增長的N的重疊(
.method = "inc+METHOD"
例如,"inc+public"
或"inc+morisita"
)。
我們可以使用repOverlap
函數計算不同樣本之間Repertoire的重疊情況。同樣,我們可以將分析結果傳遞給vis()
函數,以完成所有輸出結果的可視化展示。
library(immunarch) # Load the package into R
data(immdata) # Load the test dataset
# 使用不同的度量方法計算Repertoire的重疊情況
imm_ov1 <- repOverlap(immdata$data, .method = "public", .verbose = F)
head(imm_ov1)
# A2-i129 A2-i131 A2-i133 A2-i132 A4-i191 A4-i192 MS1 MS2 MS3 MS4 MS5 MS6
#A2-i129 NA 63 74 69 46 55 30 41 24 35 44 54
#A2-i131 63 NA 56 81 42 64 34 31 33 33 23 49
#A2-i133 74 56 NA 87 49 61 41 44 31 31 44 65
#A2-i132 69 81 87 NA 62 67 47 46 50 48 60 76
#A4-i191 46 42 49 62 NA 55 42 34 41 29 37 49
#A4-i192 55 64 61 67 55 NA 56 37 27 37 56 61
imm_ov2 <- repOverlap(immdata$data, .method = "morisita", .verbose = F)
head(imm_ov2)
# A2-i129 A2-i131 A2-i133 A2-i132 A4-i191 A4-i192
#A2-i129 NA 0.0024642881 0.0011511984 0.0044505612 0.0005804524 0.0024253356
#A2-i131 0.0024642881 NA 0.0011475178 0.0088347844 0.0006212924 0.0019547325
#A2-i133 0.0011511984 0.0011475178 NA 0.0043090343 0.0004456898 0.0016076124
#A2-i132 0.0044505612 0.0088347844 0.0043090343 NA 0.0009178361 0.0023583418
#A4-i191 0.0005804524 0.0006212924 0.0004456898 0.0009178361 NA 0.0005006889
#A4-i192 0.0024253356 0.0019547325 0.0016076124 0.0023583418 0.0005006889 NA
# MS1 MS2 MS3 MS4 MS5 MS6
#A2-i129 0.0003009428 0.0011482287 0.0001797280 0.0014031280 0.0007196454 0.0027679140
#A2-i131 0.0001927309 0.0014283644 0.0002328510 0.0021404462 0.0002198598 0.0034297172
#A2-i133 0.0002194163 0.0018138252 0.0001618185 0.0007751521 0.0002272166 0.0017382456
#A2-i132 0.0004486568 0.0032894737 0.0005874910 0.0073378655 0.0008173229 0.0106015902
#A4-i191 0.0007469433 0.0002730513 0.0001892369 0.0004114056 0.0003530021 0.0008469919
#A4-i192 0.0002977945 0.0007443917 0.0001358868 0.0016537104 0.0003422629 0.0544339382
p1 <- vis(imm_ov1)
p2 <- vis(imm_ov2, .text.size = 2)
p1 + p2
image.png
vis(imm_ov1, "heatmap2")
image.png
我們可以設置更改有效數字的位數:
p1 <- vis ( imm_ov2 , .text.size = 2.5 , .signif.digits = 1 )
p2 <- vis ( imm_ov2 , .text.size = 2 , .signif.digits = 2 )
p1 + p2
image.png
我們還可以使用repOverlapAnalysis
函數對計算得到的重疊指標進行分析。
# Apply different analysis algorithms to the matrix of public clonotypes:
# "mds" - Multi-dimensional Scaling
# MDS降維
repOverlapAnalysis(imm_ov1, "mds")
## Standard deviations (1, .., p=4):
## [1] 0 0 0 0
##
## Rotation (n x k) = (12 x 2):
## [,1] [,2]
## A2-i129 -18.7767715 -18.360817
## A2-i131 29.9586985 -7.870441
## A2-i133 28.1148594 22.629093
## A2-i132 -44.3435640 6.221812
## A4-i191 13.8586515 7.452149
## A4-i192 -14.0065477 27.068830
## MS1 -8.8469009 -8.151574
## MS2 -0.9712073 -1.297017
## MS3 -10.4398629 4.894354
## MS4 0.5131505 10.471309
## MS5 18.5153823 -12.628029
## MS6 6.4241122 -30.429669
# "tsne" - t-Stochastic Neighbor Embedding
# TSNE降維
repOverlapAnalysis(imm_ov1, "tsne")
## DimI DimII
## A2-i129 141.757274 -1.875981
## A2-i131 -336.372028 177.099380
## A2-i133 82.395447 -42.878936
## A2-i132 -11.731661 -41.878646
## A4-i191 38.681498 -109.060994
## A4-i192 169.797839 36.512757
## MS1 116.225222 42.689482
## MS2 3.659358 -70.354712
## MS3 139.036548 18.337403
## MS4 21.703642 -81.842537
## MS5 -320.584081 165.745570
## MS6 -44.569057 -92.492786
## attr(,"class")
## [1] "immunr_tsne" "matrix"
# Visualise the results
# MDS降維可視化
repOverlapAnalysis(imm_ov1, "mds") %>% vis()
image.png
# Visualise the results
# TSNE降維可視化
repOverlapAnalysis(imm_ov1, "tsne") %>% vis()
image.png
# Clusterise the MDS resulting components using K-means
repOverlapAnalysis(imm_ov1, "mds+kmeans") %>% vis()
image.png
構建公共克隆型庫
為了從repertoires列表中構建一個包含所有clonotypes的大型公共克隆型庫,我們可以使用pubRep
該函數。
# Pass "nt" as the second parameter to build the public repertoire table using CDR3 nucleotide sequences
# 使用CDR3區域的核苷酸序列計算構建公共克隆型庫
pr.nt <- pubRep(immdata$data, "nt", .verbose = F)
pr.nt
## CDR3.nt Samples A2-i129
## 1: TGCGCCAGCAGCTTGGAAGAGACCCAGTACTTC 8 1
## 2: TGTGCCAGCAGCTTCCAAGAGACCCAGTACTTC 7 NA
## 3: TGTGCCAGCAGTTACCAAGAGACCCAGTACTTC 7 1
## 4: TGCGCCAGCAGCTTCCAAGAGACCCAGTACTTC 6 2
## 5: TGTGCCAGCAGCCAAGAGACCCAGTACTTC 6 4
## ---
## 75101: TGTGCTTCACAACTCTTATTGGACGAGACCCAGTACTTC 1 NA
## 75102: TGTGCTTCACAAGCCCTACAGGGCACTTTCCATAATTCACCCCTCCACTTT 1 NA
## 75103: TGTGCTTCAGGGCGGGCCTACGAGCAGTACTTC 1 NA
## 75104: TGTGCTTCCGCCGGACCGGACCGGGAGACCCAGTACTTC 1 NA
## 75105: TGTGCTTGCGGGACAGATAACTATGGCTACACCTTC 1 NA
## A2-i131 A2-i133 A2-i132 A4-i191 A4-i192 MS1 MS2 MS3 MS4 MS5 MS6
## 1: NA 1 1 NA 1 NA NA 1 1 1 1
## 2: 1 1 2 1 NA 1 NA NA 2 NA 1
## 3: 1 1 NA 1 1 1 NA 2 NA NA NA
## 4: NA 1 1 NA NA NA 1 NA 1 NA 1
## 5: 2 NA 2 3 1 NA NA NA NA 4 NA
## ---
## 75101: 1 NA NA NA NA NA NA NA NA NA NA
## 75102: NA NA NA NA NA NA NA NA NA 1 NA
## 75103: NA NA NA NA NA 1 NA NA NA NA NA
## 75104: NA 1 NA NA NA NA NA NA NA NA NA
## 75105: NA NA NA NA 1 NA NA NA NA NA NA
# Pass "aa+v" as the second parameter to build the public repertoire table using CDR3 aminoacid sequences and V alleles
# In order to use only CDR3 aminoacid sequences, just pass "aa"
# 使用CDR3區域的氨基酸序列和V等位基因序列計算構建公共克隆型庫
pr.aav <- pubRep(immdata$data, "aa+v", .verbose = F)
pr.aav
## CDR3.aa V.name Samples A2-i129 A2-i131 A2-i133 A2-i132
## 1: CASSLEETQYF TRBV5-1 8 1 NA 2 1
## 2: CASSDSSGGANEQFF TRBV6-4 6 1 1 2 NA
## 3: CASSFQETQYF TRBV5-1 6 3 NA 1 1
## 4: CASSLGETQYF TRBV12-4 6 2 NA NA 4
## 5: CASSDSGGSYNEQFF TRBV6-4 5 NA NA NA 3
## ---
## 74440: CTSSRPTQGAYEQYF TRBV7-2 1 NA NA NA NA
## 74441: CTSSSRAGAGTDTQYF TRBV7-2 1 NA NA NA NA
## 74442: CTSSYPGLAGLKRKETQYF TRBV7-2 1 NA NA NA 1
## 74443: CTSSYRQRPYQETQYF TRBV7-2 1 NA NA NA NA
## 74444: CTSSYSTSGVGQFF TRBV7-2 1 NA NA NA NA
## A4-i191 A4-i192 MS1 MS2 MS3 MS4 MS5 MS6
## 1: NA 2 NA NA 1 1 1 1
## 2: 3 NA NA NA 2 NA NA 12
## 3: NA NA NA 1 NA 1 NA 1
## 4: 3 NA 1 NA NA NA 2 1
## 5: NA 1 1 NA 1 NA NA 1
## ---
## 74440: NA NA NA NA NA NA NA 1
## 74441: NA NA NA NA 1 NA NA NA
## 74442: NA NA NA NA NA NA NA NA
## 74443: NA NA NA NA 1 NA NA NA
## 74444: NA NA NA NA NA 1 NA NA
# You can also pass the ".coding" parameter to filter out all noncoding sequences first:
# 也可以通過設置“.coding=T”參數,事先過濾掉所有非編碼序列:
pr.aav.cod <- pubRep(immdata$data, "aa+v", .coding = T)
# Create a public repertoire with coding-only sequences using both CDR3 amino acid sequences and V genes
# 使用 CDR3 氨基酸序列和 V 基因創建一個僅包含編碼序列的公共庫
pr <- pubRep(immdata$data, "aa+v", .coding = T, .verbose = F)
head(pr)
# CDR3.aa V.name Samples A2-i129 A2-i131 A2-i133 A2-i132 A4-i191 A4-i192 MS1 MS2 MS3
#1: CASSLEETQYF TRBV5-1 8 1 NA 2 1 NA 2 NA NA 1
#2: CASSDSSGGANEQFF TRBV6-4 6 1 1 2 NA 3 NA NA NA 2
#3: CASSFQETQYF TRBV5-1 6 3 NA 1 1 NA NA NA 1 NA
#4: CASSLGETQYF TRBV12-4 6 2 NA NA 4 3 NA 1 NA NA
#5: CASSDSGGSYNEQFF TRBV6-4 5 NA NA NA 3 NA 1 1 NA 1
#6: CASSDSSGSTDTQYF TRBV6-4 5 NA NA NA 4 1 1 NA NA 1
# MS4 MS5 MS6
#1: 1 1 1
#2: NA NA 12
#3: 1 NA 1
#4: NA 2 1
#5: NA NA 1
#6: NA NA 2
# Apply the filter subroutine to leave clonotypes presented only in healthy individuals
# 應用過濾子程序,過濾出只出現在健康個體中克隆型
pr1 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "C"))
# Apply the filter subroutine to leave clonotypes presented only in diseased individuals
# 應用過濾子程序,過濾出只出現在患病個體中克隆型
pr2 <- pubRepFilter(pr, immdata$meta, .by = c(Status = "MS"))
# Divide one by another
pr3 <- pubRepApply(pr1, pr2)
# Plot it
p <- ggplot() +
geom_jitter(aes(x = "Treatment", y = Result), data = pr3)
p
image.png