hello,大家好,今天我們開一個新的專題,有關TCR數據分析的部分,這部分相對于轉錄組要難的多,我們今天來一個基礎的,計算TCR的distance。有關TCRdist的知識,大家可以參考文獻Quantifiable predictive features define epitope-specific T cell receptor repertoires,IF49分(nature)。有關TCR的基礎知識,大家可以參考文章10X單細胞(10X空間轉錄組)TCR數據分析之TCR 內在調控潛力系統(TiRP)。好了,開始我們今天的分享:
今天我們就搞懂幾個概念
(1)TCR distances
Weighted multi-CDR distances between TCRs were computed using
(這個軟件大家應該聽說過,原本是TCRdist,發的文章是nature,這個軟件對TCRdist進行了改進,軟件主要用于TCR repertoire analysis and visualization),該軟件包已擴展以適應 γδ TCR,當然,軟件開發的部分我們不太關心,主要關心算法。
Briefly, the distance metric in this study is based on comparing TCR β-chain sequences.(β鏈的序列),The tcrdist3 default settings compare TCRs at the CDR1, CDR2, and CDR2.5 and CDR3 positions(當然,我們單細胞的數據只能比較CDR3區域的距離,不過也足夠我們使用了).By default,IMGT aligned CDR1, CDR2, and CDR2.5 amino acids are inferred from TRVB gene names,(看來這里的序列指的是氨基酸序列),using the *01 allele sequences when allele level information is not available。The CDR3 junction sequences are trimmed 3 amino acids on the N-terminal side and 2 amino acids on the C-terminus, positions that are highly conserved and less crucial for mediation of antigen recognition(這個地方確實是研究的重點,單細胞其實我也推薦大家采用氨基酸序列進行分析) 。For two CDR3s with different lengths, a set of consecutive gaps are inserted at a position in the shorter sequence that minimizes the summed substitution penalties based on a BLOSUM62 substitution matrix(這個我們單細胞數據不用擔心). Distances are then the weighted sum of substitution penalties across all CDRs, with the CDR3 penalty weighted three times that of the other CDRs. (距離是所有 CDR 的替換懲罰的加權總和,CDR3 懲罰的權重是其他 CDR 的三倍。 看來不替換,就沒有距離)。
總而言之一句話,依據共享的序列特征來計算TCR之間的相互距離。我們會分析到下面的結果
圖片.png
(2)Optimized TCR-specific radius
既然有了TCR的距離分析,那么我們必然有一個TCR的特異性半徑,半徑內部的TCR序列,具有相同的特異性,這個概念,我們也需要看一看。
To find biochemically similar TCRs while maintaining a high level of specificity, we used the packages
and
to generate an appropriate set of unenriched antigen-na?ve background TCRs.(首先納入背景)。A background repertoire was created for each MIRA(一個TCR的數據庫) set,with each consisting of two parts.First,we combined a set of 100,000 synthetic TCRs generated using the software OLGA(合成的TCR),whose TRBV- and TRBJ-gene frequencies match those in the antigen-enriched repertoire.(這是人工模擬抗原富集的TCR數據),Second we used 100,000 umbilical cord blood TCRs sampled evenly from 8 subjects(真實的數據),這種混合平衡了感興趣的生化鄰域附近的背景序列的密集采樣與代表抗原幼稚庫的常見 TCR 的廣泛采樣。We then adjust for the biased sampling by using the TRBV- and TRBJ-gene frequencies observed in the cord-blood data.(數據進行了一定的矯正)。The adjustment is a weighting based on the inverse of each TCR’s sampling probability.Because we oversampled regions of the “TCR space” near the candidate centroids we were able to estimate the density of the meta-clonotype neighborhoods well below 1 in 200,000. This is important because ideal meta-clonotypes would be highly specific even in repertoires larger than 200,000 sequences.(看來這部分,疾病對于TCR的克隆有很深的影響)。With each candidate centroid, a meta-clonotype was engineered by selecting the maximum distance radius that still controlled the number of neighboring TCRs in the weighted unenriched background to 1 in
106(距離半徑的定義),使用不在臍帶血庫中的 TRBV 基因的候選質心被排除在進一步分析之外,因為需要估計基因頻率來應用上述反向加權。(這個概念其實還是有點~~~??).其實對于半徑的定義,就是為了尋找專一對抗原的motif結構,這也是為什么不直接使用最特意TCR序列的原因。
圖片.png
(3)基礎認知
γδT細胞是執行固有免疫功能的T細胞,其TCR由γ和δ鏈組成。此類T細胞主要分布于腸道呼吸道以及泌尿生殖道等黏膜和皮下組織,在外周血中只占CD3+T細胞的0.5%-1%。γδT細胞具有抗感染和抗腫瘤的作用,可殺傷病毒或細胞內細菌感染的靶細胞,同時通過分泌多種細胞因子發揮免疫調節作用和介導炎癥反應。
αβT細胞占外周血T細胞總數95%以上,識別由MHC分子提呈的蛋白質抗原,具有MHC限制性,是介導機體特異性免疫中的細胞免疫及免疫調節的主要細胞。
通常所說的T細胞指的是αβT細胞。
當然,還有很多的概念和分析點,以及代碼,我們不要貪多,一天學習一點點,吃透,然后進行下一步,越往后越難,基礎一定要打好。
生活很好,有你更好