10X單細(xì)胞(10X空間轉(zhuǎn)錄組)TCR數(shù)據(jù)分析之TCRdist(2)

hello,大家好,我們繼續(xù)我們的TCR數(shù)據(jù)分析,這一專題會有非常多的內(nèi)容,我們慢慢分享,文獻(xiàn)在Quantifiable predictive features define epitopespecific T cell receptor repertoires,影響因子49(nature)。今天我們的任務(wù)還是要多學(xué)習(xí)一些基礎(chǔ)的概念和算法。

TCRs from T cells that recognize the same pMHC epitope often share conserved sequence features, suggesting that it may be possible to predictively model epitope specificity(關(guān)于基因重排和抗原表位等相關(guān)的基礎(chǔ)知識,在我的文章10X單細(xì)胞(10X空間轉(zhuǎn)錄組)TCR數(shù)據(jù)分析之TCR 內(nèi)在調(diào)控潛力系統(tǒng)(TiRP)),這里強(qiáng)調(diào)的是對于相同的pMHC,TCR富集的序列會含有相同的motif,這個已經(jīng)被無數(shù)的實驗證實,所以,表明有可能對表位特異性進(jìn)行預(yù)測建模。(這也是我們這個專題的終極目的)。
這里就需要我們上一篇提到的內(nèi)容,如果對抗原富集后的TCR進(jìn)行建模分析,首先a distance measure on the space of TCRs(TCR的距離度量) that permits clustering and visualization(這里的聚類和可視化與單細(xì)胞轉(zhuǎn)錄組不同), a robust repertoire diversity metric that accommodates the low number of paired public receptors observed when compared to single-chain analyses(允許少量的其他單鏈序列,畢竟尋找motif), and a distancebased classifier(分類器,這個在機(jī)器學(xué)習(xí)中非常常見) that can assign previously unobserved TCRs to characterized repertoires with robust sensitivity and specificity。
圖片.png
當(dāng)然,具體的抗原表位富集后的TCR序列contains a clustered group of receptors that share core sequence similarities, together with a dispersed set of diverse ‘outlier’ sequences(這是很自然的,這些相似的序列必然擁有相同的motif,從而特異性的結(jié)合抗原表位)。通過識別核心序列中的共享基序,我們能夠突出顯示驅(qū)動 TCR 識別基本要素的關(guān)鍵保守殘基。 (看來這里的序列還是氨基酸序列)。
這里我們測序得到的TCR序列,我們需要總結(jié)和分析的部分是include length, charge, and hydrophobicity of the CDR3 regions, clonal diversity (within individuals), and amino acid sequence sharing (across individuals) following well-established approaches to repertoire analysis。(建立的方法我們后面介紹,總之,很多指標(biāo)需要我們深入分析,而不簡簡單單是基因序列,單細(xì)胞的TCR分析需要我們升級)。
圖片.png
Mean values for CDR3 length, charge, and hydrophobicity tightly clustered for the majority of the epitopes, and all CDR3 features showed substantially overlapping ranges(看來確實可以依據(jù)抗原富集來尋找起作用的motif)。
這里簡單回顧一下作者的發(fā)現(xiàn),(1)found negative correlations between CDR3 charge and peptide charge(CDR3的電荷和肽段電荷成反比,以及 CDR3 長度和肽長度之間)。表明電荷和長度互補(bǔ)可能在某些表位的 pMHC 識別中起作用(基礎(chǔ)知識,了解即可)。(2)Whereas substantial levels of sharing or publicity were observed for individual chains(單鏈比較,很多都是一樣的),當(dāng)考慮配對的 αβ 受體時,觀察到個體之間的共享水平較低(這一點(diǎn)很有意思,單鏈比較有大量的相同,而配對的雙鏈卻鮮有一致的,有意思)。
單細(xì)胞測TCR的作用,By using paired single-cell TCRαβ sequencing, we were able to determine whether V and J segment usage was correlated both within a chain (for example, Vα –Jα , Vβ –Jβ ) and across chains (for example, Vα –Vβ , Vα –Jβ).(尋找相關(guān)性)。
相對于沒有進(jìn)行抗原表位富集的TCR序列,病毒抗原表位識別后的TCR序列found varying degrees of dominance of single and pairwise gene associations。(這個也是在預(yù)料之中)。
圖片.png
  • 圖注:V and J gene segment usage and covariation in epitopespecific responses(V 和 J 基因片段使用和表位特異性反應(yīng)中的協(xié)變). a, Gene segment usage and gene–gene pairing landscapes are illustrated using four vertical stacks(垂直堆疊) (one for each V and J segment) connected by curved paths whose thickness is proportional to the number of TCR clones with the respective gene pairing(就是桑基圖) (each panel is labelled with the four gene segments atop their respective colour stacks and the epitope identifier in the top middle). Genes are coloured by frequency within the repertoire with a fixed colour sequence used throughout the manuscript which begins red (most frequent), green (second most frequent), blue, cyan, magenta, and black. The enrichment of gene segments relative to background frequencies is indicated by up or down arrows with an arrowhead number equal to the log2 of the fold change. b, Jensen–Shannon divergence(有關(guān)JS散度大家可以參考文章KL散度、JS散度、Wasserstein距離) between the observed gene frequency distributions and background frequencies, normalized by the mean Shannon entropy of the two distributions (higher values reflect stronger gene preferences). c, Adjusted mutual information of gene usage correlations between regions (higher values indicate more strongly covarying gene usage). The lower limits of the colour ranges in b and c were chosen to highlight significant changes。 A summary of the number of subjects, total number of TCR sequences
圖片.png
  • 圖注:Gene segment usage and gene–gene pairing landscapes are illustrated graphically using four vertical stacks (one for each V and J segment) connected by curved segments with thickness proportional to the number of TCRs with the respective gene pairing (each panel is labelled with the four gene segments atop their respective colour stacks and the epitope identifier in the top middle). Genes are coloured by frequency within the repertoire with a fixed colour sequence used throughout the manuscript which begins red (most frequent), green (second most frequent), blue, cyan, magenta, and black. Clonally expanded TCRs were reduced to a single data point for this analysis. The number of clones is indicated to the left of each panel. The enrichment of gene segments relative to background frequencies is indicated by up or down arrows, with each successive arrowhead corresponding to an additional twofold deviation (for example, one arrowhead = twofold enrichment, two arrowheads = fourfold enrichment).(和上圖的表現(xiàn)形式一致)。

每個表位特異性反應(yīng)的特征是單個基因的過度表達(dá)以及顯著的基因配對偏好,這就為我們對單獨(dú)的抗原表位進(jìn)行建模尋找motif提供了理論依據(jù)。每個表位特異性基因頻率分布和背景分布之間的 Jensen-Shannon 散度用于量化基因偏好的總大小 (這個需要我們有一點(diǎn)的算法基礎(chǔ))。We quantified the degree of gene usage covariation between pairs of segments using the adjusted mutual information score(這也是重要的一環(huán))。

為了尋找motif,TCR的距離定義就需要排上用場了。(概念和計算原理上篇已經(jīng)說過,CD3的懲罰更重)。

圖片.png
  • 圖注:2D kernel principal components analysis (PCA) projection of the TCRdist landscape coloured by Vα (left panel) and Vβ (right panel) gene usage. Three groups of receptors that correspond to TCR logos and clusters depicted in c are indicated with dashed ellipses.(單細(xì)胞都很常見的方法)
圖片.png
  • 圖注:Epitope-specific TCR landscapes were projected into two dimensions (2D) using kernel PCA analysis applied to the TCRdist distance matrix: TCRs with small TCRdist values tend to project to nearby points in 2D. The same 2D projection is shown in the four panels of each row, coloured by Vα , Jα , Vβ and Jβ gene segment usage (left to right, respectively). The colours are based on gene frequency in the projected repertoire and follow the same sequence used throughout the manuscript: in decreasing order, 1, red; 2, green; 3, blue; 4, cyan; 5, magenta; 6, black; followed by assorted colours for rare frequencies. A summary of number of subjects,
To complement these landscape projections, we performed TCRdist based

clustering of the epitope-specific receptors and constructed hierarchical
distance trees(一個很好的分析軟件,TCRdist)(It is important to note that clonal expansions are not reflected in these repertoire landscape analyses, as each unique receptor is included only once.),不計算重復(fù)),developed a TCR logo representation that summarizes the gene frequencies, CDR3 amino acid sequences, and inferred rearrangement(這個地方也需要注意,大家做過生化實驗的應(yīng)該都懂這個)。主要有一個cluster組成,其他的序列也是相似的結(jié)構(gòu),這就為我們尋找motif提供了便利。除了相似受體的核心cluster之外,每個repertoire還包含彼此明顯不同的受體的多個區(qū)域。
structures of a set of TCRs as a tool to further annotate these clusters


圖片.png
  • 圖注:Average-linkage dendrogram of TCRdist receptor clusters coloured by generation probability, with TCR logos for selected receptor subsets (the branches enclosed in dashed boxes labelled with size of the TCR clusters). Each logo depicts the V- (left side) and J- (right side) gene frequencies, CDR3 amino acid sequences (middle), and inferred rearrangement structure (bottom bars coloured by source region, light grey for the V-region, dark grey for J, black for D, and red for N-insertions) of the grouped receptors. (n = 13 mice, 291 TCR clones.)
盡管 CDR3 序列保守性在 TCRdist 簇標(biāo)識中很明顯,但這些共享的 CDR3 殘基中有許多直接來自 V 和 J 區(qū)的基因組序列,因此反映了觀察到的基因使用偏差,為了尋找CDR3的motif序列,采用了遞歸搜索算法,identified sequence patterns that occur significantly more often in the observed receptors than in two V- and J-gene-matched background sets of receptor sequences(這需要結(jié)構(gòu)生物學(xué)的只是了,知道的太少了,慚愧)。
圖片.png
  • 注:Enriched CDR3 sequence motifs define key features of epitope specificity. The top-scoring CDR3α (left TCR logo) and CDR3β (right TCR logo) sequence motifs are shown for each repertoire. The motif sequence logo is shown at full height (top) and scaled (bottom) by per-column relative entropy to background frequencies derived from TCRs with matching gene-segment composition in order to highlight motif positions under selection. For three epitopes with solved ternary TCR–pMHC structures, the enriched motif positions are mapped onto the 3D structure: motif positions shown in green sticks; peptide in magenta; alpha (beta) chain in yellow (blue) cartoons; selected hydrogen bonds shown as dotted green lines。
propose that these statistically enriched, non-germline-encoded motifs have a critical role in mediating TCR recognition(應(yīng)該是這樣的),對TCR的蛋白結(jié)構(gòu)分析也證明了這一點(diǎn)。所以我們對于TCR的序列分析,能夠識別驅(qū)動 TCR 識別(抗原)essential elements的關(guān)鍵保守殘基,這個分析,太重要了。
接下來應(yīng)用 TCRdist 測量來定量評估表位特異性庫中的受體多樣性和density,采用了一個new diversity metric (TCRdiv) that generalizes Simpson’s diversity index(辛普森多樣性指數(shù),大家可以百度一下,看看這個指數(shù)) by capturing similarity among receptors in addition to exact identity, as Simpson’s diversity index is highly sensitive to sampling noise because of the relative rarity of observing identical αβ pairs among individuals。
Examination of TCRdiv scores for the analysed repertoires for single chains as well as paired receptors clarified trends seen in the earlier analyses(例如:the PB1 repertoire exhibited low diversity in the α -chain and high β -chain diversity)
圖片.png
如上所述,我們的landscape分析表明,每個repertoire都由一組或多組共享相似序列特征的cluster受體以及更多樣化的離群cluster組成。考慮到cluster和發(fā)散的 TCR 的貢獻(xiàn),開發(fā)了一個特定于repertoires的最近鄰評分(NN 距離),它捕獲了每個受體周圍的受體密度(計算為受體與其在repertoires中的最近鄰受體之間的平均 TCRdist)。 Although variation across repertoires was apparent in the NN-distance distributions,大多數(shù)表位表現(xiàn)出近似雙峰分布,其中一個具有低 NN 距離的受體峰代表受體分布的主要和密集采樣的主要cluster,而具有更大 NN 距離的受體的第二個峰反映了異常受體。
圖片.png
為了確認(rèn)這些非成簇受體的抗原特異性,把兩個峰的受體提取出來,然后實驗衡量binding特異性四聚體的能力(識別相應(yīng)抗原的能力)。在每種情況下都確認(rèn)了受體的反應(yīng)性,表明這些不同的異常受體中至少有一些是legitimate,if unconventional, solutions to the problem of epitope specificity,部分解釋了這種現(xiàn)象。
這個軟件還有分類器的功能,幫助我們識別專有T細(xì)胞的motif,比如浸潤腫瘤的TCR序列等等,非常有價值,今天的基礎(chǔ)知識我們就到這里,下一篇我們分享軟件TCRdist的算法和代碼。

生活很好,有你更好

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。
禁止轉(zhuǎn)載,如需轉(zhuǎn)載請通過簡信或評論聯(lián)系作者。

推薦閱讀更多精彩內(nèi)容