研究思路
自閉癥的早期診斷標志物這篇推文簡單介紹了這類研究的基本思路。
統計分析
Partial least squares discriminant analysis (PLS-DA) was
performed to examine whether different combinations of
multiple cytokines could be used to differentiate between
child developmental outcomes. Initially, linear regression
analysis was performed on each transformed immune marker
individually using the covariates stated above to generate
residuals for use in the PLS-DA. Eotaxin-2, epithelial
neutrophil-activating protein 78, granulocyte macrophage
colony-stimulating factor, eotaxin-1, interferon-g (IFN-g),
IL-4, monocyte chemoattractant protein 4 (MCP-4), and IL-13
all violated assumptions of linearity in the linear regression
model and were therefore excluded from the PLS-DA. The
PLS-DA was computed using the web-based MetaboAnalyst
software in accordance with the protocol by Xia and Wishart
(24). Analysis was performed using leave-one-out cross-
validation and prediction accuracy performance measure for
determining the number of latent variables. The permutation
statistic was performed using prediction accuracy during
training with 2000 permutations.
采用偏最小二乘判別分析(PLS-DA)檢驗是否可以使用多種細胞因子的不同組合來區分兒童發育結果。最初,使用上述協變量對每個轉化后的免疫標記分別進行線性回歸分析,以生成殘差用于PLS-DA。Eotaxin-2、上皮中性粒細胞活化蛋白78、粒細胞巨噬細胞集落刺激因子、eotaxin-1、干擾素-g (IFN-g)、IL-4、單核細胞趨化蛋白4 (MCP-4)、IL-13均違反線性回歸模型的線性假設,被排除在PLS-DA之外。PLS-DA是由Xia和Wishart(24)根據協議使用基于web的MetaboAnalyst軟件計算出來的。采用無遺漏交叉驗證和預測精度性能指標進行分析,以確定潛在變量的數量。排列統計采用2000個排列的訓練預測精度進行。(機譯)
偏最小二乘判別分析(PLS-DA)
偏最小二乘判別分析(PLS-DA)是一種用于判別分析的多變量統計分析方法。判別分析是一種根據觀察或測量到的若干變量值,來判斷研究對象如何分類的常用統計分析方法。其原理是對不同處理樣本(如觀測樣本、對照樣本)的特性分別進行訓練,產生訓練集,并檢驗訓練集的可信度。
偏最小二乘回歸(Partial least squares regression)與主成分回歸相關,但不是尋找響應變量和自變量之間最大方差超平面,而是通過投影分別將預測變量和觀測變量投影到一個新空間,來尋找一個線性回歸模型。因為數據X和Y都會投影到新空間,PLS系列的方法都被稱為雙線性因子模型(bilinear fator models)。當Y是分類數據時稱為偏最小二乘判別分析(Partial least squares Discriminant Analysis, PLS-DA)。
我的理解:建立一個線性回歸模型來預測分類。
R語言如何進行PLS-DA
ropls: PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data