介紹
從集合角度考慮同一類型基因的功能性區別,這也是基因富集分析的出發點。單個基因解析生物學功能還是偏弱,說服力不夠。
過表達分析ORA富集基因類別解析功能變化,一般基于超幾何檢驗分布(DAVID)。
第二代富集方法是使用Functional Class Scoring(FCS)方法,選擇某一特定基因集,計算分組的基因在該基因集的得分,稱之為enrichment score,它避免了ORA篩選差異基因的步驟。方法有GSEA、GLOBALTEST等。
第三代是對單個樣本計算enrichment score,后續可通過傳統統計分析計算富集得分和表型之間的關聯關系。方法有 PLAGE、Z-score和ssGSEA以及GSVA。
方法初解
Gene Set Enrichment Analysis (GSEA): 1.根據表型排序基因集的基因;2.判斷ranks of genes是否和均勻分布有差別(weighted Kolmogorov-Smirnov test)。
GLOBALTEST uses a logistic regression model to determine if samples with similar profiles have similar phenotype by testing if the variance of the coefficients of genes in the gene set is different from 0.
Gene Set Analysis (GSA) uses the maxmean statistic to determine if either up- or down-regulation of genes is the trend for which the evidence is the strongest for a particular gene set.
Single Sample GSEA (SSGSEA) calculates a sample level gene set score by comparing the distribution of gene expression ranks inside and outside the gene set.
The Gene Set Variation Analysis (GSVA) uses a non-parametric kernel to estimate the distribution of the gene expression level across all samples in order to bring the expression profiles to a common scale and then computes the Kolmogorov-Smirnov statistic similar to GSEA.
方法
敏感性比較
假陽性比較
Reference
A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity