无码中文字幕乱在线观看,女模特私拍,欧美人与ZZo0ZZ00XXXX

作者：ahworld
鏈接：單細胞數據質控-雙細胞預測-scrublet使用教程
來源：微信公眾號-seqyuan
著作權歸作者所有，任何形式的轉載都請聯系作者。

在分析scRNA-seq數據之前，我們必須確保所有細胞barcode均與活細胞相對應。通常基于三個QC協變量執行細胞QC（Quality control）：

每個barcode的數量
每個barcode對應的基因數量
每個barcode的數量中線粒體基因的占比

通過這些QC協變量的分布圖，可以通過閾值過濾掉離群峰。

這些異常的barcodes對應著：

死細胞
細胞膜破損的細胞
雙細胞(doublets)

例如，barcodes計數深度低，檢測到的基因很少且線粒體計數高，這表明細胞的細胞質mRNA已通過破膜滲出，因此，僅位于線粒體中的mRNA仍然在細胞內。相反，具有非預期高計數和檢測到大量基因的細胞可能代表雙細胞。

檢測scRNA-seq中雙細胞的分析鑒定工具總結了以下幾種：

scrublet (python)
DoubletDetection (python)
DoubletDecon (R)
DoubletFinder (R)

這些雙細胞的分析鑒定工具在2019年發表的《單細胞數據分析最佳實踐》中也有推薦(Luecken M D et al, 2019)

本期將對scrublet的使用做一個詳細介紹

scrublet的使用

scrublet文獻:Wolock S L, Lopez R, Klein A M. Scrublet: computational identification of cell doublets in single-cell transcriptomic data[J]. Cell systems, 2019, 8(4): 281-291. e9.
scrublet教程參考來源鏈接

安裝

scrublet為python語言編寫的包，用以下pip命令安裝就可以。

pip install scrublet

使用注意事項

處理來自多個樣本的數據時，請分別對每個樣本運行Scrublet。因為Scrublet旨在檢測由兩個細胞的隨機共封裝形成的technical doublets，所以在merged數據集上可能會表現不佳，因為細胞類型比例不代表任何單個樣品；
檢查doublet score閾值是否合理，并在必要時進行手動調整。并不是所有情況向下doublet score的直方分布圖都是呈現標準的雙峰；
UMAP或t-SNE可視化的結果中，預測的雙細胞應該大體上共定位（可能在多個細胞群中）。如果不是，則可能需要調整doublet score閾值，或更改預處理參數以更好地解析數據中存在的細胞狀態。

準備工作

數據準備

下載來自10X Genomics8k的PBMC數據集并解壓。

wget http://cf.10xgenomics.com/samples/cell-exp/2.1.0/pbmc8k/pbmc8k_filtered_gene_bc_matrices.tar.gz
tar xfz pbmc8k_filtered_gene_bc_matrices.tar.gz

numpy兼容性報錯修復

高版本numpy帶來的cannot import name '_validate_lenghs'報錯修復方案

scrublet包的開發依賴的是比較低版本的numpy，會用到arraypad.py中的_validate_lenghs函數，這個函數在比較高的numpy版本中已經棄用，如果安裝的是高版本numpy，可能在運行scrublet時會有報錯導致中斷：cannot import name '_validate_lenghs'。

考慮到一般其他軟件包依賴高版本numpy的情況比較多，不想再降低numpy的版本，所以讓scrublet能夠運行下去的修復方案為如下：

打開終端，進入Python環境，輸入以下代碼，查看Python安裝位置。

import sys
print(sys.path)

找到arraypad.py的位置 ~/anaconda3/lib/python3.6/site-packages/numpy/lib/arraypad.py，打開文件，在文件最后添加以下代碼，保存退出，問題解決。

def _normalize_shape(ndarray, shape, cast_to_int=True):
    """
    Private function which does some checks and normalizes the possibly
    much simpler representations of ‘pad_width‘, ‘stat_length‘,
    ‘constant_values‘, ‘end_values‘.

    Parameters
    ----------
    narray : ndarray
        Input ndarray
    shape : {sequence, array_like, float, int}, optional
        The width of padding (pad_width), the number of elements on the
        edge of the narray used for statistics (stat_length), the constant
        value(s) to use when filling padded regions (constant_values), or the
        endpoint target(s) for linear ramps (end_values).
        ((before_1, after_1), ... (before_N, after_N)) unique number of
        elements for each axis where `N` is rank of `narray`.
        ((before, after),) yields same before and after constants for each
        axis.
        (constant,) or val is a shortcut for before = after = constant for
        all axes.
    cast_to_int : bool, optional
        Controls if values in ``shape`` will be rounded and cast to int
        before being returned.

    Returns
    -------
    normalized_shape : tuple of tuples
        val                               => ((val, val), (val, val), ...)
        [[val1, val2], [val3, val4], ...] => ((val1, val2), (val3, val4), ...)
        ((val1, val2), (val3, val4), ...) => no change
        [[val1, val2], ]                  => ((val1, val2), (val1, val2), ...)
        ((val1, val2), )                  => ((val1, val2), (val1, val2), ...)
        [[val ,     ], ]                  => ((val, val), (val, val), ...)
        ((val ,     ), )                  => ((val, val), (val, val), ...)

    """
    ndims = ndarray.ndim

    # Shortcut shape=None
    if shape is None:
        return ((None, None), ) * ndims

    # Convert any input `info` to a NumPy array
    shape_arr = np.asarray(shape)

    try:
        shape_arr = np.broadcast_to(shape_arr, (ndims, 2))
    except ValueError:
        fmt = "Unable to create correctly shaped tuple from %s"
        raise ValueError(fmt % (shape,))

    # Cast if necessary
    if cast_to_int is True:
        shape_arr = np.round(shape_arr).astype(int)

    # Convert list of lists to tuple of tuples
    return tuple(tuple(axis) for axis in shape_arr.tolist())


def _validate_lengths(narray, number_elements):
    """
    Private function which does some checks and reformats pad_width and
    stat_length using _normalize_shape.

    Parameters
    ----------
    narray : ndarray
        Input ndarray
    number_elements : {sequence, int}, optional
        The width of padding (pad_width) or the number of elements on the edge
        of the narray used for statistics (stat_length).
        ((before_1, after_1), ... (before_N, after_N)) unique number of
        elements for each axis.
        ((before, after),) yields same before and after constants for each
        axis.
        (constant,) or int is a shortcut for before = after = constant for all
        axes.

    Returns
    -------
    _validate_lengths : tuple of tuples
        int                               => ((int, int), (int, int), ...)
        [[int1, int2], [int3, int4], ...] => ((int1, int2), (int3, int4), ...)
        ((int1, int2), (int3, int4), ...) => no change
        [[int1, int2], ]                  => ((int1, int2), (int1, int2), ...)
        ((int1, int2), )                  => ((int1, int2), (int1, int2), ...)
        [[int ,     ], ]                  => ((int, int), (int, int), ...)
        ((int ,     ), )                  => ((int, int), (int, int), ...)

    """
    normshp = _normalize_shape(narray, number_elements)
    for i in normshp:
        chk = [1 if x is None else x for x in i]
        chk = [1 if x >= 0 else -1 for x in chk]
        if (chk[0] < 0) or (chk[1] < 0):
            fmt = "%s cannot contain negative values."
            raise ValueError(fmt % (number_elements,))
    return normshp

scrublet使用教程

我的測試執行環境是MACOS jupyter notebook，以下代碼為python包的載入和畫圖設置：

%matplotlib inline
import scrublet as scr
import scipy.io
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd

plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.sans-serif'] = 'Arial'
plt.rc('font', size=14)
plt.rcParams['pdf.fonttype'] = 42

讀入10X的scRNA-seq矩陣，讀入raw counts矩陣為scipy sparse矩陣，cells作為行，genes作為列：

input_dir = '/Users/yuanzan/Desktop/doublets/filtered_gene_bc_matrices/GRCh38/'
counts_matrix = scipy.io.mmread(input_dir + '/matrix.mtx').T.tocsc()
genes = np.array(scr.load_genes(input_dir + '/genes.tsv', delimiter='\t', column=1))
out_df = pd.read_csv(input_dir + '/barcodes.tsv', header = None, index_col=None, names=['barcode'])


print('Counts matrix shape: {} rows, {} columns'.format(counts_matrix.shape[0], counts_matrix.shape[1]))
print('Number of genes in gene list: {}'.format(len(genes)))

Counts matrix shape: 8381 rows, 33694 columns
Number of genes in gene list: 33694

初始化Scrublet對象

相關參數為：

expected_doublet_rate，doublets的預期占比，通常為0.05-0.1，結果對該參數不是特別敏感。對于此示例數據，預期的doublets占比來自Chromium用戶指南
sim_doublet_ratio，要模擬的doublets數量相對于轉錄組的觀測值的比例。此值應該足夠高，以使所有的doublet狀態都能很好地由模擬doublets表示。設置得太高會使計算量增大，默認值是2（盡管設置低至0.5的值也對測試的數據集產生非常相似的結果。
n_neighbors，用于構造轉錄組觀測值和模擬doublets的KNN分類器的鄰居數。默認值為round（0.5 * sqrt（n_cells）），通常表現比較好。

scrub = scr.Scrublet(counts_matrix, expected_doublet_rate=0.06)

計算doublet score

運行下面的代碼計算doublet score，內部處理過程包括:

Doublet simulation
Normalization, gene filtering, rescaling, PCA
Doublet score calculation
Doublet score threshold detection and doublet calling

doublet_scores, predicted_doublets = scrub.scrub_doublets(min_counts=2, min_cells=3, min_gene_variability_pctl=85, n_prin_comps=30)

繪制doublet score分布直方圖

Doublet score分布直方圖包括觀察到的轉錄組和模擬的doublet，模擬的doublet直方圖通常是雙峰的。

下面左圖模式對應于由具有相似基因表達的兩個細胞產生的"embedded" doublets；
右圖模式對應"neotypic" doublets，由具有不同基因表達的細胞（例如，不同類型的細胞）產生，這些會在下游分析中引入更多的假象。Scrublet只能檢測"neotypic" doublets。

要call doublets vs. singlets，我們必須設置一個doublet score閾值，理想情況下，閾值應在模擬doublet直方圖的兩種模式之間設置最小值。scrub_doublets()函數嘗試自動識別這一點，在這個測試數據示例中表現比較好。如果自動閾值檢測效果不佳，則可以使用call_doublets()函數調整閾值，例如：

scrub.call_doublets(threshold=0.25)

# 畫doublet score直方圖
scrub.plot_histogram()

降維可視化

降維計算

這個示例采用UMAP降維，還有tSNE可選，作者不推薦用tSNE，因為運行比較慢。

print('Running UMAP...')
scrub.set_embedding('UMAP', scr.get_umap(scrub.manifold_obs_, 10, min_dist=0.3))
print('Done.')

UMAP可視化

scrub.plot_embedding('UMAP', order_points=True)

下面左圖黑色的點為預測的doublets。

# doublets占比
print (scrub.detected_doublet_rate_)
# 0.043789523923159525

把doublets預測結果保存到文件，后續用Seurat等軟件處理的時候可以導入doublets的預測結果對barcode進行篩選。

out_df['doublet_scores'] = doublet_scores
out_df['predicted_doublets'] = predicted_doublets
out_df.to_csv(input_dir + '/doublet.txt', index=False,header=True)
out_df.head()

barcode	doublet_scores	predicted_doublets
AAACCTGAGCATCATC-1	0.020232985898221900	FALSE
AAACCTGAGCTAACTC-1	0.009746972531259230	FALSE
AAACCTGAGCTAGTGG-1	0.013493253373313300	FALSE
AAACCTGCACATTAGC-1	0.087378640776699	FALSE
AAACCTGCACTGTTAG-1	0.02405046655276650	FALSE
AAACCTGCATAGTAAG-1	0.03969184391224250	FALSE
AAACCTGCATGAACCT-1	0.030082836796977200	FALSE

此次測試的jupyter notebook上傳到了我的github-seqyuan，有需要可以下載測試。

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

單細胞數據質控-雙細胞預測-scrublet使用教程

單細胞數據質控-雙細胞預測-scrublet使用教程

scrublet的使用

安裝

使用注意事項

準備工作

數據準備

numpy兼容性報錯修復

scrublet使用教程

初始化Scrublet對象

計算doublet score

繪制doublet score分布直方圖

降維可視化

降維計算

UMAP可視化

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

單細胞數據質控-雙細胞預測-scrublet使用教程

scrublet的使用

安裝

使用注意事項

準備工作

數據準備

numpy兼容性報錯修復

scrublet使用教程

初始化Scrublet對象

計算doublet score

繪制doublet score分布直方圖

降維可視化

降維計算

UMAP可視化

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频