【舊版空間轉(zhuǎn)錄組Spatial】(二)跑通流程試驗記錄

舊號無端被封,小號再發(fā)一次

更多空間轉(zhuǎn)錄組文章:

1. 新版10X Visium
2. 舊版Sptial

一、運(yùn)行st_pipeline

工作流程概要圖

工作流程概要圖

詳細(xì)工作流程圖

1.1 需要的輸入文件

  • FASTQ文件(讀取1包含空間信息和UMI,讀取2包含基因組序列)
  • 用STAR生成的基因組索引
  • GTF或GFF3格式的注釋文件(使用轉(zhuǎn)錄組時可選)
  • 包含條形碼和數(shù)組坐標(biāo)的文件(查看文件夾“ ids”并選擇正確的一個)。基本上,此文件包含3列(BARCODE,X和Y)。如果數(shù)據(jù)不是條形碼(例如RNA-Seq數(shù)據(jù)),則此文件也是可選的。
  • 數(shù)據(jù)集的名稱

ST管道具有多個參數(shù),這些參數(shù)主要與修剪,映射和注釋有關(guān),但是通常默認(rèn)值已經(jīng)足夠了。安裝ST管道后,您可以看到鍵入“ st_pipeline_run.py --help”的參數(shù)的完整說明。

(base) [Robin@SC-201910280935 pipl_test]$ st_pipeline_run.py --help
usage: st_pipeline_run.py [-h] [--ids [FILE]] --ref-map [FOLDER]
                          [--ref-annotation [FILE]] --expName [STRING]
                          [--allowed-missed [INT]] [--allowed-kmer [INT]]
                          [--overhang [INT]]
                          [--min-length-qual-trimming [INT]]
                          [--mapping-rv-trimming [INT]]
                          [--contaminant-index [FOLDER]] [--qual-64]
                          [--htseq-mode [STRING]] [--htseq-no-ambiguous]
                          [--start-id [INT]] [--no-clean-up] [--verbose]
                          [--mapping-threads [INT]]
                          [--min-quality-trimming [INT]] [--bin-path [FOLDER]]
                          [--log-file [STR]] [--output-folder [FOLDER]]
                          [--temp-folder [FOLDER]]
                          [--umi-allowed-mismatches [INT]]
                          [--umi-start-position [INT]]
                          [--umi-end-position [INT]] [--keep-discarded-files]
                          [--remove-polyA [INT]] [--remove-polyT [INT]]
                          [--remove-polyG [INT]] [--remove-polyC [INT]]
                          [--remove-polyN [INT]] [--filter-AT-content [INT%]]
                          [--filter-GC-content [INT%]] [--disable-multimap]
                          [--disable-clipping]
                          [--umi-cluster-algorithm [STRING]]
                          [--min-intron-size [INT]] [--max-intron-size [INT]]
                          [--umi-filter] [--umi-filter-template [STRING]]
                          [--compute-saturation]
                          [--saturation-points SATURATION_POINTS [SATURATION_POINTS ...]]
                          [--include-non-annotated]
                          [--inverse-mapping-rv-trimming [INT]]
                          [--two-pass-mode] [--strandness [STRING]]
                          [--umi-quality-bases [INT]]
                          [--umi-counting-offset [INT]]
                          [--demultiplexing-metric [STRING]]
                          [--demultiplexing-multiple-hits-keep-one]
                          [--demultiplexing-trim-sequences DEMULTIPLEXING_TRIM_SEQUENCES [DEMULTIPLEXING_TRIM_SEQUENCES ...]]
                          [--homopolymer-mismatches [INT]]
                          [--star-genome-loading [STRING]]
                          [--star-sort-mem-limit STAR_SORT_MEM_LIMIT]
                          [--disable-barcode] [--disable-umi]
                          [--transcriptome] [--version]
                          fastq_files fastq_files

1.1 基礎(chǔ)語法

1.2 運(yùn)行測試程序看看能否跑通

$ cp -r test tests2
$ cd test2
$ mkdir index
$ cd /opt/st_pipeline/test2/config
$ gzip -d Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz
# STAR比對
$ STAR --runThreadN 10  --runMode genomeGenerate --genomeDir ./index \
--genomeFastaFiles ./config/Homo_sapiens.GRCh38.dna.chromosome.19.fa \
--sjdbGTFfile ./config/annotations/Homo_sapiens.GRCh38.79_chr19.gtf
# 運(yùn)行st_pipeline_run.py
$ mkdir results
$ st_pipeline_run.py --expName test2 \
     --ids ./config/idfiles/150204_arrayjet_1000L2_probes.txt \
     --ref-map ./index --log-file log.txt  --output-folder ./results 
     --ref-annotation ./config/annotations/Homo_sapiens.GRCh38.79_chr19.gtf  \                  
     ./input/arrayjet_1002/testdata_R1.fastq 
     ./input/arrayjet_1002/testdata_R2.fastq  

得到結(jié)果:

$ cd results/
$ ls
test2_reads.bed  test2_stdata.tsv

二、運(yùn)行Spatial Transcriptomics Analysis

(base) [Robin@SC-201910280935 data]$ unsupervised.py --help
usage: unsupervised.py [-h] --counts-table-files COUNTS_TABLE_FILES
                       [COUNTS_TABLE_FILES ...] [--normalization [STR]]
                       [--num-clusters [INT]] [--num-exp-genes [FLOAT]]
                       [--num-exp-spots [FLOAT]] [--min-gene-expression [INT]]
                       [--num-genes-keep [INT]] [--clustering [STR]]
                       [--dimensionality [STR]] [--use-log-scale]
                       [--alignment-files ALIGNMENT_FILES [ALIGNMENT_FILES ...]]
                       [--image-files IMAGE_FILES [IMAGE_FILES ...]]
                       [--num-dimensions [INT]] [--spot-size [INT]]
                       [--top-genes-criteria [STR]] [--use-adjusted-log]
                       [--tsne-perplexity [INT]] [--tsne-theta [FLOAT]]
                       [--outdir OUTDIR] [--color-space-plots]


optional arguments:
  -h, --help            show this help message and exit
  --counts-table-files COUNTS_TABLE_FILES [COUNTS_TABLE_FILES ...]
                        One or more matrices with gene counts per feature/spot (genes as columns)
  --normalization [STR]
                        Normalize the counts using:
                        RAW = absolute counts
                        DESeq2 = DESeq2::estimateSizeFactors(counts)
                        DESeq2PseudoCount = DESeq2::estimateSizeFactors(counts + 1)
                        DESeq2Linear = DESeq2::estimateSizeFactors(counts, linear=TRUE)
                        DESeq2SizeAdjusted = DESeq2::estimateSizeFactors(counts + lib_size_factors)
                        RLE = EdgeR RLE * lib_size
                        TMM = EdgeR TMM * lib_size
                        Scran = Deconvolution Sum Factors (Marioni et al)
                        REL = Each gene count divided by the total count of its spot
                        (default: DESeq2)
  --num-clusters [INT]  The number of clusters/regions expected to be found.
                        If not given the number of clusters will be computed.
                        Note that this parameter has no effect with DBSCAN clustering.
  --num-exp-genes [FLOAT]
                        The percentage of number of expressed genes (>= --min-gene-expression) a spot
                        must have to be kept from the distribution of all expressed genes (default: 1)
  --num-exp-spots [FLOAT]
                        The percentage of number of expressed spots a gene
                        must have to be kept from the total number of spots (default: 1)
  --clustering [STR]    What clustering algorithm to use after the dimensionality reduction:
                        Hierarchical = Hierarchical Clustering (Ward)
                        KMeans = Suitable for small number of clusters
                        DBSCAN = Number of clusters will be automatically inferred
                        Gaussian = Gaussian Mixtures Model
                        (default: KMeans)
  --dimensionality [STR]
                        What dimensionality reduction algorithm to use:
                        tSNE = t-distributed stochastic neighbor embedding
                        PCA = Principal Component Analysis
                        ICA = Independent Component Analysis
                        SPCA = Sparse Principal Component Analysis
                        (default: tSNE)
  --use-log-scale       Use log2(counts + 1) values in the dimensionality reduction step
  --alignment-files ALIGNMENT_FILES [ALIGNMENT_FILES ...]
                        One or more tab delimited files containing and alignment matrix for the images as
                                 a11 a12 a13 a21 a22 a23 a31 a32 a33
                        Only useful is the image has extra borders, for instance not cropped to the array corners
                        or if you want the keep the original image size in the plots.
  --image-files IMAGE_FILES [IMAGE_FILES ...]
                        When provided the data will plotted on top of the image
                        It can be one ore more, ideally one for each input dataset
                         It is desirable that the image is cropped to the array
                        corners otherwise an alignment file is needed
  --num-dimensions [INT]
                        The number of dimensions to use in the dimensionality reduction (2 or 3). (default: 2)
  --spot-size [INT]     The size of the spots when generating the plots. (default: 20)
  --top-genes-criteria [STR]
                        What criteria to use to keep top genes before doing
                        the dimensionality reduction (Variance or TopRanked) (default: Variance)
  --use-adjusted-log    Use adjusted log normalized counts (R Scater::normalized())
                        in the dimensionality reduction step (recommended with SCRAN normalization)
  --tsne-perplexity [INT]
                        The value of the perplexity for the t-sne method. (default: 30)
  --tsne-theta [FLOAT]  The value of theta for the t-sne method. (default: 0.5)
  --outdir OUTDIR       Path to output dir

unsupervised.py --counts-table-files test2_stdata.tsv --normalization DESeq2 --num-clusters 5 \
     --clustering KMeans --dimensionality tSNE --image-files HE_Rep6_MOB.jpg --use-log-scale 
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

推薦閱讀更多精彩內(nèi)容