舊號無端被封,小號再發(fā)一次
更多空間轉(zhuǎn)錄組文章:
1. 新版10X Visium
- 【10X空間轉(zhuǎn)錄組Visium】(一)Space Ranger 1.0.0(更新于20191205)
- 【10X空間轉(zhuǎn)錄組Visium】(二)Loupe Browser 4.0.0
- 【10X空間轉(zhuǎn)錄組Visium】(三)跑通Visium全流程記錄
- 【10X空間轉(zhuǎn)錄組Visium】(四)R下游分析的探索性代碼示例
- 【10X空間轉(zhuǎn)錄組Visium】(五)Visium原理、流程與產(chǎn)品
- 【10X空間轉(zhuǎn)錄組Visium】(六)新版Seurat v3.2分析Visium空間轉(zhuǎn)錄組結(jié)果的代碼實操
- 【10X空間轉(zhuǎn)錄組Visium】(七)思考新版Seurat V3.2作者在Github給予的回答
2. 舊版Sptial
- 【舊版空間轉(zhuǎn)錄組Spatial】(一)ST Spot Detector使用指南
- 【舊版空間轉(zhuǎn)錄組Spatial】(二)跑通流程試驗記錄
- 【舊版空間轉(zhuǎn)錄組Spatial】(三)ST Spot Detector實操記錄
一、運(yùn)行st_pipeline
工作流程概要圖
工作流程概要圖
詳細(xì)工作流程圖
1.1 需要的輸入文件
- FASTQ文件(讀取1包含空間信息和UMI,讀取2包含基因組序列)
- 用STAR生成的基因組索引
- GTF或GFF3格式的注釋文件(使用轉(zhuǎn)錄組時可選)
- 包含條形碼和數(shù)組坐標(biāo)的文件(查看文件夾“ ids”并選擇正確的一個)。基本上,此文件包含3列(BARCODE,X和Y)。如果數(shù)據(jù)不是條形碼(例如RNA-Seq數(shù)據(jù)),則此文件也是可選的。
- 數(shù)據(jù)集的名稱
ST管道具有多個參數(shù),這些參數(shù)主要與修剪,映射和注釋有關(guān),但是通常默認(rèn)值已經(jīng)足夠了。安裝ST管道后,您可以看到鍵入“ st_pipeline_run.py --help”的參數(shù)的完整說明。
(base) [Robin@SC-201910280935 pipl_test]$ st_pipeline_run.py --help
usage: st_pipeline_run.py [-h] [--ids [FILE]] --ref-map [FOLDER]
[--ref-annotation [FILE]] --expName [STRING]
[--allowed-missed [INT]] [--allowed-kmer [INT]]
[--overhang [INT]]
[--min-length-qual-trimming [INT]]
[--mapping-rv-trimming [INT]]
[--contaminant-index [FOLDER]] [--qual-64]
[--htseq-mode [STRING]] [--htseq-no-ambiguous]
[--start-id [INT]] [--no-clean-up] [--verbose]
[--mapping-threads [INT]]
[--min-quality-trimming [INT]] [--bin-path [FOLDER]]
[--log-file [STR]] [--output-folder [FOLDER]]
[--temp-folder [FOLDER]]
[--umi-allowed-mismatches [INT]]
[--umi-start-position [INT]]
[--umi-end-position [INT]] [--keep-discarded-files]
[--remove-polyA [INT]] [--remove-polyT [INT]]
[--remove-polyG [INT]] [--remove-polyC [INT]]
[--remove-polyN [INT]] [--filter-AT-content [INT%]]
[--filter-GC-content [INT%]] [--disable-multimap]
[--disable-clipping]
[--umi-cluster-algorithm [STRING]]
[--min-intron-size [INT]] [--max-intron-size [INT]]
[--umi-filter] [--umi-filter-template [STRING]]
[--compute-saturation]
[--saturation-points SATURATION_POINTS [SATURATION_POINTS ...]]
[--include-non-annotated]
[--inverse-mapping-rv-trimming [INT]]
[--two-pass-mode] [--strandness [STRING]]
[--umi-quality-bases [INT]]
[--umi-counting-offset [INT]]
[--demultiplexing-metric [STRING]]
[--demultiplexing-multiple-hits-keep-one]
[--demultiplexing-trim-sequences DEMULTIPLEXING_TRIM_SEQUENCES [DEMULTIPLEXING_TRIM_SEQUENCES ...]]
[--homopolymer-mismatches [INT]]
[--star-genome-loading [STRING]]
[--star-sort-mem-limit STAR_SORT_MEM_LIMIT]
[--disable-barcode] [--disable-umi]
[--transcriptome] [--version]
fastq_files fastq_files
1.1 基礎(chǔ)語法
1.2 運(yùn)行測試程序看看能否跑通
$ cp -r test tests2
$ cd test2
$ mkdir index
$ cd /opt/st_pipeline/test2/config
$ gzip -d Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz
# STAR比對
$ STAR --runThreadN 10 --runMode genomeGenerate --genomeDir ./index \
--genomeFastaFiles ./config/Homo_sapiens.GRCh38.dna.chromosome.19.fa \
--sjdbGTFfile ./config/annotations/Homo_sapiens.GRCh38.79_chr19.gtf
# 運(yùn)行st_pipeline_run.py
$ mkdir results
$ st_pipeline_run.py --expName test2 \
--ids ./config/idfiles/150204_arrayjet_1000L2_probes.txt \
--ref-map ./index --log-file log.txt --output-folder ./results
--ref-annotation ./config/annotations/Homo_sapiens.GRCh38.79_chr19.gtf \
./input/arrayjet_1002/testdata_R1.fastq
./input/arrayjet_1002/testdata_R2.fastq
得到結(jié)果:
$ cd results/
$ ls
test2_reads.bed test2_stdata.tsv
二、運(yùn)行Spatial Transcriptomics Analysis
(base) [Robin@SC-201910280935 data]$ unsupervised.py --help
usage: unsupervised.py [-h] --counts-table-files COUNTS_TABLE_FILES
[COUNTS_TABLE_FILES ...] [--normalization [STR]]
[--num-clusters [INT]] [--num-exp-genes [FLOAT]]
[--num-exp-spots [FLOAT]] [--min-gene-expression [INT]]
[--num-genes-keep [INT]] [--clustering [STR]]
[--dimensionality [STR]] [--use-log-scale]
[--alignment-files ALIGNMENT_FILES [ALIGNMENT_FILES ...]]
[--image-files IMAGE_FILES [IMAGE_FILES ...]]
[--num-dimensions [INT]] [--spot-size [INT]]
[--top-genes-criteria [STR]] [--use-adjusted-log]
[--tsne-perplexity [INT]] [--tsne-theta [FLOAT]]
[--outdir OUTDIR] [--color-space-plots]
optional arguments:
-h, --help show this help message and exit
--counts-table-files COUNTS_TABLE_FILES [COUNTS_TABLE_FILES ...]
One or more matrices with gene counts per feature/spot (genes as columns)
--normalization [STR]
Normalize the counts using:
RAW = absolute counts
DESeq2 = DESeq2::estimateSizeFactors(counts)
DESeq2PseudoCount = DESeq2::estimateSizeFactors(counts + 1)
DESeq2Linear = DESeq2::estimateSizeFactors(counts, linear=TRUE)
DESeq2SizeAdjusted = DESeq2::estimateSizeFactors(counts + lib_size_factors)
RLE = EdgeR RLE * lib_size
TMM = EdgeR TMM * lib_size
Scran = Deconvolution Sum Factors (Marioni et al)
REL = Each gene count divided by the total count of its spot
(default: DESeq2)
--num-clusters [INT] The number of clusters/regions expected to be found.
If not given the number of clusters will be computed.
Note that this parameter has no effect with DBSCAN clustering.
--num-exp-genes [FLOAT]
The percentage of number of expressed genes (>= --min-gene-expression) a spot
must have to be kept from the distribution of all expressed genes (default: 1)
--num-exp-spots [FLOAT]
The percentage of number of expressed spots a gene
must have to be kept from the total number of spots (default: 1)
--clustering [STR] What clustering algorithm to use after the dimensionality reduction:
Hierarchical = Hierarchical Clustering (Ward)
KMeans = Suitable for small number of clusters
DBSCAN = Number of clusters will be automatically inferred
Gaussian = Gaussian Mixtures Model
(default: KMeans)
--dimensionality [STR]
What dimensionality reduction algorithm to use:
tSNE = t-distributed stochastic neighbor embedding
PCA = Principal Component Analysis
ICA = Independent Component Analysis
SPCA = Sparse Principal Component Analysis
(default: tSNE)
--use-log-scale Use log2(counts + 1) values in the dimensionality reduction step
--alignment-files ALIGNMENT_FILES [ALIGNMENT_FILES ...]
One or more tab delimited files containing and alignment matrix for the images as
a11 a12 a13 a21 a22 a23 a31 a32 a33
Only useful is the image has extra borders, for instance not cropped to the array corners
or if you want the keep the original image size in the plots.
--image-files IMAGE_FILES [IMAGE_FILES ...]
When provided the data will plotted on top of the image
It can be one ore more, ideally one for each input dataset
It is desirable that the image is cropped to the array
corners otherwise an alignment file is needed
--num-dimensions [INT]
The number of dimensions to use in the dimensionality reduction (2 or 3). (default: 2)
--spot-size [INT] The size of the spots when generating the plots. (default: 20)
--top-genes-criteria [STR]
What criteria to use to keep top genes before doing
the dimensionality reduction (Variance or TopRanked) (default: Variance)
--use-adjusted-log Use adjusted log normalized counts (R Scater::normalized())
in the dimensionality reduction step (recommended with SCRAN normalization)
--tsne-perplexity [INT]
The value of the perplexity for the t-sne method. (default: 30)
--tsne-theta [FLOAT] The value of theta for the t-sne method. (default: 0.5)
--outdir OUTDIR Path to output dir
unsupervised.py --counts-table-files test2_stdata.tsv --normalization DESeq2 --num-clusters 5 \
--clustering KMeans --dimensionality tSNE --image-files HE_Rep6_MOB.jpg --use-log-scale