linux學習100篇39:轉錄組分析用軟件及安裝trim-galore

安裝

(rnaseq) root 11:58:44 ~
$ conda install -y trim-galoretrim-galore
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.9.2
  latest version: 4.10.1

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /root/miniconda3/envs/rnaseq

  added / updated specs:
    - trim-galore


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cutadapt-3.4               |   py39h38f01e4_1         198 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
    dnaio-0.5.1                |   py39h38f01e4_0         140 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
    isa-l-2.30.0               |       ha770c72_4         192 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
    pigz-2.6                   |       h27826a3_0          87 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
    python-isal-0.10.0         |   py39h3811e60_0         117 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
    trim-galore-0.6.6          |       hdfd78af_1          42 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
    xopen-1.1.0                |   py39hf3d152e_2          20 KB  https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
    ------------------------------------------------------------
                                           Total:         797 KB

The following NEW packages will be INSTALLED:

  cutadapt           anaconda/cloud/bioconda/linux-64::cutadapt-3.4-py39h38f01e4_1
  dnaio              anaconda/cloud/bioconda/linux-64::dnaio-0.5.1-py39h38f01e4_0
  isa-l              anaconda/cloud/conda-forge/linux-64::isa-l-2.30.0-ha770c72_4
  pigz               anaconda/cloud/conda-forge/linux-64::pigz-2.6-h27826a3_0
  python-isal        anaconda/cloud/conda-forge/linux-64::python-isal-0.10.0-py39h3811e60_0
  trim-galore        anaconda/cloud/bioconda/noarch::trim-galore-0.6.6-hdfd78af_1
  xopen              anaconda/cloud/conda-forge/linux-64::xopen-1.1.0-py39hf3d152e_2



Downloading and Extracting Packages
python-isal-0.10.0   | 117 KB    | ######################################## | 100% 
trim-galore-0.6.6    | 42 KB     | ######################################## | 100% 
xopen-1.1.0          | 20 KB     | ######################################## | 100% 
dnaio-0.5.1          | 140 KB    | ######################################## | 100% 
pigz-2.6             | 87 KB     | ######################################## | 100% 
isa-l-2.30.0         | 192 KB    | ######################################## | 100% 
cutadapt-3.4         | 198 KB    | ######################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(rnaseq) root 12:00:42 ~

查看

(rnaseq) root 12:05:52 ~
$ cutadapt --help
cutadapt version 3.4

Copyright (C) 2010-2021 Marcel Martin <marcel.martin@scilifelab.se>

cutadapt removes adapter sequences from high-throughput sequencing reads.

Usage:
    cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

For paired-end reads:
    cutadapt -a ADAPT1 -A ADAPT2 [options] -o out1.fastq -p out2.fastq in1.fastq in2.fastq

Replace "ADAPTER" with the actual sequence of your 3' adapter. IUPAC wildcard
characters are supported. All reads from input.fastq will be written to
output.fastq with the adapter sequence removed. Adapter matching is
error-tolerant. Multiple adapter sequences can be given (use further -a
options), but only the best-matching adapter will be removed.

Input may also be in FASTA format. Compressed input and output is supported and
auto-detected from the file name (.gz, .xz, .bz2). Use the file name '-' for
standard input/output. Without the -o option, output is sent to standard output.

Citation:

Marcel Martin. Cutadapt removes adapter sequences from high-throughput
sequencing reads. EMBnet.Journal, 17(1):10-12, May 2011.
http://dx.doi.org/10.14806/ej.17.1.200

Run "cutadapt --help" to see all command-line options.
See https://cutadapt.readthedocs.io/ for full documentation.

Options:
  -h, --help            Show this help message and exit
  --version             Show version number and exit
  --debug               Print debug log. Use twice to also print DP matrices
  -j CORES, --cores CORES
                        Number of CPU cores to use. Use 0 to auto-detect. Default:
                        1

Finding adapters:
  Parameters -a, -g, -b specify adapters to be removed from each read (or from
  the first read in a pair if data is paired). If specified multiple times, only
  the best matching adapter is trimmed (but see the --times option). When the
  special notation 'file:FILE' is used, adapter sequences are read from the given
  FASTA file.

  -a ADAPTER, --adapter ADAPTER
                        Sequence of an adapter ligated to the 3' end (paired data:
                        of the first read). The adapter and subsequent bases are
                        trimmed. If a '$' character is appended ('anchoring'), the
                        adapter is only found if it is a suffix of the read.
  -g ADAPTER, --front ADAPTER
                        Sequence of an adapter ligated to the 5' end (paired data:
                        of the first read). The adapter and any preceding bases are
                        trimmed. Partial matches at the 5' end are allowed. If a
                        '^' character is prepended ('anchoring'), the adapter is
                        only found if it is a prefix of the read.
  -b ADAPTER, --anywhere ADAPTER
                        Sequence of an adapter that may be ligated to the 5' or 3'
                        end (paired data: of the first read). Both types of matches
                        as described under -a and -g are allowed. If the first base
                        of the read is part of the match, the behavior is as with
                        -g, otherwise as with -a. This option is mostly for
                        rescuing failed library preparations - do not use if you
                        know which end your adapter was ligated to!
  -e E, --error-rate E, --errors E
                        Maximum allowed error rate (if 0 <= E < 1), or absolute
                        number of errors for full-length adapter match (if E is an
                        integer >= 1). Error rate = no. of errors divided by length
                        of matching region. Default: 0.1 (10%)
  --no-indels           Allow only mismatches in alignments. Default: allow both
                        mismatches and indels
  -n COUNT, --times COUNT
                        Remove up to COUNT adapters from each read. Default: 1
  -O MINLENGTH, --overlap MINLENGTH
                        Require MINLENGTH overlap between read and adapter for an
                        adapter to be found. Default: 3
  --match-read-wildcards
                        Interpret IUPAC wildcards in reads. Default: False
  -N, --no-match-adapter-wildcards
                        Do not interpret IUPAC wildcards in adapters.
  --action {trim,retain,mask,lowercase,none}
                        What to do if a match was found. trim: trim adapter and up-
                        or downstream sequence; retain: trim, but retain adapter;
                        mask: replace with 'N' characters; lowercase: convert to
                        lowercase; none: leave unchanged. Default: trim
  --rc, --revcomp       Check both the read and its reverse complement for adapter
                        matches. If match is on reverse-complemented version,
                        output that one. Default: check only read

Additional read modifications:
  -u LENGTH, --cut LENGTH
                        Remove bases from each read (first read only if paired). If
                        LENGTH is positive, remove bases from the beginning. If
                        LENGTH is negative, remove bases from the end. Can be used
                        twice if LENGTHs have different signs. This is applied
                        *before* adapter trimming.
  --nextseq-trim 3'CUTOFF
                        NextSeq-specific quality trimming (each read). Trims also
                        dark cycles appearing as high-quality G bases.
  -q [5'CUTOFF,]3'CUTOFF, --quality-cutoff [5'CUTOFF,]3'CUTOFF
                        Trim low-quality bases from 5' and/or 3' ends of each read
                        before adapter removal. Applied to both reads if data is
                        paired. If one value is given, only the 3' end is trimmed.
                        If two comma-separated cutoffs are given, the 5' end is
                        trimmed with the first cutoff, the 3' end with the second.
  --quality-base N      Assume that quality values in FASTQ are encoded as
                        ascii(quality + N). This needs to be set to 64 for some old
                        Illumina FASTQ files. Default: 33
  --length LENGTH, -l LENGTH
                        Shorten reads to LENGTH. Positive values remove bases at
                        the end while negative ones remove bases at the beginning.
                        This and the following modifications are applied after
                        adapter trimming.
  --trim-n              Trim N's on ends of reads.
  --length-tag TAG      Search for TAG followed by a decimal number in the
                        description field of the read. Replace the decimal number
                        with the correct length of the trimmed read. For example,
                        use --length-tag 'length=' to correct fields like
                        'length=123'.
  --strip-suffix STRIP_SUFFIX
                        Remove this suffix from read names if present. Can be given
                        multiple times.
  -x PREFIX, --prefix PREFIX
                        Add this prefix to read names. Use {name} to insert the
                        name of the matching adapter.
  -y SUFFIX, --suffix SUFFIX
                        Add this suffix to read names; can also include {name}
  --rename TEMPLATE     Rename reads using TEMPLATE containing variables such as
                        {id}, {adapter_name} etc. (see documentation)
  --zero-cap, -z        Change negative quality values to zero.

Filtering of processed reads:
  Filters are applied after above read modifications. Paired-end reads are always
  discarded pairwise (see also --pair-filter).

  -m LEN[:LEN2], --minimum-length LEN[:LEN2]
                        Discard reads shorter than LEN. Default: 0
  -M LEN[:LEN2], --maximum-length LEN[:LEN2]
                        Discard reads longer than LEN. Default: no limit
  --max-n COUNT         Discard reads with more than COUNT 'N' bases. If COUNT is a
                        number between 0 and 1, it is interpreted as a fraction of
                        the read length.
  --max-expected-errors ERRORS, --max-ee ERRORS
                        Discard reads whose expected number of errors (computed
                        from quality values) exceeds ERRORS.
  --discard-trimmed, --discard
                        Discard reads that contain an adapter. Use also -O to avoid
                        discarding too many randomly matching reads.
  --discard-untrimmed, --trimmed-only
                        Discard reads that do not contain an adapter.
  --discard-casava      Discard reads that did not pass CASAVA filtering (header
                        has :Y:).

Output:
  --quiet               Print only error messages.
  --report {full,minimal}
                        Which type of report to print: 'full' or 'minimal'.
                        Default: full
  -o FILE, --output FILE
                        Write trimmed reads to FILE. FASTQ or FASTA format is
                        chosen depending on input. Summary report is sent to
                        standard output. Use '{name}' for demultiplexing (see
                        docs). Default: write to standard output
  --fasta               Output FASTA to standard output even on FASTQ input.
  -Z                    Use compression level 1 for gzipped output files (faster,
                        but uses more space)
  --info-file FILE      Write information about each read and its adapter matches
                        into FILE. See the documentation for the file format.
  -r FILE, --rest-file FILE
                        When the adapter matches in the middle of a read, write the
                        rest (after the adapter) to FILE.
  --wildcard-file FILE  When the adapter has N wildcard bases, write adapter bases
                        matching wildcard positions to FILE. (Inaccurate with
                        indels.)
  --too-short-output FILE
                        Write reads that are too short (according to length
                        specified by -m) to FILE. Default: discard reads
  --too-long-output FILE
                        Write reads that are too long (according to length
                        specified by -M) to FILE. Default: discard reads
  --untrimmed-output FILE
                        Write reads that do not contain any adapter to FILE.
                        Default: output to same file as trimmed reads

Paired-end options:
  The -A/-G/-B/-U options work like their -a/-b/-g/-u counterparts, but are
  applied to the second read in each pair.

  -A ADAPTER            3' adapter to be removed from second read in a pair.
  -G ADAPTER            5' adapter to be removed from second read in a pair.
  -B ADAPTER            5'/3 adapter to be removed from second read in a pair.
  -U LENGTH             Remove LENGTH bases from second read in a pair.
  -p FILE, --paired-output FILE
                        Write second read in a pair to FILE.
  --pair-adapters       Treat adapters given with -a/-A etc. as pairs. Either both
                        or none are removed from each read pair.
  --pair-filter (any|both|first)
                        Which of the reads in a paired-end read have to match the
                        filtering criterion in order for the pair to be filtered.
                        Default: any
  --interleaved         Read and/or write interleaved paired-end reads.
  --untrimmed-paired-output FILE
                        Write second read in a pair to this FILE when no adapter
                        was found. Use with --untrimmed-output. Default: output to
                        same file as trimmed reads
  --too-short-paired-output FILE
                        Write second read in a pair to this file if pair is too
                        short.
  --too-long-paired-output FILE
                        Write second read in a pair to this file if pair is too
                        long.
(rnaseq) root 12:06:01 ~

就是一個簡單的perl wrapper,打包了fastqc和cutadapt,但是卻非常實用。

因為cutadapt的參數選擇實在是有夠復雜,光接頭類型就有5種,還有各種參數,大哥,我就想去去接頭、trim一下質量而已,你就不能自動搞了嗎。不要給選擇困難癥的我這么多選擇啊。

想自動化?trim_galore 完美的符合了你的需求,無需自己去查接頭,全自動質量過濾,噢耶。

還能和mutilqc完美對接,生成網頁版報告。

使用比較簡單直接:

其他參數無需選擇,默認的就可以了,是不是十分之自動化。

參見說明文檔

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 228,702評論 6 534
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 98,615評論 3 419
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 176,606評論 0 376
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 63,044評論 1 314
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 71,826評論 6 410
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 55,227評論 1 324
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 43,307評論 3 442
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 42,447評論 0 289
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 48,992評論 1 335
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 40,807評論 3 355
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 43,001評論 1 370
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,550評論 5 361
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,243評論 3 347
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 34,667評論 0 26
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 35,930評論 1 287
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 51,709評論 3 393
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 47,996評論 2 374

推薦閱讀更多精彩內容