男人添女荫道口视频,国模无码一区二区三区,年轻的嫂子

注：后面可能還會構建鼠源的x
參考：

我這用的hg19和hg38，構建了幾個軟件的，bwa, hisat2, subread,bowtie2,STAR
STAR建議在服務器上跑

獲取測序源文件

hg19

hg19的UCSC下載鏈接
最后反正我直接下了總的fa文件，也有讓下別的。我偷懶
下以下幾個文件：

hg19.fa.gz - "Soft-masked" assembly sequence in one file. Repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case.
md5sum.txt - checksums of files in this directory

檢測文件完整性：感覺有點類似哈希校驗那種感覺。

cat md5sum.txt
# 將hg19.fa.gz和對應的校驗碼提出來
echo "806c02398f5ac5da8ffd6da2d1d5d1a9  hg19.fa.gz" > check_md5sum.txt
# 用md5sum校驗
md5sum -c check_md5sum.txt
# 如果校驗完成，顯示 hg19.fa.gz: OK

注意這里面寫的是>，不是>>。好像>是覆蓋寫入，>>是在文件末尾加。

校驗沒問題就解壓

gzip -dk hg19.fa.gz
-d, --decompress  decompress #不加d就是壓縮了
-k, --keep        keep (don not delete) input files #保存源文件存在，不加源文件會被刪除。

解壓之后獲得hg19.fa然后對這個文件建立bwa的索引就行。

hg38

hg38的UCSC下載鏈接

步驟和hg19完全一樣，然后下載以下文件

hg38.fa.gz - "Soft-masked" assembly sequence in one file. Repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case.
md5sum.txt - checksums of files in this directory

gtf文件

hg38的各種gtf文件
 hg19的各種gtf文件
可能可以下ens或者know，反正我下的knowGene

bwa

hg19

語法：

bwa index [ –p prefix ] [ –a algoType ] <in.db.fasta>
bwa index -a bwtsw hg19.fa
-p STR   輸出數據庫的前綴；默認和輸入的文件名一致，輸出的數據庫在其輸入文件所在的文件夾，并以該文件名為前綴。
-a [is|bwtsw]   構建index的算法，有兩個算法： is 是默認的算法，雖然相對較快，但是需要較大的內存，當構建的數據庫大于2GB的時候就不能正常工作了。 bwtsw 對于短的參考序列式不工作的，必須要大于等于10MB, 但能用于較大的基因組數據，比如人的全基因組。

輸結果如下：

[BWTIncConstructFromPacked] 680 iterations done. 6241341392 characters processed.
[BWTIncConstructFromPacked] 690 iterations done. 6264217232 characters processed.
[bwt_gen] Finished constructing BWT in 695 iterations.
[bwa_index] 1878.26 seconds elapse.
[bwa_index] Update BWT... 15.69 sec
[bwa_index] Pack forward-only FASTA... 15.37 sec
[bwa_index] Construct SA from BWT and Occ... 941.87 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index -a bwtsw hg19.fa
[main] Real time: 2905.553 sec; CPU: 2872.593 sec

最后會生成五個文件：hg19.fa.amb、hg19.fa.ann、hg19.fa.bwt、hg19.fa.pac、hg19.fa.sa

具體這五個文件是啥啥啥就不管了，留個參考鏈接：BWA源碼閱讀筆記（二）索引文件amb/ann/pac文件是什么?

hg38

步驟和hg19完全一樣

bwa index -a bwtsw hg38.fa

hisat2

這個我是直接去官網下的，鏈接：hisat2索引下載地址

對應的是UCSC hg19和UCSC hg38，至于和剩下兩個GRCh38和GRCh37的區別，見下面連接：

hg19、GRCH37、b37、hs37d5介紹和區別

hg19

但是問題在于解壓出來的文件，大小是在有點奇怪啊。

-rw-r--r-- 1 dick dick 926M 11月 20  2015 genome.1.ht2
-rw-r--r-- 1 dick dick 691M 11月 20  2015 genome.2.ht2
-rw-r--r-- 1 dick dick 4.8K 11月 20  2015 genome.3.ht2
-rw-r--r-- 1 dick dick 691M 11月 20  2015 genome.4.ht2
-rw-r--r-- 1 dick dick 1.2G 11月 20  2015 genome.5.ht2
-rw-r--r-- 1 dick dick 704M 11月 20  2015 genome.6.ht2
-rw-r--r-- 1 dick dick    8 11月 20  2015 genome.7.ht2
-rw-r--r-- 1 dick dick    8 11月 20  2015 genome.8.ht2
-rwxr-xr-x 1 dick dick 1.3K 11月 20  2015 make_hg19.sh

這個大小就不是非常讓人信服，所以決定用我自己的hisat重新跑一下。

hisat2-build [options]* <reference_in> <ht2_index_base>
reference_in            comma-separated list of files with ref sequences
hisat2_index_base       write ht2 data to files with this dir/basename
-p <int>                number of threads
hisat2-build -p 16 hg19.fa genome

最后輸出：

-rw-r--r-- 1 dick dick 926M 6月  22 17:05 genome.1.ht2
-rw-r--r-- 1 dick dick 691M 6月  22 17:05 genome.2.ht2
-rw-r--r-- 1 dick dick 4.8K 6月  22 16:52 genome.3.ht2
-rw-r--r-- 1 dick dick 691M 6月  22 16:52 genome.4.ht2
-rw-r--r-- 1 dick dick 1.2G 6月  22 17:07 genome.5.ht2
-rw-r--r-- 1 dick dick 704M 6月  22 17:07 genome.6.ht2
-rw-r--r-- 1 dick dick   12 6月  22 16:52 genome.7.ht2
-rw-r--r-- 1 dick dick    8 6月  22 16:52 genome.8.ht2

完全一樣，因該就是這樣一個索引了。

hg38

基本上一樣，就記錄一下最后的輸出吧。

-rw-r--r-- 1 dick 974M 11月 20  2015 genome.1.ht2
-rw-r--r-- 1 dick 728M 11月 20  2015 genome.2.ht2
-rw-r--r-- 1 dick  15K 11月 20  2015 genome.3.ht2
-rw-r--r-- 1 dick 728M 11月 20  2015 genome.4.ht2
-rw-r--r-- 1 dick 1.3G 11月 20  2015 genome.5.ht2
-rw-r--r-- 1 dick 741M 11月 20  2015 genome.6.ht2
-rw-r--r-- 1 dick    8 11月 20  2015 genome.7.ht2
-rw-r--r-- 1 dick    8 11月 20  2015 genome.8.ht2
-rwxr-xr-x 1 dick 1.3K 11月 20  2015 make_hg38.sh

Total time for call to driver() for forward index: 00:16:53

subread

這個軟件好像還比較簡單，一句就可以了。

subread-buildindex -o hg19 hg19.fa

這個程序寫的不是很好，-o后面更的是要生成index的名字，沒搞懂一開始以為是輸出文件夾，然后文檔寫的也不清楚。

生成文件如下：

-rw-r--r-- 1 dick dick 749M 6月  24 22:59 hg19.00.b.array
-rw-r--r-- 1 dick dick 4.9G 6月  24 22:59 hg19.00.b.tab
-rw-r--r-- 1 dick dick 5.5K 6月  24 22:57 hg19.files
-rw-r--r-- 1 dick dick    0 6月  24 22:46 hg19.log
-rw-r--r-- 1 dick dick 2.3K 6月  24 22:59 hg19.reads

hg38也是一樣的。

ubread-buildindex -o hg38 hg38.fa

bowtie2

網上似乎可以下載：Bowtie 2: Manual 下載在右邊的欄目里，文檔好像這兒也可以找到，這個軟件的文檔寫的非常可以。

但是只有hg19和GRCh38，所以我用的hg38，就手動建立了。

命令行如下：

bowtie2-build --threads 16 hg19.fa hg19
bowtie2-build --threads 16 hg38.fa hg38
這條命令在結束前應該會打印很多行輸出。當其運行完畢時，當前文件夾會產生6個新的文件，它們的文件名都以hg19開始，分別以.1.bt2, .2.bt2, .3.bt2, .4.bt2, .rev.1.bt2和.rev.2.bt2結束。這些文件構成了索引——你完成了！
--thread 10 這個參數是指定多線程的，肯定要用的，其他沒研究。（我這破電腦，最多因該可以跑到16）
注意：這個命令是有參數的，在文檔里可以找到，懶得去研究了。

絕了，我做出來的索引居然還大一點點。不管了還是先用我自己的索引吧。

star

注意：STAR非常吃內存，所以可以用一下spares mode，我最后是在服務器上完成的。

代碼：

STAR  --runMode genomeGenerate \
    --genomeDir ref \
    --runThreadN 20 \
    --genomeFastaFiles reference.fa\
    --sjdbGTFfile reference.gtf

常用參數說明

--runThreadN 線程數 :設置線程數
--runMode genomeGenerate : 設置模式為構建索引
--genomedDir 索引文件存放路徑 : 必須先創建文件夾
--genomeFastaFiles 基因組fasta文件路徑 : 支持多文件路徑
--sjdbGTFfile gtf文件路徑 : 可選項，高度推薦,用于提高比對精確性
--sjdbOverhang 讀段長度: 后續回帖讀段的長度, 如果讀長是PE 100，則該值設為100-1=99

坑1：如果設置--sjdbGTFfile或--sjdbFileChrStartEnd，就需要設置--sjdbOverhang，這個選項參數是測序長度-1好像，印象里我的測序是50的讀長，設置49看看。

坑2：這個軟件會非常吃內存，如果不限制內存使用，索引可能會使用30G往上的內存然后在sorting Suffix Array chunks and saving them to disk這一步會報錯killed，沒有任何提示，所以要限制一下內存。

試一下限制ram --limitGenomeGenerateRAM 10000000000，這個命令好蠢啊，是限制多少byte，還要換算，我電腦是16g，那么換算下來大概17000000000byte，這樣可以了，限制之后好像速度會變得巨巨巨巨慢。

坑3：限制內存一定是限制到空閑內存的量以下，否則會出現terminate called after throwing an inatance of 'std::bas_alloc'查了一下解決方法好像是用--genomeChrBinNbits = 11

The number here depends on the number of sacffolds/sequences and assembly size and is calculated by using the formula given in the manual asmin(18,log2[max(GenomeLength/NumberOfReferences,ReadLength)])

注：貼個教程：基因組注釋文件(GFF,GTF)下載的四種方法

最后是去UCSC下載的，教程里那個下下來不是很好用，建議直接去各個基因的下載頁面下載東西。

hg38下載頁面里面有個genes文件夾、hg19下載頁面里面有個genes文件夾

有ens、known、ncbi、ref四種gtf文件，我下的knowngene，我理解是已知基因序列的GTF。

hg19

STAR  --runMode genomeGenerate \
    --limitGenomeGenerateRAM 10000000000 \
    --genomeDir ~/Circ/index/STAR/hg19/ \
    --genomeFastaFiles hg19.fa\
    --sjdbGTFfile ./gtf/hg19.knownGene.gtf\
    --sjdbOverhang 49 --runThreadN 16

記錄一下輸出：

流程：
這個軟件是先生成很多分段的SA文件，然后所有SA文件再合成一個整的索引
Jul 06 09:39:06 ..... started STAR run
Jul 06 09:39:06 ... starting to generate Genome files
Jul 06 09:40:22 ..... processing annotations GTF
Jul 06 09:40:47 ... starting to sort Suffix Array. This may take a long time...
Jul 06 09:41:07 ... sorting Suffix Array chunks and saving them to disk...
        在這一步非常慢，會生成很多分段的SA文件。
Jul 06 09:58:03 ... loading chunks from disk, packing SA...
Jul 06 10:00:27 ... finished generating suffix array
Jul 06 10:00:27 ... generating Suffix Array index
Jul 06 10:05:45 ... completed Suffix Array index
Jul 06 10:05:45 ..... inserting junctions into the genome indices
Jul 06 10:08:27 ... writing Genome to disk ...
Jul 06 10:08:30 ... writing Suffix Array to disk ...
Jul 06 10:10:51 ... writing SAindex to disk
Jul 06 10:11:02 ..... finished successfully

輸出文件

-rw-rw-r-- 1 dick dick  688 7月   6 09:40 chrLength.txt
-rw-rw-r-- 1 dick dick 2.0K 7月   6 09:40 chrNameLength.txt
-rw-rw-r-- 1 dick dick 1.3K 7月   6 09:40 chrName.txt
-rw-rw-r-- 1 dick dick 1021 7月   6 09:40 chrStart.txt
-rw-rw-r-- 1 dick dick  25M 7月   6 09:40 exonGeTrInfo.tab
-rw-rw-r-- 1 dick dick  11M 7月   6 09:40 exonInfo.tab
-rw-rw-r-- 1 dick dick 2.2M 7月   6 09:40 geneInfo.tab
-rw-rw-r-- 1 dick dick 3.0G 7月   6 10:08 Genome
-rw-rw-r-- 1 dick dick  640 7月   6 10:08 genomeParameters.txt
-rw-rw-r-- 1 dick dick  21K 7月   6 10:11 Log.out
-rw-rw-r-- 1 dick dick  23G 7月   6 10:10 SA
-rw-rw-r-- 1 dick dick 1.5G 7月   6 10:10 SAindex
-rw-rw-r-- 1 dick dick 7.4M 7月   6 10:05 sjdbInfo.txt
-rw-rw-r-- 1 dick dick 9.7M 7月   6 09:40 sjdbList.fromGTF.out.tab
-rw-rw-r-- 1 dick dick 6.6M 7月   6 10:05 sjdbList.out.tab
-rw-rw-r-- 1 dick dick 4.8M 7月   6 09:40 transcriptInfo.tab

hg38

STAR  --runMode genomeGenerate \
    --genomeDir ~/Circ/index/STAR/hg38/ \
    --genomeFastaFiles hg38.fa\
    --sjdbGTFfile ./gtf/hg38.knownGene.gtf \
    --sjdbOverhang 49 --runThreadN 25

輸出文件：

-rw-rw-r-- 1 dick dick 3.0K 7月   6 10:24 chrLength.txt
-rw-rw-r-- 1 dick dick  12K 7月   6 10:24 chrNameLength.txt
-rw-rw-r-- 1 dick dick 8.5K 7月   6 10:24 chrName.txt
-rw-rw-r-- 1 dick dick 4.9K 7月   6 10:24 chrStart.txt
-rw-rw-r-- 1 dick dick  51M 7月   6 10:24 exonGeTrInfo.tab
-rw-rw-r-- 1 dick dick  21M 7月   6 10:24 exonInfo.tab
-rw-rw-r-- 1 dick dick 9.1M 7月   6 10:24 geneInfo.tab
-rw-rw-r-- 1 dick dick 3.1G 7月   6 11:08 Genome
-rw-rw-r-- 1 dick dick  640 7月   6 11:08 genomeParameters.txt
-rw-rw-r-- 1 dick dick 6.6M 7月   6 11:10 Log.out
-rw-rw-r-- 1 dick dick  24G 7月   6 11:09 SA
-rw-rw-r-- 1 dick dick 1.5G 7月   6 11:10 SAindex
-rw-rw-r-- 1 dick dick  12M 7月   6 11:05 sjdbInfo.txt
-rw-rw-r-- 1 dick dick  17M 7月   6 10:24 sjdbList.fromGTF.out.tab
-rw-rw-r-- 1 dick dick  11M 7月   6 11:05 sjdbList.out.tab
-rw-rw-r-- 1 dick dick  16M 7月   6 10:24 transcriptInfo.tab

最后是在服務器上跑完了。所以服務器還是很重要的，自家電腦大部分時候一是可能會跑一半宕機，還有可能是撐滿內存。

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

200826 Circ之旅3-構建人類基因組索引

200826 Circ之旅3-構建人類基因組索引

獲取測序源文件

hg19

hg38

gtf文件

bwa

hg19

hg38

hisat2

hg19

hg38

subread

bowtie2

star

hg19

hg38

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

200826 Circ之旅3-構建人類基因組索引

獲取測序源文件

hg19

hg38

gtf文件

bwa

hg19

hg38

hisat2

hg19

hg38

subread

bowtie2

star

hg19

hg38

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频