什么是illumina bcl 格式和fastq格式:
參考:https://zhuanlan.zhihu.com/p/26506787
Illumina測序儀下機的數據通常為bcl格式,是將同一個測序通道(Lane)所有樣品的數據混雜在一起。每個Lane里有n個樣品的fastq.gz和一個undetermined.fastq.gz。區分每一樣品的方法是根據不同的index。
Illumina官方出品的Bcl2FastQ軟件,根據Index序列分割轉換成每個樣品的FastQ文件,打開長這樣:
每一條read,包含四行,第一行read的ID,最后幾個堿基序列是index;第二行是你的library里的DNA的序列(應該是不包括index和barcord primer 序列了);第三行+/-應該代表正鏈反鏈(具體沒什么意義);第四行,每個堿基的測序質量。以上就是fastaq的嘴臉。
#常用軟件#
我以前都是用cutadapt + FASTX-Toolkit的組合,直到同事們給我推薦了Trim Galore,質量評估使用FastQC。
BCL 格式文件是Illumina DNA sequencing instruments (HiSeq 或者 MiSeq) 創建的文件。BCL文件可以被CASAVA系統進行分析。也可以通過Illumina DNA sequencing instruments轉化成QSEC格式文件。
bcl文件的位置:
Miseq的Bcl文件位置在e.g.: /sequencedata/MiSeq/170808_M00528_0300_000000000-AP0TP/Data/Intensities/BaseCalls/L001/C1.1
我們的Miseq data是自動傳入到服務器里的,我們連接到服務器后就可以進入到這個文件夾。
bcl2fastq2安裝及其依賴gcc,boost,cmake等的安裝
bcl2fastq2 Conversion v2.19 使用指導
bcl2fastq 軟件常見的問題:
KNOWN ISSUES:
?Corrupted *.bcl or *.bcl.gz files may cause bcl2fastq to stall indefinitely.
?No index sequences are included in the header for each read in the resulting FASTQ
files if bcl2fastq is run without providing a sample sheet file.
?The HTML report files will not display statistics for samples and projects named“default”, “all”, “unknown”, and “undetermined”.
?The HTML report, Stats.json, and ConversionStats.xml files incorrectly reports the
% ≥ ??30metric by excluding bases with quality score 30 (i.e. the number reported is
actually % > Q30).
?5’ adapter trimming is not supported.
?“N” is incorrectly allowed as anindex sequence character in the sample sheet. When
used, this will cause a mismatch for any sequence character other than “N”.
?No warnings or errors are displayed when bcl2fastq is used to process run folders
that are missing control files.
?Sample sheet files generated from Illumina Experiment Manager may cause bcl2fastq
to abort if they contain non-ASCII characters. Only alphanumeric characters dashes,
and underscores are allowed in the sample sheet.
在使用bcl2fastq時候sample sheet的正確格式:
Illumina剛下機的數據為bcl格式文件(per-cycle BCL basecall file),但是下游的分析一般都需要fastq格式文件,所以在進行下游分析之前,需要使用CASAVA軟件中的configureBclToFastq.pl將bcl格式的文件根據每個樣本之前添加的index分出,并轉為fastq格式的文件。在看bcl2fastq的說明文檔時,會經常碰到一個詞:demultiplexing,指的就是將multiplexed的reads根據index從不同或者同一個lane中分出,生成sample對應的fastq文件,這一步就涉及到輸入正確的samplesheet.csv。
所有的步驟只使用一行代碼就可以解決,首先貼出代碼:
參考:chen_amiao的博客
以下參考:
bcl2fastq是illumina官方提供的bcl文件轉化為fastq軟件。
Google或官網搜索最新版,https://support.illumina.com/downloads/bcl2fastq-conversion-software-v217.html
下載
bcl2fastq2 Conversion Software v2.17 Installer (Linux tarball)? 安裝源文件
bcl2fastq2 Conversion Software v2.17 Guide (15051736 G)???? 介紹文件pdf
電腦Ubuntu14.04準備環境:
?To build bcl2fastq2 Conversion Software v2.17, you need the following software.Versions listed are tested and supported; newer versions are untested.
} gcc 4.7 (with support for c++11)
} boost 1.54 (with its dependencies)
} CMake 2.8.9
} zlib
} librt
} libpthread??
系統:bio-linux8
1.更新軟件(安裝環境)?
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install zlibc
sudo apt-get install libc6 # provides librt and libpthread
sudo apt-get install gcc
sudo apt-get install g++
sudo apt-get install libboost1.54-all-dev
sudo apt-get install cmake??
#設置變量
export TMP=/tmpexport SOURCE=${TMP}/bcl2fastq
export BUILD=${TMP}/bcl2fastq2-v2.17.1.14-build
export INSTALL_DIR=/usr/local/bcl2fastq2-v2.17.1.14
cd ${TMP}?
#軟件包放在?/home/me/下載/bcl2fastq2/
tar -xvzf /home/me/下載/bcl2fastq2/bcl2fastq2-v2.17.1.14.tar.gz
mkdir ${BUILD}
cd ${BUILD}
sudo ${SOURCE}/src/configure --prefix=${INSTALL_DIR}?
#上步顯示成功,繼續下面,未成功則可能是有些軟件包沒裝好,重新更新下依賴環境
make
make install?
#################測試##################
/usr/local/bcl2fastq2-v2.17.1.14/bin/bcl2fastq -v
2.運行參數?-h?
/usr/local/bcl2fastq2-v2.17.1.14/bin/bcl2fastq -h ?
參考:http://nhoffman.github.io/borborygmi/compiling-bcl2fastq-on-ubuntu.html#sec-2