這里是佳奧!
2022年的最后一天,讓我們繼續(xù)ATAC-Seq的學(xué)習(xí)!
1 計(jì)算插入片段長(zhǎng)度
非冗余非線粒體能夠比對(duì)的fragment、比對(duì)率、NRF、PBC1、PBC2、peak數(shù)、無(wú)核小體區(qū)NFR、TSS富集、FRiP 、IDR重復(fù)的一致性
根據(jù)bam文件第9列,在R里面統(tǒng)計(jì)繪圖
samtools view 2-ce11-2.last.bam | cut -f 9 >1.txt
apt install r-base-core
$ R
> a=read.table('1.txt')
> dim(a)
[1] 7292144 1
> png('hist.png')
> hist(as.numeric(a[,1]))
> dev.off
> q()
hist(abs(as.numeric(a[,1])), breaks=100)
批量腳本
##創(chuàng)建一個(gè)config.last.bam文件,里面內(nèi)容包含bam文件的名稱
2-cell-1.last.bam 2-cell-1.last
2-cell-2.last.bam 2-cell-2.last
2-cell-4.last.bam 2-cell-4.last
2-cell-5.last.bam 2-cell-5.last
##提取bam文件的第九列indel插入長(zhǎng)度信息
cat config.last.bam | while read id;
do
arr=($id)
sample=${arr[0]}
sample_name=${arr[1]}
samtools view $sample | awk '{print $9}' > ${sample_name}.length.txt
done
##準(zhǔn)備一個(gè)用于R語(yǔ)言批量繪制indel分布的文本輸入文件config.indel.length.distribution
2-cell-1.last.length.txt 2-cell-1.last.length
2-cell-2.last.length.txt 2-cell-2.last.length
2-cell-4.last.length.txt 2-cell-4.last.length
2-cell-5.last.length.txt 2-cell-5.last.length
##有了上面的文件就可以批量檢驗(yàn)bam文件進(jìn)行出圖。創(chuàng)建批量運(yùn)行的shell腳本
cat config.indel.length.distribution | while read id;
do
arr=($id)
input=${arr[0]}
output=${arr[1]}
Rscript indel.length.distribution.R $input $output
done
##indel.length.distribution.R
cmd=commandArgs(trailingOnly=TRUE);
input=cmd[1]; output=cmd[2];
a=abs(as.numeric(read.table(input)[,1]));
png(file=output);
hist(a,
main="Insertion Size distribution",
ylab="Read Count",xlab="Insert Size",
xaxt="n",
breaks=seq(0,max(a),by=10)
);
axis(side=1,
at=seq(0,max(a),by=100),
labels=seq(0,max(a),by=100)
);
dev.off()
2 FRiP值的計(jì)算
fraction of reads in called peak regions
Fraction of reads in peaks (FRiP) - Fraction of all mapped reads that fall into the called peak regions, i.e. usable reads in significantly enriched peaks divided by all usable reads. In general, FRiP scores correlate positively with the number of regions. (Landt et al, Genome Research Sept. 2012, 22(9): 1813–1831)
bedtools intersect -a ../align/2-ceLL-1.bed -b 2-ceLL-1_peaks.narrowPeak |wc -l
148210
wc ../align/2-ceLL-1.bed
5105844
wc ../align/2-ceLL-1.raw.bed
5105844
ls *narrowPeak|while read id;
do
echo $id
bed=../align/$(basename $id "_peaks.narrowPeak").raw.bed
#ls -lh $bed
Reads=$(bedtools intersect -a $bed -b $id |wc -l|awk '{print $1}')
totalReads=$(wc -l $bed|awk '{print $1}')
echo $Reads $totalReads
echo '==> FRiP value:' $(bc <<< "scale=2;100*$Reads/$totalReads")'%'
done
2-ce11-2_peaks.narrowPeak
3420904 95149325
==> FRiP value: 3.59%
2-ce11-4_peaks.narrowPeak
1126859 29866961
==> FRiP value: 3.77%
2-ce11-5_peaks.narrowPeak
4259835 103697403
==> FRiP value: 4.10%
2-ceLL-1_peaks.narrowPeak
2488167 62365958
==> FRiP value: 3.98%
只顯示.bam,其他不顯示:
$ ls 2-ce11-?.raw.bam
2-ce11-2.raw.bam 2-ce11-4.raw.bam 2-ce11-5.raw.bam
可以使用R包看不同peaks文件的overlap情況:
if(F){
options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/")
options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
source("http://bioconductor.org/biocLite.R")
BiocManager::install('ChIPseeker')
BiocManager::install('ChIPpeakAnno')
}
library(ChIPseeker)
library(ChIPpeakAnno)
list.files('D:/ATAC-Seq/數(shù)據(jù)/',"*.narrowPeak")
tmp=lapply(list.files('D:/ATAC-Seq/數(shù)據(jù)/',"*.narrowPeak"),function(x){
return(readPeakFile(file.path('D:/ATAC-Seq/數(shù)據(jù)/', x)))
})
ol <- findOverlapsOfPeaks(tmp[[1]],tmp[[4]])
png('overlapVenn.png')
makeVennDiagram(ol)
dev.off()
3 IDR計(jì)算
也可以使用專業(yè)軟件,IDR 來(lái)進(jìn)行計(jì)算出來(lái),同時(shí)考慮peaks間的overlap,和富集倍數(shù)的一致性 。
詳細(xì)的教程:
http://www.lxweimin.com/p/d8a7056b4294
source activate atac
# 可以用search先進(jìn)行檢索
conda search idr
source deactivate
## 保證所有的軟件都是安裝在 py3 這個(gè)環(huán)境下面
conda create -n py3 -y python=3 idr
conda activate py3
conda install -c bioconda idr
idr -h
idr --samples 2-ceLL-1_peaks.narrowPeak 2-ce11-2_peaks.narrowPeak --plot
idr --samples 2-ceLL-1_peaks.narrowPeak 2-ce11-2_peaks.narrowPeak \
--input-file-type narrowPeak \
--rank p.value \
--output-file sample-idr \
--plot \
--log-output-file sample.idr.log
4 deeptools可視化
需要把.bam轉(zhuǎn)化為.bw
http://www.bio-info-trainee.com/1815.html
cd ~/project/atac/align
source activate atac
# ls *.bam |xargs -i samtools index {}
ls *last.bam |while read id;do
nohup bamCoverage -p 5 --normalizeUsing CPM -b $id -o ${id%%.*}.last.bw &
done
cd dup
ls *.bam |xargs -i samtools index {}
ls *.bam |while read id;do
nohup bamCoverage --normalizeUsing CPM -b $id -o ${id%%.*}.rm.bw &
done
.bw文件的IGV可視化
查看TSS附件信號(hào)強(qiáng)度
## both -R and -S can accept multiple files
mkdir -p ~/project/atac/tss
cd ~/project/atac/tss
source activate atac
computeMatrix reference-point --referencePoint TSS -p 15 \
-b 10000 -a 10000 \
-R /home/kaoku/refer/mm10/ucsc.refseq.bed \
-S /home/kaoku/project/atac/align/*.bw \
--skipZeros -o matrix1_test_TSS.gz \
--outFileSortedRegions regions1_test_genes.bed
## both plotHeatmap and plotProfile will use the output from computeMatrix
plotHeatmap -m matrix1_test_TSS.gz -out test_Heatmap.png
plotHeatmap -m matrix1_test_TSS.gz -out test_Heatmap.pdf --plotFileFormat pdf --dpi 720
plotProfile -m matrix1_test_TSS.gz -out test_Profile.png
plotProfile -m matrix1_test_TSS.gz -out test_Profile.pdf --plotFileFormat pdf --perGroup --dpi 720
下載參考.bed
http://genome.ucsc.edu/cgi-bin/hgTables
##具體轉(zhuǎn)化方法
http://www.lxweimin.com/p/5d078d517770
繪制的熱圖
查看基因body的信號(hào)強(qiáng)度
source activate atac
computeMatrix scale-regions -p 15 \
-R /home/kaoku/refer/mm10/ucsc.refseq.bed \
-S /home/kaoku/project/atac/align/*.bw \
-b 10000 -a 10000 \
--skipZeros -o matrix1_test_body.gz
plotHeatmap -m matrix1_test_body.gz -out ExampleHeatmap1.png
plotHeatmap -m matrix1_test_body.gz -out test_body_Heatmap.png
plotProfile -m matrix1_test_body.gz -out test_body_Profile.png
繪制的熱圖
ngsplot也是可以的。
上面的批量代碼其實(shí)就是為了統(tǒng)計(jì)全基因組范圍的peak在基因特征的分布情況,也就是需要用到computeMatrix
計(jì)算,用plotHeatmap
以熱圖的方式對(duì)覆蓋進(jìn)行可視化,用plotProfile
以折線圖的方式展示覆蓋情況。
computeMatrix
具有兩個(gè)模式: scale-region
和reference-point
。前者用來(lái)信號(hào)在一個(gè)區(qū)域內(nèi)分布,后者查看信號(hào)相對(duì)于某一個(gè)點(diǎn)的分布情況。無(wú)論是那個(gè)模式,都有有兩個(gè)參數(shù)是必須的,-S是 提供bigwig文件,-R是提供基因的注釋信息。
##deeptools官方文檔
https://deeptools.readthedocs.io/en/develop/content/tools/computeMatrix.html#id10
補(bǔ)充:
查看進(jìn)程:
top
彩色界面:
htop
下一步便是peaks的注釋。
我們下一篇再見(jiàn)!