一、先准备一个小数据集(这里用E.coli),将各个过程文件都生成出来
#!/bin/bash
wk_dir=/media/whq/282A932A2A92F3D2/282A932A2A92F3D2/WHQ/WGS_practice
index=/media/whq/282A932A2A92F3D2/282A932A2A92F3D2/WHQ/WGS_practice/index/E.coli_K12_MG1655.fa
genome=/media/whq/282A932A2A92F3D2/282A932A2A92F3D2/WHQ/WGS_practice/genome/E.coli_K12_MG1655.fa
#bwa+samtools
cd $wk_dir/fastq
bwa mem -t 3 -R '@RG\tID:foo\tPL:illumina\tSM:E.coli_K12' $index SRR1770413_1.fastq.gz SRR1770413_2.fastq.gz |samtools sort -@ 3 -o SRR1770413.sorted.bam -
#markdup
gatk MarkDuplicates -I SRR1770413.sorted.bam -O SRR1770413.sorted.markdup.bam -M SRR1770413.sorted.bam.matrix
samtools index SRR1770413.sorted.markdup.bam
#gatk
gatk CreateSequenceDictionary -R ../genome/E.coli_K12_MG1655.fa -O ../genome/E.coli_K12_MG1655.dict
gatk HaplotypeCaller -R $genome --emit-ref-confidence GVCF -I SRR1770413.sorted.markdup.bam -O SRR1770413.g.vcf
gatk GenotypeGVCFs -R $genome -V SRR1770413.g.vcf -O SRR1770413.vcf
文件系统:
image.png
二、统计比对结果
https://www.jianshu.com/p/2107fc22e4a8
https://zhuanlan.zhihu.com/p/56430358
三、统计比对bam文件的深度和覆盖度
https://www.jianshu.com/p/2107fc22e4a8
https://zhuanlan.zhihu.com/p/56430358
四、统计bam上不同区间的深度和覆盖度
https://www.jianshu.com/p/dd9d015e090a?utm_campaign=haruki&utm_content=note&utm_medium=reader_share&utm_source=weixin
image.png
image.png
image.png
BamDeal_Linux statistics Coverage 的帮助文档
Usage: Coverage -l <bam.list> -r <Ref.fa> -o <outPrefix>
Usage: Coverage -i <A.bam B.bam> -r <Ref.fa> -o <outPrefix>
-i <str> input SAM/BAM files, delimited by space
-l <str> input list of SAM/BAM files
-o <str> prefix of output file
-r <str> input reference FASTA to get Depth-GC wig
-w <int> windows size for Depth-GC wig, default [10000]
-b <str> list of the regions of which the coverage and mean of depth would be given
-q <int> the quality to filter reads, default [10]
-d <int> Filter the duplicated read
-h show more details for help [hewm2008 v1.32]
一些计算coverage的示范操作
1. Coverage -i <A.bam B.bam> -r <Ref.fa> -o AAA -q Q
Coverage -l <bam.list> -r <Ref.fa> -o AAA -q Q
This will generate four files (GC with depth, depth frequency, depth along reads in reference FASTA and basic statistics) and output the result to current directory. Output files are named with the prefix AAA.
(1.1) the reads with quality lower than Q will be removed from analysis, default value of Q is 10.
2. Coverage -i <A.bam B.bam> -r <Ref.fa> -o AAA -b <bed.region>
Coverage -l <bam.list> -r <Ref.fa> -o AAA -b <bed.region>
This will generate the same outputs as the example above with an extra file giving coverage and mean of depth information for the regions listed in the <bed.region>. The <bed.region> is formatted as [Chr_name Start_site End_site]. For example:
Chr1 1 100
Chr1 1000 2000
Chr2 3 50
...
计算bam文件的区间覆盖度
mkdir BamDeal.res&&cd BamDeal.res
BamDeal_Linux statistics Coverage -i ../SRR1770413.sorted.bam -r ../../genome/E.coli_K12_MG1655.fa -o ./bamdeal.res -w 1000
#注意,如果输入时个bam的list,那计算的深度是所有bam在该位点的深度和!!!
结果展示
image.png
文件说明
less bamdeal.res.DepthGC.gz |head -4
#Depth Number
0 47037
1 2660
2 1561
less bamdeal.res.DepthGC.gz |head -10
##Bigin Depth(X) GC(%)
#>NC_000913.3 WindowsSite: 1000
1 63.95 50.70
1001 58.17 53.20
2001 60.37 51.90
3001 59.04 56.10
4001 62.20 53.20
5001 61.35 47.00
6001 53.03 51.40
7001 54.13 54.00
less bamdeal.res.depthsite.fa.gz |head -2#应该是每个位点的覆盖信息
>NC_000913.3
43 44 46 47 47 47 47 47 48 49 49 49 49 49 49 49 49 50 50 50 50 50 51 52 52 54 54 54 54 54 54 54 54 54 53 53 53 53 52 52 52 52 53 53 52 53 53 53 54 53 53 53 53 53 53 53 53 53 53 53 53 54 54 55 55 54 54 54 54 54 55 55 53 53 53 54 54 53 54 54 54 55 57 57 58 58 58 59 60 61 61 60 60 60 60 61 61 62 62 62 63 64 64 64 64 64 65 65 65 65 65 65 66 66 66 66 66 66 65 65 66 67 67 67 67 68 68 68 68 67 68 68 68 68 67 67 68 69 70 70 71 71 72 72 71 71 71 70 69 69 70 70 69 69 70 72 72 72
less bamdeal.res.stat
#ID Length CoveredBase TotalDepth Coverage% MeanDepth
NC_000913.3 4641652 4594615 267566554 98.99 57.64
#Genome 4641652 4594615 267566554 98.99 57.64
四:画circle图
https://www.jianshu.com/p/799ef8f2bfcb












网友评论