美文网首页高质量生信文章收录
不同的比对软件基因组索引构建

不同的比对软件基因组索引构建

作者: 佳名 | 来源:发表于2019-04-30 13:14 被阅读30次

1 Hisat2

hisat2-build -p 2 genome.fa genome

hisat2-build不支持压缩文件,运行完成后,生成后缀名为ht2的8个文件。

Usage: hisat2-build [options]* <reference_in> <ht2_index_base>
    reference_in            comma-separated list of files with ref sequences
    hisat2_index_base       write ht2 data to files with this dir/basename
Options:
    -c                      reference sequences given on cmd line (as
                            <reference_in>)
    --large-index           force generated index to be 'large', even if ref
                            has fewer than 4 billion nucleotides
    -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
    -p                      number of threads
    --bmax <int>            max bucket sz for blockwise suffix-array builder
    --bmaxdivn <int>        max bucket sz as divisor of ref len (default: 4)
    --dcv <int>             diff-cover period for blockwise (default: 1024)
    --nodc                  disable diff-cover (algorithm becomes quadratic)
    -r/--noref              don't build .3/.4.ht2 (packed reference) portion
    -3/--justref            just build .3/.4.ht2 (packed reference) portion
    -o/--offrate <int>      SA is sampled every 2^offRate BWT chars (default: 5)
    -t/--ftabchars <int>    # of chars consumed in initial lookup (default: 10)
    --localoffrate <int>    SA (local) is sampled every 2^offRate BWT chars (default: 3)
    --localftabchars <int>  # of chars consumed in initial lookup in a local index (default: 6)
    --snp <path>            SNP file name
    --haplotype <path>      haplotype file name
    --ss <path>             Splice site file name
    --exon <path>           Exon file name
    --seed <int>            seed for random number generator
    -q/--quiet              verbose output (for debugging)
    -h/--help               print detailed description of tool and its options
    --usage                 print this usage message
    --version               print version information and quit

2 BWA

bwa index -p genome genome.fa

索引建立好之后,会生成5个文件,后缀分别为bwt,pac,ann,amb,sa
https://www.jianshu.com/p/1552cc6ac3be

3 Bowtie

bowtie1出现的早,所以对于测序长度在50bp以下的序列效果不错

bowtie-build GENOME.fa GENOME.fa

4 Bowtie2

bowtie2-build mm10.fa
Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
    reference_in            comma-separated list of files with ref sequences
    bt2_index_base          write bt2 data to files with this dir/basename
*** Bowtie 2 indexes work only with v2 (not v1).  Likewise for v1 indexes. ***
Options:
    -f                      reference files are Fasta (default)
    -c                      reference sequences given on cmd line (as
                            <reference_in>)
    --large-index           force generated index to be 'large', even if ref
                            has fewer than 4 billion nucleotides
    --debug                 use the debug binary; slower, assertions enabled
    --sanitized             use sanitized binary; slower, uses ASan and/or UBSan
    --verbose               log the issued command
    -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
    -p/--packed             use packed strings internally; slower, less memory
    --bmax <int>            max bucket sz for blockwise suffix-array builder
    --bmaxdivn <int>        max bucket sz as divisor of ref len (default: 4)
    --dcv <int>             diff-cover period for blockwise (default: 1024)
    --nodc                  disable diff-cover (algorithm becomes quadratic)
    -r/--noref              don't build .3/.4 index files
    -3/--justref            just build .3/.4 index files
    -o/--offrate <int>      SA is sampled every 2^<int> BWT chars (default: 5)
    -t/--ftabchars <int>    # of chars consumed in initial lookup (default: 10)
    --threads <int>         # of threads
    --seed <int>            seed for random number generator
    -q/--quiet              verbose output (for debugging)
    -h/--help               print detailed description of tool and its options
    --usage                 print this usage message
    --version               print version information and quit

bowtie1和2的差别:

1,bowtie1出现的早,所以对于测序长度在50bp以下的序列效果不错,而bowtie2主要针对的是长度在50bp以上的测序的。
2,Bowtie 2支持有空位的比对
3,Bowtie 2支持局部比对,也可以全局比对
4,Bowtie 2对最长序列没有要求,但是Bowtie 1最长不能超过1000bp
5, Bowtie 2 allows alignments to [overlap ambiguous characters] (e.g. Ns) in the reference. Bowtie 1 does not.
6,Bowtie 2不能比对colorspace reads.

相关文章

网友评论

    本文标题:不同的比对软件基因组索引构建

    本文链接:https://www.haomeiwen.com/subject/qqomnqtx.html