NCBI Blast是常用的序列查找工具, 包括蛋白, 核酸. 一般使用网页进行查询即可, 但有时候开发则需要本地的数据库以及程序. NCBI提供Blast+工具包, 内含多种blast工具, 介绍可以参考NCBI提供的两份文档(书):
下载与安装
Blast+的下载
- Blast+程序下载: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
- 根据平台进行选择安装, 例如mac的
dmg版本, win的-win64.exe. - Mac安装后再
/usr/local/ncbi/blast - 安装目录内含有两个子文件夹,
bin与doc.bin内有可执行程序, 介绍如下:
| Program | Function |
|---|---|
| blastdbcheck | Checks the integrity of a BLAST database |
| blastdbcmd | Retrieves sequences or other information from a BLAST database |
| blastdb_aliastool | Creates database alias (to tie volumes together for example) |
Blastn |
Searches a nucleotide query against a nucleotide database |
blastp |
Searches a protein query against a protein database |
blastx |
Searches a nucleotide query, dynamically translated in all six frames, against a protein database |
| blast_formatter | Formats a blast result using its assigned request ID (RID) or its saved archive |
| convert2blastmask | Converts lowercase masking into makeblastdb readable data |
| deltablast | Searches a protein query against a protein database, using a more sensitive algorithm |
| dustmasker | Masks the low complexity regions in the input nucleotide sequences |
| legacy_blast.pl | Converts a legacy blast search command line into blast+ counterpart and execute it |
| makeblastdb | Formats input FASTA file(s) into a BLAST database |
| makembindex | Indexes an existing nucleotide database for use with megablast |
| makeprofiledb | Creates a conserved domain database from a list of input position specific scoring matrix (scoremats) generated by psiblast |
| psiblast | Finds members of a protein family, identifies proteins distantly related to the query, or builds position specific scoring matrix for the query |
| rpsblast | Searches a protein against a conserved domain database to identify functional domains present in the query |
| rpstblastn | Searches a nucleotide query, by dynamically translating it in all six-frames first, against a conserved domain database |
| segmasker | Masks the low complexity regions in input protein sequences |
| tblastn | Searches a protein query against a nucleotide database dynamically translated in all six frames |
| tblastx | Searches a nucleotide query, dynamically translated in all six frames, against a nucleotide database similarly translated |
update_blastdb.pl |
Downloads preformatted blast databases from NCBI |
| windowmasker | Masks repeats found in input nucleotide sequences |
executables 除了提供 Blast+, 还提供其他工具:
-
magic-blast: 用于映射大的next-generation RNA和DNA序列到全基因组或转录组的. 可参考Magic-Blast -
IgBlast: 分析免疫球蛋白和T细胞受体可变区域序列. 可参考IgBlast 和相关文献. -
rmblast: -
remote-fuser:
配置
- 将BLAST按照目录export到PATH, 例如
export PATH=$PATH:$HOME/ncbi-blast-2.8.1+/bin. 这可保证直接执行. - 管理数据库:
- 创建一个存放数据库的文件夹:
mkdir $HOME/ncbi-blast-2.8.1+/blastdb - 设置
BLASTDB环境变量,export BLASTDB=$HOME/blastdb - 自行下载和解压相关序列数据库
- 使用
updata_blastdb.pl来管理数据库.
数据库的下载
NCBI FTP服务器提供一个BLAST的专门文件夹 : ftp://ftp.ncbi.nlm.nih.gov/blast/, 含有BLAST程序以及数据库. 内含以下子文件夹:
-
db: 数据库, 很重要 -
executables: 可执行程序, 包括Blast+ -
documents: 文档 -
demo: 各种提供给开发者的demonstration packages -
matrices: Different supported and experimental scoring matrices -
WGS_TOOLS: 产生WGS计划数据库的工具 -
temp: 杂项文件 -
windowmasker_files: A collection of windowmasker files for various organisms/genomes, each in its own subdirectory named using their taxonomic ids
配置
可执行文件路径加入到环境变量. 将blast内bin的文件夹路径加入到PATH环境变量即可, 请自行搜索具体方法. 例如Bash: export PATH=$PATH:/usr/local/ncbi/blast/bin
另外一个重要的配置是BLASTDB环境变量, 即blast进行搜索时数据库所在. 根据数据库位置进行设置, 例如 : export BLASTDB=$HOME/blastdb
示例
官方简单示例1
- 使用
blastdbcmd提取已安装数据库(refseq_rna.00)中的nm_000122序列到文档test_query.txt - 运行
blastn进行核酸的搜索, 也是搜索本地该数据库.
$ blastdbcmd -db refseq_rna.00 -entry nm_000122 -out test_query.fa
$ blastn -query test_query.fa -db refseq_rna.00 -task blastn -dust no -outfmt "7 qseqid sseqid evalue bitscore" -max_target_seqs 2
# BLASTN 2.2.29+
# Query: gi|263191547|ref|NM_000122.3| Homo sapiens mutL homolog 1 (MLH1), transcript variant 1, mRNA
# Database: refseq_rna.00
# Fields: query id, subject id, evalue, bit score
# 2 hits found
gi|263191547|ref|NM_000122.3| gi|263191547|ref|NM_000122.3| 0.0 4801
gi|263191547|ref|NM_000122.3| gi|332816398|ref|XM_001170433.2| 0.0 4758
# BLAST processed 1 queries












网友评论