美文网首页
Conterminator--检测核苷酸和蛋白质序列组中的污染

Conterminator--检测核苷酸和蛋白质序列组中的污染

作者: 寒山梦绮 | 来源:发表于2022-03-08 19:20 被阅读0次

更新: 最近有看到新的软件可以检测高通量序列的污染情况---fastq_screen(https://github.com/StevenWingett/FastQ-Screen),相比较这个软件快很多且容易操作!

1.This tools was design to detection of contamination in nucleotide and protein sequence sets

2.install

three approaches

# SSE4.1
wget https://mmseqs.com/conterminator/conterminator-linux-sse41.tar.gz; tar xvfz conterminator-linux-sse41.tar.gz; export PATH=$(pwd)/conterminator/:$PATH
# AVX2
wget https://mmseqs.com/conterminator/conterminator-linux-avx2.tar.gz; tar xvfz conterminator-linux-avx2.tar.gz; export PATH=$(pwd)/conterminator/:$PATH
# conda
conda install -c bioconda conterminator

3.Getting started
Conterminator requires two input files:
(1) a FASTA file containing all sequences (example/dna.fna/example/prots.faa) and
(2) a mappingFile (example/dna.mapping /examples/prots.mapping), which maps FASTA identfiers to NCBI taxon identfiers. The program produces two output files with prefix (${RESULT_PREFIX}).

example:
To process nucleotide sequences

conterminator dna example/dna.fna example/dna.mapping ${RESULT_PREFIX} tmp     

Protein sequences

conterminator protein example/prots.faa example/prots.mapping ${RESULT_PREFIX} tmp  

4.Mapping file
Conterminator needs a mapping file, which assigns each fasta identifier to a taxonomical identifier.

We choose NT/NR database

blastdbcmd -db nt -entry all > nt.fna
blastdbcmd -db nt -entry all -outfmt "%a %T" > nt.fna.taxidmapping
conterminator dna nt.fna nt.fna.taxidmapping nt.result tmp

相关文章

网友评论

      本文标题:Conterminator--检测核苷酸和蛋白质序列组中的污染

      本文链接:https://www.haomeiwen.com/subject/cqfdrrtx.html