美文网首页
PEPPAN分析泛基因组

PEPPAN分析泛基因组

作者: 胡童远 | 来源:发表于2021-10-08 09:24 被阅读0次

文章:Accurate reconstruction of bacterial pan- and core genomes with PEPPAN. Genome Res. 2020
引用:5
GITHUB: https://github.com/zheminzhou/PEPPAN

conda pip3安装

conda create -n peppan
conda activate peppan
# dependency
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install mmseqs2
conda install blast diamond rapidnj fasttree

# main procedure
pip3 install peppan
# miniconda3/envs/peppan/lib/python3.7/site-packages
PEPPAN --help
PEPPAN_parser --help

Github下载peppan文件,结合以上conda安装的依赖也OK

# win下载master.zip文件:https://github.com/zheminzhou/PEPPAN/archive/refs/heads/master.zip
./PEPPAN --help

测试peppan

# peppan
cd /hutongyuan/software/PEPPAN-master/
PEPPAN -p examples/ST131 \
-P examples/GCF_000010485.combined.gff.gz examples/*.gff.gz

PEPPAN参数:
p: [Default: PEPPAN] prefix for the outputs
t: [Default: 8] Number of threads

测试peppan_parser (用peppan的PEPPAN.gff作为输入)

# peppan_parser
PEPPAN_parser -g examples/ST131.PEPPAN.gff \
-s examples/PEPPAN_out \
-t -c

PEPPAN_parser参数:
g: [REQUIRED] generated PEPPA.gff file from PEPPA.py
s: [optional] A folder for splitted GFF files
t: [Default: False] Flag to generate the gene present/absent tree
c: [Default: False] Flag to generate a rarefraction curve
a: [Default: -1] Set to an integer between 0 and 100, % of presence for a gene to be included in a Core Gene Allelic Variation tree

测试过程

# peppan
Run MMSeqs linclust to get exemplar sequences
Iterative clustering. 5995 exemplars left with identity = 0.9
Run BLASTn
Run diamond
Obtained 5987 exemplar gene sequences from examples/ST131.clust.exemplar
...

# peppan_parser
GFF files are saved under folder examples/PEPPAN_out
Summary of the pan-genome is saved in examples/ST131.PEPPAN.gene_content.summary_statistics.txt
Gene content matrix is saved in examples/ST131.PEPPAN.gene_content.csv
Gene presence matrix is saved in examples/ST131.PEPPAN.gene_content.Rtab
Gene content tree is saved in examples/ST131.PEPPAN.gene_content.nwk
Curves for all genes are saved in examples/ST131.PEPPAN.gene_content.curve

测试结果 (红标是parser的结果)

运行peppan - 案例

PEPPAN -t 8 \
-p result_peppan/peppan \
./gff/*.gff

PEPPAN_parser -g result_peppan/peppan.PEPPAN.gff \
-s result_peppan/PEPPAN_out \
-t -c

peppan结果

peppan.PEPPAN.gene_content.Rtab 即是PAV

Rtab表里有多拷贝现象

peppan.PEPPAN.gene_content.curve

相关文章

网友评论

      本文标题:PEPPAN分析泛基因组

      本文链接:https://www.haomeiwen.com/subject/qeeznltx.html