美文网首页
2023-07-20 转录组数据比对-hisat2(需补充)

2023-07-20 转录组数据比对-hisat2(需补充)

作者: 麦冬花儿 | 来源:发表于2023-08-07 10:20 被阅读0次

软件下载与安装

# Installing HISAT2 (https://daehwankimlab.github.io/hisat2/download/)
#wget https://cloud.biohpc.swmed.edu/index.php/s/oTtGWbWjaxsQ2Ho/download -O ~/software/hisat2-2.2.1-Linux_x86_64.zip
unzip ~/software/hisat2-2.1.0-Linux_x86_64.zip -d /opt/biosoft/
echo 'PATH=$PATH:/opt/biosoft/hisat2-2.1.0/' >> ~/.bashrc
source ~/.bashrc

文件准备

train@MiWiFi-R3P-srv hisat2]$ ls
A.1.fastq  A.2.fastq  B.1.fastq  B.2.fastq  C.1.fastq  C.2.fastq  D.1.fastq  D.2.fastq  E.1.fastq  E.2.fastq  F.1.fastq  F.2.fastq  G.1.fastq  G.2.fastq  genome.fasta  genome.gtf
genome.gtf#该文件包含详细的基因结构信息
#构建hisat2数据库文件
perl -e 'while (<>) { if (m/^#/) { print; next } $number ++; $num = "ms" . "0" x (7 - length($number)) . $number; s/\t\.\t/\t$num\t/; print; }' ~/00.incipient_data/data_for_variants_calling/variants.vcf > variants.vcf #可选
hisat2_extract_splice_sites.py genome.gtf > splitSites.txt #记录剪切位点信息,相当于intron信息
hisat2_extract_exons.py genome.gtf > exonSites.txt #记录外显子位点信息
hisat2_extract_snps_VCF.pl variants.vcf > variants.snp #记录snp位点信息
hisat2-build -p 8 --snp variants.snp --ss splitSites.txt --exon exonSites.txt genome.fasta genome #事实上以上文件都是可选项
hisat2 -x genome -p 8 --min-intronlen 20 --max-intronlen 4000 --dta --dta-cufflinks -1 A.1.fastq -2 A.2.fastq -S A.sam --new-summary --summary-file A.hisat2.summary" #参考基因组超过500M,建议将--max-intronlen 20000-100000之间,
也可以利用脚本计算intron长度
perl -e 'while ($_=<>{ @aa = split /t/,$_; push @intron_length, abs ($aa[1]) - $aa[2]) - })'
然后批量操作
for i in `ls *.1.fastq`
do
    i=${i/.1.fastq/}
    echo "hisat2 -x genome -p 8 --min-intronlen 20 --max-intronlen 4000 --dta --dta-cufflinks -1 $i.1.fastq -2 $i.2.fastq -S $i.sam --new-summary --summary-file $i.hisat2.summary --score-min L,-0.4,-0.4"
done > command.hisat2.list
sh command.hisat2.list # 可以使用ParaFly -c command.hisat2.list -CPU 8进行并行化计算

j结果如下

[train@MiWiFi-R3P-srv hisat2]$ ParaFly -c command.hisat2.list -CPU 8
Number of  Commands: 7
HISAT2 summary stats:
    Total pairs: 1287752
        Aligned concordantly or discordantly 0 time: 29414 (2.28%)
        Aligned concordantly 1 time: 997143 (77.43%)
        Aligned concordantly >1 times: 257239 (19.98%)
        Aligned discordantly 1 time: 3956 (0.31%)
    Total unpaired reads: 58828
        Aligned 0 time: 43806 (74.46%)
        Aligned 1 time: 10321 (17.54%)
        Aligned >1 times: 4701 (7.99%)
    Overall alignment rate: 98.30%
HISAT2 summary stats:
    Total pairs: 1417604
        Aligned concordantly or discordantly 0 time: 376301 (26.54%)
        Aligned concordantly 1 time: 213476 (15.06%)
        Aligned concordantly >1 times: 827431 (58.37%)
        Aligned discordantly 1 time: 396 (0.03%)
    Total unpaired reads: 752602
        Aligned 0 time: 742719 (98.69%)
        Aligned 1 time: 1667 (0.22%)
        Aligned >1 times: 8216 (1.09%)
    Overall alignment rate: 73.80%
HISAT2 summary stats:
    Total pairs: 1422090
        Aligned concordantly or discordantly 0 time: 405241 (28.50%)
        Aligned concordantly 1 time: 198821 (13.98%)
        Aligned concordantly >1 times: 817669 (57.50%)
        Aligned discordantly 1 time: 359 (0.03%)
    Total unpaired reads: 810482
        Aligned 0 time: 800919 (98.82%)
        Aligned 1 time: 1552 (0.19%)
        Aligned >1 times: 8011 (0.99%)
    Overall alignment rate: 71.84%
HISAT2 summary stats:
    Total pairs: 1423417
        Aligned concordantly or discordantly 0 time: 399863 (28.09%)
        Aligned concordantly 1 time: 203001 (14.26%)
        Aligned concordantly >1 times: 820177 (57.62%)
        Aligned discordantly 1 time: 376 (0.03%)
    Total unpaired reads: 799726
        Aligned 0 time: 790007 (98.78%)
        Aligned 1 time: 1598 (0.20%)
        Aligned >1 times: 8121 (1.02%)
    Overall alignment rate: 72.25%
HISAT2 summary stats:
    Total pairs: 1421496
        Aligned concordantly or discordantly 0 time: 347667 (24.46%)
        Aligned concordantly 1 time: 201568 (14.18%)
        Aligned concordantly >1 times: 871884 (61.34%)
        Aligned discordantly 1 time: 377 (0.03%)
    Total unpaired reads: 695334
        Aligned 0 time: 684191 (98.40%)
        Aligned 1 time: 1814 (0.26%)
        Aligned >1 times: 9329 (1.34%)
    Overall alignment rate: 75.93%
HISAT2 summary stats:
    Total pairs: 1369754
        Aligned concordantly or discordantly 0 time: 21871 (1.60%)
        Aligned concordantly 1 time: 468062 (34.17%)
        Aligned concordantly >1 times: 878564 (64.14%)
        Aligned discordantly 1 time: 1257 (0.09%)
    Total unpaired reads: 43742
        Aligned 0 time: 28517 (65.19%)
        Aligned 1 time: 3481 (7.96%)
        Aligned >1 times: 11744 (26.85%)
    Overall alignment rate: 98.96%
HISAT2 summary stats:
    Total pairs: 1378862
        Aligned concordantly or discordantly 0 time: 19572 (1.42%)
        Aligned concordantly 1 time: 459700 (33.34%)
        Aligned concordantly >1 times: 898631 (65.17%)
        Aligned discordantly 1 time: 959 (0.07%)
    Total unpaired reads: 39144
        Aligned 0 time: 27029 (69.05%)
        Aligned 1 time: 2831 (7.23%)
        Aligned >1 times: 9284 (23.72%)
    Overall alignment rate: 99.02%


All commands completed successfully. :-)
[train@MiWiFi-R3P-srv hisat2]$ ls
A.1.fastq         B.2.fastq         C.hisat2.summary               D.2.fastq         E.hisat2.summary  F.hisat2.summary  genome.2.ht2  genome.7.ht2      G.sam
A.2.fastq         B.hisat2.summary  command.hisat2.list            D.hisat2.summary  E.sam             F.sam             genome.3.ht2  genome.8.ht2      splitSites.txt
A.hisat2.summary  B.sam             command.hisat2.list.completed  D.sam             exonSites.txt     G.1.fastq         genome.4.ht2  genome.fasta      variants.snp
A.sam             C.1.fastq         C.sam                          E.1.fastq         F.1.fastq         G.2.fastq         genome.5.ht2  genome.gtf        variants.vcf
B.1.fastq         C.2.fastq         D.1.fastq                      E.2.fastq         F.2.fastq         genome.1.ht2      genome.6.ht2  G.hisat2.summary

相关文章

网友评论

      本文标题:2023-07-20 转录组数据比对-hisat2(需补充)

      本文链接:https://www.haomeiwen.com/subject/axwoudtx.html