1.准备文件
1)需要蛋白文件和编码蛋白的核酸序列,文件格式fasta格式,对应的蛋白和核酸的id要一样
==> all.cds1 <==
>Zm_accD
ATGTTCAAATATCAATGCAATTTTTTATTTTCCAACGAGAAGGTAGAACA
TCATAGATGTGGGCCGAGTAAATCAATAGATAGTGTTGATAGTATTGGAC
ATACAGATAGAAGTGAACAACCTATTCTAAACGATATGGAGAAAAAGGTT
CCTAGTTGGAATCGTATTAGTAATTATAGTTTCAATAATGTTGATTATTT
ATTTGATATCAGGAATATTTGGAGTTTGATCTCTGATGACACTTTTTTTG
TTAGGGATAGTAATGGTGATCGTTACTCTATATTTTTTGATATTGAAAAT
CATATTTTTGAGGTTGACAATGATAGTTCCTTTATGAGTGAACTAGAAAT
TATTGTTTCTAGTTATTTGAATAGGGGGTCTAAGTCTAAGAAAAAGAATC
ATACATGTAATGATACTGAATCCAGTTGGAAAAAGAACATTATTAGTAGC
==> all.pep1 <==
>Zm_accD
MFKYQCNFLFSNEKVEHHRCGPSKSIDSVDSIGHTDRSEQPILNDMEKKV
PSWNRISNYSFNNVDYLFDIRNIWSLISDDTFFVRDSNGDRYSIFFDIEN
HIFEVDNDSSFMSELEIIVSSYLNRGSKSKKKNHTCNDTESSWKKNIISS
IDSYLRFEVSINSSISSSTNESYIYNFICTENKNSSESDRSSIRTSQNID
DLDIRVEESNHNDNPFYKFRHLWVQCENCYGAHYKQFFGEKMYICEFCGY
HLKMSSSDRIELSIDPGTWDPMDEYMVSVDPIEFDSPVEFDEEYEDEEPD
SDRDHIDFSRDQPEDDDSYIDRIDSYQRETGLNEAVQTGIGQLNGIPVAF
GVMDFQFMGGSMGSVVGEKITRLIEYATNRSLPIIIVCASGGARMQEGSL
SLMQMAKISSVLYNYQLNKKLFYVAILTDPTTGGVTASFAMLGDIIIAEP
2)同源蛋白分组信息,一个分组的蛋白会进行比对,计算值
for i in `fastalength all.pep | cut -f 2 -d '_' | sort | uniq -c |awk '$1==3' | grep -v '\-D' | awk '{print $2}'`;do fastalength all.pep | grep -v '\-D' | grep "$i\>" | awk '{printf"%s\t",$2}END{printf"\n"}'>> all.homo; done #生成分组信息表
###
Kg_accD Az_accD
Kg_atpA Az_atpA
Kg_atpB Az_atpB
Kg_atpE Az_atpE
Kg_atpF Az_atpF
Kg_atpH Az_atpH
Kg_atpI Az_atpI
###
3.ParaAT对齐并计算Ka Ks值
ParaAT会先对齐氨基酸序列,然后根据氨基酸序列对齐核酸序列。并且有参数可以直接计算Ka Ks值。
perl /home/chenyw/bin/ParaAT.pl -h all.homo -n all.cds1 -a all.pep1 -p proc -o output -f axt -c 11 -k
ParaAT计算Ka Ks是用KaKs_Calculator计算的,采用的是默认参数,及-c 1 -m MA,所以如果要选用其它密码子表或者模型就要自己手动计算,接下来是手动计算的过程。
4.用KaKs_Calculator计算Ka Ks
/home/chenyw/bin/KaKs_Calculator -i Kg_psbN-Zs_psbN.cds_aln.axt -o Kg_psbN-Zs_psbN.cds_aln.axt.kaks -c 11 -m MS
网友评论