美文网首页
合并run-featurecounts.R产生的count文件

合并run-featurecounts.R产生的count文件

作者: 余绕 | 来源:发表于2023-04-21 16:44 被阅读0次

I have several files, each file has four cloumns: gene_id, counts, fpkm and tpm Can you write a perl script to merge the tpm coulms of these files basing on the gene_id? The script should be able to calculate the file number from the commandlines.


#!/usr/bin/perl
use strict;
use warnings;

# Get file names from command line arguments
my @files = @ARGV;

# Initialize hash to store gene ID TPM values
my %tpm_values;

# Process each file
foreach my $file (@files) {
    open my $fh, '<', $file or die "Can't open $file: $!";
    # Get column name from file name
    my ($column_name) = $file =~ /(\S+)\.count$/;
    # Skip header line
    my $header = <$fh>;
    while (my $line = <$fh>) {
        chomp $line;
        # Split line into columns
        my ($gene_id, $counts, $fpkm, $tpm) = split /\t/, $line;
        # Add TPM value to hash for this gene ID and column name
        $tpm_values{$gene_id}{$column_name} = $tpm;
    }
    close $fh;
}

# Print merged TPM values
# Print header row with column names
print "Gene_ID\t";
for my $file (@files) {
    my ($column_name) = $file =~ /(\S+)\.count$/;
    print "$column_name\t";
}
print "\n";

foreach my $gene_id (sort keys %tpm_values) {
    print "$gene_id\t";
    # Print TPM value from each file for this gene ID
    for my $file (@files) {
        my ($column_name) = $file =~ /(\S+)\.count$/;
        print exists $tpm_values{$gene_id}{$column_name} ? "$tpm_values{$gene_id}{$column_name}\t" : "NA\t";
    }
    print "\n";
}

合并前文件内容:
gene_id                  counts          fpkm         tpm
LOC_Os01g01010  248     10.6260353400409        17.4281762622683
LOC_Os01g01019  1       0.115196235653905       0.18893785268724
LOC_Os01g01030  31      1.63336480724046        2.67894551921027
LOC_Os01g01040  275     13.4319764240168        22.0303100053021
LOC_Os01g01050  362     23.0490775108712        37.8036937284081
LOC_Os01g01060  179     25.2596545730101        41.4293476479946
LOC_Os01g01070  200     13.7092035461406        22.4850010536979
LOC_Os01g01080  713     44.8696317769905        73.5924384220478
LOC_Os01g01090  1       0.0538921368127652      0.0883906019005892
LOC_Os01g01100  0       0       0
LOC_Os01g01110  1       0.14570836990118        0.238981997731223
LOC_Os01g01115  10      0.53360525105611        0.875186847425071
LOC_Os01g01120  232     24.9748495514202        40.9622277902293
LOC_Os01g01130  53      3.43695621970201        5.63708635307769
运行:
perl Merge_files_FPKM.pl   BPT_0d_RNA_TPM.count   BPT_1d_RNA_TPM.count   BPT_2d_RNA_TPM.count BPT_5d_RNA_TPM.count >ALL_FPKM
合并后:
Gene_ID         BPT_0d_RNA_TPM   BPT_1d_RNA_TPM  BPT_2d_RNA_TPM    BPT_5d_RNA_TPM  
ChrSy.fgenesh.gene.1    0       0       0       0       
ChrSy.fgenesh.gene.10   0       0       0       0       
ChrSy.fgenesh.gene.11   0       0       0       0       
ChrSy.fgenesh.gene.12   0.0312652903977292      0       0.066672911473648       0.0229874436731277      
ChrSy.fgenesh.gene.13   0.181866321164529       0       0       0       
ChrSy.fgenesh.gene.14   0.433521572989911       0.405008007567322       0.462240156579073       1.27496691870357        
ChrSy.fgenesh.gene.15   0       0       0       0       
ChrSy.fgenesh.gene.16   0       0       0.177648218071233       0       
ChrSy.fgenesh.gene.17   0       0       0       0.022022270106722       
ChrSy.fgenesh.gene.18   0.255806921435966       0       0       0       
ChrSy.fgenesh.gene.19   0.139531048055982       0.195530725416455       0.223161397907665       0.102588591599082       
ChrSy.fgenesh.gene.2    0       0       0       0       
ChrSy.fgenesh.gene.20   0       0       0       0       
ChrSy.fgenesh.gene.21   0       0       0.0705026870674347      0       
ChrSy.fgenesh.gene.22   0       0       0       0       

相关文章

网友评论

      本文标题:合并run-featurecounts.R产生的count文件

      本文链接:https://www.haomeiwen.com/subject/ktrjjdtx.html