美文网首页走进转录组
bulk RNA-Seq (3)表达定量

bulk RNA-Seq (3)表达定量

作者: Bioinfor生信云 | 来源:发表于2022-06-28 10:31 被阅读0次

欢迎关注Bioinfor 生信云微信公众号!

上一篇推文讲了bulk RNA-Seq 的比对到参考基因组部分,接下来就是基因表达定量了。计算表达定量可以用StringTie、Htseq-cout、featureCounts,这里推荐使用featureCounts。featureCounts 是Rsubread 软件包里的一个命令,所以安装R版本Rsubread的即可。这里使用了一个脚本,在运行featureCounts 的同时,计算FPKM、TPM。

安装相关的R包

if(!requireNamespace("BiocManager", quietly = TURE))
    install.packages("BiocManager")
BiocManager::install("Rsubread")
BiocManager::install("limma")
BiocManager::install("edgeR")

featureCounts.R

# Create a parser
p <- arg_parser("run featureCounts and calculate FPKM/TPM")

# Add command line arguments
p <- add_argument(p, "--bam", help="input: bam file", type="character")
p <- add_argument(p, "--gtf", help="input: gtf file", type="character")
p <- add_argument(p, "--output", help="output prefix", type="character")

# Parse the command line arguments
argv <- parse_args(p)

library(Rsubread)
library(limma)
library(edgeR)

bamFile <- argv$bam
gtfFile <- argv$gtf
nthreads <- 1
outFilePref <- argv$output

outStatsFilePath  <- paste(outFilePref, '.log',  sep = '');
outCountsFilePath <- paste(outFilePref, '.count', sep = '');

fCountsList = featureCounts(bamFile, annot.ext=gtfFile, isGTFAnnotationFile=TRUE, nthreads=nthreads, isPairedEnd=TRUE)
dgeList = DGEList(counts=fCountsList$counts, genes=fCountsList$annotation)
fpkm = rpkm(dgeList, dgeList$genes$Length)
tpm = exp(log(fpkm) - log(sum(fpkm)) + log(1e6))

write.table(fCountsList$stat, outStatsFilePath, sep="\t", col.names=FALSE, row.names=FALSE, quote=FALSE)

脚本用法

输入:bam文件,gtf文件


结果输出

.count结尾的表达定量文件


.log结尾的日志文件

Assigned        17323283  #基因结构上的reads
Unassigned_Unmapped     2909094  #未比对上的reads
Unassigned_Read_Type    0
Unassigned_Singleton    0
Unassigned_MappingQuality       0
Unassigned_Chimera      0
Unassigned_FragmentLength       0
Unassigned_Duplicate    0
Unassigned_MultiMapping 0
Unassigned_Secondary    0
Unassigned_NonSplit     0
Unassigned_NoFeatures   408001  #未落在基因结构上
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    1180806  #无法确定分配到哪个基因

喜欢就点点赞哟

相关文章

网友评论

    本文标题:bulk RNA-Seq (3)表达定量

    本文链接:https://www.haomeiwen.com/subject/ityavrtx.html