发表于Cancer Cell. 2018 Sep 10;文章是:Overcoming Resistance to Dual Innate Immune and MEK Inhibition Downstream of KRAS. 主要探究的是耐药性,用的是细胞系,但是这篇文章我最关心的是研究团队对转录组和表观数据的结合,还有对CCLE和TCGA这样的公共数据库的挖掘利用。
需要补充很多细胞系知识和药物知识,反映在下面这张图上面:
image
数据都在NCBI:
- The accession number for the RNA-seq data reported in this paper is GEO: GSE96779.
- The accession number for the ChIP-seq data reported in this paper is GEO: GSE96780.
不幸的是:Accession "GSE96779" is currently private and is scheduled to be released on Mar 16, 2020. 虽然现在还不能下载,但是只要这个数据集正式公开,就是一个绝佳的例子。
ChIP-seq 数据处理流程:
很中规中矩的流程,没什么好说的,重点就是找到peaks即可: Sequenced reads from histone H3K27 acetylation ChIPed DNA were aligned to hg19 using Bowtie2 with -k mode 1 and enriched ChIP peaks were called by MACS algorithm at the threshold of p < 1x10-5.
找到peaks后续的下游分析也比较简单:Peak signals from two sets of the ChIP peaks (MSR-A549 cells vs. A549 cells) were normalized using MAnorm algorithm and 4,251 peaks were identified to be significantly increased on MSR-A549 cells at the arbitrary threshold at p < 1x10-25.
主要是和转录组数据结合,如下;
image
RNA-seq 数据处理流程:
同样是很中规中矩的流程,而且是非常老旧的流程,可能是因为惯性的力量,不过通常我们认为老旧的流程并没有什么定量准确性的问题,只不过是速度慢了一点而已。当然我还是推荐大家转为 hisat2+featureCounts 流程: Sequenced reads from RNA-seq were aligned with Tophat aligner to UCSC Refgene hg19. Transcripts were quantified using Cufflinks and differentially expressed transcripts were identified by Cuffdiff.
组学联合分析
前面提到的ChIP-seq数据得到peaks后面就可以跟RNA-seq结合,分析如下:
Those 4,251 peaks with increased signal upon resistance were assigned to genes potentially contributed for its expression using Citrome BETA tool. Of those genes, 986 genes were identified to have their expression also significantly increased in MSR-A549 cells to momelotinib/selumetinib treatment at FDR q < 0.25.
From the resulting genes, 659 genes were identified by selecting genes with the average of RPKM in MSR-A549 cells > 0.1 and fold change of average RPKM (MSR-A549 cells/A549 cells) > 1.5, and shown in Table S1 as resistance-related genes.
表达矩阵最直接的利用是热图看感兴趣的基因集的上调下调情况,比如:
image
CCLE 数据库辅助:
Lung adenocarcinoma cell lines in CCLE repository are subdivided into 2 classes
- KP (n = 9, H2009, H358, H1792, CALU6, H441, RERFLCAD2, HCC2108, HCC1171 and H2291) harboring KRAS and TP53 mutation/deletion with intact LKB1
- KL (n = 10, H1734, HCC44, H647, H2122, H1573, H1355, A549, H2030, H23 and H1944) harboring KRAS and LKB1 mutation
The RPKM values for each cell line was obtained from CCLE repository, and then 386 differentially expressed genes between KP and KL cell lines (p < 0.05 and FDR q < 0.25) were identified using R platform and TCC package (Sun et al., 2013). Of those genes, 168 genes were upregulated specifically in KP cell lines (M > 0, A > 0), and shown in Figure 7B (Top 30 genes) and Table S2 (168 genes).
CCLE数据库的差异分析
image
CCLE数据库的差异分析的热图:
image
CCLE数据库的GSEA分析:
image
TCGA数据库辅助:
The level3 RNA-seq V2 datasets for lung adenocarcinoma samples were downloaded from TCGA data portal and classified into KP samples or KL samples according to their mutation status (See Table S3). Samples having mutations on both LKB1 and TP53 were excluded. TCGA ID: TCGA-78-7160, TCGA-78-7166 and TCGA-78-7540 were further eliminated because they were previously identified as a different subtype from KP or KL (Skoulidis et al., 2015). Then differentially expressed genes between KP samples (n = 21) and KL samples (n = 17) were analyzed by Gene Set Enrichment Analysis (GSEA) with YAP1 signature (Dupont et al., 2011) and MSigDB C3 signature TEAD1 target.
TCGA数据库里面的病人分组如下:
image
可以做各种各样的分析。
可以说是很中规中矩啦!!!
(文章转自jimmy的2018年阅读文献笔记)
生信基础知识大全系列:生信基础知识100讲
史上最强的生信自学环境准备课来啦!! 7次改版,11节课程,14K的讲稿,30个夜晚打磨,100页PPT的课程。
如果需要组装自己的服务器;代办生物信息学服务器
如果需要帮忙下载海外数据(GEO/TCGA/GTEx等等),点我?
如果需要线下辅导及培训,看招学徒
如果需要个人电脑:个人计算机推荐
如果需要置办生物信息学书籍,看:生信人必备书单
如果需要实习岗位:实习职位发布
如果需要售后:点我














网友评论