美文网首页试读
R| Agilent芯片分析-limma

R| Agilent芯片分析-limma

作者: 高大石头 | 来源:发表于2021-04-22 08:04 被阅读0次

Aglient的芯片在科研界也是一大宠儿,通常根据其染色分为单通道多通道两种。最为奇葩的是Aglient芯片的许多表达矩阵下载后发现有空值、负值,因此就要求我们从原始数据开始着手。下面就一起学习下吧。

核心函数:

read.maimages(raw_datas, source = "agilent", green.only = T, other.columns = "gIsWellAboveBG")

1.单通道芯片

以下以GSE23558为例,是《aglient芯片原始数据处理》的学习笔记。

1.1 数据下载及读取

rm(list = ls())
library(tidyverse)
library(limma)
library(GEOquery)
library(AnnoProbe)
gse="GSE23558"
#setwd(gse)
#geoChina(gse)
load("D:/jianshu/microarry-analysis/GSE23558/GSE23558_eSet.Rdata") #提取原始数据

# 分组信息
pd <- pData(gset[[1]])
raw_dir <- "D:/jianshu/microarry-analysis/GSE23558/GSE23558_RAW"
raw_datas <- paste0(raw_dir,"/",list.files(raw_dir))
raw_order <- str_extract(raw_datas,"GSM\\d*")
pd <- pd[match(raw_order,rownames(pd)),]
pd <- pd %>% 
  select(geo_accession,`tissue:ch1`)
colnames(pd) <- c("id","type")
pd$type <- case_when(pd$type=="Oral Tumor"~"tumor",
                     T~"normal")
pd$type <- factor(pd$type,levels = c("normal","tumor"))
group_list <- pd$type
names(group_list) <- pd$id

#原始数据读取
data.raw <- read.maimages(raw_datas,
                          source = "agilent",
                          green.only = T,
                          other.columns = "gIsWellAboveBG")
## Read D:/jianshu/microarry-analysis/GSE23558/GSE23558_RAW/GSM577914.txt.gz 
## Read D:/jianshu/microarry-analysis/GSE23558/GSE23558_RAW/GSM577915.txt.gz 
## Read D:/jianshu/microarry-analysis/GSE23558/GSE23558_RAW/GSM577916.txt.gz 
......
## Read D:/jianshu/microarry-analysis/GSE23558/GSE23558_RAW/GSM577943.txt.gz 
## Read D:/jianshu/microarry-analysis/GSE23558/GSE23558_RAW/GSM577944.txt.gz 
## Read D:/jianshu/microarry-analysis/GSE23558/GSE23558_RAW/GSM577945.txt.gz

1.2 背景矫正和标准化

data.bg <- backgroundCorrect(data.raw,method = "normexp")
## Array 1 corrected
## Array 2 corrected
## Array 3 corrected
.....
## Array 29 corrected
## Array 30 corrected
## Array 31 corrected
## Array 32 corrected
data.norm <- normalizeBetweenArrays(data.bg,method = "quantile")

1.3 基因过滤

去掉对照探针、未匹配到genesymbol探针、去表达探针(至少在一般样本中高于背景)、重复探针。

ctrl <- data.norm$genes$ControlType==1L
Nosymbol <- is.na(data.norm$genes$GeneName)
IsExpr <- rowSums(data.norm$other$gIsWellAboveBG>0)>= nrow(pd)/2
Isdup <- duplicated(data.norm$genes$GeneName)
data.filt <- data.norm[!ctrl&!Nosymbol&IsExpr&!Isdup,]
dim(data.filt)
## [1] 20650    32

过滤后剩余2万零650个探针。

1.4 表达矩阵

data.exp <- data.filt@.Data[[1]]
library(RColorBrewer)
colors <- brewer.pal(12,"Set3")
boxplot(data.exp,col=colors,las=3)
image.png
colnames(data.exp) <- str_extract(colnames(data.exp),"GSM\\d*")

1.5 获得基因名

anno <- data.filt$genes
nrow(anno);nrow(data.exp)
## [1] 20650
## [1] 20650
rownames(data.exp)=anno$GeneName
ids <- unique(anno$GeneName)
data.exp <- data.exp[!duplicated(anno$GeneName),]

其实,整个过程相当于对作者上传的标准化矩阵进行了修复。

1.6 差异分析

design <- model.matrix(~group_list)
fit <- lmFit(data.exp,design)
fit1 <- eBayes(fit,trend = T,robust=T)
summary(decideTests(fit))
##        (Intercept) group_listtumor
## Down             0            2090
## NotSig           0           16887
## Up           20650            1673
options(digits = 4)
deg <- topTable(fit1,coef = 2,n=dim(data.exp)[1])
boxplot(data.exp[rownames(deg)[1],]~group_list)
image.png

2.双通道芯片

rm(list = ls())
gse="GSE29609"
library(limma)
library(AnnoProbe)
#setwd(gse)
#geoChina(gse)
load("D:/jianshu/microarry-analysis/GSE29609_eSet.Rdata")
pd <- Biobase::pData(gset[[1]])
raw_dir <- "D:/jianshu/microarry-analysis/GSE29609/GSE29609_RAW"

raw_datas <- paste0(raw_dir,"/",list.files(raw_dir,pattern = "GSM\\d*"))
raw_order <- str_extract(raw_datas,"GSM\\d*")
pd <- pd[match(raw_order,rownames(pd)),]

#原始数据读取
data.raw <- read.maimages(raw_datas,
                          source = "agilent",
                          green.only = F,
                          other.columns = "gIsWellAboveBG")
## Read D:/jianshu/microarry-analysis/GSE29609/GSE29609_RAW/GSM733579_US22502565_251239115211_S01_A01_GE2_44k_1005.txt.gz 
## Read D:/jianshu/microarry-analysis/GSE29609/GSE29609_RAW/GSM733580_US22502565_251239125482_S01_A01_GE2_44k_1005.txt.gz 
## Read D:/jianshu/microarry-analysis/GSE29609/GSE29609_RAW/GSM733581_US22502565_251239144561_S01_A01_GE2_44k_1005.txt.gz 
......
## Read D:/jianshu/microarry-analysis/GSE29609/GSE29609_RAW/GSM733615_US22502565_251239125485_S01_A01_GE2_44k_1005.txt.gz 
## Read D:/jianshu/microarry-analysis/GSE29609/GSE29609_RAW/GSM733616_US22502565_251239115213_S01_A01_GE2_44k_1005.txt.gz 
## Read D:/jianshu/microarry-analysis/GSE29609/GSE29609_RAW/GSM733617_US22502565_251239144552_S01_A01_GE2_44k_1005.txt.gz

2.1 背景矫正、标准化

data.bg <- backgroundCorrect(data.raw,method = "normexp")
## Array 1 corrected
## Array 2 corrected
## Array 3 corrected
......
## Array 37 corrected
## Array 38 corrected
## Array 39 corrected
## Array 1 corrected
## Array 2 corrected
## Array 3 corrected
......
## Array 37 corrected
## Array 38 corrected
## Array 39 corrected
data.norm <- normalizeBetweenArrays(data.bg,method = "quantile")
ctrl <- data.norm$genes$ControlType==1L
Nosymbol <- is.na(data.norm$genes$GeneName)
#IsExpr <- rowSums(data.norm$other$gIsWellAboveBG>0)>= nrow(pd)/2
Isdup <- duplicated(data.norm$genes$GeneName)
data.filt <- data.norm[!ctrl&!Nosymbol&!Isdup,]
dim(data.filt)
## [1] 31036    39
data.exp <- data.filt@.Data[[4]]
library(RColorBrewer)
colors <- brewer.pal(12,"Set3")
boxplot(data.exp,col=colors,las=3)
image.png
疑问:双通道芯片,我是按照单通道的处理的,不知道是否正确?希望得到大神的指导,万分感谢。

参考链接:

aglient芯片原始数据处理

双通道芯片MA和density

相关文章

  • R| Agilent芯片分析-limma

    Aglient的芯片在科研界也是一大宠儿,通常根据其染色分为单通道和多通道两种。最为奇葩的是Aglient芯片的许...

  • 三种差异分析的整理

    针对测序数据和芯片数据,目前常用差异分析的R包有edgeR、limma、DESeq2,做一简单比较,方便平时分析。...

  • R|FPKM、RPKM差异分析

    芯片数据差异分析,常规用limma进行差异分析,而对于RNA-seq数据,常用edgeR、DEseq2和limma...

  • 用R语言分析:RNAseq表达矩阵样本的差异性

    我们之前介绍了limma包,limma包是对基因芯片表达矩阵的分析,不能对逆转录RNAseq表达矩阵进行分析(因为...

  • R实例:limma包分析Agilent 4x44K Arrays

    目前网页上常见的是使用Affy的包对Affymetrix的芯片分析教程,而忽略了对其他平台芯片分析。limma包是...

  • Agilent芯片的差异分析

    在进行目的芯片的差异分析之前,有两大拦路虎,一只是混杂因素;另一只,就是今天的问题,目的芯片的Agilent平台的...

  • Agilent芯片表达矩阵处理

    Agilent的芯片同样也是扫描得到图片,然后图像处理(主要是Agilent Feature Extraction...

  • 转录组数据差异分析

    差异分析• limma• edgeR• DESeq2 title: "三大R包差异分析"output: html_...

  • 芯片表达谱分析(四)

    本文补充些芯片数据分析注意事项。 GEOquery 与 limma前面文章的分析案例都是从原始数据开始分析,如果想...

  • GSEA分析笔记

    之前对芯片数据的分析,基本上就是limma包进行差异分析,然后对差异基因进行GO富集分析。GSEA(Gene Se...

网友评论

    本文标题:R| Agilent芯片分析-limma

    本文链接:https://www.haomeiwen.com/subject/ozpclltx.html