数据导入
read.table 是R语言中的数据读取函数,可以读取多种形式的表格。
以下是其默认的参数设置,这次文章先记下我个人常用的参数吧。
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
row.names, col.names, as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
file : 是读入的文件,可以是绝对路径,也可以通过setwd()改变目录后的文件名称。
header : 逻辑判断参数,为True时,将第一行认为是列名。
sep : 识别列与列之间分隔的字符形式通过此参数设置。有时候文件中的空格有可能会是制表符(\t),所以要分清楚列之间分隔的字符形式。
comment.char :默认情况下,read.table 用 # 作为注释标识字符。如果碰到该字符(除了在被引用的字符串内),该行中随后的内容将会被忽略。只含有空白和注释的行被当作空白行。如果确认数据文件中没有注释内容,用 comment.char = “” 会比较安全 ,也会让读入速度增加。
row.names :可输入向量作为行名。想要使用第一列作为行名时,输入row.names = x[,1]
colClasses :可以输入一组向量改变读入数据中列的类型,如果要指定改名某列变量,要指明列的名称,例如colClasses = c('x' = 'character') 即可。
> a <- read.table("c27-ha_go_bp.txt", sep = "\t", header = T)
> str(a)
'data.frame': 233 obs. of 8 variables:
$ GO.biological.process.complete: Factor w/ 233 levels "adaptive immune response (GO:0002250)",..: 13 19 55 33 52 164 136 88 120 85 ...
$ Homo.sapiens...REFLIST..20996.: int 1315 959 538 7634 482 7492 7841 678 6985 8389 ...
$ upload_1..1992. : int 244 196 1 934 0 917 950 152 861 997 ...
$ upload_1..expected. : num 124.8 91 51 724.3 45.7 ...
$ upload_1..over.under. : Factor w/ 2 levels "-","+": 2 2 1 2 1 2 2 2 2 2 ...
$ upload_1..fold.Enrichment. : Factor w/ 135 levels " < 0.01","0.02",..: 71 80 2 31 1 31 30 95 32 27 ...
$ upload_1..raw.P.value. : num 1.37e-20 1.64e-20 3.55e-20 4.85e-20 1.15e-19 ...
$ upload_1..FDR. : num 2.17e-16 1.29e-16 1.87e-16 1.92e-16 3.64e-16 ...
#将特定列的因子类型改为字符型
> a <- read.table("c27-ha_go_bp.txt", sep = "\t", header = T,
+ colClasses = c("upload_1..over.under." = "character")
+ )
> str(a)
'data.frame': 233 obs. of 8 variables:
$ GO.biological.process.complete: Factor w/ 233 levels "adaptive immune response (GO:0002250)",..: 13 19 55 33 52 164 136 88 120 85 ...
$ Homo.sapiens...REFLIST..20996.: int 1315 959 538 7634 482 7492 7841 678 6985 8389 ...
$ upload_1..1992. : int 244 196 1 934 0 917 950 152 861 997 ...
$ upload_1..expected. : num 124.8 91 51 724.3 45.7 ...
$ upload_1..over.under. : chr "+" "+" "-" "+" ...
$ upload_1..fold.Enrichment. : Factor w/ 135 levels " < 0.01","0.02",..: 71 80 2 31 1 31 30 95 32 27 ...
$ upload_1..raw.P.value. : num 1.37e-20 1.64e-20 3.55e-20 4.85e-20 1.15e-19 ...
$ upload_1..FDR. : num 2.17e-16 1.29e-16 1.87e-16 1.92e-16 3.64e-16 ...
Excel数据的导入
1.准备好Excel中的数据
2.将其保存到制表符分割的ASCII文件中,一般是“文本文件(制表符分隔)”
3.使用read.table函数读入R中
数据导出
在 R中处理完数据后,一般我们都要将其导出作进一步分析,或是给其他人。因此将数据框导出到excel会是较好的选择。
write.csv(x, file = "", append = FALSE, quote = TRUE, sep = " ",
eol = "\n", na = "NA", dec = ".", row.names = TRUE,
col.names = TRUE, qmethod = c("escape", "double"),
fileEncoding = "")
write.csv() 将数据框导出为csv格式
row.names ,col.names :都是两个逻辑判断参数,若为TRUE导出的数据首行/首列为行名/列名。
完。











网友评论