- 【R for Data Science】(3) Data Tra
- [R语言] tidyr包 数据整理《R for data sci
- R for Data science-3 Data visual
- R for Data Science
- [R语言] tibble包《R for data science
- [R语言] Functions 函数《R for data sc
- 【R for Data Science】(2) tibble
- [R语言] Vectors 向量操作《R for data sc
- [R语言] Join 连接《R for data science
- [R语言] readr包 数据解析及导入《R for data
通常我们的数据不能直接用于可视化处理,因此我们要对它们进行转化整理(transform),比如创建新的变量或重命名变量或者重新整理观测值的顺序等等。
1. 安装
这里使用 nycflights13 和 tidyverse 两个包,其中主要用到 dplyr 包中函数:
library(nycflights13)
library(tidyverse)
nycflights13 中的 flights 数据对象含有 336776 个 2013 年纽约的航班信息:
flights.png
注意 :
intstands for integers.dblstands for doubles, or real numbers.chrstands for character vectors, or strings.dttmstands for date-times (a date + a time).lglstands for logical, vectors that contain only TRUE or FALSE.fctrstands for factors, which R uses to represent categorical variables with fixed possible values.datestands for dates.
dplyr basicsfilter(): Pick observations by their values.arrange(): Reorder the rows.select(): Pick variables by their names.mutate(): Create new variables with functions of existing variables.summarise(): Collapse many values down to a single summary.
2. filter() 筛选观测值(行)
选取特定值:
filter1.png
2.1
near() 能用来判断两个值是否相等:
near().png
2.2 逻辑判断:
&is "and",|is "or", and!is "not". “与或非”
x %in% y: This will select every row where x is one of the values in y.
!(x & y)is the same as!x | !y, and!(x | y)is the same as!x & !y.
logical operators.png
filter(flights, month == 11 | month == 12)
filter(flights, month %in% c(11, 12))
filter(flights, !(arr_delay > 120 | dep_delay > 120))
filter(flights, arr_delay <= 120, dep_delay <= 120)
filter2.png
2.3 缺失值:
NArepresents an unknown value so missing values are “contagious”: almost any operation involving an unknown value will also be unknown.
判断一个值是否是 NA, 使用 is.na()。
尝试:
filter(df, is.na(x) | x > 1)
3. arrange() 对行进行重排序
默认情况下按升序排列。使用desc() 可以降序排列, NA 值进行排序时候再末尾:
desc.png
4. select() 筛选特征值(列)
flights 对象有 19 个特征值,可以直接选择所需要的特征值进行后续分析:
select.png
There are a number of helper functions you can use within
select():
starts_with("abc"): matches names that begin with “abc”.ends_with("xyz"): matches names that end with “xyz”.contains("ijk"): matches names that contain “ijk”.matches("(.)\\1"): selects variables that match a regular expression. This one matches any variables that contain repeated characters.num_range("x", 1:3): matchesx1,x2andx3.
rename() 函数可以用来重命名变量:
rename.png
5. mutate() 添加新变量
一般把新变量添加在数据末尾:
mutate.png
6. summarise() 分组统计
能将一整个数据框统计成一行。同时会用到 group_by() 来进行分组:
image.png














网友评论