purrr 是什么 ,官方介绍时Functional Programming Tools(函数式编程工具),可以让代码变得更加简洁清晰。初学的朋友可以不用管这个东西,这只是锦上添花。
然后看purrr官方的一个例子:
library(purrr)
mtcars %>%
split(.$cyl) %>% # from base R
map(~ lm(mpg ~ wt, data = .)) %>%
map(summary) %>%
map_dbl("r.squared")
## 4 6 8
## 0.5086326 0.4645102 0.4229655
这里给出了三个回归模型的R方。purrr的核心函数式map,其相当于循环。将上面的代码拆开。
head(mtcars,3)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
mtcars 是数据集
mtcars %>%
split(.$cyl)
## $`4`
## mpg cyl disp hp drat wt qsec vs am gear carb
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
##
## $`6`
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
##
## $`8`
## mpg cyl disp hp drat wt qsec vs am gear carb
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
split(.$cyl)将数据集根据cyl划分成为三份,返回一个list列表。split这个函数来自于base R,也就是基础R中的函数。
mtcars %>%
split(.$cyl) %>% # from base R
map(~ lm(mpg ~ wt, data = .))
## $`4`
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Coefficients:
## (Intercept) wt
## 39.571 -5.647
##
##
## $`6`
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Coefficients:
## (Intercept) wt
## 28.41 -2.78
##
##
## $`8`
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Coefficients:
## (Intercept) wt
## 23.868 -2.192
利用 map 函数对于列表中的每一个数据集构建回归模型,这里会构建三个回归模型。然后是对构建好的模型调用summary函数。
mtcars %>%
split(.$cyl) %>% # from base R
map(~ lm(mpg ~ wt, data = .)) %>%
map(summary)
## $`4`
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1513 -1.9795 -0.6272 1.9299 5.2523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.571 4.347 9.104 7.77e-06 ***
## wt -5.647 1.850 -3.052 0.0137 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.332 on 9 degrees of freedom
## Multiple R-squared: 0.5086, Adjusted R-squared: 0.454
## F-statistic: 9.316 on 1 and 9 DF, p-value: 0.01374
##
##
## $`6`
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Residuals:
## Mazda RX4 Mazda RX4 Wag Hornet 4 Drive Valiant Merc 280
## -0.1250 0.5840 1.9292 -0.6897 0.3547
## Merc 280C Ferrari Dino
## -1.0453 -1.0080
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.409 4.184 6.789 0.00105 **
## wt -2.780 1.335 -2.083 0.09176 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.165 on 5 degrees of freedom
## Multiple R-squared: 0.4645, Adjusted R-squared: 0.3574
## F-statistic: 4.337 on 1 and 5 DF, p-value: 0.09176
##
##
## $`8`
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1491 -1.4664 -0.8458 1.5711 3.7619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.8680 3.0055 7.942 4.05e-06 ***
## wt -2.1924 0.7392 -2.966 0.0118 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.024 on 12 degrees of freedom
## Multiple R-squared: 0.423, Adjusted R-squared: 0.3749
## F-statistic: 8.796 on 1 and 12 DF, p-value: 0.01179
然后是选取summary结果中的r.squared属性。
mtcars %>%
split(.$cyl) %>% # from base R
map(~ lm(mpg ~ wt, data = .)) %>%
map(summary) %>%
map_dbl("r.squared")
## 4 6 8
## 0.5086326 0.4645102 0.4229655
构建一个回归模型,然后使用str可以查看对象中有哪些属性。
l <- lm(mpg ~ wt, data = mtcars)
ls <- summary(l)
str(ls)
## List of 11
## $ call : language lm(formula = mpg ~ wt, data = mtcars)
## $ terms :Classes 'terms', 'formula' language mpg ~ wt
## .. ..- attr(*, "variables")= language list(mpg, wt)
## .. ..- attr(*, "factors")= int [1:2, 1] 0 1
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "mpg" "wt"
## .. .. .. ..$ : chr "wt"
## .. ..- attr(*, "term.labels")= chr "wt"
## .. ..- attr(*, "order")= int 1
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(mpg, wt)
## .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
## .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
## $ residuals : Named num [1:32] -2.28 -0.92 -2.09 1.3 -0.2 ...
## ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
## $ coefficients : num [1:2, 1:4] 37.285 -5.344 1.878 0.559 19.858 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:2] "(Intercept)" "wt"
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
## $ aliased : Named logi [1:2] FALSE FALSE
## ..- attr(*, "names")= chr [1:2] "(Intercept)" "wt"
## $ sigma : num 3.05
## $ df : int [1:3] 2 30 2
## $ r.squared : num 0.753
## $ adj.r.squared: num 0.745
## $ fstatistic : Named num [1:3] 91.4 1 30
## ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
## $ cov.unscaled : num [1:2, 1:2] 0.38 -0.1084 -0.1084 0.0337
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:2] "(Intercept)" "wt"
## .. ..$ : chr [1:2] "(Intercept)" "wt"
## - attr(*, "class")= chr "summary.lm"
$residuals可以调用回归模型的残差:
ls$residuals
## Mazda RX4 Mazda RX4 Wag Datsun 710
## -2.2826106 -0.9197704 -2.0859521
## Hornet 4 Drive Hornet Sportabout Valiant
## 1.2973499 -0.2001440 -0.6932545
## Duster 360 Merc 240D Merc 230
## -3.9053627 4.1637381 2.3499593
## Merc 280 Merc 280C Merc 450SE
## 0.2998560 -1.1001440 0.8668731
## Merc 450SL Merc 450SLC Cadillac Fleetwood
## -0.0502472 -1.8830236 1.1733496
## Lincoln Continental Chrysler Imperial Fiat 128
## 2.1032876 5.9810744 6.8727113
## Honda Civic Toyota Corolla Toyota Corona
## 1.7461954 6.4219792 -2.6110037
## Dodge Challenger AMC Javelin Camaro Z28
## -2.9725862 -3.7268663 -3.4623553
## Pontiac Firebird Fiat X1-9 Porsche 914-2
## 2.4643670 0.3564263 0.1520430
## Lotus Europa Ford Pantera L Ferrari Dino
## 1.2010593 -4.5431513 -2.7809399
## Maserati Bora Volvo 142E
## -3.2053627 -1.0274952
然后计算三个模型的平均残差:
mtcars %>%
split(.$cyl) %>% # from base R
map(~ lm(mpg ~ wt, data = .)) %>%
map(summary) %>% map("residuals") %>% map_dbl(mean)
## 4 6 8
## 1.714616e-16 7.140246e-17 1.269136e-16
这里就返回出了每个模型的平均残差。map与map_dbl的区别是map返回的结果是一个list,而map_dbl的返回结果是一个向量。
需要注意的是,这里数据集会使用.,来代替。更详细的资料参见https://github.com/rstudio/cheatsheets/raw/master/purrr.pdf
网友评论