参考:https://r-pkgs.org/package-structure-state.html#binary-package
Package structure and state
Package states
R包一共有5种states:
- source
- bundled
- binary
- installed
- in-memory
install.packages()和devtools::install_github()都是从source, bundled, binary这些states转成 installed states。而library()则是使installed package变成in-memory
Source package
源码包就是一个目录下面有着特定的结构,就像我们自己开发R包最开始产生的那个目录结构一样,包括DESCRIPTION文件、R/目录下放着函数的.R文件等等。
如果需要查看源码包,直接找上CRAN即可(当然如果是bioconductor包,去找相应的Github就是),eg.
- forcats: https://cran.r-project.org/package=forcats
- readxl: https://cran.r-project.org/package=readxl
其中一个会给出来的链接就是:在Github上公开的
- forcats: https://github.com/tidyverse/forcats
- readxl: https://github.com/tidyverse/readxl
有的作者可能忘记添加这种URL了,不过肯定也是可以找到的。
如果有的包不是在公共平台上发布的,也可以在一些非官方的、仅可读的镜像上如 METACRAN上找到。比如:
Bundled package
Bundled package就是经过压缩打包的R包。在linux上经常就是.tar.gz格式,意味着这个state就是把很多文件打包起来(.tar)然后再gzip压缩(.gz)。这种state主要是方便传输,一般是一种中间形式。
如果要对本地开发的R包制作这种state,可以使用devtools::build(),相当于是调用了 pkgbuild::build() 并最终 R CMD build,更详细的信息可以参考:https://cran.r-project.org/doc/manuals/R-exts.html#Building-package-tarballs
但是实际情况是,一个bundled包并不是简单地tar打包然后gzip压缩做成的,在R里面,制作一个.tar.gz文件实际上还有更多其他操作。
举例来说,forcats_0.4.0.tar.gz下载后,终端解压:
tar xvf forcats_0.4.0.tar.gz
这样解压以后,你会发现实际上就和source package的结构相当,重要states内容比较如下:
package_states.png
总结来说,source package和uncompressed bundle之间主要的区别就是:
- Vignettes have been built, so rendered outputs, such as HTML, appear below
inst/doc/and a vignette index appears in thebuild/directory, usually alongside a PDF package manual.- A local source package might contain temporary files used to save time during development, like compilation artefacts in
src/. These are never found in a bundle.- Any files listed in
.Rbuildignoreare not included in the bundle. These are typically files that facilitate your development process, but that should be excluded from the distributed product.
.Rbuildignore
这个就和版本控制工具如Git的.gitignore相似,这个文件决定了什么文件会进一步被用到下游形式中(如bundle),什么文件会被抛弃。
文件是用正则表达式写的,如下面这样:
^foofactors\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
^README\.Rmd$
每一个以^开头$结尾的文件都是会被后期抛弃的,这种文件只是在开发过程中起作用。为了避免正则表达式写错,最安全的排除指定文件的做法是:
usethis::use_build_ignore("notes")
总的来说:这个文件就是让你的开发更方便,你需要不停做测试和修改,然而有些中间文件并不能上传到CRAN上。
.Rbuildignoreis a way to resolve some of the tension between the practices that support your development process and CRAN’s requirements for submission and distribution
可能涉及到的中间文件包括:
- Files that help you generate package contents programmatically. Examples:
- Using
README.Rmdto generate an informative and currentREADME.md.- Storing
.Rscripts to create and update internal or exported data.- Files that drive package development, checking, and documentation, outside of CRAN’s purview. Examples:
- Files relating to the RStudio IDE.
- Using the pkgdown package to generate a website.
- Configuration files related to continuous integration/deployment and monitoring test coverage.
举例:tidyverse 注意这里只是为了展示,真实情况不一定
^.*\.Rproj$ # Designates the directory as an RStudio Project
^\.Rproj\.user$ # Used by RStudio for temporary files
^README\.Rmd$ # An Rmd file used to generate README.md
^LICENSE\.md$ # Full text of the license
^cran-comments\.md$ # Comments for CRAN submission
^\.travis\.yml$ # Used by Travis-CI for continuous integration testing
^data-raw$ # Code used to create data included in the package
^pkgdown$ # Resources used for the package website
^_pkgdown\.yml$ # Configuration info for the package website
^\.github$ # Contributing guidelines, CoC, issue templates, etc.
Binary package
如果需要把R包分享给其他没有R包开发经验的用户,就需要用到binary package,而且这种包的形式是平台特异的。比如Windows和macOS。如果需要制作一个二进制包,需要使用如下代码:
devtools::build(binary = TRUE)
不过一般最开始制作这种二进制包并发布的是CRAN,用户并不需要。在CRAN上提交package bundle,然后它会帮你发布二进制的包。
Installed package
安装后的包就是二进制包解压以后的package library。下图展示了包下载的一些方式,实际情况还要复杂很多:
installation.png
Package libraries
查看可用的包:
# on Windows
.libPaths()
#> [1] "C:/Users/jenny/Documents/R/win-library/3.6"
#> [2] "C:/Program Files/R/R-3.6.0/library"
lapply(.libPaths(), list.dirs, recursive = FALSE, full.names = FALSE)
#> [[1]]
#> [1] "abc" "anytime" "askpass" "assertthat"
#> ...
#> [145] "zeallot"
#>
#> [[2]]
#> [1] "base" "boot" "class" "cluster"
#> [5] "codetools" "compiler" "datasets" "foreign"
#> [9] "graphics" "grDevices" "grid" "KernSmooth"
#> [13] "lattice" "MASS" "Matrix" "methods"
#> [17] "mgcv" "nlme" "nnet" "parallel"
#> [21] "rpart" "spatial" "splines" "stats"
#> [25] "stats4" "survival" "tcltk" "tools"
#> [29] "translations" "utils"
我们可以看到R的library分成了两类:
- A user library
- A system-level or global library
第一类就是用户自己后来添加的包,从CRAN、bioconductor等各处的都有。第二类是核心包,比如base,系统默认自带的。目的是方便管理,其他安装的包的添加或删除不会干扰到原来的基础包。
从path中也可以反映出,如果要对R进行升级更新,比如从3.5 到3.6(minor version),那么需要重新安装包。但是如果是R 3.6.0到3.6.1(patch release),就不需要重新安装。














网友评论