參考:https://r-pkgs.org/package-structure-state.html#binary-package
Package structure and state
Package states
R包一共有5種states:
- source
- bundled
- binary
- installed
- in-memory
install.packages()
和devtools::install_github()
都是從source, bundled, binary這些states轉成 installed states。而library()
則是使installed package變成in-memory
Source package
源碼包就是一個目錄下面有著特定的結構,就像我們自己開發R包最開始產生的那個目錄結構一樣,包括DESCRIPTION文件、R/目錄下放著函數的.R文件等等。
如果需要查看源碼包,直接找上CRAN即可(當然如果是bioconductor包,去找相應的Github就是),eg.
- forcats: https://cran.r-project.org/package=forcats
- readxl: https://cran.r-project.org/package=readxl
其中一個會給出來的鏈接就是:在Github上公開的
- forcats: https://github.com/tidyverse/forcats
- readxl: https://github.com/tidyverse/readxl
有的作者可能忘記添加這種URL了,不過肯定也是可以找到的。
如果有的包不是在公共平臺上發布的,也可以在一些非官方的、僅可讀的鏡像上如 METACRAN上找到。比如:
Bundled package
Bundled package就是經過壓縮打包的R包。在linux上經常就是.tar.gz格式,意味著這個state就是把很多文件打包起來(.tar)然后再gzip壓縮(.gz)。這種state主要是方便傳輸,一般是一種中間形式。
如果要對本地開發的R包制作這種state,可以使用devtools::build()
,相當于是調用了 pkgbuild::build()
并最終 R CMD build
,更詳細的信息可以參考:https://cran.r-project.org/doc/manuals/R-exts.html#Building-package-tarballs
但是實際情況是,一個bundled包并不是簡單地tar打包然后gzip壓縮做成的,在R里面,制作一個.tar.gz文件實際上還有更多其他操作。
舉例來說,forcats_0.4.0.tar.gz下載后,終端解壓:
tar xvf forcats_0.4.0.tar.gz
這樣解壓以后,你會發現實際上就和source package的結構相當,重要states內容比較如下:
總結來說,source package和uncompressed bundle之間主要的區別就是:
- Vignettes have been built, so rendered outputs, such as HTML, appear below
inst/doc/
and a vignette index appears in thebuild/
directory, usually alongside a PDF package manual.- A local source package might contain temporary files used to save time during development, like compilation artefacts in
src/
. These are never found in a bundle.- Any files listed in
.Rbuildignore
are not included in the bundle. These are typically files that facilitate your development process, but that should be excluded from the distributed product.
.Rbuildignore
這個就和版本控制工具如Git的.gitignore相似,這個文件決定了什么文件會進一步被用到下游形式中(如bundle),什么文件會被拋棄。
文件是用正則表達式寫的,如下面這樣:
^foofactors\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
^README\.Rmd$
每一個以^開頭$結尾的文件都是會被后期拋棄的,這種文件只是在開發過程中起作用。為了避免正則表達式寫錯,最安全的排除指定文件的做法是:
usethis::use_build_ignore("notes")
總的來說:這個文件就是讓你的開發更方便,你需要不停做測試和修改,然而有些中間文件并不能上傳到CRAN上。
.Rbuildignore
is a way to resolve some of the tension between the practices that support your development process and CRAN’s requirements for submission and distribution
可能涉及到的中間文件包括:
- Files that help you generate package contents programmatically. Examples:
- Using
README.Rmd
to generate an informative and currentREADME.md
.- Storing
.R
scripts to create and update internal or exported data.- Files that drive package development, checking, and documentation, outside of CRAN’s purview. Examples:
- Files relating to the RStudio IDE.
- Using the pkgdown package to generate a website.
- Configuration files related to continuous integration/deployment and monitoring test coverage.
舉例:tidyverse 注意這里只是為了展示,真實情況不一定
^.*\.Rproj$ # Designates the directory as an RStudio Project
^\.Rproj\.user$ # Used by RStudio for temporary files
^README\.Rmd$ # An Rmd file used to generate README.md
^LICENSE\.md$ # Full text of the license
^cran-comments\.md$ # Comments for CRAN submission
^\.travis\.yml$ # Used by Travis-CI for continuous integration testing
^data-raw$ # Code used to create data included in the package
^pkgdown$ # Resources used for the package website
^_pkgdown\.yml$ # Configuration info for the package website
^\.github$ # Contributing guidelines, CoC, issue templates, etc.
Binary package
如果需要把R包分享給其他沒有R包開發經驗的用戶,就需要用到binary package,而且這種包的形式是平臺特異的。比如Windows和macOS。如果需要制作一個二進制包,需要使用如下代碼:
devtools::build(binary = TRUE)
不過一般最開始制作這種二進制包并發布的是CRAN,用戶并不需要。在CRAN上提交package bundle,然后它會幫你發布二進制的包。
Installed package
安裝后的包就是二進制包解壓以后的package library。下圖展示了包下載的一些方式,實際情況還要復雜很多:
Package libraries
查看可用的包:
# on Windows
.libPaths()
#> [1] "C:/Users/jenny/Documents/R/win-library/3.6"
#> [2] "C:/Program Files/R/R-3.6.0/library"
lapply(.libPaths(), list.dirs, recursive = FALSE, full.names = FALSE)
#> [[1]]
#> [1] "abc" "anytime" "askpass" "assertthat"
#> ...
#> [145] "zeallot"
#>
#> [[2]]
#> [1] "base" "boot" "class" "cluster"
#> [5] "codetools" "compiler" "datasets" "foreign"
#> [9] "graphics" "grDevices" "grid" "KernSmooth"
#> [13] "lattice" "MASS" "Matrix" "methods"
#> [17] "mgcv" "nlme" "nnet" "parallel"
#> [21] "rpart" "spatial" "splines" "stats"
#> [25] "stats4" "survival" "tcltk" "tools"
#> [29] "translations" "utils"
我們可以看到R的library分成了兩類:
- A user library
- A system-level or global library
第一類就是用戶自己后來添加的包,從CRAN、bioconductor等各處的都有。第二類是核心包,比如base,系統默認自帶的。目的是方便管理,其他安裝的包的添加或刪除不會干擾到原來的基礎包。
從path中也可以反映出,如果要對R進行升級更新,比如從3.5 到3.6(minor version),那么需要重新安裝包。但是如果是R 3.6.0到3.6.1(patch release),就不需要重新安裝。