参见:https://mp.weixin.qq.com/s/67rjY7w-Uh0AfnaxNoik8Q
先前我们介绍过在后台运行R 脚本,对于耗时较长的代码运行,或者复杂的包的安装,我们可以使用该方法,从而不占用前台:
直接安装一下:
remotes::install_github("lindeloev/job")
ps: 这里发现在win 下安装会发生报错:
> remotes::install_github("lindeloev/job")
错误: Failed to install 'unknown package' from GitHub:
畸形'Config/testthat/edit ...'开头行!
现在我们有更方便的方法了,只需要在代码使用job 包中的函数,就可以实现后台操作了:
job::job(
{ tmp <- matrix(sample(letters, 1000, replace = T), ncol = 10) }
)
使用方式为:
job::job({<your code>})
其实只是从手动操作,变成了代码:
如果我们想要将后台运行的结果和前台运行的结果分离,不相互污染,还可以将变量保存在一个新的环境中:
job::job(brm_result = {
fit = brm(model, data)
fit = add_criterion(fit, "loo")
print(summary(fit)) # Show a summary in the job
the_test = hypothesis(fit, "hp > 0")
})
比如有多个任务:
此外还有一些有用的信息:
Finer control
RStudio jobs spin up a new session, i.e., a new environment. By default, job::job() will make this environment identical to your current one. But you can fine control this:
import: the default "auto" setting imports all objects that are referenced by the code into the job. Control this using job::job({}, import = c(model, data)). You can also import everything (import = "all") or nothing (import = NULL).
packages: by default, all attached packages are attached in the job. Control this using job::job({}, packages = c("brms")) or set packages = NULL to load nothing. If brms is not loaded in your current session, adding library("brms") to the job code may be more readable.
options: by default, all options are overwritten/inserted to the job. Control this using, e.g., job::job({}, opts = list(mc.cores = 2) or set opts = NULL to use default options. If you want to set job-specific options, adding options(mc.cores = 2) to the job code may be more readable.
export: in the example above, we assigned the job environment to brm_result upon completion. Naturally, you can choose any name, e.g., job::job(fancy_name = {a = 5}). To return nothing, use an unnamed code chunk (insert results to globalenv() and remove everything before return: (job::job({a = 5; rm(list=ls())}). Returning nothing is useful when
your main result is a text output or a file on the disk, or
when the return is a very large object. The underlying rstudioapi::jobRunScript() is slow in the back-transfer so it's usually faster to saveRDS(obj, filename) them in the job and readRDS(filename) into your current session.
Some use cases
Model training, cross validation, or hyperparameter tuning: train multiple models simultaneously, each in their own job. If one fails, the others continue.
Heavy I/O tasks, like processing large files. Save the results to disk and return nothing.
Run unit tests and other code in an empty environment. By default, devtools::test() runs in the current environment, including manually defined variables (e.g., from the last test-run) and attached packages. Call job::job({devtools::test()}, import = NULL, packages = NULL, opts = NULL) to run the test in complete isolation.
Upgrading packages
See also
job::job() is aimed at easing interactive development within RStudio. For larger problems, production code, and solutions that work outside of RStudio, check out:
The future package's %<- operator combined with plan(multisession).
The callr package is a general tool to run code in new R sessions.