024. 解决问题篇 - 06. 用job 包在命令行后台运行命令 - 《011_R 语言学习》

参见：https://mp.weixin.qq.com/s/67rjY7w-Uh0AfnaxNoik8Q

先前我们介绍过在后台运行R 脚本，对于耗时较长的代码运行，或者复杂的包的安装，我们可以使用该方法，从而不占用前台：

直接安装一下：

remotes::install_github("lindeloev/job")

ps: 这里发现在win 下安装会发生报错：

> remotes::install_github("lindeloev/job")
错误: Failed to install 'unknown package' from GitHub:
  畸形'Config/testthat/edit ...'开头行!

现在我们有更方便的方法了，只需要在代码使用job 包中的函数，就可以实现后台操作了：

job::job(
  { tmp <- matrix(sample(letters, 1000, replace = T), ncol = 10) }
)

使用方式为：

job::job({<your code>})

其实只是从手动操作，变成了代码：

06. 用job 包在命令行后台运行命令 - 图1

如果我们想要将后台运行的结果和前台运行的结果分离，不相互污染，还可以将变量保存在一个新的环境中：

job::job(brm_result = {
  fit = brm(model, data)
  fit = add_criterion(fit, "loo")
  print(summary(fit))  # Show a summary in the job
  the_test = hypothesis(fit, "hp > 0")
})

比如有多个任务：

06. 用job 包在命令行后台运行命令 - 图2

此外还有一些有用的信息：

Finer control
RStudio jobs spin up a new session, i.e., a new environment. By default, job::job() will make this environment identical to your current one. But you can fine control this:
import: the default "auto" setting imports all objects that are referenced by the code into the job. Control this using job::job({}, import = c(model, data)). You can also import everything (import = "all") or nothing (import = NULL).
packages: by default, all attached packages are attached in the job. Control this using job::job({}, packages = c("brms")) or set packages = NULL to load nothing. If brms is not loaded in your current session, adding library("brms") to the job code may be more readable.
options: by default, all options are overwritten/inserted to the job. Control this using, e.g., job::job({}, opts = list(mc.cores = 2) or set opts = NULL to use default options. If you want to set job-specific options, adding options(mc.cores = 2) to the job code may be more readable.
export: in the example above, we assigned the job environment to brm_result upon completion. Naturally, you can choose any name, e.g., job::job(fancy_name = {a = 5}). To return nothing, use an unnamed code chunk (insert results to globalenv() and remove everything before return: (job::job({a = 5; rm(list=ls())}). Returning nothing is useful when
your main result is a text output or a file on the disk, or
when the return is a very large object. The underlying rstudioapi::jobRunScript() is slow in the back-transfer so it's usually faster to saveRDS(obj, filename) them in the job and readRDS(filename) into your current session.
Some use cases
Model training, cross validation, or hyperparameter tuning: train multiple models simultaneously, each in their own job. If one fails, the others continue.
Heavy I/O tasks, like processing large files. Save the results to disk and return nothing.
Run unit tests and other code in an empty environment. By default, devtools::test() runs in the current environment, including manually defined variables (e.g., from the last test-run) and attached packages. Call job::job({devtools::test()}, import = NULL, packages = NULL, opts = NULL) to run the test in complete isolation.
Upgrading packages
See also
job::job() is aimed at easing interactive development within RStudio. For larger problems, production code, and solutions that work outside of RStudio, check out:
The future package's %<- operator combined with plan(multisession).
The callr package is a general tool to run code in new R sessions.