R - Parallel Computing

This article introduces parallel computing through the parallel package on a computer with multiple cores.

Parallel computing operation steps:

Load a parallel computing package, such as library(parallel).
Create several "workers", usually one worker = one core (core);
These workers know nothing, their global environment has no variables , and there is no loaded R package , so no matter what you want these workers to do, you need to provide the corresponding objects and libraries;
Use some functions that can run loops in parallel, such as parApply, parLapply, parSapply.
When you are done using parallel backends and do not need workers, stop them, otherwise, they will continue to hang in memory.

illustrate:

1- About the parallel version of the loop function. When referring to loops in BaseR, there are for and while functions for loops, and there are also apply family functions. Correspondingly, in parallel computing, there are also apply family functions for parallel computing.

library(parallel)


# run this code instead to use all available CPU cores
#variable c1 is workers (clusters)
#启动workers，包括确定使用的workers数量。workers=cores
cl <- makeCluster(detectCores()) 


#将当前R中的变量(这里命名为object1和object2，是任何R对象)导出到新创建的workers的全局变量中，以便workers使用它们。注意第一个参数是workers。
clusterExport (cl, varlist = c("object1", "object2"))


#对some.vector中的每个元素，分别使用FUN作用，返回结果是向量。
#parSapply函数的第一个参数是workers;
#操作类似sapply函数，可先查阅saplly函数的用法。
#将返回结果存储在result对象中
result <- parSapply (cl, some.vector, FUN = function (i) {some.function1; some.function2})


#关闭workers
stopCluster (cl)

Example:

Generate 1e6 random numbers from a standard normal distribution, calculate the mean of these random numbers, and repeat the process 100 times.

Non-parallel version code:

lapply (1:100, FUN = function (x) mean (rnorm (1000000)))

Parallel version code:

library (parallel)
cl <- makeCluster (4)
res <- parLapply (cl, X = 1:100, fun = function (x) mean (rnorm (1000000)))
stopCluster (cl)

Note: The lapply and parLapply used here, the sapply function is a simplified version of the lapply function, the sapply function returns a vector, and lapply returns a list. lapply(list + apply), sapply (simplify+apply).

When the computer is running the above two functions, we open the task manager (shortcut key: ctrl+Alt+Del), the non-parallel program only uses part of the computer capacity, in this example, the non-parallel version of the program only uses 39% CPU, while the CPU of the parallel version is 100%.

The microbenchmark package of the R language is used for performance testing. The microbenchmark function is a function in the microbenchmark package for measuring the execution time of a code block. The result of the microbenchmark function will return a data frame that contains the time results for each execution, as well as some statistics, such as average time, minimum time, maximum time, etc. The purpose of this code is to test and compare the execution time of different code blocks through the microbenchmark function to evaluate their performance.

mb <- microbenchmark::microbenchmark (
  {
    lapply (1:100, FUN = function (x) mean (rnorm (1e6)))
  },
  {
    library (parallel)
    cl <- makeCluster (4L)
    res <- parLapply (cl, X = 1:100, fun = function (x) mean (rnorm (1e6)))
    stopCluster (cl)
  }, 
  times = 10)
mb

operation result:

Unit: seconds
...
      min       lq     mean   median       uq      max neval cld
 7.389548 7.522466 7.566548 7.585431 7.605311 7.703006    10   b
 2.853429 2.890022 2.954747 2.943975 2.968527 3.114184    10  a

Through the comparison of the running time of the two versions of the program, it can be seen that the calculation time of the parallel version of the program is not 4 times faster than that of the non-parallel version of the program, because we use 4 cores, and the parallel version of the program should run as expected The speed is 4 times faster, and the reason why this expectation is not achieved is: it also takes some time to manage the parallelism: splitting the data, sending them to individual workers, collecting the results, and merging the results together.

Therefore, the time it takes for parallel computing to adapt to a computation is much higher than the time it takes for R to communicate with a single core.

In fact, if the calculation of the mean of 1e6 random numbers is increased to the calculation of the mean of 1e7 random numbers, repeated 100 times, the speed of the parallel version will increase by almost 4 times (83.8 non-parallel vs. 21.5 parallel).

NOTE: Unless you have a reasonably powerful computer, do not attempt to run the code below, as it will take a while for the computer to run the code below. Note that in the code below, the number of repetitions has been reduced to 5, otherwise it would take longer.

mb <- microbenchmark::microbenchmark (
  {
    lapply (1:100, FUN = function (x) mean (rnorm (1e7)))
  },
  {
    library (parallel)
    cl <- makeCluster (4L)
    res <- parLapply (cl, X = 1:100, fun = function (x) mean (rnorm (1e7)))
    stopCluster (cl)
  }, 
  times = 5)
mb

Unit: seconds
...
      min       lq     mean   median       uq      max neval cld
 83.08273 83.82933 83.95855 83.97395 84.39401 84.51273     5   b
 21.42050 21.43552 21.58001 21.49912 21.58116 21.96373     5  a

reference:

Parallelization in R [David Zelený]

R - Parallel Computing

Guess you like