When parallel operations encounter C++ functions in R, let foreach+Rcpp work together

Table of contents

Solution 1: C++ function in R package

Solution 2: The C++ function is local and used through Rcpp::sourceCpp("fun_name.cpp")

Solution 3: Write the C++ function in the current script

Digression: Why study foreach+Rcpp?

References in this article:


Problem: Functions using Rcpp in foreach not working

Links to related problem descriptions

cl <- makePSOCKcluster(8)                                                                                     
registerDoParallel(cl)                                                                                        
rows <- foreach(i=1:8,.combine=rbind,.packages="myPackage") %dopar% multiGenerateCSVrow(scoreMatrix=NIsample,   
                                                                   validMatrix = matrix(1,nrow=10,ncol=10),   
                                                                   cutoffVector = rep(0,10),                  
                                                                   factorVector = randomsCutPlus1[i,],        
                                                                   actualVector = rep(1,10),                  
                                                                   scaleSample = 1)                           
stopCluster(cl)                                                                                               
~    

Error in multiGenerateCSVrow(scoreMatrix = NIsample, validMatrix = matrix(1,  : 
  task 1 failed - "NULL value passed as symbol address"      

Chinese error: "NULL value cannot be used as a symbolic address" expression

Error in English: "NULL value passed as symbol address"                                                                             

 How to combine foreach and Rcpp? There are the following solutions:

Solution 1: C++ function in R package

As Patrick McCarthy suggested, put the C++ functions in a package, installed and loaded the package, and passed it to the parameter of the parallel operation function forearch. packs=("...")

The premise of this method is to encapsulate the C++ function in an R package. If it is a C++ function in someone else's package, you can use this method directly. If it is a C++ function written by yourself, this method is too complicated. The reason for the complexity is that it needs to be packaged into an R function, which is a bit troublesome.

Solution 2: The C++ function is local and used through Rcpp::sourceCpp("fun_name.cpp")

There is no need to store the C++ function in an R package, just add two lines of statements to the foreach function

  • library(Rcpp) #Load the Rcpp package, because the sourceCpp() function is a function in Rcpp
  • sourceCpp("fun_name.cpp") #C++ function stored in

Note: To use the C++ function in ParLapply, you can also use the above method to put the above statement in the fun of the parLapply(, fun) function, which is equivalent to allowing each node to load this C++ function.

The library(Rcpp) in the loop body of the foreach() function can be replaced with foreach(..., .packs="Rcpp").

Here is an example:

cl = makeCluster(n_cores, outfile="")
registerDoParallel(cl)

foreach(n = 1:N,.packages = "Rcpp",.noexport = "<name of Rccp function>")%dopar%{
  source("Scripts/Rccp_functions.R")
  ### do stuff with functions scripted in Rccp_functions.R
}

stopImplicitCluster()

 Solution 3: Write the C++ function in the current script

Add parameter .noexport = c(<Functions that were implemented in C++>) in foreach function

Possible cause: C++ functions are imported into parallel from the global environment, but, since they are not normal functions, they don't actually work. This does mean that these C++ functions have to be loaded individually on each node; in my case, this was a SNOW clusterCall() call, which fetched various files, including the C++ code.

Can refer to:

worker.init <- function() {
  library(inline)
  sigFunc <- signature(x="numeric", size_x="numeric")
  code <- ' double tot =0;
  for(int k = 0; k < INTEGER(size_x)[0]; k++){
  tot += REAL(x)[k];
  };
  return ScalarReal(tot);
  '
  assign('cFunc', cxxfunction(sigFunc, code), .GlobalEnv)
  NULL
}

f1 <- function(){
  x <- rnorm(100)
  a <- cFunc(x=x, size_x=as.integer(length(x)))
  return(a)
}

library(foreach)
library(doParallel)
cl <- makePSOCKcluster(3)
clusterCall(cl, worker.init)
registerDoParallel(cl)
res <- foreach(counter=1:100) %dopar% f1()

Digression: Why study foreach+Rcpp?

Obviously parLapply+Rcpp can be used, just use parLapply+Rcpp when executing parallel. However, sometimes, a certain loop in the loop body will be broken. At this time, I hope to let the program skip the broken times and continue to execute, so as to avoid that the previous running is abolished due to a certain loop replacement. , The time spent was wasted. And there is a good parameter in the foreach function .errorhandling = c("stop", "remove", "pass"), set .errorhanding="pass", so that when a cycle is broken for some reason, the program will automatically Skip this loop and continue running, eventually returning the values ​​of all loops.

For an introduction to the foreach function, you can see: Using the foreach function

 References in this article:

r - Can't run Rcpp function in foreach - Stack Overflow

Guess you like

Origin blog.csdn.net/u011375991/article/details/132525957