Table of contents
Solution 1: C++ function in R package
Solution 2: The C++ function is local and used through Rcpp::sourceCpp("fun_name.cpp")
Solution 3: Write the C++ function in the current script
Digression: Why study foreach+Rcpp?
Problem: Functions using Rcpp in foreach not working
Links to related problem descriptions
cl <- makePSOCKcluster(8)
registerDoParallel(cl)
rows <- foreach(i=1:8,.combine=rbind,.packages="myPackage") %dopar% multiGenerateCSVrow(scoreMatrix=NIsample,
validMatrix = matrix(1,nrow=10,ncol=10),
cutoffVector = rep(0,10),
factorVector = randomsCutPlus1[i,],
actualVector = rep(1,10),
scaleSample = 1)
stopCluster(cl)
~Error in multiGenerateCSVrow(scoreMatrix = NIsample, validMatrix = matrix(1, :
task 1 failed - "NULL value passed as symbol address"Chinese error: "NULL value cannot be used as a symbolic address" expression
Error in English: "NULL value passed as symbol address"
How to combine foreach and Rcpp? There are the following solutions:
Solution 1: C++ function in R package
As Patrick McCarthy suggested, put the C++ functions in a package, installed and loaded the package, and passed it to the parameter of the parallel operation function forearch. packs=("...")
The premise of this method is to encapsulate the C++ function in an R package. If it is a C++ function in someone else's package, you can use this method directly. If it is a C++ function written by yourself, this method is too complicated. The reason for the complexity is that it needs to be packaged into an R function, which is a bit troublesome.
Solution 2: The C++ function is local and used through Rcpp::sourceCpp("fun_name.cpp")
There is no need to store the C++ function in an R package, just add two lines of statements to the foreach function
- library(Rcpp) #Load the Rcpp package, because the sourceCpp() function is a function in Rcpp
- sourceCpp("fun_name.cpp") #C++ function stored in
Note: To use the C++ function in ParLapply, you can also use the above method to put the above statement in the fun of the parLapply(, fun) function, which is equivalent to allowing each node to load this C++ function.
The library(Rcpp) in the loop body of the foreach() function can be replaced with foreach(..., .packs="Rcpp").
Here is an example:
cl = makeCluster(n_cores, outfile="")
registerDoParallel(cl)
foreach(n = 1:N,.packages = "Rcpp",.noexport = "<name of Rccp function>")%dopar%{
source("Scripts/Rccp_functions.R")
### do stuff with functions scripted in Rccp_functions.R
}
stopImplicitCluster()
Solution 3: Write the C++ function in the current script
Add parameter .noexport = c(<Functions that were implemented in C++>) in foreach function
Possible cause: C++ functions are imported into parallel from the global environment, but, since they are not normal functions, they don't actually work. This does mean that these C++ functions have to be loaded individually on each node; in my case, this was a SNOW clusterCall() call, which fetched various files, including the C++ code.
Can refer to:
worker.init <- function() {
library(inline)
sigFunc <- signature(x="numeric", size_x="numeric")
code <- ' double tot =0;
for(int k = 0; k < INTEGER(size_x)[0]; k++){
tot += REAL(x)[k];
};
return ScalarReal(tot);
'
assign('cFunc', cxxfunction(sigFunc, code), .GlobalEnv)
NULL
}
f1 <- function(){
x <- rnorm(100)
a <- cFunc(x=x, size_x=as.integer(length(x)))
return(a)
}
library(foreach)
library(doParallel)
cl <- makePSOCKcluster(3)
clusterCall(cl, worker.init)
registerDoParallel(cl)
res <- foreach(counter=1:100) %dopar% f1()
Digression: Why study foreach+Rcpp?
Obviously parLapply+Rcpp can be used, just use parLapply+Rcpp when executing parallel. However, sometimes, a certain loop in the loop body will be broken. At this time, I hope to let the program skip the broken times and continue to execute, so as to avoid that the previous running is abolished due to a certain loop replacement. , The time spent was wasted. And there is a good parameter in the foreach function .errorhandling = c("stop", "remove", "pass"), set .errorhanding="pass", so that when a cycle is broken for some reason, the program will automatically Skip this loop and continue running, eventually returning the values of all loops.
For an introduction to the foreach function, you can see: Using the foreach function