Julia Parallel Computing Notes (3)

Five, remote call two

This section talks about three macro commands that are convenient for remote calling.

Macro command@everywhere

A special case of remote calls is to make the declaration/function/expression executed in all processes. For example: open 1 Worker, and then declare a variable on each process (main process + 1 Worker) betaand calculate an expression beta+1. Using the method described earlier, it can be written as:

julia> for pid = 1:2
           r = @fetchfrom pid (beta=0;beta+1)
           println(r)
       end
1
1

It is used println()to print the results, because the REPL defaults to print the results of the last line, and in this example, the last line is end, and no results are displayed. Special reminder: Use parentheses when calling multiple expressions remotely, otherwise only the last one will be called, resulting in betaundefined, like this:

julia> for pid = 1:2
           r = @fetchfrom pid beta=0;beta+1
           println(r)
       end
ERROR: UndefVarError: beta not defined

Can it be done in one step? Have! This is it @everywhere. Can be abbreviated as:

julia> @everywhere (beta=0;beta+1)

It is a broadcast of an expression, which will be executed on all processes. But it does not return a Future object, and the result is not visible, so it is generally used for broadcast declarations, and the subsequent expressions still use @fetchremote calls such as "single command" to return the result. Of course, we can take a trick, @everywherebroadcast some expressions that do not need to return results and are shared by all processes , and then only use the "single command" to get the results back for the expressions that must return the result at the end. It will be more concise to write this way.

Macro command@eval

@evalCommands are used to perform "value substitution" on expressions. The so-called "value substitution" here means to first evaluate the expression and then replace the position of the expression with the value. It is also possible to "value substitution" for only one variable in the expression, just add a $symbol before the variable . Since this command has a high priority, we can use it to do some sao operations, such as using local variables on Worker:

julia> beta = 0;

julia> @eval @everywhere $beta+1

Which takes effect earlier @evalthan @everywhere, betareplaces with 1, so @everywhereno error will be reported. To prove that the @evalpriority is indeed higher, you can try:

julia> p = 0;  @fetch @eval $p+1
1
# 或者
julia> p = 0;  @everywhere @eval $p+1

It can be seen that @evalthe front and back are the same.

For comparison, let's take a look at the error report:

julia>  kappa = 0; @everywhere kappa+1
ERROR: On worker 2:
UndefVarError: kappa not defined

Be careful not to use it here beta, because the above has already been betabroadcast, so there are already betadeclarations in each process .

Macro command@distributed

Let me talk about a macro command specifically for the for loop @distributed. Its usage is

@distributed (聚合函数) for var = range
					       表达式
					   end

If there are multiple expressions, the value of the last expression participates in the aggregation. The aggregation function is an optional parameter. If it is empty, it means no aggregation. E.g:

julia> s = @distributed (+) for i = 1:10
                               2*i
                               3*i
                            end
165

It can be seen that it aggregates the results of the last row. If you don't write an aggregate function, then the returned value is no longer a value. On a single computer, @distributedcoroutine-level parallelism will be used first, so a Task is returned, as follows:

julia> s = @distributed  for i = 1:10
                               2*i
                               3*i
                            end
Task (queued) @0x00000000079d99f0

Note that the Task is in the queue, because it returns immediately when it is created. Whether you write aggregate functions or not, Julia's scheduler will always automatically schedule it to run at the right time. We can check its execution at any time:

julia> istaskdone(s)
true

As for how to specify @distributedprocess-level parallelism, and whether process-level parallelism @distributedwill be used first on a computer cluster , the book is not clearly written, and it needs to be tested.

Six, remote reference

All the above-mentioned remote calls are based on cross-process data transfer, that is, the data of each process are isolated from each other, and cooperation is realized by means of remote references. A remote reference is an object, divided into Future objects and RemoteChannel objects. The former is referenced from a Worker to the main process, and the latter is created and stored on a Worker and is visible to all processes.

Give a chestnut to illustrate:

julia> c = Channel(2)
Channel{Any}(sz_max:2,sz_curr:0)

julia> @fetchfrom 2 put!(c,10)
10

julia> isready(c)
false

julia> @fetchfrom 2 isready(c)
true

This kind of remote reference is implemented using Future objects. As shown in the figure, when the main process c(red) is created as a remote call parameter, a copy (green) is created on the Worker, the expression is executed to get the result (blue), and the result is extracted and stored in the Future of the main process (So ​​this Future actually chas nothing to do with the local ). The operation on the Worker only changes the state of the copy without affecting the main process.
Insert picture description here
Note that the result @fetchfromof the expression put!(c,10)(blue) is extracted , so the result (blue) will be removed from the Worker, but cthe copy (green) as a parameter will not be removed, and it still exists on the Worker, so it can Continue to operate it on the remote.

If we want cto modify the Worker from the main process at any time during the operation of the Worker pair c, then the aforementioned Future-based remote references will not work. To this end, Julia provides one RemoteChannel. An example of the creation method is as follows:

julia> f = ()->Channel{Int}(10)
#47 (generic function with 1 method)

julia> r = RemoteChannel(f,2)
RemoteChannel{Channel{Int64}}(2, 1, 45)

The first step is to declare a function f(see the syntax of anonymous functions), which must return a Channel. The second step RemoteChannel(f, 2)is to create a RemoteChannel on the Worker with PID=2 and have fthe same attributes as the returned Channel. Of course, writing this way will leave a redundant main process f, so two sentences are written into one sentence in the book:

julia> r = RemoteChannel(()->Channel{Int}(10),2)
RemoteChannel{Channel{Int64}}(2, 1, 47)

rIt is the handle of this RemoteChannel, located in the main process. We can rmodify the RemoteChannel through operations , like a remote control drone. For example isready(r), put!(r,100)etc. also include extraction fetch(r)and take!(r). Moreover, we can pass it as a parameter to any Worker, and then operate there, for example:

julia> @fetchfrom 3 put!(r,100)
RemoteChannel{Channel{Int64}}(2, 1, 47)

julia> take!(r)
100

julia> isready(r)
false

Here we rput an element on the Worker with PID=3 , and then extract it in the main process, and finally check that it ris empty in the main process , and it can rbe seen that it is shared.

If no PID is specified when creating a RemoteChannel, it will be created on the main process by default. No matter where it is created, operations on the handle will cause data to pass between the "process of operation" and the "process of creation". If the data is large, this transfer will consume a lot of time. The concept of "shared array" is introduced below to solve this problem.

Guess you like

Origin blog.csdn.net/iamzhtr/article/details/91359782