Julia Parallel Computing Notes (2)

(Under continuous revision, the latest update was on August 10, 2020.)

Four, one of remote calls

The previous section talked about Julia's task-level parallelism, and this section talked about process-level parallelism. Coroutines can only be parallelized on a single computer, and processes can be parallelized on multiple computers. Before everything starts, first of all still need using Distributedto addprocs(n) or julia -p nto open multiple workers. Again, Worker refers specifically to remote processes.

Remote call refers to starting a function or expression in Worker (remote process) through the main process. For functions, use to remotecall()implement remote calls:

remotecall(函数, Worker的PID, 函数参数)

For example: call a rand()function on the main process to create a 2x3 random array, the expression is rand(2,3). If you want to do this on Worker with PID=4, it should be written as:

remotecall(rand,4,2,3)

After the remote call, the result will not be returned to the local immediately, you need to use the fetch()extraction result, for example:

julia> r = remotecall(rand,4,2,3)
Future(4, 1, 5, nothing)

julia> fetch(r)
2×3 Array{Float64,2}:
 0.466937  0.761268  0.975553
 0.754082  0.025674  0.824383

Note that the fetch()task parallel here is different from the previous task, and the data on the Worker will be removed. If the result cannot be extracted, it will block until there is a result. fetch()The result of the extraction will be cached on the main process, specifically in an object of type Future inside r.

We want to specifically talk about Future objects. Future(远程PID, 本地PID, Future ID, 远程结果)It is an object that stores remote call information and will return immediately when the remote call is made, but at this time only the last item is included nothing. When the fetch()result is retrieved from the remote, the result will be filled nothingin the location. In fact, we can create a Future object ourselves, for example:

julia> Future(100)
Future(100, 1, 34, nothing)

The remote PID of this Future is 100, the local PID is 1, and its own number is 34 (indicating that it is the 34th Future you created). However, this Future does not belong to any remote call, so there will be no data to fill the nothingposition.

Objects that store remote call information like Future are called "remote references" (so remote references are an object rather than an operation). Another remote reference is the cross-process version RemoteChannelof the previous section Channel, used to exchange data between processes. In other words, it Channelis an inter-coroutine pipeline and RemoteChannelan inter-process pipeline.

Now look back fetch(). After extracting the result, you can reuse it on the main process fetch(r)to use the result multiple times, or simply assign the result to a new object:

julia> result = fetch(r);

julia> result
2×3 Array{Float64,2}:
 0.466937  0.761268  0.975553
 0.754082  0.025674  0.824383

Naturally, there is another one-step technique, that is remotecall_fetch(), the above example can be abbreviated as:

result = remotecall_fetch(rand,4,2,3)

Likewise, it will block until the result is successfully extracted. Since the Future object consumes time, if you do not need to extract the result, you can use a Future object remote_do()instead remotecall(). But there is a detail here: when multiple remote calls are remotecall()executed, they will be executed sequentially, but remote_do()they are out of order.

For expressions (emphasis, the function assigned to parameters is an expression), we can use macro commands @spawnatto implement remote calls. E.g:

julia> s = @spawnat 4 rand(2,3)
Future(4, 1, 7, nothing)

julia> fetch(s)
2×3 Array{Float64,2}:
 0.375975  0.844135  0.257647
 0.057513  0.169291  0.0544206

Of course, you can also write:

julia> fetch(@spawnat 4 rand(2,3))
2×3 Array{Float64,2}:
 0.57577   0.500889  0.228997
 0.268749  0.295895  0.0822172

Make it a little more complicated:

julia> fetch(@spawnat 3 (1).+fetch(s))
2×3 Array{Float64,2}:
 1.37598  1.84413  1.25765
 1.05751  1.16929  1.05442

The expression here (1).+fetch(s)means to fetch(s)add 1 one by one. Note that it should be written (1).+fetch(s)as not in the book 1 .+fetch(s), otherwise an error will be reported. It seems to be one of the changes in version 1.1.

Another macro command @spawndoes not need to specify PID, which is much more convenient to use. So generally use it instead of @spawnator remotecall(). E.g:

julia> fetch(@spawn (1).+fetch(@spawn rand(2,3)))
2×3 Array{Float64,2}:
 1.14194  1.57693  1.90071
 1.88392  1.31092  1.51812

Tips : Use myid()to query the PID of the current process. If it is called directly, it will return 1; if it is called remotely, it will return the PID of the remote process.

But it cannot be replaced remote_do(), because the latter does not return results. In addition, if the Worker is known to be free, assigning the PID will be slightly faster (unfortunately in most cases we don't know which Workers are free).

Macro commands also have "one-step" skills! We can fetch(@spawn 表达式)abbreviate as @fetch 表达式, and fetch(@spawnat PID 表达式)abbreviate as @fetchfrom PID 表达式. E.g:

julia> @fetch rand(2,3)
2×3 Array{Float64,2}:
 0.0622937  0.93881   0.471734
 0.576323   0.621816  0.713404

Similar to the previous "one step in place", since the call and extraction are combined into one command, the main process will block and wait for the worker to return the result before continuing. This calling method is called "synchronous calling". If the call and extraction are separated, the main process can do other things after the call, until the Worker informs it, and then extract the result. This method is called "asynchronous call".

Tips: Asynchronous call means you call your friend to eat. Your friend says you know. When you are finished, you will find you and you will do something else. Synchronous call means you call your friend to eat, and your friend is busy, you just wait there, and when your friend is finished, you go together.

Now we look at a set of examples to deepen our understanding of the nature of synchronization:

# 例1
julia> @time @spawn sleep(3)
  0.000288 seconds (112 allocations: 5.891 KiB)
Future(4, 1, 24, nothing)

# 例2
julia> @time @sync @spawn sleep(3)
  3.014166 seconds (2.96 k allocations: 173.319 KiB)
Future(2, 1, 25, nothing)

# 例3
julia> @time @sync @fetch rand(2,3)
  0.015145 seconds (152 allocations: 7.703 KiB)
2×3 Array{Float64,2}:
 0.665148  0.607531  0.563096
 0.506471  0.748635  0.588137

@timeIt is a macro command for timing. As you can see, in Example 1, the Future object is returned immediately after the call, but the Worker has not yet been executed yet. Example 2 adds a @syncmacro command, which will force the Future object to return after the Worker is executed. Example 3 shows that it @synccan also work on @fetch. In fact, @syncit may be applied to @spawn, @spawnat, @fetch, @async, @distributed(hereinafter described), but not for asynchronous operations such as remotecall()and remote_do(). And @fecthfromas previously described is necessarily synchronous calls.

"Synchronization" is a very important concept. Let's look at a set of examples to understand how to use @syncmulti-process synchronization:

# 开辟4个进程。
julia> addprocs(3); procs()
4-element Array{Int64,1}:
 1
 2
 3
 4
 
# 示例1
julia> @time for pid in procs()
           @spawnat pid (sleep(pid); println(myid()))
       end
  0.060870 seconds (35.07 k allocations: 1.694 MiB)


# 示例2
julia> @time fetch(@sync (
           for pid in procs()
               @spawnat pid (sleep(pid); println(myid()))
           end
       ))
1
      From worker 2:    2
      From worker 3:    3
      From worker 4:    4
  4.066450 seconds (35.37 k allocations: 1.705 MiB)

# 示例3
julia> @time for pid in procs()
           fetch(@spawnat pid (sleep(pid); println(myid())))
       end
1
      From worker 2:    2
      From worker 3:    3
      From worker 4:    4
 10.113555 seconds (35.43 k allocations: 1.710 MiB)

The meaning of Example 1 is: initiate a set of actions on each process sleep(pid); println(myid()), and then time the entire cycle. Observing the output, it can be seen that the actions of each process have not been completed at the end of the entire cycle. Example 2 has been added to @syncensure that all process actions are completed before timing. The actions of each process are parallel, and the total time is approximately equal to that of the slowest process. Example 3 is done for each process fetch(@spawnat pid (sleep(pid); println(myid()))), which is equivalent to @fetchfrom pid (sleep(pid); println(myid()))taking 10 seconds to prove that each process is actually serial rather than parallel. This is because @fetchfromit contains a synchronization. Therefore, when we send parallel tasks to each process, do not rush to extract the results immediately, but first @sync, and then extract to the local process one by one. It seems a little troublesome, so something like DistributedArrays appeared. Let's introduce this package a little bit, and write a detailed explanation about the DistributedArrays package later.

The DistributedArrays package provides the DArray type, which is a distributed array, that is, the array is composed of sub-blocks stored in multiple processes. The purpose of creating DArray is to facilitate cross-process indexing. Simply put, when you use it @spawnatto modify the DArray remotely, you can omit the step of extracting it locally and directly access the modified elements in the DArray locally. For example, we create an array of type DArray d:

julia> d = dzeros(4,3)
4×3 DArray{Float64,2,Array{Float64,2}}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

Then use to localindicesget the index of the sub-array on process 2:

julia> @fetchfrom 2 localindices(d)
(1:2, 1:3)

It can be seen that the element d[1,1]is stored in process 2. When we want to modify it, we must operate in process 2:

julia> @spawnat 2 localpart(d)[1,1] = 666
Future(2, 1, 397, nothing)

Note that it must localpartbe used here and cannot be written directly d[1,1]. localpart(d)[1,1]Corresponds to in process 2 d[1,1]. The index localpart(d)[1,1]in the [1,1]sub-array is not the index of the entire array, just coincidentally.

From other processes (including the main process) can only read but not modify d[1,1], otherwise an error will be reported:

# 读取
julia> d
4×3 DArray{Float64,2,Array{Float64,2}}:
 666.0  0.0  0.0
   0.0  0.0  0.0
   0.0  0.0  0.0
   0.0  0.0  0.0

# 修改
julia> d[1,1] = 666
ERROR: setindex! not defined for DArray{Float64,2,Array{Float64,2}}
Stacktrace:
 [1] error(::String, ::Type) at ./error.jl:42
 [2] error_if_canonical_setindex(::IndexCartesian, ::DArray{Float64,2,Array{Float64,2}}, ::Int64, ::Int64) at ./abstractarray.jl:1084
 [3] setindex!(::DArray{Float64,2,Array{Float64,2}}, ::Int64, ::Int64, ::Int64) at ./abstractarray.jl:1073
 [4] top-level scope at REPL[40]:1

DistributedArrays reference material: link . But its Julia version is too old and some commands are outdated.

Finally, we discuss the issue of data transmission across processes. When an expression is called remotely, the parameters in the expression are automatically transferred from the main process to the Worker, for example:

julia> A = rand(1000,1000);

julia> @time @spawn A^2
  0.251465 seconds (496.78 k allocations: 24.047 MiB, 6.12% gc time)
Future(3, 1, 31, nothing)

The parameter transmitted to the Worker here is A. Write another way:

julia> @time @spawn rand(1000,1000)^2
  0.000250 seconds (123 allocations: 6.588 KiB)
Future(4, 1, 32, nothing)

The parameter transmitted here is 1000,1000obviously smaller than the data volume of A, so the Future object returns faster. Both ways of writing have their own advantages: if you think the cost of transferring A across processes is much smaller than the calculation of A, then choose the first one, otherwise choose the second one. The difference in writing will have a significant impact on Julia's operating efficiency.

Julia Parallel Computing Notes (2)

Four, one of remote calls

Guess you like