Julia parallel computing notes (1)

This article is the reading notes of Chapter 14 of "Julia Language Programming" (Wei Kun), adding a lot of self-test and official documents. The content is basically covered completely, but the flavor is better compared to the original. The full text is very long, divided into five parts, this is the first one.

One, process, thread

Process is the father of threads. A process is an execution instance of a program, which can obtain resources from the operating system and is the smallest unit of resource allocation. The process is the superior of the thread and can contain multiple threads, of which there must be a main thread. The thread is the subordinate of the process and obtains resources from the process. Threads are different execution paths of a process, like a river shunting along multiple branches, so multiple subordinate threads share the resources of the same superior process.

In terms of hardware, a single CPU core can only run one process. In order to meet the needs of executing multiple processes, the operating system will use the so-called "process scheduling" to create a "scheduling table" to arrange for multiple processes to execute alternately, which looks like they are executed simultaneously, which is called "pseudo-concurrency." So a single-core computer can also chat with QQ while writing Word, instead of waiting for the QQ process to end before opening Word. In order to further improve concurrency (improving concurrency helps to make full use of the CPU), modern operating systems have introduced the concept of threads, which is equivalent to decomposing a core into multiple "virtual cores", and each virtual core runs a "thread". After a process is decomposed into multiple threads, it can be distributed to multiple cores, not limited to one core. But note that this also requires the support of the CPU itself. Intel CPU supports this kind of decomposition, called hyper-threading technology, a physical core can be divided into two virtual cores. In the early years, AMD CPUs could not be broken down, so they had to stack more physical cores to achieve similar concurrent performance. However, the recent Ryzen series also has hyperthreading, which looks good.

In Julia, Distributedpackages provide parallelization and distributed functions. So before everything starts, please using Distributed.

What Julia currently provides is mainly process-level and coroutine-level (in the next section) parallel. The process is divided into "local process" and "remote process", marked with a PID number. There is one and only one local process, called the main process, PID=1. The remote process is called Worker. By default, the REPL opened with the julia command has only one main process. By addprocs(N)adding N Workers, their numbers are PID=2,3,4,...,N+1, or rmprocs(k)delete the kth Worker. Note that regardless of whether the Worker has been deleted, the newly added Worker will be numbered starting from N+2. You can also use the julia -p ncommand in the terminal to directly open the REPL with n Workers, and the -pparameters will automatically load the Distributedpackage.

You can use to procs()view the PIDs of all processes and the PIDs of workers()all workers. Use nprocs()and nworkers()view the number of processes and workers. Generally, there is always nprocs()=nworkers()+1. However, when there is no remote process, the main process will be regarded as a Worker, therefore nworkers()=nprocs()=1.

Julia also provides a test version of thread-level parallelism. There needs to be a "coordinator" between multiple threads, called "condition variables". Condition variables are global variables visible to all threads and can change state. It is equivalent to a public broadcast signal to notify each thread to wait or execute. Note that the condition variable will be modified by any thread, so conflicts may also occur, so it is usually used in combination with the so-called mutex. The mutex is equivalent to the "secretary" of the condition variable, responsible for receiving every thread that comes to visit, ensuring that only one thread touches the condition variable at a time. As for the principle of mutual exclusion, it is more esoteric.

Second, the coroutine

Coroutine is neither a process nor a thread, but it can be understood as a kind of "virtual, patchwork small thread". If the process is compared to the flow of cars, the thread is the car. Break a hundred cars into one thousand "small cars" and these "small cars" will operate more efficiently than the original vehicles. Such a "car" is called a coroutine. Generally speaking, coroutines are smaller than threads, have higher execution efficiency, and are easier to use, so they are also called "user-mode lightweight threads" or "green threads".

To sum up, process, thread, and coroutine are three levels of parallel mechanisms. Each parallel mechanism consumes additional time cost when switching, and the order of consumption is: process>thread>coroutine. We should choose the appropriate level of parallelism based on our needs. Currently Julia does not recommend using threads.

In Julia, the coroutine is called Task. There are two ways to create Task:

taskname=Task(f)
Task(f)-Use the construction method to encapsulate a function object f into a Task named taskname. At this time, it is required that f must be parameterless, that is, there are no parameters or all parameters have default values. If f has parameters, then f becomes an expression and Task()will be executed before being passed Task()in , so what is passed in is no longer a function object but the result of execution. To solve this problem, you can define one f1()=f(参数)and then taskname=Task(f1).
taskname=@task 表达式
@task-Use a macro command to encapsulate an expression into a Task named taskname. Pay attention to capitalization.

Once created, usable istaskstarted(taskname)and istaskdone(taskname)see whether Task start and end. Task has five states: runnable (can be started), waiting (blocking waiting), queued (being scheduled), done (execution ended), failed (execution ended abnormally). There is a scheduler inside Julia that is responsible for maintaining the task run queue. The user schedule(taskname)can add the task to the queue and start it, and then it will automatically return to the done state, indicating that it has been completed.

For expression, there is a "combined" macro commands: @async 表达式. It will create a Task and start it directly. E.g:

julia> using Distributed

julia> a=zeros(1,5)
1×5 Array{Float64,2}:
 0.0  0.0  0.0  0.0  0.0

julia> @async fill!(a,4)
Task (done) @0x00000000063059f0

julia> a
1×5 Array{Float64,2}:
 4.0  4.0  4.0  4.0  4.0

We can Task()use certain commands inside the passed function object f to force this Task to change its state, including:

sleep(N)  睡眠N秒
yield()  请求切换为其他task
yieldto(taskname)  请求切换为指定的task，一般不建议使用

Three, channel

Channel is a first-in, first-out queue with blocking characteristics, which is equivalent to a pipe with two openings. The declaration method is:

channelname=Channel{类型}(大小)

Here we call the objects placed in the Channel "elements". If the element type is not specified when the Channel is declared, it will default to any type.

After creating the Channel, use to put!(channelname,元素)put elements into the Channel, and use take!(channelname)to extract a top element from the Channel. take!()The element will be removed from the Channel. If you don't want to remove it, you can use it fetch()instead (note that there is no exclamation mark in fetch). When the Channel is full/empty, it put!()/take!()will block (that is, suspend execution). At this time put!()/take!(), the block can be released by extracting/putting elements . You can also use the for structure to traverse to extract the elements in the Channel, for example:

julia> c=Channel(2)
Channel{Any}(sz_max:2,sz_curr:0)
   
julia> put!(c,10)
10
    
julia> put!(c,11)
11
    
julia> for x in c
			print(x," ")
	   end
10 11

Here the for structure is essentially a series of automatic execution take!(), and take!()will remove elements as well. When the Channel becomes empty, the for structure also blocks.

Use close(channelname)may be forced to close the entrance of the Channel and the release of all put!()and take!()blocking. No more elements can be added at this time, but they can still be extracted until all are taken.

Channel is a common buffer that can be read and written by multiple Tasks concurrently and safely. The order of placing/extracting elements between multiple Tasks is arbitrary. For example, you can use a function or task to put 10 elements into a Channel, and then start 2 tasks to extract the elements in parallel. These two tasks will be alternately extracted in any order under the scheduler's arrangement until they are finished. The characteristic of this arbitrary order determines that each Task does not necessarily get exactly 5 elements, unless the total number of elements to be extracted is explicitly specified within the Task function.

In this example, we call the function or task that puts the element as the producer, and the task that extracts the element is called the consumer, which boils down to the so-called "producer/consumer problem". Channel is equivalent to a market. And in fact it not only provides one-way trading, because consumers can also be producers at the same time, producing and exchanging data with each other. So Channel is a convenient and safe data exchange area designed for parallelism.

Of course, the implicit problem here is how to ensure the proper order of putting and extracting. First of all, Channels are ordered, and Tasks can be designed according to the first-in first-out principle. The blocking feature makes the insertion and extraction automatically wait, and the appropriate sequence can be designed by using this. Second, wait()functions can help us adjust the order:

wait([x])

Block the current task until some event occurs, depending on the type of the argument:
  * [`Channel`](@ref): Wait for a value to be appended to the channel.
  * [`Condition`](@ref): Wait for [`notify`](@ref) on a condition.
  * `Process`: Wait for a process or process chain to exit. The `exitcode` field of a process can be used to determine success or failure.
  * [`Task`](@ref): Wait for a `Task` to finish. If the task fails with an exception, the exception is propagated (re-thrown in the task that called `wait`).
  * [`RawFD`](@ref): Wait for changes on a file descriptor (see the `FileWatching` package).
If no argument is passed, the task blocks for an undefined period. A task can only be restarted by an explicit call to [`schedule`](@ref) or [`yieldto`](@ref).
Often `wait` is called within a `while` loop to ensure a waited-for condition is met before proceeding.

wait(r::Future)

Wait for a value to become available for the specified [`Future`](@ref).

wait(r::RemoteChannel, args...)

Wait for a value to become available on the specified [`RemoteChannel`](@ref).

Finally, we introduce a "one-step" technique called "channel binding". It is divided into two steps:

Declare a producer function with a parameter (virtual parameter) and only Channel p().
Create a Channel with parameters (actual parameters) and only the above producer function p.

When the Channel is created in this way, p will be encapsulated as a Task and started, and when the Task is finished, the Channel will be automatically closed. In other words, the entire process of production, placement, and closure is simplified, and the market where the goods are placed is directly provided to consumers. E.g:

julia> function producer(c::Channel)
       put!(c,"start")
       for n=1:4
           put!(c,n)
       end
       put!(c,"stop")
       end;

julia> for x in Channel(producer)
           println(x)
       end
start
1
2
3
4
stop

It can be seen that there is put!()no blocking in the producer , because the Channel will automatically detect all the put operations inside the producer and divide the appropriate buffers, and automatically close after all the puts are completed. This mechanism of life cycle association is called "binding". You can also bind(channelname,taskname)bind the created Channel and Task. At this time, the Channel can be of any size (it will be automatically adjusted when binding), and the Task function must still have and only have a virtual parameter of the Channel type. But it should be noted that if the Task encapsulates an expression, it is quoted, that is put!(c,元素), it must be in c=channelname. E.g:

julia> c0 = Channel(0)
Channel{Any}(sz_max:0,sz_curr:0)

julia> task = @async foreach(i->put!(c0,i),1:4)
Task (runnable) @0x0000000007cbd150

julia> bind(c0,task);

julia> for i in c0
       @show i
       end;
i = 1
i = 2
i = 3
i = 4

Any c0 modification above will report an error.

In particular, we can repeatedly use bind()to bind multiple producers to one channel.