Multiprocessing (multi-core computing)

Multi-core computing

table of Contents

1. What is Multiprocessing

2. Add Process

3. Storage process output Queue

4. Efficiency comparison threading & multiprocessing

5. Process Pool Pool

6. shared memory

7. Process lock Lock

1. What is Multiprocessing

Assign tasks to multiple cores for calculations. A single core has its own computing space and computing power. It is true that each part of the task is executed at the same time, achieving parallel operation instead of multi-threaded pseudo-parallel, making your multi-core computer Realize its true potential

Multiprocessing is similar to multithreading. They are both used for parallel operations in python. But since threading, why does Python have a multiprocessing? The reason is simple, it is used to make up for some of the disadvantages of threading, such as The GIL mentioned in the threading tutorial.
It is also very simple to use multiprocessing. If you have a certain understanding of threading, your enjoyment time is up. Because python uses multiprocessing and threading almost the same. This makes it easier for us Get started. It's easier to use the power of your computer's multi-core system!

2. Add Process

#导入线程进程标准模块 
import multiprocessing as mp
import threading as td

#定义一个被线程和进程调用的函数 
def job(a,d):
    print('aaaaa')

#创建线程和进程,只是定义线程或进程要做什么,传入的参数有什么,名字叫什么,但是还未开始工作。
t1 = td.Thread(target=job,args=(1,2))
p1 = mp.Process(target=job,args=(1,2))

#分别启动线程和进程,线程与进程开始工作
t1.start()
p1.start()

#分别连接线程和进程,线程和进程join作用一致
t1.join()
p1.join()

Note: The first letter of Thread and Process should be capitalized. The called function has no parentheses. If there is any, the function will be called directly without increasing the process or thread. The parameters of the called function are placed in args(...). As can be seen from the comparison code above, the use of threads and processes is similar

Multi-process application

Complete application code:

import multiprocessing as mp

def job(a,d):
    print('aaaaa')

if __name__=='__main__':
    p1 = mp.Process(target=job,args=(1,2))
    p1.start()
    p1.join()

If you want to use multi-process multprocess, it must be used under the main function, and it cannot be run directly, which will cause an error. This is a special format requirement. The operating environment in the mac should be in the terminal environment. It may appear that other editing tools will not print the results after the end of the run. It should be no problem to run directly under windows and linux. The printed results after running in the terminal are:

aaaaa

3. Storage process output Queue

The function of Queue is to put the calculation result of each core or thread in the queue, and then take the result from the queue after each thread or core runs, and continue to load the calculation. The reason is simple, the function called by multiple threads cannot have a return value, so Queue is used to store the results of multiple threads.

Put the result in the Queue to define a function called by multiple threads, q is like a queue, used to save the result of each function run

#该函数没有返回值!!!
def job(q):
    res=0
    for i in range(1000):
        res+=i+i**2+i**3
    q.put(res)    #queue
#主函数 定义一个多线程队列,用来存储结果
if __name__=='__main__':
    q = mp.Queue()
#定义两个线程函数,用来处理同一个任务, args的参数只要一个值的时候,参数后面需要加一个逗号,表示args是可迭代的,后面可能还有别的参数,不加逗号会出错

p1 = mp.Process(target=job,args=(q,))
p2 = mp.Process(target=job,args=(q,))
#分别启动、连接两个线程
p1.start()
p2.start()
p1.join()
p2.join()
#上面是分两批处理的,所以这里分两批输出,将结果分别保存

res1 = q.get()
res2 = q.get()
#打印最后的运算结果
print(res1+res2)

When running, you still have to be in the terminal, and the final running result is

499667166000

to sum up

We first define a multi-threaded queue and put this queue in a multi-process parameter. After running a process, we will put the result in the result queue. When all our processes are over, our The result queue is filled with the values ​​obtained by all the processes running, we can read the values ​​in the queue one by one to get the results obtained by each process, and then we can do further calculations on the results obtained , Here is the sum of output results

4. Efficiency comparison threading & multiprocessing

Let’s compare whether the same task is faster in multiple processes, faster in multiple threads, or not at all.

Create multiprocessing

As in the previous section, first import multiprocessing and define the job() to be implemented. At the same time, for easy comparison, we increase the number of calculations to 1000000

import multiprocessing as mp

def job(q):
    res = 0
    for i in range(1000000):
        res += i + i**2 + i**3
    q.put(res) # queue
    
#因为多进程是多核运算,所以我们将上节的多进程代码命名为multicore()
def multicore():
    q = mp.Queue()
    p1 = mp.Process(target=job, args=(q,))
    p2 = mp.Process(target=job, args=(q,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    res1 = q.get()
    res2 = q.get()
    print('multicore:',res1 + res2)

Create multithread

Next, create a multithreaded program. There are many similarities between creating multithreading and multiprocessing. First import threading and then define multithread() to accomplish the same task

import threading as td

def multithread():
    q = mp.Queue() # thread可放入process同样的queue中
    t1 = td.Thread(target=job, args=(q,))
    t2 = td.Thread(target=job, args=(q,))
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    res1 = q.get()
    res2 = q.get()
    print('multithread:', res1 + res2)

Create ordinary functions

Finally, we define the most common functions. Note that in the above example, we created two processes or threads, both of which performed two operations on job(), so in normal() we also let it loop twice

def normal():
    res = 0
    for _ in range(2):
        for i in range(1000000):
            res += i + i**2 + i**3
    print('normal:', res)

operation hours

Finally, in order to compare the running time of each function, we need to import time, and then run the defined functions in turn:

import time

if __name__ == '__main__':
    st = time.time()
    normal()
    st1 = time.time()
    print('normal time:', st1 - st)
    multithread()
    st2 = time.time()
    print('multithread time:', st2 - st1)
    multicore()
    print('multicore time:', time.time() - st2)

You're done, let's take a look at the actual operation comparison.
Comparative Results:

# range(1000000)
('normal:', 499999666667166666000000L)
('normal time:', 1.1306169033050537)
('thread:', 499999666667166666000000L)
('multithread time:', 1.3054230213165283)
('multicore:', 499999666667166666000000L)
('multicore time:', 0.646507978439331)

Result analysis:

The running time of normal/multi-thread/multi-process is 1.13, 1.3 and 0.64 seconds respectively. We found that multi-core/multi-process is the fastest, indicating that multiple tasks are running at the same time. The running time of multi-threading is actually slower than that of a program that does nothing, indicating that multi-threading still has certain shortcomings.

Let's double the number of operations ten times, and then take a look at the running time of the three methods:

“”"

range(10000000)

('normal:', 4999999666666716666660000000L)
('normal time:', 40.041773080825806)
('thread:', 4999999666666716666660000000L)
('multithread time:', 41.777158975601196)
('multicore:', 4999999666666716666660000000L)
('multicore time:', 22.4337899684906 )
"""
This time the running time is still multi-process <normal <multi-thread, so we can clearly see which method is more efficient.

5. Process Pool Pool

This time we talk about the process pool Pool. The process pool is what we want to run and put it in the pool. Python will solve the problem of multi-process by itself, such as how to allocate the tasks of the process, how to deal with the results and so on.

首先import multiprocessing和定义job()

import multiprocessing as mp

def job(x):
    return x*x
#进程池 Pool() 和 map() 
然后我们定义一个Pool
def multicore():
    pool = mp.Pool()
    res = pool.map(job, range(10))
    print(res)
    
if __name__ == '__main__':
    multicore()

With the process pool, we can match the value to the required function, and then throw the data into it, and then he will return the value returned by the process execution function, and the process we said before has td, The function that he executes the call does not have a return value. You can only put the output result of the call function of him in a queue, and then return the result value from the queue, but in the pool, what you create is called by the multi-process The function has a return value and is directly returned to res as the return value of the process result.

And how do we make the process pool perform the function we specify, we need to map the function job we call and the specified parameter map together, then my equation job, and then mine I want to give it a value of 0 to 9 list of.

After we have a pool, we can let the pool correspond to a certain function, we throw data into the pool, and the pool will return the value returned by the function. The difference between Pool and the previous Process is that the function thrown to the Pool has a return value, while the Process has no return value.

Next, use map() to get the result. You need to put the function and the value of the iterative operation in map(), and then it will be automatically assigned to the CPU core, and the result will be returned.
Let’s run it.

Running result:
python [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Operation analysis: We define a process pool, put an equation and the value to be calculated in it, the process pool will automatically assign tasks to each core, each cpu, each process to calculate, and the calculation result is regarded as the process end The return value is returned.

Custom core quantity

How do we know if the Pool really uses multiple cores? We can increase the number of iterations, and then turn on the CPU load to see the CPU operation

Open CPU Load (Mac): Activity Monitor> CPU> CPU Load (one click)

The default pool size is the number of CPU cores, that is, tasks are allocated to all our cores. We can also customize the number of cores needed by passing in the processes parameter in the Pool.

def multicore():
    pool = mp.Pool(processes=3) # 定义CPU核数量为3
    res = pool.map(job, range(10))
    print(res)
apply_async() 

In addition to map(), Pool also has a way to return results, that is, apply_async(), which combines functions and values. Apply_async() can only pass one value, and it will only be put into one core for operation. But pay attention to iterable when passing in the value, so you need to add a comma after the incoming value, and you need to use the get() method to get the return value

def multicore():
    pool = mp.Pool() 
    res = pool.map(job, range(10))
    print(res)
    res = pool.apply_async(job, (2,))
    # 用get获得结果
    print(res.get())

operation result;

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
4

apply_async() combined with multiple values

apply_async() can only be combined with a single value. If you want to combine with multiple values ​​like the map above, an error will be generated.

res = pool.apply_async(job, (2,3,4))

The result will be an error:
TypeError: job() takes exactly 1 argument (3 given)
that is, apply_async() can only input one set of parameters.

Use apply_async() to output multiple results
Then how to use apply_async() to output multiple iterations Here we put apply_async() into the iterator and define a new multi_res

multi_res = [pool.apply_async(job, (i,)) for i in range(10)]

It also needs to be taken out one by one when taking out the values

print([res.get() for res in multi_res])

Combine code:

def multicore():
    pool = mp.Pool() 
    res = pool.map(job, range(10))
    print(res)
    res = pool.apply_async(job, (2,))
    # 用get获得结果
    print(res.get())
    # 迭代器,i=0时apply一次,i=1时apply一次等等
    multi_res = [pool.apply_async(job, (i,)) for i in range(10)]
    # 从迭代器中取出
    print([res.get() for res in multi_res])

operation result

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81] # map()
4
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81] # multi_res
can It can be seen that the result obtained by using the iterator in apply is the same as the result obtained by using map.

to sum up

The default call of Pool is the number of CPU cores. You can customize the number of CPU cores by passing in the processes parameter to
map(). Put in many iteration parameters and automatically assign to multiple processes. Multiple cores perform operations and return multiple results.

apply_async() can only put a set of parameters, and only put this set of parameters into a core for operation, and return a result, if you want to get the effect of map(), you need to iterate

6. shared memory

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-eKppRrOo-1614353754983)(3626EFE4BC544B46BB9F55F5A9A4B60A)] The picture
shows a typical 4 and CPU, and its shared memory is his The position of middle box 1, in our normal operation and multi-thread operation, for example, if we define a global variable, then our global variables between different threads are interoperable and can be shared with each other, and even if they are different in multi-process If the global variable A is passed in to each cpu in the process, it will not work if the calculated A between different processes wants to communicate. At this time, the global variable cannot be shared. For example, the initial value of the global variable A 0, add 1 to the first core calculation, add 2 to the second core calculation, and then pass to the next core. This will not work. In order to communicate between the CPU cores, we need to use the shared memory method. In this section, we learn how to define shared memory to achieve communication between different processes. Shared memory is in the middle of multiple cores. Each core can ingest the content of shared memory for processing. After processing it first, let the next core process the data to achieve multiple checks on the same data. , Realize information sharing.

Shared Value

We can store it in a shared memory table by using Value data.

import multiprocessing as mp

value1 = mp.Value('i', 0) 
value2 = mp.Value('d', 3.14)

The d and i parameters are used to set the data type, d represents a double precision floating point type, and i represents a signed integer. For more forms, please see the table at the end of this page.

Shared Array

In Python's mutiprocessing, there is also an Array class that can interact with shared memory to share data between processes.

array = mp.Array('i', [1, 2, 3, 4])

The Array here is different from the array in numpy. It can only be one-dimensional, not multi-dimensional. Same as Value, the data format needs to be defined, otherwise an error will be reported. We will illustrate the use of these two methods in the next section.

Wrong form

array = mp.Array('i', [[1, 2], [3, 4]]) # 2维list

Operation result:
"""
TypeError: an integer is required
"""

Reference data format

Data type represented by each parameter

Type code C Type Python Type Minimum size in bytes
'b' signed char int 1
'B' unsigned char int 1
'u' Py_UNICODE Unicode character 2
'h' signed short int 2
'H' unsigned short int 2
'i' signed int int 2
'I' unsigned int int 2
'l' signed long int 4
'L' unsigned long int 4
'q' signed long long int 8
'Q' unsigned long long int 8
'f' float float 4
'd' double float 8

(Source: https://docs.python.org/3/library/array.html)

7. Process lock Lock

This section takes the application of lock in shared memory as an example

No process lock

Let's see what happens when we don't add a process lock.

import multiprocessing as mp
import time

def job(v, num):
    for _ in range(5):
       #暂停0.1秒,让输出效果更明显
        time.sleep(0.1) 
        # v.value获取共享变量值
        v.value += num 
        print(v.value, end="")
        
def multicore():
    # 定义共享变量
    v = mp.Value('i', 0) 
    p1 = mp.Process(target=job, args=(v,1))
    p2 = mp.Process(target=job, args=(v,3)) # #设定不同的number看不同的进程如何如何抢夺内存
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    
if __name__ == '__main__':
    multicore()

In the above code, we define a shared variable v, which can be operated on by both processes. In job(), we want v to output the result of accumulating num every 0.1 seconds, but different accumulative values ​​are set in the two processes p1 and p2. So let's see if there will be conflicts between these two processes.

Run it:

1
4
5
8
9
12
13
16
17
20

We can see that process 1 and process 2 are rushing to use shared memory v. , Each process will grab shared memory access and give it accumulated value. Sometimes the shared memory is used at the same time and the value is added at the same time to cause duplication. We can solve it by adding a process lock.

Add process lock

In order to solve the above-mentioned problem of different processes grabbing shared resources, we can solve the problem by adding process locks.

#在job()中设置进程锁的使用,保证运行时一个进程的对锁内内容的独占
def job(v, num, l):
    l.acquire() # 锁住
    for _ in range(5):
        time.sleep(0.1) 
        v.value += num # v.value获取共享内存
        print(v.value)
    l.release() # 释放
    
def multicore():
    # 定义一个进程锁
然后将进程锁的信息传入各个进程中
    l = mp.Lock() 
    v = mp.Value('i', 0) # 定义共享内存
    p1 = mp.Process(target=job, args=(v,1,l)) # 需要将lock传入
    p2 = mp.Process(target=job, args=(v,3,l)) 
    p1.start()
    p2.start()
    p1.join()
    p2.join()
if __name__ == '__main__':
    multicore()

Run it, let's see if there is still a situation of preempting resources:

1
2
3
4
5
8
11
14
17
20

Obviously, after the process lock is added, no other processes will grab the shared memory during the first accumulation process, thus ensuring the complete operation of the process p1, and then the process p2 is running, which is based on p1 The value is already calculated and then proceed to the calculation. The process lock ensures that the processes do not interfere with each other when there are multiple processes.

Guess you like

Origin blog.csdn.net/lockhou/article/details/114156553