Python multiprocessing uses Multiprocessing to run programs

Reference link:
multiprocessing official website
https://blog.csdn.net/cityzenoldwang/article/details/78584175 blogger finishing
https://blog.csdn.net/quqiuzhu/article/details/51156454 blogger finishing

Process class

The Process class is used to describe a process object. When creating a child process, you only need to pass in an execution function and the parameters of the function to complete the creation of the Process example.

  • The star() method starts the process,
  • The join() method implements synchronization between processes and waits for all processes to exit before executing the following code
  • close() is used to prevent excess processes from flooding into the process pool Pool and causing process blocking.
    multiprocessing.Process (group=None, target=None, name=None,args=(), kwargs={},daemon=None)
  • target is the function name, the function to be called
  • The parameters required by the args function are passed in in the form of tuple. When a parameter is passed in, add a sign, because (1) is not a tuple, and (1,) is a tuple

Assign a process instance to the function to create a single process:

import multiprocessing as mp


def job(a, b):
    print(a+b)


if __name__ == '__main__':  # 必须将进程过程放到main函数中去
    p1 = mp.Process(target=job, args=(1, 2))  
    p1.start()
    p1.join()

Create multiple processes:

import multiprocessing
import os


def run_proc(name):
    print('Child process {0} {1} Running '.format(name, os.getpid()))


if __name__ == '__main__':
    print('Parent process {0} is Running'.format(os.getpid()))
    for i in range(5):
        p = multiprocessing.Process(target=run_proc, args=(str(i),))
        print('process start')
        p.start()
    p.join()
    print('Process close')

operation result:

Parent process 6296 is Running
process start
process start
process start
process start
process start
Child process 0 9428 Running 
Child process 1 8444 Running 
Child process 2 7852 Running 
Child process 3 6540 Running 
Child process 4 14472 Running 
Process close

If p.join is removed, the result is:

Parent process 8712 is Running
process start
process start
process start
process start
process start
Child process 0 10516 Running 
Process close
Child process 1 3172 Running 
Child process 2 10748 Running 
Child process 3 11636 Running 
Child process 4 5484 Running 

Queue

Use Queue to store process output for communication between multiple processes.
The function of Queue is to put the operation result of each core or thread in the queue, wait until each thread or core has finished running, and then remove the result from the queue and continue. Load operation. The reason is very simple. Functions called by multiple processes cannot have a return value (cannot return), so Queue is used to store the results of operations performed by multiple processes.
put method: insert data into the queue.
get method: reads and removes an element from the queue.

import multiprocessing as mp


def job(q):
    for i in range(10):
        q.put(i)  # 存放到队列中


if __name__ == '__main__':
    q = mp.Queue()  # 创建队列
    # args里一个参数时要加,号,否则会报错参数是不可迭代的
    p1 = mp.Process(target=job, args=(q,))
    p2 = mp.Process(target=job, args=(q,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    res1 = q.get()  # 获取队列中的值
    res2 = q.get()
    res3 = q.get()
    print(res1)  # 0
    print(res2)  # 1
    print(res3)  # 2

Pool process pool

The process pool is that we put the things we want to run into the process pool, and Python will solve the problem of multi-process by itself. The default size of the Pool is the number of CPU cores. We can also customize the number of cores required by passing in the processes parameter in the Pool. After defining the process pool, you can make the process pool correspond to a certain function, and return the function value by passing data to the process pool. The difference between Pool and the previous Process is that the function passed to Pool has a return value, while Process has no return value.
map method: use map() to get the result. In map(), you need to put the function and the value that needs to be iterated, and then it will be automatically assigned to the CPU core and return the result.
apply_async method : Only one value can be passed in apply_async(), it will only be put into one core for operation, but when passing in the value, pay attention to the tuple type, so you need to add a comma after passing in the value, and you need to use get( ) method to get the return value. If you want to achieve the effect of map(), you need to make the apply_async method in the form of a list.
The join method must be added to the process pool at the end, so that the process pool will only proceed downwards after it has finished running.
code show as below.

import multiprocessing as mp


def job(x):
    return x * x


def multicore():
    pool = mp.Pool(processes=mp.cpu_count() - 1)
    res = pool.map(job, range(10))
    print(res)
    res = pool.apply_async(job, (2,))
    print(res.get())
    multi_res = [pool.apply_async(job, (i,)) for i in range(10)]
    pool.close()  # 关闭进程池,其他进程无法加入
    pool.join()  # 等待所有进程执行完毕,调用前必须调用close方法
    print([res.get() for res in multi_res])


if __name__ == '__main__':
        multicore()

operation result:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
4
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Comparison of multiprocessing and multi-threading:

test program:

import multiprocessing as mp
import threading as td
import time


MAX = 10000000


def job(q):
    res = 0
    for i in range(MAX):
        res += i+i**2+i**3
    q.put(res)


def multicore():
    q = mp.Queue()
    p1 = mp.Process(target=job, args=(q,))
    p2 = mp.Process(target=job, args=(q,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    res1 = q.get()
    res2 = q.get()
    print('multicore:', res1+res2)


def normal():
    res = 0
    for _ in range(2):
        for i in range(MAX):
            res += i+i**2+i**3
    print('normal:', res)

def multithread():
    q = mp.Queue()
    t1 = td.Thread(target=job, args=(q,))
    t2 = td.Thread(target=job, args=(q,))
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    res1 = q.get()
    res2 = q.get()
    print('multithreading:', res1+res2)


if __name__ == '__main__':
    st = time.time()
    normal()
    st1 = time.time()
    print('normal time:', st1 - st)
    multithread()
    st2 = time.time()
    print('multithreading time:', st2 - st1)
    multicore()
    print('multicore time:', time.time()-st2)

operation result:

normal time: 23.027679920196533
multithreading: 4999999666666716666660000000
multithreading time: 24.3942768573761
multicore: 4999999666666716666660000000
multicore time: 19.363178968429565

From the above results, the time of multi-process is smaller than that of multi-thread and normal program, and the time of multi-thread is almost the same as normal time. The reason is that the Python interpreter has a global interpreter lock (GIL), which causes each Python process to run at most one thread at the same time. Therefore, Python multi-threaded programs cannot improve program performance and cannot take advantage of multiple CPU cores. GIL This article can To understanding. But multi-process programs can be unaffected. Python 2.6 introduces multiprocessing to solve the multi-process problem, and the multiprocessing API almost replicates the threading API.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325372317&siteId=291194637