Elegantly use multi-process in Python: process pool Pool, pipeline communication Pipe, queue communication Queue, shared memory Manager Value

The multiprocessing library multiprocessing that comes with Python can realize multiprocessing. I want to use these short examples to demonstrate how to use multithreading elegantly. On the Chinese network, some people just translated the multi-process documentation of the old version of Python's official website. In this article, I will talk about the content of the bold part below.

  • Create a process Process, fork directly inherits resources, so initialization is faster, spawn only inherits necessary resources, so it saves memory, "program entry" if name == main
  • Process pool Pool, Pool can only accept one parameter, but there is a way to pass in multiple
  • Pipeline communication Pipe, the most basic function, runs fast
  • Queue communication Queue, with the most commonly used functions, runs a little slower
  • Shared memory Manager Value, new feature of Python3.9 real shared memory shared_memory

As shown below, some articles about Python multi-process on the Chinese network did not cover many important things (after all, they only translated the old version of the multi-process document on the Python official website). They didn't talk about the bold part above, but this is what you need to know when you are long.

1. The difference between multithreading and multiprocessing

Multi-threaded threading: A person has two things to do, chatting with the opposite sex and watching dramas. As a single-threaded person, she can chat after watching the drama, but in this way, no one will chat with her. We regard her as a CPU core and start multi-threading for her-watch a drama for a while, occasionally read new news, and switch back and forth between two things (threads). Multi-threading: A single CPU core can do several things at the same time, so that it will not be stuck waiting at a certain step.

Uses: crawling website information (crawler), waiting for multiple user input

Multi-process processing: A person has a lot of bricks to move. He receives gloves, pushes various materials (applies to the system for resources) and starts to move bricks. However, there are many people around him, and we let these people help him! (One-core is difficult, eight-core onlookers). So they made a division of labor, and the bricks were moved soon. Multi-process allows multiple CPU cores to work together, so that only one person works while others stand around.

Uses: for high-performance computing. Only when the multi-process scheme is properly designed can the calculation be accelerated.

2. Global lock and multi-process

Why is it so troublesome to use multiprocessing in Python? Because Python threads are operating system threads, there is a Python global interpreter lock. There is one main thread in a python interpreter process , and multiple execution threads of user programs . Even on a multi-core CPU platform, parallel execution of multiple threads is prohibited due to the existence of the GIL. ——From the Baidu Encyclopedia entry Global Interpreter Lock . development path:

  1. Python global lock. The GIL was updated for Python 3.2. When I was a child, due to the existence of the Python GIL (Global Interpreter Lock Global Interpreter Lock) , at this time Python could not implement multi-processing by itself
  2. External multi-process communication. Python3.5. In 2015, either use Python to call C language (such as Numpy, a third-party library that implements multi-processing in other languages ​​at the bottom), or need to use external code (MPI 2015)
  3. Built-in multi-process communication. It was not until Python 3.6 that multiprocessing gradually developed into a usable Python built-in multi-process library, which can communicate between processes and have limited memory sharing
  4. Shared memory. Python 3.8 added a new feature shared_memory in 2019

3. Subprocess Process

The main process of multi-process must be written in the program entry if name ==' main ': inside

def function1(id):  # 这里是子进程
    print(f'id {id}')

def run__process():  # 这里是主进程
    from multiprocessing import Process
    process = [mp.Process(target=function1, args=(1,)),
               mp.Process(target=function1, args=(2,)), ]
    [p.start() for p in process]  # 开启了两个进程
    [p.join() for p in process]   # 等待两个进程依次结束

# run__process()  # 主线程不建议写在 if外部。由于这里的例子很简单,你强行这么做可能不会报错
if __name__ =='__main__':
    run__process()  # 正确做法:主线程只能写在 if内部

Although in this simple example, there will be no error if the main process run__process() is written outside the program entry if. But you'd better do as I ask you. The detailed explanation is too long, I wrote → "Python program entry has important functions (multithreading) rather than programming habits"

The above example only uses Process to open multiple processes, and does not involve process communication. When I am going to orchestrate a serial task into multiple processes, I also need multi-process communication. The process pool Pool allows the main program to obtain the calculation results of sub-processes (not very flexible, suitable for simple tasks), and the pipeline Pipe queue Queue, etc. can allow communication between processes (flexible enough). Shared value Value Shared array Array Shared content shared_memory (a new feature of Python 3.6 and Python 3.9, not yet mature) Let's talk about it below.

Python multi-process can choose two ways to create a process, spawn and fork. Branch creation: fork will directly copy itself to the child process to run, and let the child process inherit the handles of all its resources, so the creation speed is very fast, but it takes up more memory resources. Split creation: spawn will only hand over the necessary resource handles to the child process, so the creation speed is slightly slower. See Stack OverFlow multiprocessing fork vs spawn for a detailed explanation . (spawn is a random translation by myself, if there is a better translation, please recommend it. I will never translate handle into a handle)

multiprocessing.set_start_method('spawn')  # default on WinOS or MacOS
multiprocessing.set_start_method('fork')   # default on Linux (UnixOS)

Please note: I said that fork is faster than spawn when initially creating multiple processes , not that high-performance computing will be faster. Usually high-performance computing needs to keep the program running for a long time, so in order to save memory and process safety, I recommend choosing spawn.

4. Process pool Pool

Almost all Python multiprocessing code requires you to call Process explicitly. The process pool Pool will automatically help us manage child processes. Python's Pool is inconvenient to pass in multiple parameters. Here I offer two solutions:

Idea 1: The function func2 needs to pass in multiple parameters, now change it to one parameter, whether you directly let args be a tuple, a dictionary dict, or a class

Idea 2: Use function.partial Passing multiple parameters to pool.map() function in Python . This inflexible method fixes other parameters and requires importing Python's built-in library, which I do not recommend

import time

def func2(args):  # multiple parameters (arguments)
    # x, y = args
    x = args[0]  # write in this way, easier to locate errors
    y = args[1]  # write in this way, easier to locate errors

    time.sleep(1)  # pretend it is a time-consuming operation
    return x - y


def run__pool():  # main process
    from multiprocessing import Pool

    cpu_worker_num = 3
    process_args = [(1, 1), (9, 9), (4, 4), (3, 3), ]

    print(f'| inputs:  {process_args}')
    start_time = time.time()
    with Pool(cpu_worker_num) as p:
        outputs = p.map(func2, process_args)
    print(f'| outputs: {outputs}    TimeUsed: {time.time() - start_time:.1f}    \n')

    '''Another way (I don't recommend)
    Using 'functions.partial'. See https://stackoverflow.com/a/25553970/9293137
    from functools import partial
    # from functools import partial
    # pool.map(partial(f, a, b), iterable)
    '''

if __name__ =='__main__':
    run__pool()

5. Pipeline

As the name suggests, the pipe Pipe has two ends, so main_conn, child_conn = Pipe(), the two ends of the pipe can be placed in the main process or child process, I did not find the difference between the main pipe main_conn and the sub pipe child_conn in the experiment. Both ends can put things in at the same time, and the objects put in have been deep copied: use conn.send() to put in one end, use conn.recv() to take out the other end, and both ends of the pipeline can be sent to multiple processes at the same time. conn is an abbreviation for connect.

import time

def func_pipe1(conn, p_id):
    print(p_id)

    time.sleep(0.1)
    conn.send(f'{p_id}_send1')
    print(p_id, 'send1')

    time.sleep(0.1)
    conn.send(f'{p_id}_send2')
    print(p_id, 'send2')

    time.sleep(0.1)
    rec = conn.recv()
    print(p_id, 'recv', rec)

    time.sleep(0.1)
    rec = conn.recv()
    print(p_id, 'recv', rec)


def func_pipe2(conn, p_id):
    print(p_id)

    time.sleep(0.1)
    conn.send(p_id)
    print(p_id, 'send')
    time.sleep(0.1)
    rec = conn.recv()
    print(p_id, 'recv', rec)


def run__pipe():
    from multiprocessing import Process, Pipe

    conn1, conn2 = Pipe()

    process = [Process(target=func_pipe1, args=(conn1, 'I1')),
               Process(target=func_pipe2, args=(conn2, 'I2')),
               Process(target=func_pipe2, args=(conn2, 'I3')), ]

    [p.start() for p in process]
    print('| Main', 'send')
    conn1.send(None)
    print('| Main', conn2.recv())
    [p.join() for p in process]

if __name__ =='__main__':
    run__pipe()

If you are looking for faster operation, it is better to use the pipeline Pipe instead of the queue Queue introduced below . For details, please move to Python pipes and queues performance

So yes, pipes are faster than queues - but only by 1.5 to 2 times, what did surprise me was that Python 3 is MUCH slower than Python 2 - most other tests I have done have been a bit up and down (as long as it is Python 3.4 - Python 3.2 seems to be a bit of a dog - especially for memory usage).

I once wrote a practical example using the Python multi-thread queue function↓, if you want to pursue the ultimate performance, you can also change the Queue inside to Pipe.

Pipe also has duplex parameters and poll() method that needs to be understood. By default, duplex==True, if the two-way pipeline is not enabled, the direction of data transmission can only be conn1 ← conn2. conn2.poll()==True means you can immediately use conn2.recv() to get the passed data. conn2.poll(n) will make it wait n seconds before polling.

from multiprocessing import Pipe

conn1, conn2 = Pipe(duplex=True)  # 开启双向管道,管道两端都能存取数据。默认开启
# 
conn1.send('A')
print(conn1.poll())  # 会print出 False,因为没有东西等待conn1去接收
print(conn2.poll())  # 会print出 True ,因为conn1 send 了个 'A' 等着conn2 去接收
print(conn2.recv(), conn2.poll(2))  # 会等待2秒钟再开始查询,然后print出 'A False'

Although my following example will not report an error, it is because it is too simple, does not actually open multi-threads to run, and is not written inside the if of the program entry. In many cases, Pipe will run faster, but its functions are too few, so Queue must be used. The most obvious difference is:

conn1, conn2 = multiprocessing.Pipe()  # 管道有两端,某一端放入的东西,只能在另一端拿到
queue = multiprocessing.Queue()        # 队列只有一个,放进去的东西可以在任何地方拿到。

6. Queue Queue

You can import queue to call Python's built-in queue, and there is also a queue from multiprocessing import Queue in multithreading. The following are all multi-threaded queues.

The function of the queue Queue is very similar to the previous pipeline Pipe: no matter the main process or the child process, the queue can be accessed, and the objects put in have been deep copied. The difference is: the pipeline Pipe has only two disconnections, while the queue Queue has basic queue properties and is more flexible. For details, please move to Stack Overflow Multiprocessing - Pipe vs Queue .

def func1(i):
    time.sleep(1)
    print(f'args {i}')

def run__queue():
    from multiprocessing import Process, Queue

    queue = Queue(maxsize=4)  # the following attribute can call in anywhere
    queue.put(True)
    queue.put([0, None, object])  # you can put deepcopy thing
    queue.qsize()  # the length of queue
    print(queue.get())  # First In First Out
    print(queue.get())  # First In First Out
    queue.qsize()  # the length of queue

    process = [Process(target=func1, args=(queue,)),
               Process(target=func1, args=(queue,)), ]
    [p.start() for p in process]
    [p.join() for p in process]

if __name__ =='__main__':
    run__queue()

In addition to the Python multithreading mentioned above , reading the video streams of multiple (Hikvision\Dahua) webcams , I wrote an open source reinforcement learning library: Xiaoya ElegantRL also uses Queue for multi-CPU and multi-GPU training. In order to speed up, I have changed Queue to Pipe.

7. Shared Memory Manager

In order to realize multi-process communication in Python, the Pipe Queue mentioned above deeply copies the information that needs to be communicated from the memory for other threads to use (the more threads that need to be distributed, the more memory it occupies). For shared memory, the interpreter will be responsible for maintaining a shared memory (without deep copying), which can be read by every process, and the management will be followed when reading and writing (so don’t think that using shared memory will definitely make it faster) .

Manager can create a shared memory area, but the data stored in it needs to be in a specific format, Value can store values, and Array can store arrays, as follows. It is not recommended for people who think they are weak in writing code to try. The following example comes from the Document on the Python official website .

# https://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing%20array#multiprocessing.Array

from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Value, Array
from ctypes import Structure, c_double

class Point(Structure):
    _fields_ = [('x', c_double), ('y', c_double)]

def modify(n, x, s, A):
    n.value **= 2
    x.value **= 2
    s.value = s.value.upper()
    for a in A:
        a.x **= 2
        a.y **= 2

if __name__ == '__main__':
    lock = Lock()

    n = Value('i', 7)
    x = Value(c_double, 1.0/3.0, lock=False)
    s = Array('c', b'hello world', lock=lock)
    A = Array(Point, [(1.875,-6.25), (-5.75,2.0), (2.375,9.5)], lock=lock)

    p = Process(target=modify, args=(n, x, s, A))
    p.start()
    p.join()

    print(n.value)
    print(x.value)
    print(s.value)
    print([(a.x, a.y) for a in A])

I deleted the shared_momery introduction of Python 3.8, there is a bug in this part

The following is from Stack Overflow, thuzhf's answer 2021-01 under the question Shared memory in multiprocessing :

For those interested in using Python3.8 's shared_memory module, it still has a bug which hasn’t been fixed and is affecting Python3.8/3.9/3.10 by now (2021-01-15). The bug is about resource tracker destroys shared memory segments when other processes should still have valid access. So take care if you use it in your code.

PyTorch also has its own multi-process torch.multiprocessing

How to share a list of tensors in PyTorch multiprocessing? rozyang's answer is very simple, the core code is as follows:

import torch.multiprocessing as mp
tensor.share_memory_()

The text is over, I put part of the multiprocessing code on github . I hope everyone can write a multithreading that satisfies them. When I design high-performance multi-processing, I will abide by the following rules:

  • Pass as little data as possible
  • Minimize the load on the main thread
  • Try not to let a process wait stupidly
  • Minimize the frequency of inter-process communication

The open source Deep Reinforcement Learning (DRL) algorithm Kuberkley 's Ray-project Rllib is fast to train, but too complicated, and OpenAI's SpinningUp is simple, but not fast. I just know a little bit more process, Numpy, deep learning framework, deep reinforcement learning and other two-layer optimization algorithms, so I think it is not difficult to write a DRL library, so I open source the reinforcement learning library: Xiaoya ElegantRL . Let others take a good look at it. The DRL library is quite simple. Why make something so complicated?

Although this library will always keep the framework small and the code elegant to facilitate people who get started with deep reinforcement learning, ElegantRL puts training efficiency first (because of this, ElegantRL and SpinningUp have different positioning), so I need to use Python's multi-process To speed up the training of DRL. So by the way, I wrote [Elegantly use multi-process in Python] this article.

Guess you like

Origin blog.csdn.net/sinat_39620217/article/details/131725390