Python basic multi-process

Table of contents

1. What is multi-process

1.2 Status of the process

2. Process creation-multiprocessing

2.1 Process class syntax description

2.2 process pid

2.3 Pass parameters to the function specified by the child process

2.4 Do not share global variables between processes

3. Synchronization between processes - Queue

3.1 Queue class syntax description

3.2 Use of Queue

3.3 Queue instance

4. Synchronization between processes - Lock

5. Process pool Pool

5.2 Pool instance

5.3 Queue in the process pool

6. Process and thread comparison

6.1 Function

6.2 Differences

6.3 Advantages and disadvantages


1. What is multi-process

When a program runs, the code + the resources used are called processes, which are the basic unit for resource allocation by the operating system. Not only can multitasking be done through threads, but processes can too.

1.2 Status of the process

During work, the number of tasks is often greater than the number of CPU cores, that is, some tasks must be executing while others are waiting for the CPU to execute, resulting in different states.

ready state

executive state

wait state

2. Process creation-multiprocessing

2.1 Process class syntax description

The multiprocessing module generates a process by creating a Process object and then calling its start() method. Process is the same as threading.ThreadAPI

Syntax format: multiprocessing.Process(group=None,target=None.name=None,args=(),kwargs={},*,daemon=None)

Parameter Description:

group: Specifies the process group, which is not used in most cases

target: If a function reference is passed, the child process can be considered to execute the code here

name: Set a name for the process, you can not set it

args: Pass parameters to the function specified by target, passing in tuples

kwargs: Pass named parameters to the function specified by target

The multprocessing.Process object has the following methods and properties:

run() # The specific execution method of the process

start() # start the child process instance

join[timeout] # If the optional parameter timeout is the default value None, it will block until the process calling the join() method terminates; if timeout is a positive number, it will block for timeout seconds at most

name # The alias of the current process, the default is Process-N, N is an integer increasing from 1

pid # the pid of the current process (process number)

is_alive() # Determine whether the process child process is still alive

daemon # process daemon flag, is a boolean value

authkey # authentication key for the process

sentinel # The numeric handle of the system object, which will become ready when the process ends

timeinate() # Regardless of whether the task is completed, terminate the child process immediately

kill() # same as timeimate(), but use SIGKILL signal on unix

close() # Close the Process object and release all resources associated with it

from multiprocessing import Process
import time


def run_proc():
    """子进程要执行的代码"""
    while True:
        print("---2---")
        time.sleep(1)


if __name__ == '__main__':
    p = Process(target=run_proc)
    p.start()
    while True:
        print("---1---")
        time.sleep(1)

2.2 process pid

from multiprocessing import Process
import os
import time


def run_proc():
    """子进程要执行的代码"""
    print("子进程运行中,pid=%d..." % os.getpid())
    print("子进程将要结束...")


if __name__ == '__main__':
    print("父进程pid:%d" % os.getpid())
    p = Process(target=run_proc)
    p.start()

2.3 Pass parameters to the function specified by the child process

from multiprocessing import Process
import os
from time import sleep


def run_proc(name, age, **kwargs):
    for i in range(10):
        print("子进程运行中,name=%s, age=%d, pid=%d..." % (name, age, os.getpid()))
        print(kwargs)
        sleep(0.2)


if __name__ == '__main__':
    p = Process(target=run_proc, args=('test', 18), kwargs={"m": 20})
    p.start()
    sleep(1)
    p.terminate()
    p.join()

2.4 Do not share global variables between processes

from multiprocessing import Process
import os
import time


nums = [11, 22]


def work1():
    """子进程要执行的代码"""
    print("in process1 pid=%d, num=%s" % (os.getpid(), nums))
    for i in range(3):
        nums.append(i)
        time.sleep(1)
        print("in process1 pid=%d, nums=%s" % (os.getpid(), nums))


def work2():
    """子进程要执行的代码"""
    print("in process2 pid=%d, nums=%s" % (os.getpid(), nums))


if __name__ == '__main__':
    p1 = Process(target=work1)
    p1.start()
    p1.join()

    p2 = Process(target=work2)
    p2.start()

3. Synchronization between processes - Queue

Processes sometimes need to communicate, and the operating system provides many mechanisms to achieve inter-process communication

3.1 Queue class syntax description

q=Queue() Initializes the Queue object. If the maximum number of messages that can be received is not specified in the brackets, or the number is negative, it means that there is no upper limit for the number of acceptable messages (until the end of memory)

Queue.qsize() returns the number of messages the current queue contains

Queue.empty() returns True if the queue is empty, otherwise False

Queue.full() returns True if the queue is full, otherwise False

Queue.get([block[,timeout]]) Get a message in the queue and remove it from the queue. The default value of block is True. 1. If the block uses the default value, and timeout (in seconds) is not set, if the message queue is empty, the program will be blocked (stopped in the reading state) until the message is read from the message queue. If timeout is set , it will wait for timeout seconds, if no message has been read, a Queue.Empty exception will be thrown 2. If the block value is False, if the message queue is empty, a Queue.Empty exception will be thrown immediately

Queue.get_nowait() is equivalent to Queue.get(False)

Queue.put(item,[block[,timeout]]) Write the item message to the queue, and the default value of block is True. 1. If the block uses the default value and timeout is not set, if the message queue has no space to write, the program will be blocked (stay in the writing state) until there is room for the message queue. If timeout is set, it will Wait for timeout seconds, if there is no space, throw Queue.Full exception 2. If the block value is False, if there is no space to write in the message queue, an exception will be thrown immediately

Queue.put_nowait(item) is equivalent to Queue.put(item,False)

3.2 Use of Queue

You can use the Queue of the multprocessing module to implement data transfer between multiple processes. Queue itself is a message queue program.

from multiprocessing import Queue


q = Queue(3)  # 初识一个Queue对象,最多可接收三条put消息
q.put("消息1")
q.put("消息2")
print(q.full())
q.put("消息3")
print(q.full())


try:
    q.put("消息4", True, 2)
except:
    print("消息队列已满,现在消息数量:%s" % q.qsize())

try:
    q.put_nowait("消息4")
except:
    print("消息队列已满,现在消息数量:%s"% q.qsize())

# 推荐的方式,先判断消息队列是否已满,再写入
if not q.full():
    q.put_nowait("消息4")

# 读取消息时,先判断消息队列是否为空,再读取
if not q.empty():
    for i in range(q.qsize()):
        print(q.get_nowait())

3.3 Queue instance

from multiprocessing import Process, Queue
import os, time, random


# 写数据进程执行的代码
def write(q):
    for value in ['A', 'B', 'C']:
        print("Put %s to queue..." % value)
        q.put(value)
        time.sleep(random.random())

# 读数据进程执行的代码
def read(q):
    while True:
        if not q.empty():
            value = q.get(True)
            print("Get %s form queue..." % value)
            time.sleep(random.random())
        else:
            break


if __name__ == '__main__':
    # 父进程创建Queue,并传给各个子进程
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    # 启动子进程pw,写入
    pw.start()
    # 等待pw结束
    pw.join()
    # 启动子进程pr,读取
    pr.start()
    # pr进程里是死循环,无法等待其结束,只能强行终止
    pr.join()
    print('')
    print("所有数据都写入并且读完")

4. Synchronization between processes - Lock

Locks are used to ensure data consistency, such as read-write locks, each process adds 1 to a variable, but if one process reads but has not yet written, another process also reads and writes the variable at the same time. value, the last written value is wrong, and at this time, locks are required to maintain data consistency.

By using Lock to control a piece of code can only be executed by one process at a time. The two methods of the Lock object, acquire() is used to acquire the lock, and release() is used to release the lock. When a process calls acquire(), if the state of the lock is unlocked and returns, the process has acquired the lock. If the state of the lock is locked, the process calling acquire() is blocked.

1.Lock syntax description

lock = multiprocessing.Lock(): Create a lock

lock.acquire(): acquire a lock

lock.release(): Release the lock

with lock: automatically acquire and release locks, similar to with open() as f

2. When the program is not locked:

import multiprocessing
import time


def add(num, value):
    print("add{0}:num={1}".format(value, num))
    for i in range(0, 2):
        num += value
        print("add{0}:num={1}".format(value, num))
        time.sleep(1)
        

if __name__ == '__main__':
    lock = multiprocessing.Lock()
    num = 0
    p1 = multiprocessing.Process(target=add, args=(num, 1))
    p2 = multiprocessing.Process(target=add, args=(num, 2))
    p1.start()
    p2.start()

3. When the program is locked

import multiprocessing
import time


def add(num, value, lock):
    try:
        lock.acquire()
        print("add{0}:num{1}".format(value, num))
        for i in range(0, 2):
            num += value
            print("add{0}:num={1}".format(value, num))
            time.sleep(1)
    except Exception as err:
        raise err
    finally:
        lock.release()


if __name__ == '__main__':
    lock = multiprocessing.Lock()
    num = 0
    p1 = multiprocessing.Process(target=add, args=(num, 1, lock))
    p2 = multiprocessing.Process(target=add, args=(num, 2, lock))
    p1.start()
    p2.start()

Only when one of the processes is completed, the other processes will execute, and whoever grabs the lock first will execute first

5. Process pool Pool

When the number of sub-processes to be created is not large, you can directly use the Process in multiprocessing to dynamically generate multiple processes, but if there are hundreds or even thousands of targets, the workload of manually creating processes is huge, and multiprocessing can be used at this time The Pool method provided by the module.

multiprocessing.pool.Pool([processes[,initializer[,initargs[,maxtasksperchild[,context]]]]])

Parameter Description:

processes: the number of worker processes, if processes is None, use the value returned by os.cpu_count()

initializer: If initializer is not None, each worker process will call initializer(*initargs) at startup

maxtasksperchild: The number of tasks a worker process can complete before it exits or is replaced by a new worker process, in order to release unused resources

context: used to specify the context of the started worker process

There are two ways to submit tasks to the process pool:

apply(func[,args[,kwds]]): blocking method

apply_async(func[args[,kwds]]): non-blocking method. Call function func in non-blocking mode (parallel execution, blocking mode must wait for the previous process to exit before executing the next process, args is the parameter list passed to func, kwds is the keyword parameter list passed to func)

Multiprocessing.Pool common functions:

close() closes the Pool so that it no longer accepts new tasks

terminate() Terminate immediately regardless of whether the task is completed

join() The main process is blocked, waiting for the child process to exit, must be used after close or terminate

5.2 Pool instance

When initializing the Pool, you can specify a maximum number of processes. When a new request is submitted to the Pool, if the pool is not full, a new process will be created to execute the request, but if the number of processes in the pool has reached The specified maximum value, then the request will wait until a process in the pool ends, and then the previous process will be used to execute the new task.

from multiprocessing import Pool
import os, time, random


def worker(msg):
    t_start = time.time()
    print("%s开始执行,进程号为%d" % (msg, os.getpid()))
    # 生成0-1之间浮点数
    time.sleep(random.random()*2)
    t_stop = time.time()
    print(msg, "执行完毕,耗时%0.2f" % (t_stop-t_start))


if __name__ == '__main__':
    po = Pool(3)  # 定义一个进程池,最大进程数为3
    for i in range(0, 10):
        # Pool().apply_async(要调用的目标,(传递给目标的参数元组,))
        # 每次循环将会用空闲下来的子进程取调用目标
        po.apply_async(worker, (i,))

    print("-----start-----")
    po.close()  # 关闭进程池,关闭后po不再接收新的请求
    po.join()  # 等待po中所有子进程执行完成,必须放在close语句之后
    print("-----stop-----")

"""
from multiprocessing import Pool
import os, time, random


def worker(a):
    t_start = time.time
    print('%s开始执行,进程号为%d' % (a, os.getpid))
    time.sleep(random.random() * 2)
    t_stop = time.time
    print(a, "执行完成,耗时%0.2f" % (t_stop - t_start))


if __name__ == '__main__':
    po = Pool(3)  # 定义一个进程池
    for i in range(0, 10):
        po.apply_async(worker, (i,))  # 向进程池中添加worker的任务print("--start--")
        po.close()
        po.join()
        print("--end--")
"""

5.3 Queue in the process pool

If you want to use Pool to create a process, you need to use Queue() in multiprocessing.Manager() instead of multiprocessing.Queue(), otherwise you will get an error message: RuntimeError: Queue object should only be shared between processes through inheritance

from multiprocessing import Manager, Pool
import os,time,random


def reader(q):
    print("reader启动(%s),父进程为(%s)" % (os.getpid(), os.getpid()))
    for i in range(q.qsize()):
        print("reader从Queue获取到消息:%s" % q.get(True))

def writer(q):
    print("writer启动(%s),父进程(%s)" % (os.getpid(), os.getpid()))
    for i in "itcast":
        q.put(i)


if __name__ == "__main__":
    print("(%s)start" % os.getpid())
    q = Manager().Queue()
    po = Pool()
    po.apply_async(writer, (q,))

    time.sleep(1)

    po.apply_async(reader, (q,))
    po.close()
    po.join()
    print("(%s) End" % os.getpid())

6. Process and thread comparison

6.1 Function

Process: able to complete multitasking, such as running multiple QQs on one computer at the same time

Thread: able to complete multitasking, such as multiple chat windows in one QQ

The definition is different: a thread is an independent unit for resource allocation and scheduling by the system

A thread is an entity of a process and the basic unit of CPU scheduling and allocation. It is a basic unit that is smaller than a process and can run independently. The thread itself basically does not own system resources, but only a few resources that are essential during operation. (such as the program counter, a set of registers, and the stack) but it can share all resources owned by the process with other threads of the same process

6.2 Differences

A program has at least one process, and a process has at least one thread

The division scale of threads is smaller than that of processes (resources are less than processes), making multi-threaded programs highly concurrency

The process has an independent memory unit during execution, and multiple threads share the memory, which greatly improves the operating efficiency of the program

Threads cannot be executed independently and must depend on the process

The process can be understood as a pipeline in the factory, and the threads in it are the workers in the pipeline

6.3 Advantages and disadvantages

Thread: thread execution overhead is small, but it is not conducive to resource management and protection

Process: Process execution overhead is high, but it is good for resource management and protection

Guess you like

Origin blog.csdn.net/xiao__dashen/article/details/125435084