Python process and thread multi-process

Python study notes, special records, share with you, I hope it will be helpful to everyone.

multi-Progress

To make Python programs multiprocessing (multiprocessing), we first understand the relevant knowledge of the operating system.

The Unix/Linux operating system provides a fork() system call, which is very special. Ordinary function call, call once, return once, but fork() call once, return twice, because the operating system automatically copies the current process (called the parent process) (called the child process), and then, respectively Return within the parent process and the child process.

The child process always returns 0, and the parent process returns the ID of the child process. The reason for this is that a parent process can fork many child processes. Therefore, the parent process must record the ID of each child process, and the child process only needs to call getppid() to get the parent process ID.

Python's os module encapsulates common system calls, including fork, which can easily create sub-processes in Python programs:

import os

print 'Process (%s) start...' % os.getpid()

pid = os.fork()
if pid == 0:
    print 'I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid())
else:
    print 'I (%s) just created a child process (%s).' % (os.getpid(), pid)

The results are as follows:

Process (45321) start...
I (45321) just created a child process (45322).
I am child process (45322) and my parent is 45321.

Process finished with exit code 0

Since Windows does not have a fork call, the above code cannot run on Windows. Since the Mac system is based on the BSD (a type of Unix) kernel, there is no problem running under the Mac. It is recommended that you learn Python with a Mac!

With the fork call, a process can copy a child process to handle the new task when it receives a new task. The common Apache server is that the parent process listens to the port, and every time there is a new http request, it forks the child process. To handle new http requests.

multiprocessing

If you plan to write a multi-process service program, Unix/Linux is undoubtedly the right choice. Since Windows does not have a fork call, is it impossible to write multi-process programs in Python on Windows?

Since Python is cross-platform, it should naturally also provide a cross-platform multi-process support. The multiprocessing module is a cross-platform version of the multi-process module.

The multiprocessing module provides a Process class to represent a process object. The following example demonstrates starting a child process and waiting for its end:

from multiprocessing import Process
import os

# 子线程要执行的代码
def run_proc(name):
    print 'Run child process %s (%s)...' % (name, os.getpid())

if __name__=='__main__':
    print 'Parent process %s.' % os.getpid()
    p = Process(target=run_proc, args=('test',))
    print 'Child process will start.'
    p.start()
    p.join()
    print 'Child process end.'

The execution results are as follows:

Parent process 45409.
Child process will start.
Run child process test (45410)...
Child process end.

Process finished with exit code 0

When creating a child process, you only need to pass in an execution function and function parameters, create a Process instance, and start it with the start() method, so that creating a process is easier than fork().

The join() method can wait for the end of the child process before continuing to run down. It is usually used for synchronization between processes.

Pool

If you want to start a large number of child processes, you can use the process pool method to create child processes in batches:

from multiprocessing import Pool
import os, time, random

def long_time_task(name):
    print 'Run task %s (%s)...' % (name, os.getpid())
    start = time.time()
    time.sleep(random.random() * 3)
    end = time.time()
    print 'Task %s runs %0.2f seconds.' % (name, (end - start))

if __name__=='__main__':
    print 'Parent process %s.' % os.getpid()
    p = Pool(4)
    for i in range(5):
        p.apply_async(long_time_task, args=(i,))
    print 'Waiting for all subprocesses done...'
    p.close()
    p.join()
    print 'All subprocesses done.'

The execution results are as follows:

Parent process 45546.
Waiting for all subprocesses done...
Run task 0 (45547)...
Run task 1 (45548)...
Run task 2 (45549)...
Run task 3 (45550)...
Task 0 runs 0.51 seconds.
Run task 4 (45547)...
Task 2 runs 0.72 seconds.
Task 4 runs 0.38 seconds.
Task 1 runs 2.15 seconds.
Task 3 runs 2.58 seconds.
All subprocesses done.

Process finished with exit code 0

Code interpretation:

Calling the join() method on the Pool object will wait for all child processes to complete. You must call close() before calling join(). After calling close(), you cannot continue to add new Processes.

Please pay attention to the output result. Task 0, 1, 2, 3 are executed immediately, and task 4 is executed after a previous task is completed. This is because the default pool size is 4 on my computer. Therefore, Up to 4 processes can be executed at the same time. This is an intentional design limitation of the Pool, not a limitation of the operating system. If you change it to:

p = Pool(5)

You can run 5 processes at the same time.

Since the default pool size is the number of CPU cores, if you unfortunately have an 8-core CPU, you have to submit at least 9 child processes to see the above waiting effect.

Child process

In many cases, the child process is not itself, but an external process. After we create the child process, we also need to control the input and output of the child process.

The subprocess module allows us to start a subprocess very conveniently and then control its input and output.

The following example demonstrates how to run the command nslookup www.python.org in Python code , which has the same effect as running directly from the command line:

import subprocess

print '$ nslookup www.python.org'
r = subprocess.call(['nslookup', 'www.python.org'])
print 'Exit code:', r

operation result:

$ nslookup www.python.org
Server:		192.168.111.1
Address:	192.168.111.1#53

Non-authoritative answer:
Name:	www.python.org
Address: 151.101.108.223

Exit code: 0

Process finished with exit code 0

If the child process still needs input, you can input it through the communicate() method:

import subprocess

print('$ nslookup')
p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = p.communicate(b'set q=mx\npython.org\nexit\n')
print(output.decode('utf-8'))
print('Exit code:', p.returncode)

The above code is equivalent to executing the command nslookup on the command line, and then manually typing:

set q=mx
python.org
exit

The results are as follows:

$ nslookup
Server:		192.168.19.4
Address:	192.168.19.4#53

Non-authoritative answer:
python.org	mail exchanger = 50 mail.python.org.

Authoritative answers can be found from:
mail.python.org	internet address = 82.94.164.166
mail.python.org	has AAAA address 2001:888:2000:d::a6


Exit code: 0

Process finished with exit code 0

Inter-process communication

There must be communication between processes, and the operating system provides many mechanisms to realize the communication between processes. Python's multiprocessing module wraps the underlying mechanism and provides multiple ways to exchange data such as Queue and Pipes.

Let's take Queue as an example to create two child processes in the parent process, one to write data to the Queue, and one to read data from the Queue:

from multiprocessing import Process, Queue
import os, time, random

# 写数据进程执行的代码:
def write(q):
    print 'Process to write: %s' % os.getpid()
    for value in ['A', 'B', 'C']:
        print 'Put %s to queue...' % value 
        q.put(value)
        time.sleep(random.random())

# 读数据进程执行的代码:
def read(q):
    print 'Process to read: %s' % os.getpid()
    while True:
        value = q.get(True)
        print 'Get %s from queue.' % value

if __name__=='__main__':
    # 父进程创建Queue，并传给各个子进程：
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    # 启动子进程pw，写入:
    pw.start()
    # 启动子进程pr，读取:
    pr.start()
    # 等待pw结束:
    pw.join()
    # pr进程里是死循环，无法等待其结束，只能强行终止:
    pr.terminate()

The results are as follows:

Process to write: 50563
Put A to queue...
Process to read: 50564
Get A from queue.
Put B to queue...
Get B from queue.
Put C to queue...
Get C from queue.

Process finished with exit code 0

Under Unix/Linux, the multiprocessing module encapsulates the fork() call, so that we don't need to pay attention to the details of fork(). Since Windows does not have a fork call, multiprocessing needs to "simulate" the effect of fork. All Python objects in the parent process must be serialized through pickle and then passed to the child process. All, if the call of multiprocessing fails under Windows, consider first Did pickle fail?