foreword

The computers we use today have already entered the era of multi-CPU or multi-core, and the operating systems we use all support "multi-tasking" operating systems, which allows us to run multiple programs at the same time, or decompose a program into several relative tasks. Independent subtasks allow multiple subtasks to execute concurrently, thereby shortening the execution time of the program and allowing users to obtain a better experience. Therefore, no matter what programming language is used for development at the moment, it should be one of the necessary skills for programmers to realize that the program can perform multiple tasks at the same time, which is often called "concurrent programming". To this end, we need to discuss two concepts first, one is called process and the other is called thread.

1. Multi-process in Python
- 2. Multithreading in Python
- - Three, multi-process or multi-thread

1. Multi-process in Python

The fork() system call is provided on Unix and Linux operating systems to create processes. The parent process calls the fork() function, and the child process is created. The child process is a copy of the parent process, but the child process has its own PID . The fork() function is very special. It will return twice. In the parent process, the PID of the child process can be obtained through the return value of the fork() function, while the return value in the child process is always 0. Python's os module provides the fork() function. Since the Windows system does not have a fork() call, to achieve cross-platform multi-process programming, you can use the Process class of the multiprocessing module to create child processes, and this module also provides more advanced encapsulation, such as the process pool for batch startup processes ( Pool), queues (Queue) and pipes (Pipe) for inter-process communication, etc.

Let's use an example of downloading files to illustrate the difference between using multi-process and not using multi-process. Let's take a look at the following code:

from random import randint
from time import time, sleep


def download_task(filename):
    print('开始下载%s...' % filename)
    time_to_download = randint(5, 10)
    sleep(time_to_download)
    print('%s下载完成! 耗费了%d秒' % (filename, time_to_download))


def main():
    start = time()
    download_task('Python从入门到大神.pdf')
    download_task('Peking Hot.avi')
    end = time()
    print('总共耗费了%.2f秒.' % (end - start))


if __name__ == '__main__':
    main()

The following is the result of running the program.

开始下载Python从入门到大神.pdf...
Python从入门到大神.pdf下载完成! 耗费了6秒
开始下载Peking Hot.avi...
Peking Hot.avi下载完成! 耗费了7秒
总共耗费了13.01秒.

As can be seen from the above example, if the code in the program can only be executed bit by bit in order, then even if two unrelated download tasks are executed, it is necessary to wait for the completion of one file download before starting the next download task, which is obviously unreasonable and inefficient. Next, we use the multi-process method to put the two download tasks into different processes, the code is as follows.

from multiprocessing import Process
from os import getpid
from random import randint
from time import time, sleep


def download_task(filename):
    print('启动下载进程，进程号[%d].' % getpid())
    print('开始下载%s...' % filename)
    time_to_download = randint(5, 10)
    sleep(time_to_download)
    print('%s下载完成! 耗费了%d秒' % (filename, time_to_download))


def main():
    start = time()
    p1 = Process(target=download_task, args=('Python从入门到住院.pdf', ))
    p1.start()
    p2 = Process(target=download_task, args=('Peking Hot.avi', ))
    p2.start()
    p1.join()
    p2.join()
    end = time()
    print('总共耗费了%.2f秒.' % (end - start))


if __name__ == '__main__':
    main()

In the above code, we created a process object through the Process class. We pass in a function through the target parameter to represent the code to be executed after the process starts. The following args is a tuple, which represents the parameters passed to the function. The start method of the Process object is used to start the process, and the join method means to wait for the end of the process execution. Running the above code can obviously find that the two download tasks are started "simultaneously", and the execution time of the program will be greatly shortened, no longer the sum of the time of the two tasks. The following is the result of one execution of the program.

启动下载进程，进程号[1530].
开始下载Python从入门到大神.pdf...
启动下载进程，进程号[1531].
开始下载Peking Hot.avi...
Peking Hot.avi下载完成! 耗费了7秒
Python从入门到住院.pdf下载完成! 耗费了10秒
总共耗费了10.01秒.

We can also use the classes and functions in the subprocess module to create and start subprocesses, and then communicate with the subprocesses through pipelines. We will not explain these contents here, and interested readers can understand these knowledge by themselves. Next we will focus on how to implement communication between two processes. We start two processes, one outputs Ping and the other outputs Pong, and the output of Ping and Pong from the two processes adds up to 10. It sounds simple, but it would be wrong to write it like this.

from multiprocessing import Process
from time import sleep

counter = 0


def sub_task(string):
    global counter
    while counter < 10:
        print(string, end='', flush=True)
        counter += 1
        sleep(0.01)

        
def main():
    Process(target=sub_task, args=('Ping', )).start()
    Process(target=sub_task, args=('Pong', )).start()


if __name__ == '__main__':
    main()

It seems that there is nothing wrong with it, but the final result is that Ping and Pong each output 10, Why? When we create a process in a program, the child process copies the parent process and all its data structures, and each child process has its own independent memory space, which means that each of the two child processes has a counter variable, so the result It is also conceivable. The simpler way to solve this problem is to use the Queue class in the multiprocessing module. It is a queue that can be shared by multiple processes. The bottom layer is implemented through the pipeline and semaphore mechanism. Interested readers can try it by themselves one time.

2. Multithreading in Python

In the early version of Python, the thread module (now named _thread) was introduced to realize multi-thread programming. However, this module is too low-level, and many functions are not provided. Therefore, we recommend using the threading module for current multi-thread development. Modules provide a better object-oriented encapsulation for multithreaded programming. Let's implement the example of downloading files just now in a multi-threaded way.

from random import randint
from threading import Thread
from time import time, sleep


def download(filename):
    print('开始下载%s...' % filename)
    time_to_download = randint(5, 10)
    sleep(time_to_download)
    print('%s下载完成! 耗费了%d秒' % (filename, time_to_download))


def main():
    start = time()
    t1 = Thread(target=download, args=('Python从入门到住院.pdf',))
    t1.start()
    t2 = Thread(target=download, args=('Peking Hot.avi',))
    t2.start()
    t1.join()
    t2.join()
    end = time()
    print('总共耗费了%.3f秒' % (end - start))


if __name__ == '__main__':
    main()

We can directly use the Thread class of the threading module to create threads, but we have talked about a very important concept called "inheritance". We can create new classes from existing classes, so we can also create them by inheriting the Thread class. A custom thread class, and then create a thread object and start the thread. The code is shown below.

from random import randint
from threading import Thread
from time import time, sleep


class DownloadTask(Thread):

    def __init__(self, filename):
        super().__init__()
        self._filename = filename

    def run(self):
        print('开始下载%s...' % self._filename)
        time_to_download = randint(5, 10)
        sleep(time_to_download)
        print('%s下载完成! 耗费了%d秒' % (self._filename, time_to_download))


def main():
    start = time()
    t1 = DownloadTask('Python从入门到住院.pdf')
    t1.start()
    t2 = DownloadTask('Peking Hot.avi')
    t2.start()
    t1.join()
    t2.join()
    end = time()
    print('总共耗费了%.2f秒.' % (end - start))


if __name__ == '__main__':
    main()

Because multiple threads can share the memory space of a process, it is relatively simple to realize communication between multiple threads. The most direct way you can think of is to set a global variable, and multiple threads can share this global variable. But when multiple threads share the same variable (we usually call it "resource"), it is very likely to produce uncontrollable results that cause the program to fail or even crash. If a resource is used by multiple threads in competition, then we usually call it a "critical resource". Access to a "critical resource" needs to be protected, otherwise the resource will be in a "chaos" state. The following example demonstrates the scenario where 100 threads transfer money to the same bank account (transfer 1 yuan). In this example, the bank account is a critical resource. Without protection, we are likely to get wrong result.

from time import sleep
from threading import Thread


class Account(object):

    def __init__(self):
        self._balance = 0

    def deposit(self, money):
        # 计算存款后的余额
        new_balance = self._balance + money
        # 模拟受理存款业务需要0.01秒的时间
        sleep(0.01)
        # 修改账户余额
        self._balance = new_balance

    @property
    def balance(self):
        return self._balance


class AddMoneyThread(Thread):

    def __init__(self, account, money):
        super().__init__()
        self._account = account
        self._money = money

    def run(self):
        self._account.deposit(self._money)


def main():
    account = Account()
    threads = []
    # 创建100个存款的线程向同一个账户中存钱
    for _ in range(100):
        t = AddMoneyThread(account, 1)
        threads.append(t)
        t.start()
    # 等所有存款的线程都执行完毕
    for t in threads:
        t.join()
    print('账户余额为: ￥%d元' % account.balance)


if __name__ == '__main__':
    main()

Running the above program, the result is surprising. 100 threads transfer 1 yuan to the account respectively, and the result is far less than 100 yuan. The reason for this is that we have not protected the "critical resource" of the bank account. When multiple threads deposit money into the account at the same time, they will execute the line of code new_balance = self._balance + money together. Multiple threads The obtained account balances are all 0 in the initial state, so the +1 operation is performed on all 0, so the wrong result is obtained. In this case, "lock" can come in handy. We can protect "critical resources" through "locks". Only threads that obtain "locks" can access "critical resources", while other threads that do not obtain "locks" can only be blocked until the threads that obtain "locks" are released. Only when the "lock" is obtained, other threads have the opportunity to obtain the "lock" and then access the protected "critical resource". The following code demonstrates how to use a "lock" to secure operations on a bank account to obtain correct results.

from time import sleep
from threading import Thread, Lock


class Account(object):

    def __init__(self):
        self._balance = 0
        self._lock = Lock()

    def deposit(self, money):
        # 先获取锁才能执行后续的代码
        self._lock.acquire()
        try:
            new_balance = self._balance + money
            sleep(0.01)
            self._balance = new_balance
        finally:
            # 在finally中执行释放锁的操作保证正常异常锁都能释放
            self._lock.release()

    @property
    def balance(self):
        return self._balance


class AddMoneyThread(Thread):

    def __init__(self, account, money):
        super().__init__()
        self._account = account
        self._money = money

    def run(self):
        self._account.deposit(self._money)


def main():
    account = Account()
    threads = []
    for _ in range(100):
        t = AddMoneyThread(account, 1)
        threads.append(t)
        t.start()
    for t in threads:
        t.join()
    print('账户余额为: ￥%d元' % account.balance)


if __name__ == '__main__':
    main()

It is a pity that Python's multi-threading cannot take advantage of the multi-core characteristics of the CPU. This can be confirmed by starting a few threads that execute endless loops. The reason for this is that the Python interpreter has a "global interpreter lock" (GIL). Any thread must obtain the GIL lock before execution, and then the interpreter will automatically release the GIL lock every time 100 bytecodes are executed. , Let other threads have a chance to execute, this is a historical problem, but even so, like the example we gave before, the use of multi-threading is still positive in terms of improving execution efficiency and improving user experience.

Three, multi-process or multi-thread

Whether it is multi-process or multi-thread, as long as the number is large, the efficiency will definitely not improve. Why? Let's use an analogy. Suppose you are unfortunately preparing for the senior high school entrance examination. Every night, you need to do homework in five subjects: Chinese, Mathematics, English, Physics, and Chemistry. Each homework takes 1 hour. If you spend 1 hour doing Chinese homework first, and then spend 1 hour doing math homework after finishing, in this way, you finish all of them in turn, and it takes a total of 5 hours. This method is called a single-task model. If you plan to switch to a multi-tasking model, you can do Chinese for 1 minute, then switch to math homework, do 1 minute, then switch to English, and so on, as long as the switching speed is fast enough, this method can be executed with a single-core CPU Multitasking is the same thing, from an outsider's point of view, you are doing 5 homework at the same time.

However, there is a price for switching homework. For example, when switching from Chinese to mathematics, you must first clean up the Chinese books and pens on the table (this is called saving the scene), and then open the math textbook and find out the compass and ruler (this is called preparing for the new environment). ), to start doing math homework. The same is true when the operating system switches processes or threads. It needs to save the current execution environment (CPU register state, memory page, etc.), and then prepare the execution environment for the new task (restore the last register state, switch memory pages, etc.) to start execution. Although this switching process is fast, it also takes time. If there are thousands of tasks running at the same time, the operating system may be mainly busy switching tasks, and there is not much time to perform tasks. The most common situation in this situation is that the hard disk is beeping, there is no response when clicking the window, and the system is in a state of suspended animation. Therefore, once the multitasking reaches a limit, the system performance will drop sharply, and eventually all tasks will not be done well.

The second consideration of whether to use multitasking is the type of task, which can be divided into calculation-intensive and I/O-intensive. Computation-intensive tasks are characterized by a large number of calculations that consume CPU resources, such as video encoding and decoding or format conversion, etc. This kind of task depends entirely on the computing power of the CPU. More, the more time spent on task switching, the lower the efficiency of the CPU to perform tasks. Since computing-intensive tasks mainly consume CPU resources, the execution efficiency of such tasks is usually very low in a scripting language such as Python. The C language is the most competent for this type of task. We mentioned earlier that Python has embedded C/C++ codes. Mechanisms.

In addition to computing-intensive tasks, other tasks involving network and storage medium I/O can be regarded as I/O-intensive tasks. The characteristics of this type of task are that the CPU consumption is very small, and most of the time of the task is waiting for I/O. The /O operation completes (because the speed of I/O is much slower than the speed of CPU and memory). For I/O-intensive tasks, if you start multitasking, you can reduce the I/O waiting time so that the CPU can run efficiently. There is a large class of tasks that are I/O intensive, including network applications and web applications, which we will cover shortly.

Four, single thread + asynchronous I/O

The most important improvement of modern operating systems to I/O operations is to support asynchronous I/O. If you make full use of the asynchronous I/O support provided by the operating system, you can use a single-process single-thread model to perform multitasking. This new model is called an event-driven model. Nginx is a web server that supports asynchronous I/O. It uses a single-process model on a single-core CPU to efficiently support multitasking. On a multi-core CPU, you can run multiple processes (the number is the same as the number of CPU cores), making full use of the multi-core CPU. Server-side programs developed with Node.js also use this working mode, which is also a popular solution for concurrent programming.

In the Python language, the programming model of single thread + asynchronous I/O is called coroutine. With the support of coroutine, efficient multi-tasking programs can be written based on event-driven. The biggest advantage of coroutines is extremely high execution efficiency, because subroutine switching is not thread switching, but controlled by the program itself, so there is no thread switching overhead. The second advantage of the coroutine is that it does not require a multi-threaded lock mechanism, because there is only one thread, and there is no conflict of writing variables at the same time. In the coroutine, there is no need to lock the shared resources, and only need to judge the state, so execute The efficiency is much higher than multithreading. If you want to make full use of the multi-core feature of the CPU, the easiest way is multi-process + coroutine, which not only makes full use of multi-core, but also gives full play to the high efficiency of the coroutine, and can obtain extremely high performance. The content of this aspect will be explained in subsequent courses.

5. Case

1. Put time-consuming tasks into threads for better user experience.

In the interface shown below, there are two buttons "Download" and "About". It takes 10 seconds to download files online by simulating clicking the "Download" button in a sleep mode. If we don't use "multithreading", we will It is found that when the "Download" button is clicked, other parts of the entire program are blocked by this time-consuming task and cannot be executed. This is obviously a very bad user experience. The code is as follows.

import time
import tkinter
import tkinter.messagebox


def download():
    # 模拟下载任务需要花费10秒钟时间
    time.sleep(10)
    tkinter.messagebox.showinfo('提示', '下载完成!')


def show_about():
    tkinter.messagebox.showinfo('关于', '作者: 霸哥')


def main():
    top = tkinter.Tk()
    top.title('单线程')
    top.geometry('200x150')
    top.wm_attributes('-topmost', True)

    panel = tkinter.Frame(top)
    button1 = tkinter.Button(panel, text='下载', command=download)
    button1.pack(side='left')
    button2 = tkinter.Button(panel, text='关于', command=show_about)
    button2.pack(side='right')
    panel.pack(side='bottom')

    tkinter.mainloop()


if __name__ == '__main__':
    main()

If multi-threading is used to execute time-consuming tasks in an independent thread, the main thread will not be blocked due to the execution of time-consuming tasks. The modified code is as follows.

import time
import tkinter
import tkinter.messagebox
from threading import Thread


def main():

    class DownloadTaskHandler(Thread):

        def run(self):
            time.sleep(10)
            tkinter.messagebox.showinfo('提示', '下载完成!')
            # 启用下载按钮
            button1.config(state=tkinter.NORMAL)

    def download():
        # 禁用下载按钮
        button1.config(state=tkinter.DISABLED)
        # 通过daemon参数将线程设置为守护线程(主程序退出就不再保留执行)
        # 在线程中处理耗时间的下载任务
        DownloadTaskHandler(daemon=True).start()

    def show_about():
        tkinter.messagebox.showinfo('关于', '作者: 霸哥')

    top = tkinter.Tk()
    top.title('单线程')
    top.geometry('200x150')
    top.wm_attributes('-topmost', 1)

    panel = tkinter.Frame(top)
    button1 = tkinter.Button(panel, text='下载', command=download)
    button1.pack(side='left')
    button2 = tkinter.Button(panel, text='关于', command=show_about)
    button2.pack(side='right')
    panel.pack(side='bottom')

    tkinter.mainloop()


if __name__ == '__main__':
    main()

2. Use multiple processes to "divide and conquer" complex tasks

Let's complete the computationally intensive task of summing 1 to 100000000. This problem itself is very simple and can be solved with a little knowledge of loops. The code is as follows.

from time import time


def main():
    total = 0
    number_list = [x for x in range(1, 100000001)]
    start = time()
    for number in number_list:
        total += number
    print(total)
    end = time()
    print('Execution time: %.3fs' % (end - start))


if __name__ == '__main__':
    main()

In the above code, I deliberately created a list container first and then filled it with 100000000 numbers. This step is actually time-consuming, so for the sake of fairness, when we decompose this task into 8 processes to execute When , we don't consider the time spent on list slicing operations for the time being, but just count the time spent on doing operations and merging operation results. The code is as follows.

from multiprocessing import Process, Queue
from random import randint
from time import time


def task_handler(curr_list, result_queue):
    total = 0
    for number in curr_list:
        total += number
    result_queue.put(total)


def main():
    processes = []
    number_list = [x for x in range(1, 100000001)]
    result_queue = Queue()
    index = 0
    # 启动8个进程将数据切片后进行运算
    for _ in range(8):
        p = Process(target=task_handler,
                    args=(number_list[index:index + 12500000], result_queue))
        index += 12500000
        processes.append(p)
        p.start()
    # 开始记录所有进程执行完成花费的时间
    start = time()
    for p in processes:
        p.join()
    # 合并执行结果
    total = 0
    while not result_queue.empty():
        total += result_queue.get()
    print(total)
    end = time()
    print('Execution time: ', (end - start), 's', sep='')


if __name__ == '__main__':
    main()

Compare the execution results of the two pieces of code (on the MacBook I am currently using, the above code takes about 6 seconds, while the following code takes less than 1 second. Again, we only compare the calculation time , regardless of the time spent on list creation and slicing operations), after using multi-process, because more CPU execution time is obtained and the multi-core characteristics of the CPU are better utilized, the execution time of the program is significantly reduced, and the calculation amount is more The larger the effect, the more obvious it is. Of course, if you want, you can also deploy multiple processes on different computers to make distributed processes. The specific method is to share the Queue object through the network through the manager provided in the multiprocessing.managers module (registered on the network to allow Other computers can access), this part of the content is also reserved for the topic of crawlers.

In-depth understanding of processes and threads in Python

Table of contents