Python Notes: Introduction to Multithreading and Multiprocess

0 Preface

Multithreading and multiprocessing are probably one of the most frequently asked questions in back-end engineer interviews. There are also many materials on the Internet that give a detailed introduction to multithreading and multiprocessing. Here, we won't go into details.

In short, the flexible use of multi-threading and multi-process can greatly improve the operating efficiency of the program, especially for scenarios such as crawlers or online model calls. Therefore, in our actual work, multithreading and multiprocess are undoubtedly a very common tool.

Therefore, below, we will briefly introduce the implementation of multithreading and multiprocess in python.

1. Multithreading

1. Definition and application scenarios of multithreading

The definition of multithreading will not be repeated here. It separates multiple threads from a single process and executes a batch of tasks in parallel.

In essence, multi-threading is based on a single process, so the resources it possesses will not increase, so it is not suitable for performing CPU-intensive tasks. For such tasks, multi-threading is possible. Will slow down the running speed.

Conversely, for tasks that are not computationally intensive but require a large number of calls to external services, multithreading is a very effective way to improve resource utilization.

2. Basic usage of multithreading

Next, let's take a look at the basic usage of multithreading in python.

The realization of multi-threading in python is actually very simple, as long as you simply call the built-in threading library of python, it can be implemented quickly.

The simplest multi-threaded implementation can consist of the following parts:

  1. Define the tasks that the thread needs to perform;
  2. Create a thread;
  3. Start a thread;

However, it should be noted that the creation of the thread only sends a start signal, which is separate from the main thread. Therefore, the main thread can directly start subsequent code execution after sending the start command, and the thread will apply for it at the same time The resource then performs job execution.

This will bring a problem that under normal circumstances, the main thread cannot know the running status of each thread , and some operations actually need to be executed after the thread has been executed. Therefore, in most cases, we need to add an additional merge operation to merge the sub-threads into the main thread. At this time, the main thread will wait until the sub-threads have finished running before starting subsequent code execution.

A sample of the most basic multithreading usage is as follows:

import threading

def job():
    print("hello world!")
    return

def main():
    thread = threading.Thread(target=job)
    thread.start()
    thread.join()
    return

main()

In addition, if we need to input additional parameters to the task of the thread, we can use the following methods to pass parameters.

import threading

def job(message):
    print(message)
    return

def main():
    thread = [threading.Thread(target=job, args=("hello world!", ))
    thread.start()
    thread.join()
    return

main()

3. The use of queues in multithreading

As mentioned earlier, the thread is independent of the main thread, even if we can joinmerge it into the main thread through methods, in fact, we cannot get any return results from it.

To get the result of running from the thread, we need to write it into a publicly accessible storage space in some way.

On the other hand, when using threads to implement the producer-consumer model, we also need to store the data in a common storage space.

A relatively wild way is to use global variables, but this method is not elegant enough, and in the case of high concurrency, there may be hidden dangers. A more elegant way is to use the queue method in the built-in python library. achieve.

Below, we give a typical example as follows:

import threading
from queue import Queue

def job(q_in, q_out):
    while not q_in.empty():
        try:
            n = q_in.get()
            q_out.put(n**2)
        except:
            break
    return 

def main():
    q_in = Queue()
    q_out = Queue()
    [q_in.put(i) for i in range(10000)]
    
    thread_list = [threading.Thread(target=job, args=(q_in, q_out)) for i in range(5)]
    [thread.start() for thread in thread_list]
    [thread.join() for thread in thread_list]

    ans = []
    while not q_out.empty():
        ans.append(q_out.get())
    return ans

main()

However, it should be noted that because threads are running in parallel and interspersed, in fact, we cannot guarantee the one-to-one correspondence between the input queue and the output queue . If there is a need for order preservation, we need to provide additional guarantees through some other means.

4. Application of locks in multithreading

Finally, let's look at the application method of thread lock in multithreading.

As mentioned earlier, a common scenario for multithreading is to use multiple concurrent threads to execute the same task, thereby improving code execution efficiency. However, if all threads need to read and write the same parameter, because there is no communication mechanism between the threads, it may cause read-write conflicts.

E.g:

import threading
from time import sleep

def job():
    for i in range(5):
        print(i)
        sleep(1)
    return

def main():
    n = 1
    thread_list = [threading.Thread(target=job) for _ in range(5)]
    [thread.start() for thread in thread_list]
    [thread.join() for thread in thread_list]
    return

main()

In the above code, since all threads share the same read-write cache, mutual printing may appear serial, which is not what we expect.

Therefore, we need to introduce locks to protect resources.

We give the basic usage of thread lock as follows:

import threading
from time import sleep

def job(lock):
    for i in range(10):
        lock.acquire()
        print(i)
        lock.release()
        sleep(1)
    return

def main():
    n = 1
    lock = threading.Lock()
    thread_list = [threading.Thread(target=job, args=(lock, )) for _ in range(10)]
    [thread.start() for thread in thread_list]
    [thread.join() for thread in thread_list]
    return

main()

2. Multi-process

1. Definition and application scenarios of multi-process

Different from multi-threading, multi-process is directly paralleled in the process. Since each process will grab a resource, instead of sharing a resource like multi-threading, there are more CPU computing-intensive tasks. Processes can effectively improve execution efficiency . On the contrary, multithreading often fails to improve execution efficiency or even lowers execution efficiency.

2. Basic usage of multi-process

In python, the implementation of multi-process and multi-thread is very similar, just replace the built-in multi-threading with the multi-process multiprocessing library.

The most basic multi-process implementation code example is as follows:

import multiprocessing

def job():
    print("hello world!")
    return

def main():
    process = multiprocessing.Process(target=job)
    process.start()
    process.join()
    return

main()

Similarly, when parameters need to be passed in, we only need to pass the parameters through the args parameter.

import multiprocessing

def job(n):
    print(sum([i**2 for i in range(1, n+1)]))
    return

def main():
    process = multiprocessing.Process(target=job, args=(10, ))
    process.start()
    process.join()
    return

main()

3. The use of queues in multiple processes

Now, let's examine the use of queues in multiple processes.

The method of using multi-process queues is basically the same as that of multi-threading. The only difference is that multi-threads can mix their own Queue class and the Queue class in the queue library. However, since multi-processes do not share process resources, they must not be mixed. The Queue class must use the Queue class implemented in the multi-process library.

The code examples are as follows:

import multiprocessing
from multiprocessing import Queue
from time import sleep

def job(q_in, q_out):
    while not q_in.empty():
        try:
            n = q_in.get()
        except:
            break
        q_out.put(n ** 2)
    return

def main():
    q_in = Queue()
    q_out = Queue()
    for i in range(1, 11):
        q_in.put(i)
    process_list = [multiprocessing.Process(target=job, args=(q_in, q_out)) for i in range(5)]
    [process.start() for process in process_list]
    [process.join() for process in process_list]

    while not q_out.empty():
        print(q_out.get())
    return

main()

If you use the Queue class implementation in the general queue library by mistake, you will find that the above code does not print anything at all. The reason is that the Queue class between different processes is not universal, and the q_out of the main process has not been written in fact. element.

4. Application of locks in multiple processes

Now, let's examine the use of locks in multiple processes.

Similar to queues, the use of locks in multi-processes is almost the same as in multi-threading. You only need to replace the Lock class in the threading library with the Lock class in the multiprocessing library.

The code examples are as follows:

import multiprocessing
from multiprocessing import Lock

def job(lock):
    for i in range(5):
        lock.acquire()
        print(i)
        lock.release()
    return

def main():
    lock = Lock()
    process_list = [multiprocessing.Process(target=job, args=(lock, )) for i in range(5)]
    [process.start() for process in process_list]
    [process.join() for process in process_list]
    return

main()

5. The use of process pool

Process pool is a unique concept in multi-process programs. His role is to automatically allocate the resources required by multiple processes and start the process.

Its usage is equivalent to defining first and then starting, but its usage logic is:

  1. Define process tasks
  2. Put the process into the process pool
  3. Start a process in the process pool

The code example is as follows:

import multiprocessing

def job(x):
    return x**2

# 实现方式一
def multi_process_pool_test_v1():
    pool = multiprocessing.Pool(processes = 2) # 分配两个核的资源
    ans = pool.map(job, range(10))
    print(ans)
    
# 实现方式二
def multi_process_pool_test_v2():
    pool = multiprocessing.Pool(processes = 2) # 分配两个核的资源
    multi_p = [pool.apply_async(job, (i,)) for i in range(10)]
    ans = [it.get() for it in multi_p]
    print(ans)
    
multi_process_pool_test_v1()
multi_process_pool_test_v2()

3. Reference link

  1. https://docs.python.org/3/library/threading.html
  2. https://docs.python.org/3/library/multiprocessing.html
  3. https://www.bilibili.com/video/BV1jW411Y7Wj
  4. https://www.bilibili.com/video/BV1jW411Y7pv

Guess you like

Origin blog.csdn.net/codename_cys/article/details/108395215