Python learning-multithreading

Multitasking can be completed by multiple processes, or by multiple threads within a process.

We mentioned earlier that a process is composed of several threads, and a process has at least one thread.

Because thread is the execution unit directly supported by the operating system, high-level languages ​​usually have built-in multi-thread support, and Python is no exception. Moreover, Python threads are real Posix Threads, not simulated threads.

Python's standard library provides two modules: _thread and threading. _thread is a low-level module, and threading is a high-level module, encapsulating _thread. In most cases, we only need to use the advanced module of threading.

To start a thread is to pass in a function and create a Thread instance, and then call start() to start execution:

import time, threading

# 新线程执行的代码:

def loop():
	print('thread %s is running...' % threading.current_thread().name)
	n = 0
	while n < 5:
		n = n + 1
		print('thread %s >>> %s' % (threading.current_thread().name, n))
        time.sleep(1)
		print('thread %s ended.' % threading.current_thread().name)

print('thread %s is running...' % threading.current_thread().name)
t = threading.Thread(target=loop, name='LoopThread')
t.start()
t.join()
print('thread %s ended.' % threading.current_thread().name)

The results are as follows:

thread MainThread is running...
thread LoopThread is running...
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.

Since any process starts a thread by default, we call this thread the main thread, and the main thread can start a new thread. Python's threading module has a current_thread() function, which always returns an instance of the current thread. The name of the main thread instance is MainThread, the name of the child thread is specified when it is created, and we use LoopThread to name the child thread. The name is only used for display when printing, and has no other meaning . If the name is not available, Python will automatically name the thread Thread-1, Thread-2 ...

Lock

The biggest difference between multi-threading and multi-process is that in multi-process, the same variable has a copy in each process and does not affect each other. In multi-thread, all variables are shared by all threads, so any one Variables can be modified by any thread. Therefore, the biggest danger of sharing data between threads is that multiple threads change a variable at the same time, and the content is changed. (So ​​need a lock)

Let's take a look at how multiple thread colleagues manipulate a variable to change the content:

import time, threading

# 假定这是你的银行存款:
balance = 0

def change_it(n):
    # 先存后取,结果应该为0:
    global balance
    balance = balance + n
    balance = balance - n

def run_thread(n):
    for i in range(2000000):
        change_it(n)

t1 = threading.Thread(target=run_thread, args=(5,))
t2 = threading.Thread(target=run_thread, args=(8,))
t1.start()
t2.start()
t1.join()
t2.join()
print(balance)

We define a shared variable balance, the initial value is 0, and start two threads, first save and then fetch, theoretically the result should be 0, but because thread scheduling is determined by the operating system, when t1, t2 alternate execution At this time, as long as the number of cycles is sufficient, the result of the balance is not necessarily 0.

The reason is because a statement in a high-level language is several statements when the CPU is executed , even a simple calculation:

balance = balance + n

There are also two steps:

  1. Calculate balance + n and store it in a temporary variable;
  2. Assign the value of the temporary variable to balance.

It can be seen as:

x = balance + n
balance = x

Since x is a local variable, each of the two threads has its own x . When the code is executed normally: (It turns out that this is the case. The high-level language splits a statement into several parts for execution)

初始值 balance = 0

t1: x1 = balance + 5 # x1 = 0 + 5 = 5
t1: balance = x1     # balance = 5
t1: x1 = balance - 5 # x1 = 5 - 5 = 0
t1: balance = x1     # balance = 0

t2: x2 = balance + 8 # x2 = 0 + 8 = 8
t2: balance = x2     # balance = 8
t2: x2 = balance - 8 # x2 = 8 - 8 = 0
t2: balance = x2     # balance = 0
    
结果 balance = 0

But t1 and t2 run alternately, if the operating system executes t1 and t2 in the following order:

初始值 balance = 0

t1: x1 = balance + 5  # x1 = 0 + 5 = 5

t2: x2 = balance + 8  # x2 = 0 + 8 = 8
t2: balance = x2      # balance = 8

t1: balance = x1      # balance = 5
t1: x1 = balance - 5  # x1 = 5 - 5 = 0
t1: balance = x1      # balance = 0

t2: x2 = balance - 8  # x2 = 0 - 8 = -8
t2: balance = x2      # balance = -8

结果 balance = -8

The reason is that multiple statements are needed to modify the balance, and when these statements are executed, the thread may be interrupted, which causes multiple threads to mess up the content of the same object.

When two threads deposit and withdraw at the same time, the balance may be wrong. You definitely don't want your bank deposit to become negative somehow. Therefore, we must ensure that when one thread modifies the balance, the other thread must not change it. (Locked)

If we want to ensure that the balance calculation is correct, we must give change_it() a lock. When a thread starts to execute change_it(), we say that because the thread has acquired the lock, other threads cannot execute change_it() at the same time. You can only wait until the lock is released, and then you can change it after acquiring the lock. Since there is only one lock, no matter how many threads, at most only one thread can hold the lock at the same time, so there will be no conflict of modification. Creating a lock is achieved through threading.Lock(): ( Locking is agreed upon by the programmer, and atomic operations can ignore locks. What is the atomic operation in Python? )

balance = 0
lock = threading.Lock()

def run_thread(n):
	for i in range(100000):
		# 先要获取锁:
		lock.acquire()
		try:
            # 放心地改吧:
            change_it(n)
        finally:
            # 改完了一定要释放锁:
            lock.release()

When multiple threads execute lock.acquire() at the same time, only one thread can successfully acquire the lock, and then continue to execute the code, and other threads continue to wait until the lock is acquired.

**After the thread that acquired the lock is used up, the lock must be released, otherwise the thread that is waiting for the lock will wait forever and become a dead thread. **So we use try...finally to ensure that the lock will be released. (It should be a deadlock)

The advantage of the lock is to ensure that a certain key code can only be completely executed by one thread from start to end. Of course, there are many disadvantages. First, it prevents concurrent execution of multiple threads. A certain section of code that contains a lock can only be executed in a single thread mode. Implementation will greatly reduce efficiency. Secondly, because there can be multiple locks, different threads hold different locks and try to acquire the locks held by the other party, which may cause deadlocks, causing multiple threads to hang up, which can neither be executed nor ended. The operating system can only be forced to terminate.

Multi-core CPU

If you unfortunately have a multi-core CPU, you must be thinking that multi-core should be able to execute multiple threads at the same time.

What happens if you write an endless loop?

Open the Activity Monitor of Mac OS X, or Task Manager of Windows, you can monitor the CPU usage of a certain process.

We can monitor that an endless loop thread will occupy a CPU 100%.

If there are two endless loop threads, in a multi-core CPU, you can monitor that it will occupy 200% of the CPU, that is, occupy two CPU cores.

If you want to run all the cores of the N-core CPU, you must start N endless loop threads.

Try to write an infinite loop in Python: (I don't try)

import threading, multiprocessing

def loop():
    x = 0
    while True:
        x = x ^ 1

for i in range(multiprocessing.cpu_count()):
    t = threading.Thread(target=loop)
    t.start()

Start N threads with the same number of CPU cores. On a 4-core CPU, you can monitor that the CPU occupancy rate is only 102%, that is, only one core is used.

But using C, C++ or Java to rewrite the same infinite loop, you can directly run all the cores, 4 cores will run to 400%, and 8 cores will run to 800%. Why is Python not working?

**Because the Python thread is a real thread, but when the interpreter executes code, there is a GIL lock: Global Interpreter Lock. Before any Python thread executes, it must first obtain the GIL lock. Then, every time 100 bytes of code are executed, The interpreter automatically releases the GIL lock, allowing other threads to have a chance to execute. **This GIL global lock actually locks the execution code of all threads. Therefore, multi-threading can only be executed alternately in Python. Even if 100 threads run on a 100-core CPU, only one can be used. nuclear.

GIL is a legacy of the design of the Python interpreter. Usually the interpreter we use is the official implementation of CPython. We must truly use multi-core unless we rewrite an interpreter without GIL.

So, in Python, you can use multithreading, but don't expect to use multicore effectively. If you must use multiple cores through multiple threads, it can only be achieved through C extensions, but this will lose the simplicity and ease of use of Python.

However, don't worry too much. Although Python cannot use multi-threading to achieve multi-core tasks, it can achieve multi-core tasks through multiple processes. Multiple Python processes have their own independent GIL locks, which do not affect each other.

summary

Multi-threaded programming has complex models and is prone to conflicts. Locks must be used to isolate them. At the same time, you must be careful of deadlocks.

The Python interpreter is designed with GIL global locks, which makes it impossible for multiple threads to use multiple cores. Multi-threaded concurrency is a beautiful dream in Python .

Guess you like

Origin blog.csdn.net/qq_44787943/article/details/112590047