Python Advanced Tutorial Notes: Preliminary Multithreading

0x01 Thread and process preliminary

A process is the smallest unit by which the system allocates resources. A program has at least one process.

Thread is the smallest unit that can be scheduled in the OS. It is included in the process and is the actual unit of operation in the process.

There can be multiple threads in a process. In a single-core CPU, each process can only run one thread at the same time. Only in a multi-core CPU can thread concurrency exist.

The process has its own independent address space (relative address space), memory, data stack, etc., so the process occupies more resources. Since the resources of the process are independent, the communication is inconvenient, and only inter-process communication (IPC) can be used. However, threads in the same process have in common that multiple threads share the virtual space of the same process. The environment shared by threads includes process code segments, public data and files of the process, etc. Using these shared data, it is easy to communicate between threads.

In addition, when the operating system creates a process, it must allocate a separate memory space for the process and allocate a large number of related resources, but it is much simpler to create a thread.

Therefore, using multiple threads to achieve concurrency is much more performant than using multiple processes.

0x02 thread implementation

Ordinary creation method: need to inherit threading.Thread, and the inherited subclass needs to implement the run method.

import threading

import time

class Mythread(threading.Thread):
    def __init__(self, n):
        super(Mythread, self).__init__()
        self.n = n
        
    def run(self):
        print("task", self.n)
        time.sleep(1)
        print('2s')
        time.sleep(1)
        print('1s')
        time.sleep(1)
        print('0s')
        time.sleep(1)
        print("finish", self.n)
        
if __name__ == "__main__":
    t1 = Mythread("t1")
    t1.start()
    t2 = Mythread("t2")
    t3 = Mythread("t3")
    t2.start()
    t3.start()
    t4 = Mythread("t4")
    t5 = Mythread("t5")
    
    t4.start()
    t5.start()

The results are as follows:

task t1
task t2
tasktask t4
 t3
task t5
2s
2s
2s2s

2s
1s
1s
1s1s

1s
0s
0s
0s0s

0s
finish t1
finish t2
finishfinish  t3
t4
finish t5

The reason why the typesetting is so weird is because each thread will only run a certain amount of time slices. If a thread runs out of time, the CPU will turn to execute other threads.

For example, thread i executes a loop, after half of the execution, the cpu suspends the execution of thread i, and then executes thread j...

0x03 Daemon thread

If a child thread is created in the main thread, one of the following two situations may occur when the main thread ends, depending on the value of the child thread daemon (set Thread.setDaemon(True)):

  • If the daemon attribute of a sub-thread is True, the main thread will exit without checking the sub-thread at the end of running, and all sub-threads whose daemon value is True will end with the main thread, regardless of whether the operation is completed or not.
  • If the daemon property of a child thread is False, when the main thread ends, it will detect whether the child thread ends. If the child thread is still running, the main thread will wait for it to finish before exiting.

The experimental code is as follows:

from threading import Thread, Lock
import os, time

def work():
	global n
	#lock.acquire()
	temp = n
	time.sleep(.1)
	n = temp - 1
	print(n)
	#lock.release()

if __name__ == "__main__":
	lock = Lock()
	n = 100
	for i in range(10):
		p = Thread(target=work)
		p.setDaemon(True)
		p.start()

The result is as follows:

9999
99
9999
99
99
99

99

99

It can be seen that the main thread exits after execution, and does not wait for the child thread to finish before exiting.

0x04 Multi-thread shared global variable

Multiple threads in the same process share resources, such as variables, files, and so on.

import threading
import time

g_num = 100.0

def work1():
	global g_num
	for i in range(5):
		g_num += 1
		time.sleep(2)
		print(f"work 1 :g_num is {
      
      g_num}")
		
def work2():
	global g_num
	for i in range(10):
		g_num -= .5
		time.sleep(1)
		print(f"work 2 :g_num is {
      
      g_num}")

if __name__ == "__main__":
	t1 = threading.Thread(target=work1)
	t2 = threading.Thread(target=work2)
	t1.start()
	t2.start()

The results are as follows:

work 2 :g_num is 100.5
work 1 :g_num is 100.0
work 2 :g_num is 101.0
work 2 :g_num is 100.5
work 1 :g_num is 100.0
work 2 :g_num is 101.0
work 2 :g_num is 100.5
work 1 :g_num is 100.0
work 2 :g_num is 101.0
work 2 :g_num is 100.5
work 1 :g_num is 100.0
work 2 :g_num is 101.0
work 2 :g_num is 100.5
work 1 :g_num is 100.0
work 2 :g_num is 100.0

0x05 Mutex lock

In order to solve the problem of thread unsynchronization, for some resources (such as variables, code) we need to ensure one thing: only 0 or 1 thread can access the resource at any time. The solution is to use the mutex Lock.

from threading import Thread, Lock
import os, time

def work():
	global n
	lock.acquire()
	temp = n
	time.sleep(.1)
	n = temp - 1
	print(n)
	lock.release()

if __name__ == "__main__":
	lock = Lock()
	n = 100
	l = []
	for i in range(10):
		p = Thread(target=work)
		l.append(p)
		p.start()
	
	# 这里的join是指让主线程等待新开的线程运行结束之后再结束
	for p in l:
		p.join()

The running result after adding thread synchronization:

99
98
97
96
95
94
93
92
91
90
from threading import Thread, Lock
import os, time

def work():
	global n
	#lock.acquire()
	temp = n
	time.sleep(.1)
	n = temp - 1
	print(n)
	#lock.release()

if __name__ == "__main__":
	lock = Lock()
	n = 100
	l = []
	for i in range(10):
		p = Thread(target=work)
		l.append(p)
		p.start()

	# 这里的join是指让主线程等待新开的线程运行结束之后再结束
	for p in l:
		p.join()

The result of running without thread synchronization:

99
99
99
99
99
99
99
99
99
99

0x06 RLock lock (reentrant lock, recursive lock)

If Lock is used, the following code will block:

import threading
lock = threading.Lock()

lock.acquire()
    for i in range(10):
        print('获取第二把锁')
        lock.acquire()
        print(f'test.......{
      
      i}')
        lock.release()
    lock.release()

However, if this lock is replaced by an RLock lock, blocking will not occur.

RLock locks check which thread is requesting to use the lock. When the lock is occupied,

  • If the same thread applies for the lock, RLock will maintain a counter for the thread and add 1 to the counter;
  • If the thread applying for the lock is not the same, it will be blocked.

0x07 Semaphore (Semaphore)

A mutex allows only one thread to change data, but a semaphore allows a certain number of threads to change data.

For example, Shushuo with only three squatting pits.

import threading
import time

def run(n, semaphore):
    semaphore.acquire()   #加锁
    time.sleep(1)
    print("run the thread:%s\n" % n)
    semaphore.release()     #释放

if __name__ == '__main__':
    num = 0
    semaphore = threading.BoundedSemaphore(5)  # 最多允许5个线程同时运行
    for i in range(22):
        t = threading.Thread(target=run, args=("t-%s" % i, semaphore))
        t.start()
    while threading.active_count() != 1:
        pass  # print threading.active_count()
    else:
        print('-----all threads done-----')

0x08 Event class (Event)

The Event class is used by the main thread to control the operation of other threads. An event is a simple thread synchronization object that mainly provides the following methods:

  • clear sets flag to "False"
  • set sets flag to "True"
  • is_set determines whether the flag is set
  • wait will always monitor the flag, and if the flag is not detected, it will always be in the blocking state

Event processing mechanism: a "flag" is defined globally. When the flag value is "False", then event.wait() will block, and when the flag value is "True", then event.wait() will no longer block.

#利用Event类模拟红绿灯
import threading
import time

event = threading.Event()

def lighter():
    count = 0
    event.set()     #初始值为绿灯
    while True:
        if 5 < count <=10 :
            event.clear()  # 红灯,清除标志位
            print("\33[41;1mred light is on...\033[0m")
        elif count > 10:
            event.set()  # 绿灯,设置标志位
            count = 0
        else:
            print("\33[42;1mgreen light is on...\033[0m")

        time.sleep(1)
        count += 1

def car(name):
    while True:
        if event.is_set():      #判断是否设置了标志位
            print("[%s] running..."%name)
            time.sleep(1)
        else:
            print("[%s] sees red light,waiting..."%name)
            event.wait()
            print("[%s] green light is on,start going..."%name)

light = threading.Thread(target=lighter,)
light.start()

car = threading.Thread(target=car,args=("MINI",))
car.start()

0x09 Global Interpreter Lock (GIL)

In a non-python environment, in the case of a single core, only one task can be executed at the same time. Multi-core can support multiple threads to execute at the same time.

But in python, no matter how many cores there are, only one thread can be executed at the same time. The reason is the existence of GIL.

That's why people say that Python is pseudo-multithreaded.

The full name of GIL is the global interpreter lock. The source is the consideration at the beginning of python design and the decision made for data security.

If a thread wants to execute, it must first get the GIL. We can regard the GIL as a "passport", and in a python process , there is only one GIL. Threads that do not get the pass are not allowed to enter the CPU for execution. The GIL is only available in cpython, because cpython calls the native thread of the c language, so it cannot directly operate the cpu, and can only use the GIL to ensure that only one thread can get the data at the same time.

In pypy and jpython there is no GIL.

The working process of Python multithreading:

When python uses multithreading, it calls the native thread of C language.

  • get public data
  • apply for gil
  • The python interpreter calls the os native thread
  • os operates cpu to perform operations
  • When the execution time of the thread is up, no matter whether the operation has been executed or not, the gil is required to release
  • Then repeat the above process by other processes
  • After other processes are executed, they will switch to the previous thread (continue to execute from the context he recorded). The whole process is that each thread performs its own operation and switches when the execution time is up (context switch).

If you want to make full use of multi-core CPU under python, use multi-process. Because each process has its own independent GIL and does not interfere with each other, it can be executed in parallel in a true sense. In python, the execution efficiency of multi-process is better than multi-threading (only for multi-core CPU).

In python3.x, the GIL uses a timer (after the execution time reaches the threshold, the current thread releases the GIL), which is more friendly to CPU-intensive programs, but still does not solve the problem that only one thread can be executed at the same time caused by the GIL, so Efficiency remains unsatisfactory.

For a multi-core CPU to run a single python process, it is really "one core is difficult, seven cores watch".

Guess you like

Origin blog.csdn.net/weixin_43466027/article/details/119818066