table of Contents

Thread synchronization locks, deadlocks, recursive locks, semaphores, GIL

Thread synchronization locks, deadlocks, recursive locks, semaphores, GIL

First, the synchronization lock

All threads to read and write the same data at the same time, some threads have to modify the data, causing some old data to get the thread of data rather than the modified data, resulting in incorrect results, so the introduction of the genlock problem solving, synchronization lock principle is the same time only one thread to read and write data.

Locks are typically used to synchronize access to shared resources. Importing a Lock class from the threading module, create a Lock object for each shared resource when you need to access the resource, call acquire methods to get the lock object (if another thread has already acquired the lock, then the current thread needs to wait for it to be release), access to resources to be exhausted, and then call the release method to release the lock.

The following example will need to use synchronization lock:

from threading import Thread
x = 0
def task():
    global x
    for i in range(20000):
        x=x+1
        # t1 的 x刚拿到0 保存状态 就被切了
        # t2 的 x拿到0 进行+1       1
        # t1 又获得运行了  x = 0  +1  1
        # 思考:一共加了几次1? 加了两次1 真实运算出来的数字本来应该+2 实际只+1
        # 这就产生了数据安全问题.    
if __name__ == '__main__':
    t1 = Thread(target=task)
    t2 = Thread(target=task)
    t3 = Thread(target=task)
    t1.start()
    t2.start()
    t3.start()

    t1.join()
    t2.join()
    t3.join()
    print(x)

The code output is 60,000, it seems the program is no problem, but when the for loop into a greater number of 20,000 words, there will be erroneous results, such as change 200,000, each time the result is less than 600,000, which the emergence of data security issues, which need to use synchronization lock, the same time only one thread to manipulate the variables x, t2 t1 can only wait and run the operation is complete, just give the operation code x variables add genlock, specific See synchronization lock using the following code:

from threading import Thread,Lock
x = 0
mutex = Lock()  
def task():
    global x
    mutex.acquire()    # 加锁，加锁之后，同一时间只能有一个线程操作下面的代码
    for i in range(100000):
        x += 1
    mutex.release()    # 释放锁

if __name__ == '__main__':
    t1 = Thread(target=task)
    t2 = Thread(target=task)
    t3 = Thread(target=task)

    t1.start()
    t2.start()
    t3.start()
    t1.join()
    t2.join()
    t3.join()
    print(x)

Second, deadlock

When sharing multiple resources between threads, if two threads each occupy a portion of the resources and while waiting for the release of each other's resources to continue down, it will cause a deadlock. Because the system determines that part of the resources are in use, so the two threads in the absence of external forces will have to wait (blocking) down.

Once the deadlock, the entire program neither any exception occurs, it will not give any warning, but all threads in the blocked state can not continue. Deadlock readily occur, especially present in the case where a plurality of synchronization system locks. The following code will deadlock problem:

from threading import Thread,Lock
mutex1 = Lock()  # 同步锁也叫互斥锁
mutex2 = Lock()
import time
class MyThreada(Thread):  # 用继承Thread类来创建线程类
    def run(self):
        self.task1()
        self.task2()
    def task1(self):
        mutex1.acquire()
        print(f'{self.name} 抢到了 锁1')
        mutex2.acquire()
        print(f'{self.name} 抢到了 锁2')
        mutex2.release()
        print(f'{self.name} 释放了 锁2')
        mutex1.release()
        print(f'{self.name} 释放了 锁1')

    def task2(self):
        mutex2.acquire()
        print(f'{self.name} 抢到了 锁2')
        time.sleep(1)
        mutex1.acquire()
        print(f'{self.name} 抢到了 锁1')
        mutex1.release()
        print(f'{self.name} 释放了 锁1')
        mutex2.release()
        print(f'{self.name} 释放了 锁2')

for i in range(3):
    t = MyThreada()
    t.start()
    
------------------------------------------------------------------------------
Thread-1 抢到了 锁1
Thread-1 抢到了 锁2
Thread-1 释放了 锁2
Thread-1 释放了 锁1
Thread-1 抢到了 锁2
Thread-2 抢到了 锁1

The above procedure is the possibility of deadlock, thread 1 2 grab the lock blocking live. Thread 1 got (lock 2) you want to perform the required down (lock 1); thread 2 get (lock 1) want to perform the required down (lock 2). Two threads each have got the necessary conditions for the implementation of want each other down, and each without releasing the hands of the lock, which is deadlock.

Deadlock should not appear in the program, so it should take measures to avoid deadlock during multithreaded programming. Here are several common ways to solve the deadlock.

Avoid multiple locking: Try to avoid the same thread multiple Lock lock.
It has the same locking sequence: If multiple threads need to be more Lock locked, the request should ensure that they lock in the same order.
Lock using the timing: program can be specified when calling acquire () method lock timeout parameter, which specifies to automatically release the lock Lock after it exceeds the timeout seconds, so that the deadlock can be solved.
Deadlock Detection: Deadlock monitor is deadlock prevention mechanism relies on algorithms mechanism to achieve, it is mainly for those who can not achieve sequential locked, you can not use the time lock scene.
Recursive lock (Rlock): To support in the same thread multiple requests for the same resource, python provides a recursive lock to solve the deadlock.

Third, recursive lock (Rlock)

In order to support multiple requests for the same resource in the same thread, python provides a "recursive lock": threading.RLock.

Internal RLock maintains a Lock and a counter variable, counter records the number of times acquire, so that resources can be repeatedly acquire. Acquire a thread until all have been release, other threads to get resources.

The following recursive lock to solve the above deadlock:

# 递归锁 在同一个线程内可以被多次acquire
# 如何释放: 内部相当于维护了一个计数器 也就是说同一个线程 acquire了几次就要release几次

from threading import Thread,Lock,RLock
mutex1 = RLock()  # 递归锁
mutex2 = mutex1

import time
class MyThreada(Thread):
    def run(self):
        self.task1()
        self.task2()
    def task1(self):
        mutex1.acquire()
        print(f'{self.name} 抢到了 锁1 ')
        mutex2.acquire()
        print(f'{self.name} 抢到了 锁2 ')
        mutex2.release()
        print(f'{self.name} 释放了 锁2 ')
        mutex1.release()
        print(f'{self.name} 释放了 锁1 ')

    def task2(self):
        mutex2.acquire()
        print(f'{self.name} 抢到了 锁2 ')
        time.sleep(1)
        mutex1.acquire()
        print(f'{self.name} 抢到了 锁1 ')
        mutex1.release()
        print(f'{self.name} 释放了 锁1 ')
        mutex2.release()
        print(f'{self.name} 释放了 锁2 ')


for i in range(3):
    t = MyThreada()
    t.start()
    
------------------------------------------------------------------------------
Thread-1 抢到了 锁1 
Thread-1 抢到了 锁2 
Thread-1 释放了 锁2 
Thread-1 释放了 锁1 
Thread-1 抢到了 锁2 
Thread-1 抢到了 锁1 
Thread-1 释放了 锁1 
Thread-1 释放了 锁2 
Thread-2 抢到了 锁1 
Thread-2 抢到了 锁2 
Thread-2 释放了 锁2 
Thread-2 释放了 锁1 
Thread-2 抢到了 锁2 
Thread-2 抢到了 锁1 
Thread-2 释放了 锁1 
Thread-2 释放了 锁2 
Thread-3 抢到了 锁1 
Thread-3 抢到了 锁2 
Thread-3 释放了 锁2 
Thread-3 释放了 锁1 
Thread-3 抢到了 锁2 
Thread-3 抢到了 锁1 
Thread-3 释放了 锁1 
Thread-3 释放了 锁2

Above we used a recursive lock, to solve a number of synchronization lock deadlock caused. We can put RLock understood as there are large lock small lock, only to wait until all the little internal locks are gone, the other threads to enter this public resource.

Question: If we lock that is single-threaded, then we have to open a multithreaded what use is it?

Under this interpretation, at the time of access to shared resources, locks are sure to exist.
But our code, not always in access to public resources, there are some other logic can use multiple threads.
So we lock in the code inside, pay attention somewhere plus, minimal impact on performance, this would rely on the understanding of the logic.

Fourth, the semaphore (Semphare)

It controls the number of threads the same time multiple threads access the same resources

principle:

When instantiated, the specified amount.
Its built-in counter, the locking +1, -1 is released, the counter is 0 obstruction.
acquire(blocking=True,timeout=None) 加锁
release () to release the lock

from threading import Thread,currentThread,Semaphore
import time

def task():
    sm.acquire()
    print(f'{currentThread().name} 在执行')
    time.sleep(3)
    sm.release()

sm = Semaphore(5)  # 规定一次只能有5个线程执行
for i in range(15):
    t = Thread(target=task)
    t.start()
    
------------------------------------------------------------------------------
Thread-1 在执行
Thread-2 在执行
Thread-3 在执行
Thread-4 在执行
Thread-5 在执行

Thread-7 在执行
Thread-6 在执行
Thread-8 在执行
Thread-9 在执行
Thread-10 在执行

Thread-12 在执行
Thread-15 在执行
Thread-14 在执行
Thread-13 在执行
Thread-11 在执行

Five, GIL (Global Interpreter Lock)

There are a lock GIL (Global Interpreter Lock) in Cpython interpreter, GIL is essentially a mutex lock. Led under the same process, the same time can only run one thread, you can not take advantage of multi-core advantage. Under the same process multiple concurrent threads can only be achieved can not be achieved in parallel.

Why can not it true parallel operation in multiple threads?

GIL: Global Interpreter Lock no matter how many threads you start, you have a number of CPU, Python will at the same time allowing only one thread in the implementation of the (competition between threads) get GIL run on a single CPU.
When a thread encounters or IO wait time of arrival this round, will switch the CPU, the CPU time slices give other thread.
CPU switches need to consume time and resources, so compute-intensive functions (such as addition and subtraction, multiplication and division) are not suitable for multi-threaded, CPU thread switch because too much, IO-intensive more suitable for multi-threading.

Why should GIL lock?

Because cpython own garbage collection is not thread safe, once the variable reference count is 0, it will be recycled. At this point GIL lock with garbage collection mechanism is evil to compete, do not let it come so soon rob us temporarily homeless little cute (variable) ! ! !
But then, GIL lock also led to the same process at the same time can only run one thread, unable to take advantage of multi-core advantage.

If a thread robbed GIL, if you encounter (cpu deprived) io or execution time is too long, it will be forced to release the GIL lock out, so that other threads to seize GIL

Analysis: We have four tasks need to be addressed, treatment will definitely have to play a concurrent effect, the solution can be:

Option One: Open the four processes
Option Two: The next process, open four threads

Compute-intensive : Recommended use multiple processes
each have calculated 10s
multithreading
only one thread will be executed at the same time, it means that each province 10s can not, must be calculated separately for each 10s, were 40.ns
and more the process
can be executed in parallel by multiple threads, 10s + open process time

IO-intensive : Recommended use multithreading
four tasks each task 90% most of the time io.
Multithreading
can be implemented concurrently, each thread io time is not ye take up cpu, 10s + computation time four tasks of
multiple processes
can be implemented in parallel, time 10s + 1 task execution time + open process

Specific look at the following examples:

io-intensive

'''采用多进程计时情况'''
from threading import Thread
from multiprocessing import Process
import time

def work1():
    x = 1+1
    time.sleep(5)

if __name__ == '__main__':
    t_list = []
    start = time.time()
    for i in range(4):
        t = Process(target=work1)
        t_list.append(t)
        t.start()
    for t in t_list:
        t.join()
    end = time.time()
    print('多进程',end-start)

Multi-process 5.499674558639526

'''采用多线程计时情况'''
from threading import Thread
from multiprocessing import Process
import time

def work1():
    x = 1+1
    time.sleep(5)

if __name__ == '__main__':
    t_list = []
    start = time.time()
    for i in range(4):
        t = Thread(target=work1)
        # t = Process(target=work1)
        t_list.append(t)
        t.start()
    for t in t_list:
        t.join()
    end = time.time()
    print('多线程',end-start)

Multithreading 5.004202604293823

Summary: You found it! ! ! Multithreading shorter time, a difference of 0.5 seconds what it means! ! ! do you understand? ? ? Tea can be two hundred laps around the earth! !

Why multi-threaded faster?

Because you see, so many people at the same multiple processes to do, which means when the card machine had cried and so on.
That thread is not the same, we are so smart, who will wait for you, I direct Qieqie Qie, so of course the same period of slightly faster

Compute-intensive

'''采用多进程计时情况'''
from threading import Thread
from multiprocessing import Process
import time

def work1():
    res=0
    for i in range(100000000): #1+8个0
        res*=i

if __name__ == '__main__':
    t_list = []
    start = time.time()
    for i in range(4):
        t = Process(target=work1)
        t_list.append(t)
        t.start()
    for t in t_list:
        t.join()
    end = time.time()
    print('多进程',end-start)

Multi-process 18.062480211257935

'''采用多线程计时情况'''
from threading import Thread
from multiprocessing import Process
import time

def work1():
    res=0
    for i in range(100000000): 
        res*=i

if __name__ == '__main__':
    t_list = []
    start = time.time()
    for i in range(4):
        t = Thread(target=work1)
        # t = Process(target=work1)
        t_list.append(t)
        t.start()
    for t in t_list:
        t.join()
    end = time.time()
    print('多线程',end-start)

Multithreading 33.27059483528137

This time difference is actually as much as 15 seconds, why the multi-process faster?

Because you see, this is not the card machine, so multiple processes at the same time so many individuals do something, it means a time required to complete only one thing Jiuhaola! (A task time)
That multithreading is not the same, computing workload and time-consuming, but this is the only way, which is not the same with the card machine, because it is only one data moving, and calculate the whole process indeed affect the whole body, so one thread at a time constantly switching, get smart both will lose data and useless! (Multiple task time)

to sum up:

IO-intensive
- Each thread will have a variety of waiting, multi-threading more suitable
- It can also be used multi-process + coroutine
Compute-intensive
- Thread does not have to wait in the calculation, this time to switch, the switch is useless, Python is not suitable for the development of such features
- Recommended for multi-process