Day 34 GIL lock synchronization Asynchronous Event Event

A lock GIL

官方解释:
'''
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple 
native threads from executing Python bytecodes at once. This lock is necessary mainly 
because CPython’s memory management is not thread-safe. (However, since the GIL 
exists, other features have grown to depend on the guarantees that it enforces.)
'''

释义:
在CPython中,这个全局解释器锁,也称为GIL,是一个互斥锁,防止多个线程在同一时间执行Python字节码,这个锁是非常重要的,因为CPython的内存管理非线程安全的,很多其他的特性依赖于GIL,所以即使它影响了程序效率也无法将其直接去除

总结:
在CPython中,GIL会把线程的并行变成串行,导致效率降低

Two problems caused by GIL

  1. GIL What is that?

    He explained: GIL, called the Global Interpreter Lock, added on the interpreter, and is a mutex

  2. Why do we need this lock?

    The main problem is thread-safe, because the python program is essentially a bunch of strings, so when you run a python program, an interpreter must be open, but only one, all code must be handed over to him in a python interpreter to explain the implementation when there are multiple threads thread-safety issues will have to be executing code, it is not open thread screwdriver is not no this problem? of course not,

  3. cpython interpreter and GC issues

    When programming using Python, programmers do not need to participate in the management of memory, because Python has a built-in memory management mechanism, referred to as GC. Then the GC and GIL have any relevance?

    To understand this problem, you must first understand how the GC, Python memory management using reference counting, each number is a plus integer counter indicating the number of times the data is referenced, when this whole numbers 0 indicates that the data has no use, become junk data.

    When memory usage reaches a certain threshold, GC will suspend other threads, and then perform a garbage cleanup, garbage is a bunch of code, it needs a thread to perform.

    Simple, python will automatically help us deal with garbage, scavenging is a bunch of code, need to open a thread to perform, that is to say even if the program does not open his own thread, inside there are multiple threads, this time with our program GC thread the thread will create security problems

    GIL is a mutex, mutex will result in reduced efficiency, specific performance is in cpython even opened a multi-threaded and multi-core CPU also, but can not perform tasks in parallel, because the interpreter is only one, at the same time only a task execution

  4. how to solve this problem?

    Since the underlying problem, there is no solution but to avoid GIL lock affect our efficiency as much as possible.

    1. Multithreading can be achieved in parallel, so that better utilization of multi-core CPU

    2. Task to distinguish

      Tasks can be divided into two categories

      1. Basically no IO-intensive computing most of the time in computing such as face recognition image processing due to the multi-threading can not be parallelized, multi-process should be used to assign tasks to different CPU core

      2. IO-intensive computing tasks very small, most of the time waiting for IO operations

        As the network speed comparison IO CPU processing speed is very slow, multi-threading and will not cause much impact

        Also, if a large number of client connection services, processes, simply open it up only with multi-threaded

  5. GIL difference with a custom lock?

    GIL is locked interpreter-level data

    Custom lock, the lock is a shared resource other than the interpreter eg: files on the hard drive console.

    For this data resource does not belong to the interpreter should handle themselves locked

    Finished school GIL lock you should know what type of job you should use what way to deal with in order to be willing to do to improve the efficiency **

  6. GIL locking and unlocking of the time?

    • Lock time: lock immediately when calling the interpreter
    • Unlock timing: ① when the current thread encountered IO ② release the current release exceeds a set value of the thread execution time
  7. Discussion of performance?

    The reason why the lock is to solve thread safety issues

    Thanks to lock, resulting in a multi-threaded Cpython not only concurrent parallel

    But we can not deny python

    1. python is a language, GIL is a problem Cpython interpreter, as well as Jpython, pypy

    2. If the single-core CPU, GIL will not have any impact

    3. Since most programs are comparing CPU is very slow, resulting in even with multi-core CPU is unable to improve the efficiency of network-based, network speed

    4. For IO intensive tasks, it will not have much impact

    5. Without the lock, we programmers will have to solve their own security problems

      The following code performance testing

      from multiprocessing import Process
      from threading import  Thread
      import time
      # # 计算密集型任务
      #
      # def task():
      #     for i in range(100000000):
      #         1+1
      #
      #
      # if __name__ == '__main__':
      #     start_time = time.time()
      #
      #     ps = []
      #     for i in range(5):
      #         p = Process(target=task)
      #         # p = Thread(target=task)
      #         p.start()
      #         ps.append(p)
      #
      #     for i in ps:i.join()
      #
      #     print("共耗时:",time.time()-start_time)
      
      # 多进程胜
      
      
      # IO密集型任务
      
      def task():
          for i in range(100):
              with open(r"1.死锁现象.py",encoding="utf-8") as f:
                  f.read()
      
      if __name__ == '__main__':
          start_time = time.time()
      
          ps = []
          for i in range(10):
              p = Process(target=task)
              # p = Thread(target=task)
              p.start()
              ps.append(p)
      
          for i in ps:i.join()
          print("共耗时:",time.time()-start_time)
      
      # 多线程胜

signal

  • You can limit the concurrent implementation of a common code of the number of threads

  • If the limit is the number 1, there is no difference between ordinary mutex

  • Note: semaphores are not used to solve security problems, but the maximum limit for the amount of concurrency

    from threading import Semaphore,currentThread,Thread
    import time
    
    s = Semaphore(5)
    
    def task():
        s.acquire()
        time.sleep(1)
        print(currentThread().name)
        s.release()
    
    for i in range(10):
        Thread(target=task).start()

Thread pool and process pool

  • What is the process \ thread pool?

    Pool represents a list of a vessel, essentially a storage process or thread

  • Pond store thread or process?

    If the IO-intensive task using a thread pool, if it is computationally intensive tasks use the process pool

  • Why do we need a process \ thread pool

    In many cases the number of processes or threads need to be controlled in a reasonable range, such as TCP program, one client corresponds to a thread, although small overhead thread, but certainly not unlimited open, otherwise the system resources are exhausted sooner or later, the solution is to control the number of threads.

    Thread / process pool not only help us control thread / process number, but also help us to complete the creation of threads / processes, destruction, and assign tasks

    import os
    import time
    from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor
    from threading import activeCount,enumerate,currentThread
    
    # # 创建一个线程池   指定最多可以容纳两个线程
    # pool = ThreadPoolExecutor(20)
    #
    # def task():
    #     print(currentThread().name)
    #
    # # 提交任务到池子中
    # pool.submit(task)
    # pool.submit(task)
    #
    # print(enumerate())
    
    # 进程池的使用
    
    def task():
        time.sleep(1)
        print(os.getpid())
    
    
    if __name__ == '__main__':
        pool = ProcessPoolExecutor(2)
        pool.submit(task)
        pool.submit(task)
        pool.submit(task)

Synchronous Asynchronous

Two states programs: blocking and non-blocking

Blocking: When encountered during program execution IO operations, IO operation in the implementation of the program can not continue with other code, called blocking!

Non-blocking: the program does not run IO encountered in normal operation, or in some way make the program instantly met, it will not stay still, you can also perform other actions to improve CPU utilization

Synchronous - Asynchronous refers to the submission of the task

Synchronization refers to the call: After initiating task must wait for the completion of tasks performed in place to continue

Asynchronous refers to the call: the task must be initiated without waiting for task execution, you can turn to perform other operations immediately

Synchronization will have to wait and blocking effect but this is completely different, when blocking the implementation of the program will be deprived of the right CPU, while synchronous call does not!

Obviously asynchronous calls more efficient, but the results of the task how to get it?

Asynchronous call and get your results by 1 program:

from concurrent.futures import ThreadPoolExecutor
from threading import current_thread
import time

pool = ThreadPoolExecutor(3)
def task(i):
    time.sleep(0.01)
    print(current_thread().name,"working..")
    return i ** i

if __name__ == '__main__':
    objs = []
    for i in range(3):
        res_obj = pool.submit(task,i) # 异步方式提交任务# 会返回一个对象用于表示任务结果
        objs.append(res_obj)

# 该函数默认是阻塞的 会等待池子中所有任务执行结束后执行
pool.shutdown(wait=True)

# 从结果对象中取出执行结果
for res_obj in objs:
    print(res_obj.result())
print("over")

Asynchronous call a parallel program in the way and get the result 2:

from concurrent.futures import ThreadPoolExecutor
from threading import current_thread
import time

pool = ThreadPoolExecutor(3)
def task(i):
    time.sleep(0.01)
    print(current_thread().name,"working..")
    return i ** i

if __name__ == '__main__':
    objs = []
    for i in range(3):
        res_obj = pool.submit(task,i) # 会返回一个对象用于表示任务结果
        print(res_obj.result()) #result是同步的一旦调用就必须等待 任务执行完成拿到结果
print("over")

Asynchronous callbacks

  1. What is an asynchronous callback

    Asynchronous callback means: a specified function initiates an asynchronous tasks at the same time, when an asynchronous task is completed it will automatically call this function

  2. Why do we need an asynchronous callback

    Before the use of process pool thread pool or submit the task, if you want the results of processing tasks must be called result function or shutdown function, and they are all blocked, will wait until after the task is completed in order to continue the implementation of such a will not be able to perform in this process of waiting for other tasks, reducing the efficiency, you need a solution, which is to ensure the thread without waiting for analytical results, but also ensures that data can be resolved in time, the program is asynchronous callbacks

  3. The use of asynchronous callbacks

    First look at a case:

    When writing crawlers are usually two steps:

    1. Download the file from a web server

    2. Read and parse the file contents, extracting useful data

    According to the above processes can write a simple crawler

    To request a web page data you need to use a third party to request library requests can be installed by pip or pycharm, click settings-> interpreter in pycharm -> click the + -> Search requests-> Installation

    import requests,re,os,random,time
    from concurrent.futures import ProcessPoolExecutor
    
    def get_data(url):
        print("%s 正在请求%s" % (os.getpid(),url))
        time.sleep(random.randint(1,2))
        response = requests.get(url)
        print(os.getpid(),"请求成功 数据长度",len(response.content))
        #parser(response) # 3.直接调用解析方法  哪个进程请求完成就那个进程解析数据  强行使两个操作耦合到一起了
        return response
    
    def parser(obj):
        data = obj.result()
        htm = data.content.decode("utf-8")
        ls = re.findall("href=.*?com",htm)
        print(os.getpid(),"解析成功",len(ls),"个链接")
    
    if __name__ == '__main__':
        pool = ProcessPoolExecutor(3)
        urls = ["https://www.baidu.com",
                "https://www.sina.com",
                "https://www.python.org",
                "https://www.tmall.com",
                "https://www.mysql.com",
                "https://www.apple.com.cn"]
        # objs = []
        for url in urls:
            # res = pool.submit(get_data,url).result() # 1.同步的方式获取结果 将导致所有请求任务不能并发
            # parser(res)
    
            obj = pool.submit(get_data,url) # 
            obj.add_done_callback(parser) # 4.使用异步回调,保证了数据可以被及时处理,并且请求和解析解开了耦合
            # objs.append(obj)
    
        # pool.shutdown() # 2.等待所有任务执行结束在统一的解析
        # for obj in objs:
        #     res = obj.result()
        #     parser(res)
        # 1.请求任务可以并发 但是结果不能被及时解析 必须等所有请求完成才能解析
        # 2.解析任务变成了串行,

    Summary: Use asynchronous callback method is to get a job after submitting Futures object, call the object's add_done_callback to specify a callback function,

    If the task compared to boil water, you can only wait for the water guarding the kettle when no callback open, with the equivalent of a change callback will ring the kettle to boil water used for any other things during the wait for the water kettle will be opened automatic sound, which I will return process. Kettle automatic callback is sound.

    note:

    1. When using the process pool, the callback function is the main process execution
    2. When using thread pool threads execute the callback function is uncertain, which is given to idle thread which thread
    3. The callback function receives a default parameter is the object itself this task, then the task to get through the result function object processing results

Event Event

What is the event

Event notification signal indicating certain things happen at a certain time, for collaborative work between threads.

Since the state of running different threads are independent unpredictable, the data between a thread and the other thread is not synchronized, when a thread needs to use the state of another thread to determine their next operation, must remain data synchronization between threads, Event synchronization between threads can be achieved

Event Introduction

Event object contains a signal flag by a thread setting, which allows the thread waits for the occurrence of certain events. In the initial case, the signal Event object flag is set to false. If there is a thread to wait for an Event object, and the sign of the Event object is false, then the thread will be blocked until the flag has been true. If one thread will signal an Event object flag is set to true, it wakes up all the threads waiting for this Event object. If a thread is waiting for a true Event has been set as the object, then it will ignore the event, continue

Available methods:

event.isSet():返回event的状态值;
event.wait():将阻塞线程;知道event的状态为True
event.set(): 设置event的状态值为True,所有阻塞池的线程激活进入就绪状态, 等待操作系统调度;
event.clear():恢复event的状态值为False。

Use Cases:

# 在链接mysql服务器前必须保证mysql已经启动,而启动需要花费一些时间,所以客户端不能立即发起链接 需要等待msyql启动完成后立即发起链接
from threading import Event,Thread
import time

boot = False
def start():
    global boot
    print("正正在启动服务器.....")
    time.sleep(5)
    print("服务器启动完成!")
    boot = True
    
def connect():
    while True:
        if boot:
            print("链接成功")
            break
        else:
            print("链接失败")
        time.sleep(1)

Thread(target=start).start()
Thread(target=connect).start()
Thread(target=connect).start()

Use Event after transformation:

from threading import Event,Thread
import time

e = Event()
def start():
    global boot
    print("正正在启动服务器.....")
    time.sleep(3)
    print("服务器启动完成!")
    e.set()

def connect():
    e.wait()
    print("链接成功")
    
Thread(target=start).start()
Thread(target=connect).start()
Thread(target=connect).start()

Increased demand, each attempt to link wait one second, try for 3 times

from threading import Event,Thread
import time

e = Event()
def start():
    global boot
    print("正正在启动服务器.....")
    time.sleep(5)
    print("服务器启动完成!")
    e.set()

def connect():
    for i in range(1,4):
        print("第%s次尝试链接" % i)
        e.wait(1)
        if e.isSet():
            print("链接成功")
            break
        else:
            print("第%s次链接失败" % i)
    else:
        print("服务器未启动!")

Thread(target=start).start()
Thread(target=connect).start()
# Thread(target=connect).start()

Guess you like

Origin www.cnblogs.com/bladecheng/p/11142114.html