<Reprint> Why is multiprocessing recommended over multithreading in Python?

Looking at Python's multi-threading recently, we often hear veterans say: "Multi-threading under Python is tasteless, it is recommended to use multi-process!" , but why do you say that?        
        
To know it, but also to know why. So there is the following in-depth study:        
First, the background is emphasized:         1. What is the GIL? The full name of GIL is Global Interpreter Lock (Global Interpreter Lock). The source is the consideration at the beginning of python design and the decision made for data security.        2. Each CPU can only execute one thread at a time (multi-threading under a single-core CPU is actually just concurrency, not parallelism. From a macro perspective, concurrency and parallelism are the concepts of processing multiple requests at the same time. But concurrency and Parallelism is different. Parallelism means that two or more events occur at the same time; while concurrency means that two or more events occur at the same time interval.) Under Python multithreading, the execution method of each thread: 1. Get the GIL 2. Execute the code until sleep or the python virtual machine suspends it. 3. Release the GIL         . It can be seen that if a thread wants to execute, it must first get the GIL. We can regard the GIL as a "pass", and in a python process, there is only one GIL. Threads that do not get the pass are not allowed to enter the CPU for execution. In Python 2.x, the release logic of GIL is that the current thread encounters an IO operation or the ticks count reaches 100 (ticks can be regarded as a counter of Python itself, which is specially used for GIL, and returns to zero after each release. This count can be passed through sys.setcheckinterval to adjust), release.        
        










        
        
        

        
Every time the GIL lock is released, threads compete for locks and switch threads, which consumes resources. And because the GIL is locked, a process in python can only execute one thread at the same time (the thread that gets the GIL can execute it), which is why the multi-threading efficiency of python is not high on multi-core CPUs.        So is python's multithreading completely useless? Here we have a classification discussion:         1. CPU-intensive code (various loop processing, counting, etc.), in this case, due to a lot of computational work, the ticks count will soon reach the threshold, and then trigger the release of the GIL and Re-competition (multiple threads switching back and forth, of course, consumes resources), so multi-threading under python is not friendly to CPU-intensive code. 2. For IO-intensive code (file processing, web crawler, etc.), multi-threading can effectively improve efficiency (IO operations under a single thread will perform IO waiting, causing unnecessary waste of time, and enabling multi-threading can make thread A wait while thread A is waiting. , automatically switch to thread B, without wasting CPU resources, thereby improving program execution efficiency). So python's multithreading is more friendly to IO-intensive code. In python3.x, the GIL does not use the ticks count, but instead uses a timer (after the execution time reaches the threshold, the current thread releases the GIL), which is more friendly to CPU-intensive programs, but still does not solve the problem of the GIL caused by the same time only The problem of being able to execute a thread, so the efficiency is still not satisfactory. Please note: Multi-core multi-threading is worse than single-core multi-threading. The reason is that under single-core multi-threading, every time the GIL is released, the awakened thread can acquire the GIL lock, so it can execute seamlessly, but under multi-core, CPU0 releases After the GIL, threads on other CPUs will compete, but the GIL may be acquired by CPU0 again immediately, causing the awakened threads on other CPUs to wake up and wait until the switching time and then enter the pending state. Causes thread thrashing, resulting in lower efficiency
                                 
        

        

        
        

        
                                

        

                                 
回到最开始的问题:经常我们会听到老手说:“python下想要充分利用多核CPU,就用多进程”,原因是什么呢?        

原因是:每个进程有各自独立的GIL,互不干扰,这样就可以真正意义上的并行执行,所以在python中,多进程的执行效率优于多线程(仅仅针对多核CPU而言)。        
                                
所以在这里说结论:多核下,想做并行提升效率,比较通用的方法是使用多进程,能够有效提高执行效率        

 

转载地址:http://bbs.51cto.com/thread-1349105-1.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324949212&siteId=291194637