Python concurrent programming (1): Understanding concurrent programming from a performance perspective

1. Basic concepts

Before starting to explain the theoretical knowledge, let's go through a few basic concepts. Although this is an advanced course, I also hope to write it in a simpler way and easier to understand.

串行: A person can only do one thing at the same time, such as watching TV after eating;: 并行A person can do multiple things at the same time, such as watching TV while eating;

In Python, 多线程and 协程although strictly speaking is a serial, but higher than the average efficiency of the implementation of the serial program very. In general serial programs, when the program is blocked, they can only wait and cannot do other things. It's like, after the main show is broadcast on TV and into the commercial time, we can't take advantage of the commercial time to have a meal. For the program, this is obviously extremely inefficient and unreasonable.

Of course, after finishing this course, we will know how to use advertising time to do other things and arrange time flexibly. This is what we 多线程and 协程we want to do to help us, and internally schedule tasks reasonably to maximize program efficiency.

Although 多线程and 协程quite a smart. But it's still not efficient enough. The most efficient way is to multi-task, watching TV, eating and chatting. This is our 多进程ability to do things.

In order to help everyone understand more intuitively, I found two pictures on the Internet to vividly explain the difference between multithreading and multiprocessing. (Invasion)

  • 多线程, Alternate execution, serial in another sense.

.

  • 多进程, Parallel execution, concurrency in the true sense.

.

2. Single thread VS multi thread VS multi process

Words are always pale and weak, and a thousand words are not as powerful as a few lines of code.

First of all, my experimental environment is configured as follows

Note that the following code, to understand, has the following knowledge requirements for Xiaobai:

  1. Use of decorators
  2. Basic use of multithreading
  3. Basic use of multiple processes

Of course, it doesn’t matter if you don’t understand it. The main final conclusion can give everyone a clear understanding of the effects of single-threaded, multi-threaded, and multi-processes. To achieve this effect, the mission of this article will be completed. Wait till the end , To learn the entire series, might as well go back to understand and perhaps have a deeper understanding.

Let's take a look at whether single-threaded, multi-threaded, and multi-process are stronger in operation.


Before starting the comparison, first define four types of scenes

  • CPU computationally intensive
  • Disk IO intensive
  • Network IO intensive
  • [Simulation] IO-intensive

Why these 多线程 多进程scenarios are related to the applicable scenarios. In the conclusion, I will explain.

# CPU计算密集型
def count(x=1, y=1):
    # 使程序完成150万计算
    c = 0
    while c < 500000:
        c += 1
        x += x
        y += y


# 磁盘读写IO密集型
def io_disk():
    with open("file.txt", "w") as f:
        for x in range(5000000):
            f.write("python-learning\n")


# 网络IO密集型
header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36'}
url = "https://www.tieba.com/"

def io_request():
    try:
        webPage = requests.get(url, headers=header)
        html = webPage.text
        return
    except Exception as e:
        return {"error": e}


# 【模拟】IO密集型
def io_simulation():
    time.sleep(2)

The index of the competition, we use time to consider. The less time is spent, the higher the efficiency.

For convenience, makes the code look more concise, I am here is to define a simple 时间计时器decorator. If you don't know much about decorators, it doesn't matter, you just need to know that it is used to calculate the running time of the function.

def timer(mode):
    def wrapper(func):
        def deco(*args, **kw):
            type = kw.setdefault('type', None)
            t1=time.time()
            func(*args, **kw)
            t2=time.time()
            cost_time = t2-t1
            print("{}-{}花费时间:{}秒".format(mode, type,cost_time))
        return deco
    return wrapper

The first step is to look at the single-threaded

@timer("【单线程】")
def single_thread(func, type=""):
    for i in range(10):
              func()

# 单线程
single_thread(count, type="CPU计算密集型")
single_thread(io_disk, type="磁盘IO密集型")
single_thread(io_request,type="网络IO密集型")
single_thread(io_simulation,type="模拟IO密集型")

See the result

【单线程】-CPU计算密集型花费时间:83.42633867263794秒
【单线程】-磁盘IO密集型花费时间:15.641993284225464秒
【单线程】-网络IO密集型花费时间:1.1397218704223633秒
【单线程】-模拟IO密集型花费时间:20.020972728729248秒

The second step, let's take a look at the multi-threaded

@timer("【多线程】")
def multi_thread(func, type=""):
    thread_list = []
    for i in range(10):
        t=Thread(target=func, args=())
        thread_list.append(t)
        t.start()
    e = len(thread_list)

    while True:
        for th in thread_list:
            if not th.is_alive():
                e -= 1
        if e <= 0:
            break

# 多线程
multi_thread(count, type="CPU计算密集型")
multi_thread(io_disk, type="磁盘IO密集型")
multi_thread(io_request, type="网络IO密集型")
multi_thread(io_simulation, type="模拟IO密集型")

See the result

【多线程】-CPU计算密集型花费时间:93.82986998558044秒
【多线程】-磁盘IO密集型花费时间:13.270896911621094秒
【多线程】-网络IO密集型花费时间:0.1828296184539795秒
【多线程】-模拟IO密集型花费时间:2.0288875102996826秒

The third step, finally let's look at multi-process

@timer("【多进程】")
def multi_process(func, type=""):
    process_list = []
    for x in range(10):
        p = Process(target=func, args=())
        process_list.append(p)
        p.start()
    e = process_list.__len__()

    while True:
        for pr in process_list:
            if not pr.is_alive():
                e -= 1
        if e <= 0:
            break

# 多进程
multi_process(count, type="CPU计算密集型")
multi_process(io_disk, type="磁盘IO密集型")
multi_process(io_request, type="网络IO密集型")
multi_process(io_simulation, type="模拟IO密集型")

See the result

【多进程】-CPU计算密集型花费时间:9.082211017608643秒
【多进程】-磁盘IO密集型花费时间:1.287339448928833秒
【多进程】-网络IO密集型花费时间:0.13074755668640137秒
【多进程】-模拟IO密集型花费时间:2.0076842308044434秒

3. Summary of performance comparison results

Put the results together and make a table.

Let's analyze this table.

First of all CPU密集型, multi-threading compared to single-threaded has no advantages. Obviously, it also consumes a lot of time and inefficiency due to constant locking and releasing of GIL global locks, switching threads, and inefficient, and multi-process, because multiple CPUs perform calculations at the same time , Which is equivalent to ten people doing one person's homework, obviously the efficiency is doubled.

Then the IO-intensive, IO密集型can be 磁盘IO, 网络IO, 数据库IOand so on, all belong to the same class, calculate the amount is very small, mostly a waste of time to wait for IO. Through observation, we can find that our disk IO, network IO data, multi-threaded compared to single-threaded did not show a great advantage. This is because the IO task of our program is not heavy enough, so the advantages are not obvious enough.

So I also added a " 模拟IO密集型," use sleepto simulate the IO wait time is to reflect the advantages of multi-threading, but also allows us a more intuitive understanding of multi-threaded process work. Single thread requires every thread sleep(2), 10 threads is 20s, and multi-thread, at sleep(2)the time, it will switch to other threads, making 10 threads at the same time sleep(2), and finally only 10 threads 2s.

The following conclusions can be drawn

  • Single thread is always the slowest, and multiple processes are always the fastest.
  • Multithreading is suitable for use in IO-intensive scenarios, such as crawlers, website development, etc.
  • Multi-process is suitable for use in scenarios with high CPU computing requirements, such as big data analysis, machine learning, etc.
  • Although multi-process is always the fastest, it is not necessarily the best choice, because it needs the support of CPU resources to show its advantages

Guess you like

Origin blog.csdn.net/weixin_36338224/article/details/109190331