1, Python concurrent programming from a performance perspective On the concurrent programming

  • Foreword
  • The basic concept of concurrent programming
  • VS VS single-threaded multi-threaded multi-process
  • Performance Comparison Summary of findings

Preface

As a branch of the Advanced Series " 并发编程," I think this is every programmer should be in.

并发编程 This series, I prepared for nearly a week, combing the knowledge, to think what you want to cite an example in order to more thoroughly understand this knowledge makes it easier to spot. Hope that the presentation of the results really as imagined it, too friendly to white.

Yesterday roughly finishing this series I would probably say the following (the latter may adjust):

Course OutlineCourse Outline

For concurrent programming, Python implementations, summed up what, roughly the following three ways:

  • Multithreading
  • multi-Progress
  • Coroutine (generator)

In the following chapters, we will be after another to tell you about these three knowledge points.

The basic concept of concurrent programming

Before you begin to explain the theory of knowledge, had one look at a few basic concepts. Although it is advanced course, but I also want to write more white, more user-friendly.

串行: A person at the same time can only do one thing, such as after dinner to watch TV;
并行: a man in the same time period may be more or thing, for example, you can eat while watching TV;

In Python, 多线程 and  协程 although strictly speaking is a serial, but higher than the average efficiency of the implementation of the serial program very.
General serial program, when the program blocked and can only wait, can not do other things. Like, aired on the TV drama, into the advertising time, we can not take advantage of advertising time to eat a meal. For the program, this is clearly a very low efficiency, it is unreasonable.

Of course, after the completion of this course, we understand that the use of advertising time to do other things, flexible time schedule. This is why we 多线程and 协程 you want to help us to accomplish things, internal rational management tasks, making the program maximum efficiency.

Although  多线程 and  协程 quite a smart. But still not efficient enough, should be the most efficient multitasking, eat while watching television chatting. This is our  多进程 ability to do things.

To help you better understand more intuitive to find the two pictures on the Internet, to vividly explained the difference between multi-threaded and multi-process. (Invasion deleted)

  • 多线程Alternately executed, the serial on another meaning.

  • 多进程, In parallel, concurrent in the true sense.

. VS single-threaded multi-threaded multi-process VS

Text always pale, not as good as a thousand words a few lines of code come Kongwuyouli.

First of all, my test environment configuration is as follows

operating system The number of CPU cores Memory (G) hard disk
CentOS 7.2 24 nuclear 32 Mechanical hard drive

Note that
the following code, to understand, to have the knowledge that white points:

  1. Use the decorator
  2. Basic use multithreading
  3. The basic use of multiple processes

Of course, do not understand it does not matter, the main conclusion, allow everyone, multi-threaded, multi-process have a clear understanding of the general effect on the realization of the single-threaded, to achieve this effect, the mission of this article is finished, wait until the last , learn the complete series, may wish to come back around and understand there may be a more profound understanding.

Let us look, single-threaded, multi-threaded and multi-process, whether strong or weak in operation.

Before you start comparing, first define four types of scenarios

  • CPU-intensive computing
  • Disk IO intensive
  • IO-intensive network
  • Analog IO intensive []

Why are these types of scenarios, and this 多线程 多进程applies to a scene. In conclusion, let me explain.

# CPU计算密集型
def count(x=1, y=1):
    # 使程序完成150万计算
    c = 0
    while c < 500000:
        c += 1
        x += x
        y += y


# 磁盘读写IO密集型
def io_disk():
    with open("file.txt", "w") as f:
        for x in range(5000000):
            f.write("python-learning\n")


# 网络IO密集型
header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36'}
url = "https://www.tieba.com/"

def io_request():
    try:
        webPage = requests.get(url, headers=header)
        html = webPage.text
        return
    except Exception as e:
        return {"error": e}


# 【模拟】IO密集型
def io_simulation():
    time.sleep(2)

Competition indicators, we use the time to consider. The less time spent, the higher the efficiency.

For convenience, makes the code look more concise, I am here is to define a simple  时间计时器 decorator.
If you are not very understanding of the decorator, it does not matter, as long as you know what it is used to calculate a function of time on it.

def timer(mode):
    def wrapper(func):
        def deco(*args, **kw):
            type = kw.setdefault('type', None)
            t1=time.time()
            func(*args, **kw)
            t2=time.time()
            cost_time = t2-t1
            print("{}-{}花费时间:{}秒".format(mode, type,cost_time))
        return deco
    return wrapper

The first step, take a look at single-threaded

@timer("【单线程】")
def single_thread(func, type=""):
    for i in range(10):
              func()

# 单线程
single_thread(count, type="CPU计算密集型")
single_thread(io_disk, type="磁盘IO密集型")
single_thread(io_request,type="网络IO密集型")
single_thread(io_simulation,type="模拟IO密集型")

Look at the results

【单线程】-CPU计算密集型花费时间:83.42633867263794秒
【单线程】-磁盘IO密集型花费时间:15.641993284225464秒
【单线程】-网络IO密集型花费时间:1.1397218704223633秒
【单线程】-模拟IO密集型花费时间:20.020972728729248秒

The second step, look at the multi-threaded

@timer("【多线程】")
def multi_thread(func, type=""):
    thread_list = []
    for i in range(10):
        t=Thread(target=func, args=())
        thread_list.append(t)
        t.start()
    e = len(thread_list)

    while True:
        for th in thread_list:
            if not th.is_alive():
                e -= 1
        if e <= 0:
            break

# 多线程
multi_thread(count, type="CPU计算密集型")
multi_thread(io_disk, type="磁盘IO密集型")
multi_thread(io_request, type="网络IO密集型")
multi_thread(io_simulation, type="模拟IO密集型")

Look at the results

【多线程】-CPU计算密集型花费时间:93.82986998558044秒
【多线程】-磁盘IO密集型花费时间:13.270896911621094秒
【多线程】-网络IO密集型花费时间:0.1828296184539795秒
【多线程】-模拟IO密集型花费时间:2.0288875102996826秒

The third step is to look at the last multi-process

@timer("【多进程】")
def multi_process(func, type=""):
    process_list = []
    for x in range(10):
        p = Process(target=func, args=())
        process_list.append(p)
        p.start()
    e = process_list.__len__()

    while True:
        for pr in process_list:
            if not pr.is_alive():
                e -= 1
        if e <= 0:
            break

# 多进程
multi_process(count, type="CPU计算密集型")
multi_process(io_disk, type="磁盘IO密集型")
multi_process(io_request, type="网络IO密集型")
multi_process(io_simulation, type="模拟IO密集型")

Look at the results

【多进程】-CPU计算密集型花费时间:9.082211017608643秒
【多进程】-磁盘IO密集型花费时间:1.287339448928833秒
【多进程】-网络IO密集型花费时间:0.13074755668640137秒
【多进程】-模拟IO密集型花费时间:2.0076842308044434秒

Performance comparison of the results summary

The results are summarized it, tabulated.

kind CPU-intensive computing Disk IO intensive IO-intensive network Analog IO-intensive
Single-threaded 83.42 15.64 1.13 20.02
Multithreading 93.82 13.27 0.18 2.02
multi-Progress 9.08 1.28 0.13 2.01

We have to analyze this form.

First CPU密集型, to compare single-threaded multi-threaded, not only no advantage, apparently due to the ongoing global lock GIL lock release, switching threads and time-consuming, inefficient, and multi-process, because it is more than one CPU at the same time calculations , the equivalent of ten people do a man's job, apparently efficiency is growing exponentially.

Then the IO-intensive, IO密集型can be 磁盘IO, 网络IO, 数据库IOand so on, all belong to the same class, calculate the amount is very small, mostly a waste of time to wait for IO. Through observation, we can find our disk IO, IO data network, multi-threaded single-threaded comparison did not reflect the great advantage to. This is due to IO task of our program is not heavy, so the advantage is not obvious.

So I also added a " 模拟IO密集型," use sleepto simulate the IO wait time is to reflect the advantages of multi-threading, but also allows us a more intuitive understanding of multi-threaded process work. Each thread needs to be a single thread sleep(2), the thread is 10 20s, and multi-threading, in sleep(2)time, will switch to another thread, so that 10 threads at the same time sleep(2), it was only the final 10 threads 2s.

The following conclusions can be drawn

  • Always the slowest single-threaded, multi-process is always the fastest.
  • Suitable for use in multi-threaded IO-intensive scenes, such as reptiles, web site development
  • Suitable for use in multiple processes on the CPU calculates the required high operational scenarios, such as a large data analysis, machine learning
  • Although multi-process is always the fastest, but not necessarily the best choice, because it requires in order to realize the advantages of lower CPU resources to support
Published 91 original articles · won praise 47 · views 90000 +

Guess you like

Origin blog.csdn.net/qq_30007885/article/details/102565667