Python distributed and parallel asyncio implements producer-consumer model

Python distributed and parallel asyncio implements producer-consumer model


Due to the epidemic, this semester is exceptionally short. The more it expires, the more precious time becomes, and learning efficiency is also particularly important! This distributed parallel experiment is also the last distributed parallel series of jobs. It needs to implement the producer-consumer model. For a python enthusiast, I cannot avoid the barrier of asynchronous programming and learning the asyncio library. This article aims to share my realization process and knowledge exchange.

1. Topic introduction

Insert picture description here

This experiment allows everyone to experience the producer-consumer model and parallelize it.

1. Producer

Randomly generate a positive integer greater than 2 billion

2. Consumers

Determine whether a number is prime (determine the prime number algorithm yourself Baidu)

3. Buffer

Use queue (FIFO)

basic requirements:

The first experiment: Use 1 producer to generate 1,000,000 (1 million) numbers as required, and at the same time, a consumer judges whether the generated numbers are prime. Note the time required.

The second experiment: Use 1 producer to generate 1,000,000 (1 million) numbers as required, and at the same time, 2 consumers judge whether the generated numbers are prime. Note the time required.

The third experiment: Use 2 producers to generate 1,000,000 (1 million) numbers as required, while 4 consumers judge whether the generated numbers are prime. Note the time required.

2. Topic analysis

For the whole, if you query online information, you will know that there are two options for implementing the producer-consumer model:
1. Use the yield statement (advantages: simple to implement, disadvantages: usage is too old and not suitable for this question)
2. Use asyncio asynchronous programming (advantages) : Fashionable and advanced, disadvantages: relatively high learning costs)
Obviously there is only a second way to go.
For the first requirement, there is no upper bound, so set it by yourself. Python comes with large integer storage, so use the random module to generate random integers. For
the second requirement, it is recommended to use efficient algorithms to improve operating efficiency. For
the third requirement , Due to the characteristics of asyncio itself, the implementation of the coroutine requires the use of queues to exchange data, so directly use the Queue class in the asyncio module.
On the whole, the core is to master the entry-level asynico asynchronous programming

Third, the introductory learning of asyncio module

1. The origin of knowledge-"Smooth Python"

Concurrency refers to processing multiple things at once.
Parallel means doing multiple things at once.
The two are different, but there is a connection.
One is about structure and one is about execution.
Concurrency is used to formulate plans to solve possible (but not necessarily) parallel problems.
——One of the creators of Rob Pike Go language
True parallel requires multiple cores. Modern laptops have 4 CPU cores, but
usually more than 100 processes are running at the same time inadvertently. Therefore, in fact, most processes
are processed concurrently, rather than in parallel. The computer is always running more than 100 processes to
ensure that each process has a chance to make progress, but the CPU itself cannot do more
than four things at the same time . The equipment used ten years ago can also handle 100 processes concurrently, but they are all in the same
core. In view of this, Rob Pike named that speech "Concurrency Is Not
Parallelism (It's Better)" [" Concurrency Is Not Parallelism (It's Better)"].

What is said in this book can first lead you to understand the status of asyncio and appreciate the essential difference between concurrency and parallelism. Since the volume of this book is too large, interested readers can read it. This is equivalent to a wedge.
2. Retrieval and query of online materials.
Recently, students around me have reported that there are too few high-quality and directly transplantable codes on the Internet. Most of the articles are very similar and have simple core knowledge, plus some search engine The search ability is worrying, which increases the difficulty for us to obtain effective knowledge.
Here is an article that I think is highly reusable: The
python asyncio producer consumer model implements
the construction of its classes, the consumer loop termination exception handling, and the use of the asynico.wait_for() method inspired me a lot.
3. Quick tutorials on Xiaopo Station.
As we all know, Station B is a learning website. The above high-quality courses are too numerous to list. Since we want to make them fast, we can’t choose those videos that are too long (doge). Search for "python asynico". Heap of videos.
Just because I looked at you more in the crowd, I decided it was you-Crayon Shin-Chan was
Insert picture description here
right because it was streamlined and the overall ranking was high, and I didn't disappoint when I clicked in. At the beginning of the lecture, the teacher directly uttered the fragrance of my heart. The examples given can help you further understand how to use the asynico module to program.
Below is part of the code in the class for readers to learn:

import asyncio  # 导入模块


# 协程(协程不是函数)
async def print_hello():
    while True:
        print("hello world")
        await asyncio.sleep(1)  # 协程暂停1秒


async def print_goodbye():
    while True:
        print("goodbye world")
        await asyncio.sleep(2)  # 协程暂停2秒


# 创建协程对象
co1 = print_hello()
co2 = print_goodbye()
# 获取事件循环
loop = asyncio.get_event_loop()  # epoll
loop.run_until_complete(asyncio.gather(co1, co2))  # 监听事件循环

The two main keywords of asynico programming are async and await, which represent the declaration of coroutine and coroutine blocking, respectively. When you want to perform one job at the same time and want to perform another job without affecting the existing work, you can consider using asynchronous programming. Coroutines in. In the example, the two coroutines print logs alternately according to the time interval without interfering with each other.
The following example uses simulated asynchronous crawlers as an example to introduce another method of managing coroutines:

import asyncio
import random

"""
—————模拟异步网络爬虫——————
需求:有一个crontab调度器,每隔一秒,拉起一个job,
要求这些job可以并发爬取网页
"""


async def cro_scheduler():
    page = 1
    while True:
        url = '{}/{}'.format('https://www.xxx.com', page)
        job = cron_job(url)  # 必须有新协程分离出去,让它和当前协程并发
        asyncio.create_task(job)  # 注册到事件循环
        await asyncio.sleep(0)  # 这里不是阻塞,而是主动让度线程,可以让job打印日志
        page += 1


async def cron_job(url):
    tick = random.randint(1, 3)  # 模拟下载延迟
    await asyncio.sleep(tick)  # 阻塞协程,模拟下载
    print("下载结束:", url)


if __name__ == '__main__':
    asyncio.run(cro_scheduler()) # 开启调度器,硬性要求,这步是必须的

After running this example, you will find how fascinating asynchronous programming is, and it makes you want to stop!
4. Trump Card-Official Python Documents
Earlier, although the teacher repeatedly mentioned querying official documents for programming learning, I was young and impatient and couldn't calm down to chew on this hard bone. I have to say that the official document is the last hole card, and this hole card happens to be an Ace, follow the official Python document
Insert picture description here

Never get lost, you still need to read when you have time. There are also syntax explanations for different python versions in this document. You can choose your own version to learn. Here I choose the python3.7 series.
When writing this case, I also inquired about the asynico-related method parameter list. Just enter the method name you want to query in the corresponding interface Ctrl+F to quickly find and learn. Is it efficient to look for purposelessly?
5. How to efficiently judge whether a number is a prime number. As
soon as you mention prime numbers, you think of the Eratosni sieve method, but this is too simple, and if there are too many iterations, it will not be considered for the time being. This part is really handy. I searched a lot on the Internet. Here I refer to the
judgment of whether a number is a prime number/prime number-from ordinary judgment algorithm to efficient judgment algorithm idea.
The time complexity of this algorithm is O(sqrt( n)), it can be said to be efficient.

Fourth, the implementation code

It's finally the most exciting time, let's see how I achieved it!

# -*- coding:utf8 -*-
import time
import random
import asyncio
import math


def isPrime(num: int):
    if num == 2 or num == 3:
        return True
    if num % 6 != 1 and num % 6 != 5:
        return False
    for i in range(5, int(math.sqrt(num)) + 1, 6):
        if num % i == 0 or num % (i + 2) == 0:
            return False
    return True


def big_number():
    # 生成大于20亿的随机数,上限自定义
    return random.randint(2 * 10 ** 10, 2 * 10 ** 15)


class Producer_Consumer_Model:
    def __init__(self, c_num=1, p_num=1, size=1000000, is_print=False):
        """
        生产者消费者模型
        :param c_num: 消费者个数
        :param p_num: 生产者个数
        :param size: 需要处理的数据大小
        :param is_print: 是否打印日志
        """
        self.consumer_num = c_num
        self.producer_num = p_num
        self.size = size
        self.print_log = is_print

    async def consumer(self, buffer, name):
        for _ in iter(int, 1):  # 死循环,秀一波python黑魔法
            try:
                # 从缓冲区取数,如果超过设定时间取不到数则证明协程任务结束
                value = await asyncio.wait_for(buffer.get(), timeout=0.5)
                if isPrime(value):
                    if self.print_log:
                        print('[{}]{} is Prime'.format(name, value))
                else:
                    if self.print_log:
                        print('[{}]{} is not Prime'.format(name, value))
            except asyncio.TimeoutError:
                break
            await asyncio.sleep(0)

    async def producer(self, buffer, name):
        for i in range(self.size // self.producer_num):  # 将处理数据总数按生产者个数进行切分
            big_num = big_number()  # 生成大随机数
            await buffer.put(big_num)  # 放入缓冲区
            if self.print_log:
                print('[{}] {} is Produced'.format(name, big_num))
            await asyncio.sleep(0)

    async def main(self):
        buffer = asyncio.Queue()  # 定义缓冲区
        jobers = []  # 工作列表
        # 将生成者和消费者都加入工作列表
        for i in range(self.consumer_num):
            # 给消费者传入公共缓冲区和该消费者名字
            jobers.append(asyncio.create_task(self.consumer(buffer, 'Consumer' + str(i + 1))))
        for i in range(self.producer_num):
            # 给消费者传入公共缓冲区和该消费者名字
            jobers.append(asyncio.create_task(self.producer(buffer, 'Producer' + str(i + 1))))

        for j in jobers:
            # 打工人开始上班了
            await asyncio.gather(j)


if __name__ == '__main__':
    start_time = time.perf_counter()  # 时间计数
    pc_model = Producer_Consumer_Model(c_num=2, p_num=2, size=100, is_print=False)
    asyncio.run(pc_model.main())  # 开启协程服务
    end_time = time.perf_counter()
    print("此次程序耗时:【{:.3f}】秒 ".format(end_time - start_time))

There are a few details to pay attention to. The asynico.run in the main function entry is mandatory; it is possible to pass the coroutine tasks to asyncio.gather() in an iterative manner, await asyncio.wait_for(buffer.get( ), timeout=0.5), the timeout size is determined by yourself, try not to set too small, otherwise you can’t guarantee that you will exit the program because the buffer is temporarily idle and cause consumers to hunger. In fact, it still depends on the program size and calculation algorithm. Here It just provides an overall framework and ideas, and it's just like by analogy.
I'm glad to finish it just before midnight.

PS: I sincerely thank the creators of the articles and videos mentioned in this article for their support and inspiration!
Readers are welcome to communicate with me in the comment area! Here humbly accept the criticisms and corrections from various bigwigs.

Guess you like

Origin blog.csdn.net/weixin_43594279/article/details/111243453