Using Tqdm with Asyncio in Python

Make a fortune with your little hand, give it a thumbs up!

Introduction

troubled

It is not uncommon for data scientists to use concurrent programming in Python to increase productivity. It's always satisfying to watch various child processes or concurrent threads in the background to keep my computation or IO bound tasks in order.

But another thing that bothers me is that when I concurrently process hundreds or thousands of files or execute hundreds or thousands of processes in the background, I always worry that a few tasks will hang up secretly, and the entire code will run forever not finished. I'm also having a hard time knowing where the code is executing right now.

Worst of all, when I'm looking at a blank screen, it's hard to tell how long my code will take to execute or what the ETA is. This is very detrimental to my ability to organize my work schedule.

Therefore, I want a way for me to know where the code execution is going.

Existing method

The more traditional method is to share a memory area between tasks, put a counter in this memory area, let this counter +1 when a task ends, and then use a thread to print the value of this counter continuously.

This is never a good solution: on the one hand, I need to add a piece of code for counting to your existing business logic, which violates the principle of "low coupling, high cohesion". On the other hand, I have to be very careful with locking mechanisms due to thread safety issues, which cause unnecessary performance issues.

tqdm

alt

One day, I discovered the tqdm library, which uses a progress bar to visualize the progress of my code. Can I use a progress bar to visualize the completion and ETA of my asyncio tasks?

Then in this article [1] , I will share this method with everyone, so that every programmer has the opportunity to monitor the progress of his concurrent tasks.

asynchronous

Before we get started, I want you to have some background on Python asyncio. My article describes the usage of some common APIs of asyncio [2] , which will help us better understand the design of tqdm:

alt

tqdm overview

As stated on the official site, tqdm is a tool that displays a circular progress bar. It is easy to use, highly customizable and has a low resource footprint.

A typical usage is to pass an iterable to the tqdm constructor, and you'll get a progress bar like this:

from time import sleep
from tqdm import tqdm


def main():
    for _ in tqdm(range(100)):
        # do something in the loop
        sleep(0.1)


if __name__ == "__main__":
    main()

Or you can manually browse and update the progress bar's progress while reading the file:

import os
from tqdm import tqdm


def main():
    filename = "../data/large-dataset"
    with (tqdm(total=os.path.getsize(filename)) as bar,
            open(filename, "r", encoding="utf-8"as f):
        for line in f:
            bar.update(len(line))


if __name__ == "__main__":
    main()
alt

Integrate tqdm with async

Overall, tqdm is very easy to use. However, more information on integrating tqdm with asyncio is needed on GitHub. So I dug into the source code to see if tqdm supports asyncio.

Fortunately, recent versions of tqdm provide the package tqdm.asyncio, which provides the class tqdm_asyncio.

The tqdm_asyncio class has two related methods. One is tqdm_asyncio.as_completed. As can be seen from the source code, it is a wrapper for asyncio.as_completed:

@classmethod
    def as_completed(cls, fs, *, loop=None, timeout=None, total=None, **tqdm_kwargs):
        """
        Wrapper for `asyncio.as_completed`.
        """

        if total is None:
            total = len(fs)
        kwargs = {}
        if version_info[:2] < (310):
            kwargs['loop'] = loop
        yield from cls(asyncio.as_completed(fs, timeout=timeout, **kwargs),
                       total=total, **tqdm_kwargs)

The other is tqdm_asyncio.gather, as can be seen from the source code, it is based on the implementation of tqdm_asyncio.as_completed that simulates the function of asyncio.gather:

@classmethod
    async def gather(cls, *fs, loop=None, timeout=None, total=None, **tqdm_kwargs):
        """
        Wrapper for `asyncio.gather`.
        """

        async def wrap_awaitable(i, f):
            return i, await f

        ifs = [wrap_awaitable(i, f) for i, f in enumerate(fs)]
        res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
                                                 total=total, **tqdm_kwargs)]
        return [i for _, i in sorted(res)]

So, next, I will describe the usage of these two APIs. Before we start, we need to do some preparatory work. Here I wrote a simple method to simulate a concurrent task with random sleep times:

import asyncio
import random

from tqdm.asyncio import tqdm_asyncio


class AsyncException(Exception):
    def __int__(self, message):
        super.__init__(self, message)


async def some_coro(simu_exception=False):
    delay = round(random.uniform(1.05.0), 2)

    # We will simulate throwing an exception if simu_exception is True
    if delay > 4 and simu_exception:
        raise AsyncException("something wrong!")

    await asyncio.sleep(delay)

    return delay

Next, we'll create 2000 concurrent tasks, and then use tqdm_asyncio.gather instead of the familiar asyncio.gather method to see if the progress bar is working:

async def main():
    tasks = []
    for _ in range(2000):
        tasks.append(some_coro())
    await tqdm_asyncio.gather(*tasks)

    print(f"All tasks done.")


if __name__ == "__main__":
    asyncio.run(main())
alt

Or let's replace tqdm_asyncio.gather with tqdm_asyncio.as_completed and try again:

async def main():
    tasks = []
    for _ in range(2000):
        tasks.append(some_coro())

    for done in tqdm_asyncio.as_completed(tasks):
        await done

    print(f"The tqdm_asyncio.as_completed also works fine.")


if __name__ == "__main__":
    asyncio.run(main())
alt

Reference

[1]

Source: https://towardsdatascience.com/using-tqdm-with-asyncio-in-python-5c0f6e747d55

[2]

asynchronous:https://github.com/Jwindler/Ice_story

This article is published by mdnice multi-platform

Guess you like

Origin blog.csdn.net/swindler_ice/article/details/130569351