[Crawler Series] How does Python realize the progress bar effect?

1. Demand

In the process of crawling data, I found that if you don’t read the output log, you don’t know the current crawling progress, and it’s not convenient to judge simply by the console output log information. Therefore, I found a way to add a progress bar to the crawling process to display the current crawling progress in real time.

With this demand and idea, how to realize it? Currently, there are two solutions for displaying the progress bar. One is to use built-in Python modules, such as the time module; the other is to introduce third-party dedicated modules, such as the tqdm module and the alive-progress module.

2. The built-in module realizes the progress bar effect

1. Simple progress bar

import sys, time

def test_simple():
    for i in range(1, 101):
        print("\r", end="")
        print(f"当前爬取进度:{i}%:", "▋" * (i // 2), end="")
        sys.stdout.flush()
        time.sleep(0.05)

Test results:

 2. With time progress bar

import time

def test_with_time():
    scale = 50
    start = time.perf_counter()
    for i in range(scale + 1):
        progress = "▋" * i
        point = "." * (scale - i)
        c = (i / scale) * 100
        during = time.perf_counter() - start
        print("\r{:^3.0f}%【{}->{}】{:.2f}s".format(c, progress, point, during), end="")
        time.sleep(0.1)

Test results:

3. The third-party module realizes the progress bar effect

1. tqdm module

tqdm is a module specially used to quickly generate progress bars, download this module before using:

pip install tqdm
Implementation code:
import time
from tqdm import tqdm

def test_tqdm():
    # tqdm构造器内放入可迭代的对象
    for i in tqdm(range(1, 101)):
        # do somethings
        time.sleep(0.1)

Test results:

2. alive-progress module

Official website description: alive-progress is a progress bar tool that displays progress in real time and has very cool animation effects.

Download the module first:

pip install alive-progress

Implementation code:

from alive_progress import alive_bar

def test_alive_progress(task_num, totals, sleep_time):
    for i in range(task_num):  # 定义任务数
        with alive_bar(totals, bar='blocks', title=f'Task {i + 1}') as bar:
            for i in range(totals):
                time.sleep(sleep_time)
                bar()

test_alive_progress(5, 150, 0.02)

Test results:

Referring to the documentation on the official website, it is said that there will be animation effects during the running process (didn't see it during the actual operation?)! !

3. Other modules (understand)

3.1 progress module : Easy progress reporting for Python!

Official website address: progress · PyPI

3.2 PySimpleGUI module : It is a GUI-based display tool with powerful functions and can be used for progress bar display.

 

Official website address:

PySimpleGUI · PyPI

Four. Summary

In fact, the third-party module implements the function of displaying the progress bar, and the bottom layer is also based on the second method of the built-in module. Therefore, it is not intended to use a third-party library to implement the progress bar, but to implement the built-in module, choose the second one, and embed it into the code.

Next, modify the method operate_selenium() that I used to crawl weather data, the code is as follows:

def common_selenium(url, citys, path):
    scale = len(citys)
    start = time.perf_counter()  # 起始计时点
    for i in range(len(citys) + 1):
        # 进度条相关计算
        progress = "▋" * i
        point = "." * (scale - i)
        c = (i / scale) * 100
        # 最后一次列表不存在元素,避免异常,不执行
        if i == scale:
            pass
        else:
            browser = webdriver.Chrome()  # 使用谷歌浏览器
            browser.maximize_window()  # 窗口最大化
            browser.get(url)  # get请求天气网地址
            time.sleep(2)
            # 搜索指定城市天气
            today_weathers = operate_selenium(browser, citys, pos=i, path=path)
            # print(f'city:{citys[i]},today_weathers:{today_weathers}')
            write_log(today_weathers, citys[i], path)
            # 休眠5s,再关闭浏览器
            time.sleep(5)
            browser.quit()
        # 进度条实时显示
        during = time.perf_counter() - start
        print("\r{:^3.0f}%[{}->{}]{:.2f}s".format(c, progress, point, during), end="")
        time.sleep(0.01)

Crawl the list of cities in a loop. When the weather information of a certain city is crawled, the progress of weather data acquisition is as follows:

25 %[▋->...]34.66s

50 %[▋▋->..]51.63s

100%[▋▋▋▋->]69.58s

The progress is much clearer now~

Guess you like

Origin blog.csdn.net/qq_29119581/article/details/128800272