Several common methods of Python crawler scheduled tasks

Preface

The text and pictures in this article are from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us for processing.

PS: If you need Python learning materials, you can click on the link below to get it yourself

Python free learning materials and group communication answers Click to join


Remember that the previous Windows task timing can be used normally. I tried it today and found that it cannot be used normally, and the task schedule always hangs. Next, record several solutions for Python crawler timing tasks.

Method one, while True

The easiest first is that the while true loop hangs, don’t talk nonsense, just go to the code:

import os
import time
import sys 
from datetime import datetime, timedelta
def One_Plan():
     # 设置启动周期
     Second_update_time = 24 * 60 * 60

    # 当前时间
    now_Time = datetime.now()
    # 设置 任务启动时间
    plan_Time = now_Time.replace(hour=9, minute=0, second=0, microsecond=0)  
    # 设置差值,-1 day, 21:48:53.246576,类似于这样
    # time.sleep()需要传入int,所以下面使用.total_seconds() 
    # 主要用来计算差值,返回int,具体功能可以自行查阅相关资料
    delta = plan_Time - now_Time
    first_plan_Time = delta.total_seconds() % Second_update_time
    print("距离第一次执行需要睡眠%d秒" % first_plan_Time)
    return first_plan_Time

# while Ture代码块,挂起程序,睡眠时间结束后调用函数名进行执行
while True:

    s1 = One_Plan()
    time.sleep(s1)
    # 下面这里是自己定义的函数,想跑代码的可以换成hellow world函数或者注释掉这行测试下
    exe_file(D_list)
    print("正在执行首次更新程序")

I personally feel that using this method to start a timed plan is no problem if it is a single program and it is executed once a day. If you want to take into account the execution of multiple tasks a day and need to execute multiple times a day, the shortcomings will be highlighted.

There are many factors that need to be considered in the work situation. For example, the crawler program needs to be executed four times at 12 o'clock in the evening, 6 o'clock in the morning, 9 o'clock in the morning, and 3 o'clock in the afternoon, and 4 crawlers need to be executed at the same time. You also need to consider whether the network is stable. , What to do if the program hangs, etc.

Method two, Timer module

I talked about the simplest timing start before, it can be said to be the simplest and rude, life is short, python is elegant, is there a kind that is very simple and simple, and can be done in a few lines of code? Must have! To give a brief example, at the end of the previous method mentioned that other factors need to be taken into account, here comes:

Now you need to start a selenium crawler, using Firefox driver + multi-threading. Everyone understands that now the computer manager shows the CPU occupancy rate is 20%. After starting selenium, it keeps opening the browser + multi-threading. Okay, within 5 minutes, The CPU occupancy rate is directly pulled to 90%+, the computer is stuck in the fly, although the timing program is still running, but it is already similar to the standby state, do you suddenly feel that the computer is stuck, the first reaction: fucking, this lj Computer, no matter how the program can't run, I still write so much code, *****! !

Yes, please read the relevant information yourself for further study on the code and specific functions:

from datetime import datetime
from threading import Timer
import time

# 定时任务
def task():
    print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

def timedTask():
    '''
    第一个参数: 延迟多长时间执行任务(秒)
    第二个参数: 要执行的函数
    第三个参数: 调用函数的参数(tuple)
    '''
    Timer(5, task, ()).start()

while True:
    timedTask()
    time.sleep(5)

7 lines of code, is it very elegant? The main thing that is excellent and inelegant is that there is less code and no effort, right?

2020-06-05 14:06:39
2020-06-05 14:06:44
2020-06-05 14:06:49
2020-06-05 14:06:54
2020-06-05 14:06:59
2020-06-05 14:07:04
2020-06-05 14:07:09
2020-06-05 14:07:14
2020-06-05 14:07:19
2020-06-05 14:07:24

Method three, sched module

This time directly on the module-sched module

code show as below:

from datetime import datetime
import sched
import time


def timedTask():
    # 初始化 sched 模块的 scheduler 类,传入(time.time, time.sleep)这两个参数
    scheduler = sched.scheduler(time.time, time.sleep)
    # 增加调度任务,enter(睡眠时间,执行级别,执行函数)
    scheduler.enter(5, 1, task)
    # 运行任务
    scheduler.run()

# 定时任务
def task():
    print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

if __name__ == '__main__':
    timedTask()

This module is also very easy to use. It should be noted that scheduler() will only be executed once to end the program. You can add while Ture under mian or directly add scheduling tasks in timeTask. In addition to this way of writing, there is another Kind of writing, the code:

import schedule
import time

def hellow():
    print('hellow')

def Timer():
    schedule.every().day.at("09:00").do(hellow)
    schedule.every().day.at("18:00").do(hellow)

    while True:
        schedule.run_pending()

        time.sleep('需要睡眠的周期')


Timer()

Here you can see that there is day-hour-minute, which is very convenient for timing tasks. Add the time to sleep in while True, and add the number of times to be executed in the function module.

Guess you like

Origin blog.csdn.net/pythonxuexi123/article/details/112838112