8 "Python program" timing execution methods

       In daily work, we often use tasks that need to be executed periodically. One way is to use the crond that comes with the Linux system combined with the command line, and the other way is to use Python directly .

Recently, I sorted out the implementation of Python timing tasks. It is recommended to learn after collection.

 

Use while True: + sleep() to implement timing tasks

The sleep(secs) function located in the time module can make the currently executing thread pause for secs seconds before continuing to execute. The so-called pause means that the current thread enters the blocked state. When the time specified by the sleep() function is reached, the blocked state is turned into a ready state, waiting for CPU scheduling.

Based on this feature, we can implement simple timing tasks by means of while infinite loop + sleep() .

Code example:

import datetime
import time
def time_printer():    
    now = 
datetime.datetime.now()    
    ts = 
now.strftime('%Y-%m-%d %H:%M:%S')    
    print('do func time :', ts)
def loop_monitor(): 
    while True:  
        time_printer() 
        time.sleep(5)  
# 暂停5秒
if __name__ ==
 "__main__":
    loop_monitor()

Main disadvantages:

Only the interval can be set, and the specific time cannot be specified, such as 8:00 every morning

sleep is a blocking function, that is to say, during sleep , the program cannot operate anything.

 

Run scheduled tasks using the Timeloop library

Timeloop is a library that can be used to run multi-period tasks. This is a simple library that uses the decorator pattern to run marker functions in threads.

Sample code:

import time
from timeloop import 
Timeloop
from datetime import 
timedelta
tl= Timeloop()
@tl.job(interval=timedelta(seconds=2))def 
sample_job_every_2s():
    print "2s job current time : 
{}".format(time.ctime())
@tl.job(interval=timedelta(seconds=5))def 
sample_job_every_5s():
    print "5s job current time : 
{}".format(time.ctime())
@tl.job(interval=timedelta(seconds=10))
def sample_job_every_10s():
    print "10s job current time : 
{}".format(time.ctime())

 

Use the built-in module sched to implement timing tasks

The sched module implements a general-purpose event scheduler, which uses a delay function in the scheduler class to wait for a specific time and execute tasks. At the same time, multi-threaded applications are supported, and the delay function will be called immediately after each task is executed to ensure that other threads can also execute.

class sched.scheduler(timefunc, delayfunc) This class defines a general interface for scheduling events. It needs to pass in two parameters from the outside. timefunc is a function that returns time type numbers without parameters ( commonly used such as time in the time module ) , delayfunc should be a function that requires one parameter to call, is compatible with the output of timefunc , and acts as a delay for multiple time units ( commonly used such as the sleep of the time module ) .

Code example:

import datetime
import time
import sched
def time_printer():
    now = datetime.datetime.now()
    ts = now.strftime('%Y-%m-%d %H:%M:%S')
    print('do func time :', ts)
    loop_monitor()
def loop_monitor():
    s = sched.scheduler(time.time, time.sleep)# 生成调度器
    s.enter(5, 1, time_printer, ())
    s.run()
    if __name__ == 
    "__main__":
    loop_monitor()

The main method of the scheduler object :

enter(delay, priority, action, argument) , schedule an event to delay delay time units.

cancel(event) : remove the event from the queue. This method will raise a ValueError if the event is not currently in the queue .

run() : Runs all scheduled events. This function will wait ( using the delayfunc() function passed to the constructor ) and then execute the event until there are no more scheduled events.

Personal comment: better than threading.Timer , no need for loop calls.

 

Use the scheduling module schedule to realize timing tasks

schedule is a third-party lightweight task scheduling module, which can execute time according to seconds, minutes, hours, dates or custom events. schedule allows users to run Python functions ( or other callable functions ) periodically at predetermined intervals using a simple, human-friendly syntax .

Let's look at the code first. Can you understand what it means without reading the document?

import schedule
import time
def job():
    print("I'm working...")
schedule.every(10).seconds.do(job)
schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every(5).to(10).minutes.do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
schedule.every().minute.at(":17").do(job)
while True:
    schedule.run_pending()
    time.sleep(1)

Decorator: Decorate static methods with @repeat()

import time
from schedule import 
every, repeat, run_pending
@repeat(every().second)
def job():
    print('working...')
while True:
    run_pending()
    time.sleep(1)

Pass parameters:

import schedule
def greet(name):
    print('Hello', name)
schedule.every(2).seconds.do(greet, name='Alice')
schedule.every(4).seconds.do(greet, name='Bob')
while True:
    schedule.run_pending()

Decorators can also pass parameters:

from schedule import
every, repeat, run_pending
@repeat(every().second, 'World')
@repeat(every().minute, 'Mars')def hello(planet):
    print('Hello', planet)while True:
    run_pending()

Cancel task:

import schedule
i = 0
def some_task():
    global i
    i += 1
    print(i)
    if i == 10:        
schedule.cancel_job(job)
        print('cancel job')        exit(0)
job = schedule.every().second.do(some_task)
while True:
    schedule.run_pending()

Run the task once:

import time
import schedule
def job_that_executes_once():
    print('Hello')
    return 
schedule.CancelJob
schedule.every().minute.at(':34').do(job_that_executes_once)
while True:
    schedule.run_pending()
    time.sleep(1)

Retrieve tasks by label:

# 检索所有任务:
schedule.get_jobs()import schedule
def greet(name):
    print('Hello {}'.format(name))schedule.every().day.do(greet, 'Andrea').tag('daily-tasks', 'friend')
schedule.every().hour.do(greet, 'John').tag('hourly-tasks', 'friend')
schedule.every().hour.do(greet, 'Monica').tag('hourly-tasks', 'customer')
schedule.every().day.do(greet, 'Derek').tag('daily-tasks', 'guest')
friends = schedule.get_jobs('friend')print(friends)

Cancel tasks based on tags:

# 取消所有任务:
schedule.clear()import schedule
def greet(name):
    print('Hello {}'.format(name))
    if name == 'Cancel':
        schedule.clear('second-tasks')
        print('cancel second-tasks')
schedule.every().second.do(greet, 'Andrea').tag('second-tasks', 'friend')
schedule.every().second.do(greet, 'John').tag('second-tasks', 'friend')
schedule.every().hour.do(greet, 'Monica').tag('hourly-tasks', 'customer')
schedule.every(5).seconds.do(greet, 'Cancel').tag('daily-tasks', 'guest')
while True:
    schedule.run_pending()

Run a task until a certain time:

import schedule from datetime import datetime, timedelta, time
def job():
    print('working...')
schedule.every().second.until('23:59').do(job)  # 今天23:59停止schedule.every().second.until('2030-01-01 18:30').do(job)  # 2030-01-01 18:30停止schedule.every().second.until(timedelta(hours=8)).do(job)  # 8小时后停止schedule.every().second.until(time(23, 59, 59)).do(job)  # 今天23:59:59停止schedule.every().second.until(datetime(2030, 1, 1, 18, 30, 0)).do(job)  # 2030-01-01 18:30停止
while True:
    schedule.run_pending()

Run all tasks at once (mainly for testing):

import schedule
def job():
    print('working...')
def job1():
    print('Hello...')
schedule.every().monday.at('12:40').do(job)
schedule.every().tuesday.at('16:40').do(job1)
schedule.run_all()schedule.run_all(delay_seconds=3)  # 任务间延迟3秒

Running in parallel: implemented using Python's built-in queue:

import threadingimport timeimport schedule
def job1():
    print("I'm running on thread %s" % threading.current_thread())
def job2():
    print("I'm running on thread %s" % threading.current_thread())
def job3():
    print("I'm running on thread %s" % threading.current_thread())
def run_threaded(job_func):
    job_thread = threading.Thread(target=job_func)    job_thread.start()schedule.every(10).seconds.do(run_threaded, job1)schedule.every(10).seconds.do(run_threaded, job2)schedule.every(10).seconds.do(run_threaded, job3)
while True:
    schedule.run_pending()
    time.sleep(1)

 

 Use the task framework APScheduler to realize timing tasks

APScheduler ( advanced python scheduler ) is a Python timed task framework based on Quartz , which implements all the functions of Quartz and is very convenient to use. Provides tasks based on dates, fixed time intervals, and crontab types, and can persist tasks. Based on these functions, we can easily implement a Python timing task system.

It has the following three characteristics:

Scheduler similar to Liunx Cron ( optional start / end time )

Interval based execution scheduling ( periodic scheduling, optional start / end time )

One-shot tasks ( run the task once at a set date / time )

APScheduler has four components:

Trigger (trigger)  contains the scheduling logic, each job has its own trigger, which is used to determine which job will run next. Aside from their own initial configuration, triggers are completely stateless. The job store (job store)  stores the scheduled jobs. The default job store is to simply save the job in memory, and the other job stores are to save the job in the database. A job's data is serialized when saved in the persistent job store, and deserialized when loaded. Schedulers cannot share the same job store.

Executors handle the execution of jobs, usually by submitting callable objects specified in the job to a thread or pool When the job is complete, the executor will notify the scheduler.

The scheduler  is the other component. You usually only have one scheduler in your application, and application developers usually don't deal directly with job storage, schedulers, and triggers, instead, the scheduler provides a suitable interface to handle these. Configuring job storage and executors can be done in the scheduler, such as adding, modifying and removing jobs. By configuring executor , jobstore , and trigger , use thread pool (ThreadPoolExecutor default value 20) or process pool (ProcessPoolExecutor default value 5) and default up to 3 ( max_instances) task instances to run at the same time to achieve scheduling control such as adding, deleting, modifying and checking jobs

 Sample code:

from apscheduler.schedulers.blocking import BlockingScheduler
from datetime import datetime# 输出时间
def job():
    print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))# Blocking
    Schedulersched = Blocking
    Scheduler()sched.add_job(my_job, 'interval', seconds=5, id='my_job_id')sched.start()

 

Using distributed message system Celery to implement timing tasks

Celery is a simple, flexible, and reliable distributed system for processing large volumes of messages, while providing operations with the tools needed to maintain such systems , and also for task scheduling. The configuration of Celery is cumbersome. If you just need a lightweight scheduling tool, Celery will not be a good choice.

Celery is a powerful distributed task queue, which allows the execution of tasks to be completely separated from the main program, and can even be assigned to run on other hosts. We usually use it to implement asynchronous tasks ( async task ) and timing tasks ( crontab ). Asynchronous tasks are time-consuming operations such as sending emails, uploading files , image processing, etc. Timing tasks are tasks that need to be executed at a specific time.

It should be noted that celery itself does not have the task storage function. When scheduling tasks, it must be stored. Therefore, when using celery , it is necessary to use some tools with storage and access functions, such as: message queue, Redis cache, database, etc. The official recommendation is the message queue RabbitMQ , and sometimes using Redis is also a good choice.

Its architecture is composed as follows:

 Celery architecture, which adopts a typical producer - consumer model, mainly consists of the following parts:

Celery Beat , the task scheduler, the Beat process will read the content of the configuration file, and periodically send the tasks that need to be executed in the configuration to the task queue.

Producer : Tasks that need to be performed in the queue, usually enqueued by users, triggers or other operations, and then handed over to workers for processing. Task producers call APIs , functions or decorators provided by Celery to generate tasks and hand them over to the task queue for processing.

Broker , that is, message middleware, refers to the task queue itself, Celery plays the role of producer and consumer, and brokers are the places ( queues ) where producers and consumers store / obtain products .

Celery Worker , the consumer who executes the task, takes the task from the queue and executes it. Usually multiple consumers are run on multiple servers to improve execution efficiency.

Result Backend : After the task is processed, the status information and results are saved for query. Celery already supports Redis , RabbitMQ , MongoDB , Django ORM , SQLAlchemy and other methods by default.

In practical applications, when a user initiates a request from the web front end, we only need to throw the task to be processed into the task queue broker , and let the idle worker process the task, and the processing result will be temporarily stored in the background database backend . We can start multiple worker processes on one machine or multiple machines at the same time to realize distributed parallel processing tasks.

Celery timing task instance:

Python Celery & RabbitMQ Tutorial

Celery configuration practice notes

 

Use the data flow tool Apache Airflow to implement timing tasks

Apache Airflow is a data flow tool open sourced by Airbnb and is currently an Apache incubation project. It supports the ETL process of data in a very flexible way , and also supports a lot of plug-ins to complete functions such as HDFS monitoring and email notification. Airflow supports stand-alone and distributed modes, supports Master-Slave mode, supports resource scheduling such as Mesos , and has very good scalability. Used by a large number of companies.

Airflow is developed using Python . It uses DAGs (Directed Acyclic Graph ) to express the tasks to be performed in a workflow, as well as the relationships and dependencies between tasks. For example, in the following workflow, task T1 can only be executed before T2 and T3 can be executed, and T2 and T3 can only be executed before T4 can be executed.

 Airflow provides a variety of Operator implementations that can accomplish various tasks:

BashOperator – Execute bash commands or scripts.

SSHOperator – Execute remote bash commands or scripts (similar to the paramiko module).

PythonOperator – executes a Python function.

EmailOperator – send Email .

HTTPOperator – sends an HTTP request.

MySqlOperator, SqliteOperator, PostgresOperator, MsSqlOperator, OracleOperator, JdbcOperator, etc., execute SQL tasks.

DockerOperator, HiveOperator, S3FileTransferOperator, PrestoToMysqlOperator, SlackOperator…

In addition to the above Operators, you can also easily customize Operators to meet individual task requirements.

In some cases, we need to perform different tasks according to the execution results, so that the workflow will generate branches. Such as :

 This requirement can be achieved using BranchPythonOperator .

< END >

Guess you like

Origin blog.csdn.net/Rocky006/article/details/130685687