Detailed explanation of usage of APScheduler library for scheduled tasks in Python

In daily work, tasks that need to be executed periodically are often used. One way is to use the crond that comes with the Linux system combined with the command line to implement it. Another way is to use Python directly.

When a program needs to be executed every once in a while, or a certain task needs to be executed in a recurring cycle, it is necessary to use a scheduled task to execute the program. The commonly used scheduled tasks in Python mainly include the following 8 methods:

  1. while True:+sleep()
  2. threading.Timer timer
  3. Timeloop library performs scheduled tasks
  4. Scheduling module sched
  5. Scheduling module schedule
  6. Task framework APScheduler
  7. Distributed messaging system celery executes scheduled tasks
  8. Use the scheduled tasks that come with Windows

For details on the above 8 usage methods, please see the blog: Detailed explanation of 8 ways to implement scheduled tasks in python

In this article, we mainly explain the sixth method, which is how to use the task framework APScheduler.

1. Introduction to APScheduler

APScheduler is a scheduled task framework in Python, used to execute periodic or scheduled tasks. This framework can not only add and delete scheduled tasks, but also store tasks in the database to achieve task persistence, which is very convenient to use.

APscheduler, the full name of Advanced Python Scheduler, is used to execute specified jobs at specified time rules. It is a Python scheduled task framework based on Quartz, which implements all the functions of Quartz and is very convenient to use. Provides tasks based on date, fixed time interval and crontab type, and can persist tasks.

2. APScheduler library installation

First install the apscheduler library:

pip install apscheduler

Insert image description here

3. APScheduler composition

  • Trigger : Contains scheduling logic. Each job has its own trigger, which is used to determine which job will run next. Apart from their own initial configuration, triggers are completely stateless.
  • Job store (job store) : stores scheduled jobs. The default job store simply saves the job in memory, and other job stores save the job in the database. A job's data will be serialized when saved in the persistent job store, and deserialized when loaded. Schedulers cannot share the same job store.
  • Executor : handles the running of a job. They usually do this by submitting a specified callable object to a thread or pool in the job. When the job is completed, the executor will notify the scheduler.
  • Scheduler : other components. Usually in applications with only one scheduler, application developers usually do not deal directly with job storage, schedulers, and triggers. Instead, the scheduler provides appropriate interfaces for handling these. Configuring job storage and executors can be done in the scheduler, such as adding, modifying and removing jobs.

3.1 Trigger

Contains scheduling logic, each job has its own trigger, used to determine which job will run next. Apart from their own initial configuration, triggers are completely stateless.

APScheduler has three built-in triggers:

  • date: Triggered at a specific time point
  • interval: trigger at a fixed time interval
  • cron: Trigger periodically at a specific time

Simple understanding: The trigger is based on the triggering method you specify, such as whether it is triggered according to time interval, or according to cron, what are the trigger conditions, etc. Each task has its own trigger.

3.2 Job store (job store)

If your application recreates jobs every time it is started, then use the default job store (MemoryJobStore), but if you need to retain the job even if the scheduler restarts or the application crashes, you should Choose specific job storage based on your application environment. For example: use Mongo or SQLAlchemy JobStore (used to support most RDBMS).

Task memory is a place where tasks can be stored. Tasks are saved in memory by default, and tasks can also be saved in various databases. After the task is stored, it will be serialized, and then it can also be deserialized and extracted to continue execution.

3.3 executor

The Executor is initialized in the scheduler, and the Executor can also be dynamically added through the scheduler's add_executor.

Each executor will be bound to an alias, which is bound to the job as a unique identifier. During actual execution, it will be bound to the executor according to the job. Find the actual executor object and then execute the Job based on the executor object.

The selection of Executor needs to select different executors based on the actual scheduler.

To handle the execution of jobs, they are usually performed by submitting a specified callable object to a thread or pool in the job. When the job is completed, the executor will notify the scheduler.

3.4 Scheduler

Scheduler is the core of APScheduler, and all related components are defined through it. After the scheduler is started, it will start scheduling according to the configured tasks. In addition to the wake-up schedule based on the scheduled time generated by all triggers that define the Job. Scheduling will also be triggered when job information changes.

The scheduler can choose different components according to its own needs. If you use AsyncIO, choose AsyncIOScheduler, and if you use tornado, choose TornadoScheduler.

The task scheduler is the overall commander of the entire schedule. It will reasonably arrange job storage, executors, triggers to work, and add and delete tasks, etc. There is usually only one scheduler. Developers rarely directly manipulate flip-flops, memory, actuators, etc. Because these are automatically implemented by the scheduler.

4. Two common schedulers

There are many different types of schedulers in APScheduler, BlockingScheduler and BackgroundScheduler are the two most commonly used schedulers. So what's the difference between them? To put it simply, the main difference is that BlockingScheduler will block the operation of the main thread, while BackgroundScheduler will not block. So, in different situations, choose a different scheduler:

  • BlockingScheduler: After calling the start function, the current thread will be blocked. Use when the scheduler is the only thing running in your application (as in the example above).
  • BackgroundScheduler: The main thread will not block after calling start. Use when you are not running any other frameworks and want the scheduler to execute in the background of your app.

4.1 BlockingScheduler

Sample code:

import time
from apscheduler.schedulers.blocking import BlockingScheduler
 
 
def job():
    print('job 3s')
 
 
if __name__ == '__main__':
 
    sched = BlockingScheduler(timezone='MST')
    sched.add_job(job, 'interval', id='3_second_job', seconds=3)
    sched.start()
 
    while True:
        print('main 1s')
        time.sleep(1)

Running results:
Insert image description here
It can be seen from the above example that BlockingScheduler will block the current thread after calling the start function, resulting in the while loop in the main program not being executed.

4.2 BackgroundScheduler

Sample code:

import time
from apscheduler.schedulers.background import BackgroundScheduler
 
 
def job():
    print('job 3s')
 
 
if __name__ == '__main__':
 
    sched = BackgroundScheduler(timezone='MST')
    sched.add_job(job, 'interval', id='3_second_job', seconds=3)
    sched.start()
 
    while True:
        print('main 1s')
        time.sleep(1)

Running result:
Insert image description here
From the above example, it can be found that after calling the start function, job() will not start executing immediately. Instead, it will wait for 3 seconds before being scheduled for execution. How to make the job start running after start()? One of the simplest ways is to run job() once before the scheduler starts.

4.3 What will happen if the job execution time is too long?

Sample code:

import time
from apscheduler.schedulers.background import BackgroundScheduler
 
 
def job():
    print('job 3s')
    time.sleep(5)
 
 
if __name__ == '__main__':
 
    sched = BackgroundScheduler(timezone='MST')
    sched.add_job(job, 'interval', id='3_second_job', seconds=3)
    sched.start()
 
    while True:
        print('main 1s')
        time.sleep(1)

运行结果:Execution of job “job (trigger: interval[0:00:03], next run at: 2022-12-21 07:04:52 MST)” skipped: maximum number of running instances reached (1)

Insert image description here

As shown in the above example, after the 3s time arrives, it will not "restart a job thread", but will skip the scheduling, wait until the next cycle (wait another 3s), and reschedule job().

In order to allow multiple job() to run at the same time, you can configure the scheduler parameter max_instances, as in the following example, allowing 2 job() to run at the same time:

Example code:

import time
from apscheduler.schedulers.background import BackgroundScheduler
 
 
def job():
    print('job 3s')
    time.sleep(5)
 
 
if __name__ == '__main__':
 
    job_defaults = {
    
    'max_instances': 2}
    sched = BackgroundScheduler(timezone='MST', job_defaults=job_defaults)
    sched.add_job(job, 'interval', id='3_second_job', seconds=3)
    sched.start()
 
    while True:
        print('main 1s')
        time.sleep(1)

operation result:
Insert image description here

The above code example parameters are global and can also be applied to a single task:

Sample code:

import time
from apscheduler.schedulers.background import BackgroundScheduler
 
 
def job():
    print('job 3s')
    time.sleep(5)
 
 
if __name__ == '__main__':
 
    sched = BackgroundScheduler(timezone='MST')
    sched.add_job(job, 'interval', id='3_second_job', seconds=3, max_instances=2)
    sched.start()
 
    while True:
        print('main 1s')
        time.sleep(1)

operation result:
Insert image description here

4.4 How is each job scheduled?

Will the job() function be scheduled to run as a process or as a thread?

Example code:

import time
import os
import threading
from apscheduler.schedulers.background import BackgroundScheduler
 
 
def job():
    print('job 3s')
    print('job thread_id-{0}, process_id-{1}'.format(threading.get_ident(), os.getpid()))
    time.sleep(5)
 
 
if __name__ == '__main__':
 
    sched = BackgroundScheduler(timezone='MST')
    sched.add_job(job, 'interval', id='3_second_job', seconds=3, max_instances=2)
    sched.start()
 
    while True:
        print('main 1s')
        time.sleep(1)

Running results:
Insert image description here
The above example shows that the process ID of each job() is the same, and the thread ID of each thread is different. Therefore, job() is ultimately scheduled in the form of threads.

5. Usage details

Insert image description here
Parameter Description:

  • id: Specifies the unique ID of the job
  • name: Specify the name of the job
  • Trigger: The trigger defined by apscheduler is used to determine the execution time of the job. According to the set trigger rules, the next execution time of the job is calculated and will be executed when it is satisfied.
  • Executor: The executor defined by apscheduler. Set the name of the executor when the job is created. According to the string name, go to the scheduler to get the executor that executes the job, and execute the function specified by the job.
  • max_instances: The maximum number of instances to execute this job. When the executor executes the job, the number of executions is calculated based on the job ID, and whether it is executable is determined based on the maximum number of instances set.
  • next_run_time: The next execution time of the Job. You can specify a time [datetime] when creating the Job. If not specified, the trigger time will be obtained based on the trigger by default.
  • misfire_grace_time: The delayed execution time of the Job. For example, the planned execution time of the Job is 21:00:00, but it is not executed until 21:00:31 due to service restart or other reasons. If this key is set to 40, the job will continue to execute. , otherwise this job will be discarded
  • coalesce: whether the job is merged and executed, it is a bool value. For example, the scheduler restarts after stopping for 20 seconds, and the job trigger is set to be executed once every 5 seconds, so this job misses 4 execution times. If set to yes, it will be merged into one execution, otherwise it will be executed one by one.
  • func: function executed by Job
  • args: Positional parameters required by the Job to execute the function
  • kwargs: keyword parameters required by the Job execution function

5.1 interval trigger

Triggered at fixed time intervals. interval interval scheduling, the parameters are as follows:
Insert image description here
sample code:

from datetime import datetime
from apscheduler.schedulers.blocking import BlockingScheduler
 
 
def task():
    now = datetime.now()
    ts = now.strftime("%Y-%m-%d %H:%M:%S")
    print(ts)
 
 
def task2():
    now = datetime.now()
    ts = now.strftime("%Y-%m-%d %H:%M:%S")
    print(ts + ' 666!')
 
 
def task3():
    now = datetime.now()
    ts = now.strftime("%Y-%m-%d %H:%M:%S")
    print(ts + ' 888!')
 
 
def func():
    # 创建调度器BlockingScheduler()
    scheduler = BlockingScheduler()
    scheduler.add_job(task, 'interval', seconds=3, id='test_job1')
    # 添加任务,时间间隔为5秒
    scheduler.add_job(task2, 'interval', seconds=5, id='test_job2')
    # 在2022-10-27 21:50:30和2022-10-27 21:51:30之间,时间间隔为6秒
    scheduler.add_job(task3, 'interval', seconds=6, start_date='2022-10-27 21:53:00', end_date='2022-10-27 21:53:30', id ='test_job3')
    # 每小时(上下浮动20秒区间内)运行task
    # jitter振动参数,给每次触发添加一个随机浮动秒数,一般适用于多服务器,避免同时运行造成服务拥堵。
    scheduler.add_job(task, 'interval', hours=1, jitter=20, id='test_job4')
    scheduler.start()
 
 
func()

operation result:
Insert image description here

5.2 Use of trigger date

date is the most basic type of scheduling, and the job task will only be executed once. It indicates a specific point in time to trigger. Its parameters are as follows:
Insert image description here
Note: The run_date parameter can be date type, datetime type or text type.

Sample code:

from datetime import datetime
from apscheduler.schedulers.blocking import BlockingScheduler
 
 
def task():
    now = datetime.now()
    ts = now.strftime("%Y-%m-%d %H:%M:%S")
    print(ts)
 
 
def task2():
    now = datetime.now()
    ts = now.strftime("%Y-%m-%d %H:%M:%S")
    print(ts + '666!')
 
 
def func():
    # 创建调度器BlockingScheduler()
    scheduler = BlockingScheduler()
    scheduler.add_job(task, 'date', run_date=datetime(2022, 10, 27, 21, 39, 00), id='test_job1')
    scheduler.add_job(task2, 'date', run_date=datetime(2022, 10, 27, 21, 39, 50), id='test_job2')
    scheduler.start()
 
 
func()

operation result:
Insert image description here

5.3 cron trigger

Trigger periodically at a specific time, compatible with the Linux crontab format. It is the most powerful trigger.

cron parameters:
Insert image description here
expression:
Insert image description here
Example code 1:

from datetime import datetime
from apscheduler.schedulers.blocking import BlockingScheduler
 
 
def task():
    now = datetime.now()
    ts = now.strftime("%Y-%m-%d %H:%M:%S")
    print(ts)
 
 
def task2():
    now = datetime.now()
    ts = now.strftime("%Y-%m-%d %H:%M:%S")
    print(ts + ' 666!')
 
 
def task3():
    now = datetime.now()
    ts = now.strftime("%Y-%m-%d %H:%M:%S")
    print(ts + ' 888!')
 
 
def func():
    # 创建调度器BlockingScheduler()
    scheduler = BlockingScheduler()
    # 在每年 1-3、7-9 月份中的每个星期一、二中的 00:00, 01:00, 02:00 和 03:00 执行 task 任务
    scheduler.add_job(task, 'cron', month='1-3,7-9', day_of_week='1-2', hour='0-3', id='test_job1')
    scheduler.start()
 
 
func()

Sample code 2: [Note: the numbers corresponding to mon~sun in day_of_week() are 0-6]

from datetime import datetime
from apscheduler.schedulers.blocking import BlockingScheduler
 
 
def task():
    now = datetime.now()
    ts = now.strftime("%Y-%m-%d %H:%M:%S")
    print(ts)
 
 
def func():
    # 创建调度器BlockingScheduler()
    scheduler = BlockingScheduler()
    # 在每个星期三中的 23:02执行 task 任务
    scheduler.add_job(task, 'cron', day_of_week='2', hour='23', minute='2')
    scheduler.start()
 
 
if __name__ == '__main__':
    func()

operation result:

Insert image description here

Sample code 3: [Note: multiple days can be set in day-of_week(), and the hours, minutes, and seconds can be strings or numeric values]

from datetime import datetime
from apscheduler.schedulers.blocking import BlockingScheduler
 
 
def task():
    now = datetime.now()
    ts = now.strftime("%Y-%m-%d %H:%M:%S")
    print(ts)
 
 
def func():
    # 创建调度器BlockingScheduler()
    scheduler = BlockingScheduler()
    # 在每个星期二、三中的 23:11:00执行 task 任务
    scheduler.add_job(task, 'cron', day_of_week='1-2', hour='23', minute='11', second='00')
    # 在每个星期二、三中的 23:11:03执行 task 任务
    scheduler.add_job(task, 'cron', day_of_week='1-2', hour=23, minute=11, second=3)
    # 在每个星期三、四中的 23:11:05执行 task 任务
    scheduler.add_job(task, 'cron', day_of_week='2-3', hour='23', minute='11', second='05')
    # 在每个星期三、四中的 23:11:08执行 task 任务
    scheduler.add_job(task, 'cron', day_of_week='2-3', hour=23, minute=11, second=8)
    scheduler.start()
 
 
if __name__ == '__main__':
    func()

operation result:
Insert image description here

Reference blog

Guess you like

Origin blog.csdn.net/u012856866/article/details/132365438