Manage scheduled tasks with Django and Celery


Simultaneously published on personal site: http://panzhixiang.cn/article/2023/3/16/68.html

1. Background introduction

We have been using k8s cronjob to manage scheduled tasks before. Separately encapsulate the code related to the scheduled task into a pod, and then use the cronjob method to trigger it.

Although this method is very simple to operate and has no dependencies on third-party resources (such as Redis), it also has an obvious disadvantage.

The code of the timed task is separated from the Django code, and many functions of Django cannot be used. It can only communicate with the Django Server through the API encapsulated by DRF.
Sometimes for a scheduled task, many APIs need to be encapsulated, and issues such as authentication must also be considered, which is quite troublesome, so I plan to change a method to manage scheduled tasks in a new project.

Engineers who use both Python and Django probably know Celery, which is a good framework for asynchronous tasks. The last time I used it was in 2020, and I found that the way of using Celery has changed in recent years. I searched the Internet and found no good Chinese materials, so I wrote a related blog by myself, hoping to give it to the future A little help for those who need to look up relevant information.

Two, Celery configuration

Before configuring Celery, you need to install it first, pip install celeryand then start configuring.

Before officially starting to introduce the configuration, we need some assumptions so that the following text can be expressed more clearly.

Let's django-admin startproject projcreate a Django project. The Django version should be >=3.0. After successful creation, we will get the following directory structure:

proj
├── manage.py
└── proj
    ├── asgi.py
    ├── __init__.py
    ├── settings.py
    ├── urls.py
    └── wsgi.py

Those who are familiar with Django should be very familiar with the above directory tree. The following content is written based on this directory tree, so you need to remember this directory tree.

1. Define the Celery instance

In order to define a Celery instance, a file needs to be created in the directory tree above: proj/proj/celery.py.
The file name is celery.py , which is in the same directory as settings.py.

The content is as follows, I have written some important information in the code in the form of comments, pay attention to check.

import os
from celery import Celery


# 这个配置可以避免在其他的tasks.py中初始化django配置,虽然不是必须的,但是强烈建议要有这个配置
os.environ.setdefault(
    'DJANGO_SETTINGS_MODULE', 'proj.settings'
)

# 这个就是从环境变量中获取redis的地址,我这里使用redis作为broker
REDIS_HOST = os.getenv('REDIS_HOST', 'localhost:6379')
app = Celery(
    'proj',  # 第一个参数是为celery的实例起了一个名字,这里叫做proj
    backend='redis://' + REDIS_HOST + '/1',
    broker='redis://' + REDIS_HOST + '/0',
)

# 可以用这个方法批量配置celery,
# 这几个配置在一帮的场景中就足够使用了
# 另外,其实还有几种其他方法来配置celery,但是我觉得这个方法对于不是非常大的项目来说就足够了。
app.conf.update(
    task_serializer='json',
    accept_content=['json'],  # Ignore other content
    result_serializer='json',
    enable_utc=True,
)

# 这一行会从django的settings文件中获取一些celery的配置
# namespace等于CELERY的意思是settings中以 “CELERY_” 开头的配置都会被识别为celery的配置
app.config_from_object('django.conf:settings', namespace='CELERY')

# 会自动发现所有Django app中的任务
app.autodiscover_tasks()


@app.task(bind=True)
def debug_task(self):
    print(f'Request: {
      
      self.request!r}')

In addition to the above configuration, there are two other places that need to be configured.
The first is to add the following to proj/proj/__init__.py :

from .celery import app as celery_app


__all__ = ('celery_app',)

Its role is to automatically load celery when starting Django.

Another one is that you need to add celery configuration in django settings, which is app.config_from_object('django.conf:settings', namespace='CELERY')the part mentioned in the above code.

CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 30 * 60  # 单个任务的最大运行时间,单位是秒

2. Document the results of the task

When using celery for task scheduling, it is best to record the results of each task for future reference, especially when the task does not run as expected, this is even more important.

The official website recommends using django-celery-results to record task results.

  1. Install
    pip install django-celery-results
  2. Registering
    django-celery-results is a separate django app, so you need to register it in settings.py
    INSTALLED_APPS = (
        ...,
        'django_celery_results',
    )
    
    After registration, you need to migrate the database,
    python manage.py migrate django_celery_results
  3. Configuring
    django-celery-results is just a package to help automatically store task results. The final data needs to be landed in a place. There are many places that can be used to store task results, such as databases, local file systems, redis, etc. I use databases here , It is also recommended to use a database.
    Add a configuration to django's setting.py:
    CELERY_RESULT_BACKEND = 'django-db'  # 使用数据库做后端
    CELERY_CACHE_BACKEND = 'django-cache'  # 老实说,不知道这个缓存配置到底有什么作用,但是官网推荐使用这个配置,我也就留着了
    CELERY_CACHE_BACKEND = 'default'
    
  4. Note
    that this command should be run in the first-level proj directory, otherwise an error will be reported, indicating that the configuration file cannot be found.
    celery -A backend worker --loglevel=INFO
    

3. Timing task configuration

How to configure celery was introduced earlier, now that celery is available, how to manage scheduled tasks? At this time, django-celery-beat will be used, and its use is relatively simple.

1. Configure django-celery-beat

  1. Install
    pip install django-celery-beat
  2. Registration Register
    in settings.py of django
    INSTALLED_APPS = (
        ...,
        'django_celery_beat',
    )
    
    Similarly, to migrate the database after registration,
    python manage.py migrate django-celery-beat
  3. Note
    that this command should be run in the first-level proj directory, otherwise an error will be reported, indicating that the configuration file cannot be found.
    celery -A proj beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler
    

What I want to explain separately is that when many people use django_celery_beat for timing task management, they like to encapsulate the timing tasks in the form of cronjob in the code, but I prefer to configure them in the database through the Django Admin page.

Because it is encapsulated in the code, if you want to modify the scheduled task in the future, you need to rewrite the code and deploy it to the environment, which is not very friendly, and for non-technical personnel, the possibility of configuring the scheduled task by yourself is almost zero.

2. Set specific timed tasks through Django Admin

The content of this part is relatively simple. Start Django, log in to the Admin page and click Create on the page. It is not difficult, but if you want to write it, you need to connect a lot of pictures, so you don’t really want to write it.

4. Reference

  1. First Steps with Django
  2. Task result backend settings

Guess you like

Origin blog.csdn.net/u013117791/article/details/129584654