The use of distributed task queue Celery

I. Introduction

  Celery is a Python developer, simple, flexible, reliable distributed task queue, by its very nature is a producer-consumer model, the producer sends the task to the message queue, the consumer is responsible for processing tasks. Celery focus on real-time operation, but scheduling support is also very good, it can handle the task millions every day. Features:

  • Simple: celery familiar workflows, simple configuration
  • High Availability: occurs when the connection is lost or fails during the execution of tasks, celery will automatically attempt to re-execute the task
  • Quick: celery a single process can handle millions of tasks per minute
  • Flexible: various components are almost celery can be extended and customized

For example scenarios:

  1.web application: When a user needs to perform an operation at the site for a long time to complete, we can execute this operation to Celery, directly returned to the user until after the completion of the implementation of Celery notify the user, greatly improve concurrency good site and experience a sense of the user.

  2. mission scenarios: such as the need to perform certain commands or batch tasks in hundreds of machines in operation and maintenance scenarios, this time Celery can easily get.

  3. regular tasks: the timing to send data to guide the timing of the report notice similar scene, although Linux scheduled tasks can help me to achieve, but is not conducive to management, but Celery can provide management interface and rich API.

 

Second, Architecture & Works

  Celery is composed of three parts: Message Oriented Middleware (Broker), the task execution unit Worker, the result is stored (Backend), as shown below:

  

working principle:

  1. Task task module comprises an asynchronous tasks and timing tasks. Wherein the asynchronous tasks is usually triggered concurrent business logic to the message queue, and the timing of tasks performed by the process Celery Beat task periodically sent to the message queue;
  2. Worker real-time monitoring task execution unit acquires the message queue of the task execution queue;
  3. The results will Woker executing the task stored in the Backend in;

Broker messaging middleware

  Broker messaging middleware official offers many options, support RabbitMQ, Redis, Amazon SQS, MongoDB, Memcached and other official recommendation RabbitMQ.

Task execution unit Worker

  Worker Task execution unit is responsible for performing tasks removed from the message queue, it can launch one or more, may be started at different nodes of the machine, which this is achieved core distributed.

The result is stored Backend

  Backend official results storage also provides a way to store a lot of support: RabbitMQ, Redis, Memcached, SQLAlchemy, Django ORM, Apache Cassandra, Elasticsearch.

 

Third, installation 

  Here I use to install redis can refer https://www.cnblogs.com/wdliu/p/9360286.html as messaging middleware, redis.

Celery installation: 

pip3 install celery

Simple to use

  Directory Structure:

project/
├── __init__.py  
├── config.py
└── tasks.py

Each directory Description:

__init__.py: Celery initialization and loading configuration files

# / Usr / bin / the env to python3! 
# - * - Coding: UTF-. 8 - * - 
# the Author: WD 
from Celery Import Celery 
App = Celery ( ' Project ' )                                 # Create instance Celery 
app.config_from_object ( ' project.config ' )                # load the configuration module

config.py: Celery configuration file, more configuration reference: http: //docs.celeryproject.org/en/latest/userguide/configuration.html

# ! / Usr / bin / the env to python3 
# - * - Coding: UTF-. 8 - * - 
# the Author: WD 

BROKER_URL = ' Redis: //10.1.210.69: 6379/0 '  # Broker configuration, as used message middleware Redis 

CELERY_RESULT_BACKEND = ' Redis: //10.1.210.69: 6379/0 '  # BACKEND configuration, used herein Redis 

CELERY_RESULT_SERIALIZER = ' JSON '  # results serialization scheme 

CELERY_TASK_RESULT_EXPIRES = 60 * 60 * 24 # task expires 

CELERY_TIMEZONE = ' Asia / of Shanghai '    # time zone configuration 

CELERY_IMPORTS= (      # Specify import task module may specify a plurality of 
    ' project.tasks ' , 
)

tasks.py: task definition file

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author:wd

from project import app
@app.task
def show_name(name):

Start Worker:

celery worker -A project -l debug

Meaning of each parameter:

  worker: Role represents the start of a work and of course beat other roles;

  -A: the project path, here is my project directory

  -l: log level to start, the more parameters used celery --help View

View the log output, you will find the task we have defined, and the related configurations:

 

  Although the start worker, but we also need to be added by delay or apply_async task to the worker, the task here we add an interactive method to return the object AsyncResult, get results AsyncResult objects:

In addition to common AsyncResult get method for obtaining an outer result of the process also provides the following conventional methods or properties:

  • state: Returns the job status;
  • task_id: Returns the task id;
  • result: returns the task result, with the get () method;
  • ready (): determine whether the mission and the outcome, the outcome is True, or False;
  • info (): Gets the job information, the default is the result;
  • wait(t): 等待t秒后获取结果,若任务执行完毕,则不等待直接获取结果,若任务在执行中,则wait期间一直阻塞,直到超时报错;
  • successfu(): 判断任务是否成功,成功为True,否则为False;

四、进阶使用

 对于普通的任务来说可能满足不了我们的任务需求,所以还需要了解一些进阶用法,Celery提供了诸多调度方式,例如任务编排、根据任务状态执行不同的操作、重试机制等,以下会对常用高阶用法进行讲述。

定时任务&计划任务

  Celery的提供的定时任务主要靠schedules来完成,通过beat组件周期性将任务发送给woker执行。在示例中,新建文件period_task.py,并添加任务到配置文件中:

period_task.py:

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author:wd
from project import app
from celery.schedules import crontab

@app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
    sender.add_periodic_task(10.0, add.s(1,3), name='1+3=') # 每10秒执行add
    sender.add_periodic_task(
        crontab(hour=16, minute=56, day_of_week=1),      #每周一下午四点五十六执行sayhai
        sayhi.s('wd'),name='say_hi'
    )



@app.task
def add(x,y):
    print(x+y)
    return x+y


@app.task
def sayhi(name):
    return 'hello %s' % name

config.py

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author:wd

BROKER_URL = 'redis://10.1.210.69:6379/0' # Broker配置,使用Redis作为消息中间件

CELERY_RESULT_BACKEND = 'redis://10.1.210.69:6379/0' # BACKEND配置,这里使用redis

CELERY_RESULT_SERIALIZER = 'json' # 结果序列化方案

CELERY_TASK_RESULT_EXPIRES = 60 * 60 * 24 # 任务过期时间

CELERY_TIMEZONE='Asia/Shanghai'   # 时区配置

CELERY_IMPORTS = (     # 指定导入的任务模块,可以指定多个
    'project.tasks',
    'project.period_task', #定时任务
)

taskproj/taskproj/__init__.py:

from __future__ import absolute_import, unicode_literals
from .celery import app as celery_app
__all__ = ['celery_app']

我们可以观察worker日志:

还可以通过配置文件方式指定定时和计划任务,此时的配置文件如下:

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author:wd

from project import app
from celery.schedules import crontab

BROKER_URL = 'redis://10.1.210.69:6379/0' # Broker配置,使用Redis作为消息中间件

CELERY_RESULT_BACKEND = 'redis://10.1.210.69:6379/0' # BACKEND配置,这里使用redis

CELERY_RESULT_SERIALIZER = 'json' # 结果序列化方案

CELERY_TASK_RESULT_EXPIRES = 60 * 60 * 24 # 任务过期时间

CELERY_TIMEZONE='Asia/Shanghai'   # 时区配置

CELERY_IMPORTS = (     # 指定导入的任务模块,可以指定多个
    'project.tasks',
    'project.period_task',
)

app.conf.beat_schedule = {
    'period_add_task': {    # 计划任务
        'task': 'project.period_task.add',  #任务路径
        'schedule': crontab(hour=18, minute=16, day_of_week=1),
        'args': (3, 4),
    },
'add-every-30-seconds': {          # 每10秒执行
        'task': 'project.period_task.sayhi',  #任务路径
        'schedule': 10.0,
        'args': ('wd',)
    },
}

此时的period_task.py只需要注册到woker中就行了,如下:

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author:wd
from project import app

@app.task
def add(x,y):
    print(x+y)
    return x+y


@app.task
def sayhi(name):
    return 'hello %s' % name

 同样启动worker和beat结果和第一种方式一样。更多详细的内容请参考:http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#crontab-schedules

任务绑定

Celery可通过任务绑定到实例获取到任务的上下文,这样我们可以在任务运行时候获取到任务的状态,记录相关日志等。

修改任务中的period_task.py,如下:

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author:wd
from project import app
from celery.utils.log import get_task_logger

logger = get_task_logger(__name__)
@app.task(bind=True)  # 绑定任务
def add(self,x,y):
    logger.info(self.request.__dict__)  #打印日志
    try:
        a=[]
        a[10]==1
    except Exception as e:
        raise self.retry(exc=e, countdown=5, max_retries=3) # 出错每5秒尝试一次,总共尝试3次
    return x+y

在以上代码中,通过bind参数将任务绑定,self指任务的上下文,通过self获取任务状态,同时在任务出错时进行任务重试,我们观察日志:

内置钩子函数

  Celery在执行任务时候,提供了钩子方法用于在任务执行完成时候进行对应的操作,在Task源码中提供了很多状态钩子函数如:on_success(成功后执行)、on_failure(失败时候执行)、on_retry(任务重试时候执行)、after_return(任务返回时候执行),在进行使用是我们只需要重写这些方法,完成相应的操作即可。

在以下示例中,我们继续修改period_task.py,分别定义三个任务来演示任务失败、重试、任务成功后执行的操作:

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author:wd
from project import app
from celery.utils.log import get_task_logger
from celery import Task

logger = get_task_logger(__name__)

class demotask(Task):

    def on_success(self, retval, task_id, args, kwargs):   # 任务成功执行
        logger.info('task id:{} , arg:{} , successful !'.format(task_id,args))



    def on_failure(self, exc, task_id, args, kwargs, einfo):  #任务失败执行
        logger.info('task id:{} , arg:{} , failed ! erros : {}' .format(task_id,args,exc))


    def on_retry(self, exc, task_id, args, kwargs, einfo):    #任务重试执行
        logger.info('task id:{} , arg:{} , retry !  einfo: {}'.format(task_id, args, exc))

@app.task(base=demotask,bind=True)
def add(self,x,y):
    try:
        a=[]
        a[10]==1
    except Exception as e:
        raise self.retry(exc=e, countdown=5, max_retries=1) # 出错每5秒尝试一次,总共尝试1次
    return x+y

@app.task(base=demotask)
def sayhi(name):
    a=[]
    a[10]==1
    return 'hi {}'.format(name)

@app.task(base=demotask)
def sum(a,b):
    return 'a+b={} '.format(a+b)

此时的配置文件config.py:

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author:wd

from project import app
from celery.schedules import crontab

BROKER_URL = 'redis://10.1.210.69:6379/0' # Broker配置,使用Redis作为消息中间件

CELERY_RESULT_BACKEND = 'redis://10.1.210.69:6379/0' # BACKEND配置,这里使用redis

CELERY_RESULT_SERIALIZER = 'json' # 结果序列化方案

CELERY_TASK_RESULT_EXPIRES = 60 * 60 * 24 # 任务过期时间

CELERY_TIMEZONE='Asia/Shanghai'   # 时区配置

CELERY_IMPORTS = (     # 指定导入的任务模块,可以指定多个
    'project.tasks',
    'project.period_task',
)

app.conf.beat_schedule = {
'add': {          # 每10秒执行
        'task': 'project.period_task.add',  #任务路径
        'schedule': 10.0,
        'args': (10,12),
    },
'sayhi': {          # 每10秒执行
        'task': 'project.period_task.sayhi',  #任务路径
        'schedule': 10.0,
        'args': ('wd',),
    },
'sum': {          # 每10秒执行
        'task': 'project.period_task.sum',  #任务路径
        'schedule': 10.0,
        'args': (1,3),
    },
}

然后重启worker和beat,查看日志:

 几个crontab例子

#每2个小时中每分钟执行1次任务
crontab(hour='*/2')

#每3个小时的0分时刻执行1次任务
#即[0,3,6,9,12,15,18,21]点0分
crontab(minute=0, hour='*/3')

#每3个小时或8点到12点的0分时刻执行1次任务
#即[0,3,6,9,12,15,18,21]+[8,9,10,11,12]点0分
crontab(minute=0, hour='*/3,8-12')

#每个季度的第1个月中,每天每分钟执行1次任务
#月份范围是1-12,每3个月为[1,4,7,10]
crontab(month_of_year='*/3')

#每月偶数天数的0点0分时刻执行1次任务
crontab(minute=0, hour=0, day_of_month='2-31/2')

#每年5月11号的0点0分时刻执行1次任务
crontab(0, 0, day_of_month='11', month_of_year='5')

 

 

任务编排

  在很多情况下,一个任务需要由多个子任务或者一个任务需要很多步骤才能完成,Celery同样也能实现这样的任务,完成这类型的任务通过以下模块完成:

  • group: 并行调度任务

  • chain: 链式任务调度

  • chord: 类似group,但分header和body2个部分,header可以是一个group任务,执行完成后调用body的任务

  • map: 映射调度,通过输入多个入参来多次调度同一个任务

  • starmap: 类似map,入参类似*args

  • chunks: 将任务按照一定数量进行分组

修改tasks.py:

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author:wd
from project import app

@app.task
def add(x,y):
    return x+y


@app.task
def mul(x,y):
    return x*y


@app.task
def sum(data_list):
    res=0
    for i in data_list:
        res+=i
    return res

group: 组任务,组内每个任务并行执行

和project同级目录新建consumer.py如下:

from celery import group
from project.tasks import add,mul,sum
res = group(add.s(1,2),add.s(1,2))()  # 任务 [1+2,1+2] 
while True:
    if res.ready():
        print('res:{}'.format(res.get()))
        break

结果:

 

chain:链式任务

链式任务中,默认上一个任务的返回结果作为参数传递给子任务

from celery import chain
from project.tasks import add,mul,sum
res = chain(add.s(1,2),add.s(3),mul.s(3))()  # 任务((1+2)+3)*3
while True:
    if res.ready():
        print('res:{}'.format(res.get()))
        break
#结果
#res:18

还可以使用|表示链式任务,上面任务也可以表示为:

res = (add.s(1,2) | add.s(3) | (mul.s(3)))()
res.get()

chord:任务分割,分为header和body两部分,hearder任务执行完在执行body,其中hearder返回结果作为参数传递给body

 

from celery import chord
from project.tasks import add,mul,sum
res = chord(header=[add.s(1,2),mul.s(3,4)],body=sum.s())()  # 任务(1+2)+(3*4)
while True:
    if res.ready():
        print('res:{}'.format(res.get()))
        break

#结果:
#res:15

chunks:任务分组,按照任务的个数分组

from project.tasks import add,mul,sum
res = add.chunks(zip(range(5),range(5)),4)()  # 4 代表每组的任务的个数
while True:
    if res.ready():
        print('res:{}'.format(res.get()))
        break

结果:

 

 五、管理与监控

  Celery管理和监控功能是通过flower组件实现的,flower组件不仅仅提供监控功能,还提供HTTP API可实现对woker和task的管理。

安装使用

pip3 install flower

启动

 flower -A project --port=5555   
# -A :项目目录
#--port 指定端口

访问http:ip:5555

api使用,例如获取woker信息:

curl http://127.0.0.1:5555/api/workers

 

Guess you like

Origin www.cnblogs.com/harryblog/p/11607597.html