Celery-UserGuide-Tasks

The components of celery during the task.
A task is a class created by any callable object. A task has a dual role, because it defines both what happens when the task is called (sending a message) and what happens when the worker/worker receives the message.
Each task class has a unique name, this name will be quoted in the character information, and the worker executes the correct function through this name.
The task information will be kept in the queue until the worker confirms the task information. A worker can keep a lot of information in advance, and even if the worker dies due to a power outage or other reasons, the message will be retransmitted to another worker.
Ideally, the task function should be idempotent: this means that even if the function is called multiple times with the same parameters, it will not produce unexpected effects. Since the worker cannot detect whether the task is idempotent, the default behavior is to confirm the message in advance before the message is executed, so that the task call that has been started will not be executed.
If the user-defined task is idempotent, you can set the acks_late option to make the worker confirm the task information when the task returns.
That is, if the acks_late is set, if the child process executing the task ends (either by exiting the signal or calling sys.exit()), the worker will confirm the task information.

This chapter describes how to define tasks

Basic

Users can create tasks by decorating callable objects with task(), for example:

from .models import User

@app.task
def create_user(username, password):
    User.objects.create(username=username, password=password)

The properties of the task can be defined by setting parameters

@app.task(serializer='json')
def create_user(username, password):
    User.objects.create(username=username, password=password)

The task decorator can be called through the Celery application instance. If users use Django or define a set of libraries, they can use the shared_task() decorator:

from celery import shared_task

@shared_task
def add(x, y):
    return x + y

When multiple decorators are used together, make sure that the task decorator takes effect last, that is, at the top level:

@app.task
@decorator2
@decorator1
def add(x, y):
    return x + y

Task binding means that the first parameter of the task is the task instance itself:

logger = get_task_logger(__name__)

@task(bind=True)
def add(self, x, y):
    logger.info(self.request.id)

Task inheritance: use the base parameter to specify the base class of the task in the task decorator

import celery

class MyTask(celery.Task):

    def on_failure(self, exc, task_id, args, kwargs, einfo):
        print('{0!r} failed: {1!r}'.format(task_id, exc))

@task(base=MyTask)
def add(x, y):
    raise KeyError()

Name

Each task must have a unique name.
If no name is specified, the decorator will automatically generate one. This name is generated by the name of the module where 1 task is located and the name of 2 task function.
Specify name:

>>> @app.task(name='sum-of-two-numbers')
>>> def add(x, y):
...     return x + y

>>> add.name
'sum-of-two-numbers'

It is best to use the module name as a namespace to avoid conflicts with tasks under other modules:

>>> @app.task(name='tasks.add')
>>> def add(x, y):
...     return x + y

Users can distinguish different tasks by task name timeline:

>>> add.name
'tasks.add'

In the following example, tasks are defined under a module named tasks.py, and tasks.add will be automatically generated as the task name:
tasks.py

@app.task
def add(x, y):
    return x + y
>>> from tasks import add
>>> add.name
'tasks.add'

Automatic naming and relative path references are not very compatible. If you want to use relative references, you need to specify a clear task name. For example, the client uses the'.tasks' to reference the'myapps.tasks' module, and the worker uses myapp.tasks to import the module, the automatically generated name cannot match the task correctly, and a NotRegistered error will be thrown.
The introduction of the format project.myapp in Django can also cause the same problem:

INSTALLED_APPS = ['project.myapp']

If the user defines the application in project.myapp, the task module must be imported in the form of project.myapp.tasks.

>>> from project.myapp.tasks import mytask   # << GOOD

>>> from myapp.tasks import mytask    # << BAD!!!

The second case will be due to the different module names introduced by the client and worker

>>> from project.myapp.tasks import mytask
>>> mytask.name
'project.myapp.tasks.mytask'

>>> from myapp.tasks import mytask
>>> mytask.name
'myapp.tasks.mytask'

Task request

app.Task.request contains the information and status of the currently executing task.
The request defines the following attributes:

id The unique id of the currently executing task
group Unique id of task group
chord
correlation_id Custom id, used for deduplication and the like
args Location parameter
kwargs Keyword parameter
origin Host sending task
retries The number of retries of the current task, starting from 0
is_eager If the task is executed locally on the client instead of being executed by the worker, set to True
and Task forecast time in UTC format
expires Task expiration time
hostname The host where the worker performing the task is located
delivery_info Additional task information delivery information. The mapping of the exchange key and the routing key used to deliver this task is stored. For example, app.Task.retry() uses this parameter to resend the task to the same queue. The available keys depend on the messaging middleware used.
reply_to The name of the queue used to return the response.
called_directly Mark whether the task is not executed by the worker
timelimit Task time limit tuple
callback List of signal functions that need to be called after the task runs successfully
errbacks List of signal functions that need to be called after the task fails
utc Use UTC for marking tasks

Example:

@app.task(bind=True)
def dump_context(self, x, y):
    print('Executing task id {0.id}, args: {0.args!r} kwargs: {0.kwargs!r}'.format(self.request))

Bind is set to True means that the function will become a binding method so that users can access the properties and methods of the task type instance.

Log

The worker will automatically set the log, and the user can also configure the log by themselves.
Celery's built-in logger is named celery.task. Inheriting this logger can automatically obtain the task name and unique id.
It is best to define a logger for tasks in the module first

from celery.utils.log import get_task_logger

logger = get_task_logger(__name__)

@app.task
def add(x, y):
    logger.info('Adding {0} + {1}'.format(x, y))
    return x + y

Celery uses the logger module in the python standard library.
Users can also use print or other methods of output to label output or label errors, which will be redirected to the logging system.

Parameter check

Celery will check the parameters when the user calls the function:

>>> @app.task
... def add(x, y):
...     return x + y

# Calling the task with two arguments works:
>>> add.delay(8, 8)
<AsyncResult: f59d71ca-1549-43e0-be41-4e8821a83c0c>

# Calling the task with only one argument fails:
>>> add.delay(8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "celery/app/task.py", line 376, in delay
    return self.apply_async(args, kwargs)
  File "celery/app/task.py", line 485, in apply_async
    check_arguments(*(args or ()), **(kwargs or {}))
TypeError: add() takes exactly 2 arguments (1 given)

Set the typing parameter to False to disable parameter checking

>>> @app.task(typing=False)
... def add(x, y):
...     return x + y

# Works locally, but the worker receiving the task will raise an error.
>>> add.delay(8)
<AsyncResult: f59d71ca-1549-43e0-be41-4e8821a83c0c>

With task_protocol1 or higher settings, users can specify how positional parameters and keyword parameters are displayed in logs and monitoring events through argsrepe and kwargsrepr:

>>> add.apply_async((2, 3), argsrepr='(<secret-x>, <secret-y>)')

>>> charge.s(account, card='1234 5678 1234 5678').set(
...     kwargsrepr=repr({'card': '**** **** **** 5678'})
... ).delay()

Retry

Use app.Task.retyr() to re-execute the task.
When the retry method is called, the method will send new information and the same task id to the queue where the task is located.
When the task is retried, the task status will be set to retry.
example:

@app.task(bind=True)
def send_twitter_status(self, oauth, tweet):
    try:
        twitter = Twitter(oauth)
        twitter.update_status(tweet)
    except (Twitter.FailWhaleError, Twitter.LoginError) as exc:
        raise self.retry(exc=exc)

The bind parameter will allow access to the task instance.
The exc parameter is used to pass the exception information used in the log and the time to save the task result. If result abckend is set, then the exception and backtracking of the task can be accessed.
If the task has a max_retries value and the maximum number of retries has been exceeded, the current exception will be re-raised. unless:

  • If the exc parameter is not set, MaxRetriesExceededError will be thrown at this time
  • There are currently no exceptions. If there is no original exception, the exception specified by the exc parameter will be thrown, such as self.retry(exc=Twitter.LoginError()).

Retry with a custom delay

You can wait for a period of time before the task is retried. The default time is specified by the default_retry_delay parameter. The default is 3min. The unit of this parameter is seconds.
Use the countdown parameter to override the retry time value

@app.task(bind=True, default_retry_delay=30 * 60)  # retry in 30 minutes.
def add(self, x, y):
    try:
        something_raising()
    except Exception as exc:
        # overrides the default delay to retry after 1 minute
        raise self.retry(exc=exc, countdown=60)

Task decorator parameters

The parameters set to the task decorator will eventually be set as attributes of the task class.

General parameters

Task.name: The name of the registered task. The user can set the task manually or generate it automatically.
Task.request: If the task is executed, the request will save the task information for this execution request. Use thread local storage. Task.max_retries: This parameter is used when the task calls the self.retry method or the decorator uses the autoretry_for parameter. The maximum number of fillings before the mission fails. After exceeding, a MaxRetriesExceededError exception will be thrown.
The retry() method must be called manually.
Task.throws: Exception class tuple. When an exception in the tuple occurs, it will not be treated as a true exception, and will be reported as a task failure, but it will not be recorded as an error in the log, and no traceback will be retained.
example:

@task(throws=(KeyError, HttpNotFound)):
def get_foo():
    something()

Exception type:
Predicted anomaly-included in Task.throws:
Use INFO-level logs, no tracebacks. Unpredicted
exceptions:
Use ERROR-level logs and retain tracebacks.
Task.default_retry_delay: The number of seconds to wait before retrying the task. The default is 3 minutes.
Task.rate_limit: Set the rate limit for this task type (limit the number of tasks that can be run in a given time range). When the rate limit is in effect, the task will still be completed, but it may take a while to start.
When set to None, there is no restriction. When set to an integer or floating point number, it is regarded as the number of tasks executed per second.
Add "/s", "/m" or "/h" to set the number of tasks per second, per minute or hour.
For example, "100/m" will force a minimum interval of 600ms between every two tasks.
The default value is the task_defauilt_rate_limit setting.
This sets the rate of each worker instance instead of the global rate.
Task.time_limit: The hard time limit of the current task, in seconds.
Task.soft_time_limit: The soft time limit of the task.
Task.ignore_result: Do not retain task status. If set to True, you cannot check whether the task is ready or get the task result through AsyncResult.
Task.store_errors_even_if_ignored: When set to True, even if ignore_result is set to True, exceptions will still be retained.
Task.serializer: A string that defines the default serialization method. The default value is the task_serializer setting. Optional values ​​are pickle/json/yaml or other custom serialization methods that can be registered by kombu.serialization.registry.
Task.compression: Character creation, defines the default compression framework. The default is task_compression setting. It can be gzip/bzip2 or other compression frameworks registered in kombu.compression.regitry.
Task.backend: The result is stored in the background. Any instance in celery.backends. The default is app.backend, which is set by the reault_backend parameter.
Task.acks_late: When set to True, the task information will be confirmed after the task is executed. The default is to confirm before execution. This means that if the worker thread crashes during execution, the task may be executed multiple times.
Task.track_started: When set to True, the status will be set to started when the task is executed by the worker. The default value is False, because the normal behavior is not to report this level of granularity. The task has only three states: pending\finished or waiting to be retried. The host name and process id of the worker performing the task can be accessed through the state meta-data.

State

Celery can trace the current status of tasks, including the results of successful tasks, exceptions and backtracking of failed tasks.
In the life cycle of a task, a task can switch to multiple states, and each state may have different meta-data. When the task enters a new state, the original state will be discarded.
There are also some state sets, such as a fault state set and a ready state set.
The client uses the members in the state to determine whether it needs to throw an exception again or cache the result.

Built-in state

PENDING

Task waiting to be executed or unknown task.

STARTED

The mission begins. Need to set app.Task.track_started.
meta-data: The host and process of the worker performing the task.

SUCCESS

The task was executed successfully.
meta-data: the result returned by the task

FAILURE

The task execution failed
meta-data: the exception thrown and the traceback.

RETRY

The task is retrying.
meta-data: The exception and the exception traceback that caused the retry.

REVOKED

The mission was expelled.

Custom status

Set a unique name to define the status. The status is usually composed of capital letters. You can refer to the ABORTED state defined in abortable tasks.
Use updated_state() to update the task state:

@app.task(bind=True)
def upload_files(self, filenames):
    for i, file in enumerate(filenames):
        if not self.request.called_directly:
            self.update_state(state='PROGRESS',
                meta={'current': i, 'total': len(filenames)})

The PROGRESS state is defined here. By using the current count and total count as part of the state metadata, it tells any application that knows this state that the task is currently in progress and the position of the task in the process. Can be used to create a progress bar.

Guess you like

Origin blog.csdn.net/JosephThatwho/article/details/111313289