The memory leak problem and solution during the use of Celery

Memory leaks during use of Celery

Problem Description

Celery asynchronous tasks execute asynchronous tasks in a timed loop. After about half a month, after receiving a memory warning message, the memory is slowly increasing for a week.
The top command to query the memory usage is very high as shown in the figure below

Celery memory usage linear graph
Celery memory leak top

identify the problem

Using the Memory Profiler toolkit for debugging, it is
found that most of the places where there is memory increment in asynchronous tasks are the places where request requests are sent. For example the following example

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
5  23.94531 MiB  23.94531 MiB           1   @profile(precision=5)
6                                         def my_func(url):
7                                         
8  25.59375 MiB   1.64844 MiB           1       resp = requests.get(url)
9  25.59375 MiB   0.00000 MiB           1       print(resp)
10  25.59375 MiB   0.00000 MiB           1       return resp

Resolution process

To issue online relevant Celery memory leaks found at github on Celery project also has a similar memory leak issue Issue ,
introduces a Celery initiate reconnection request to RabbitMQ time, also had a leak memory. But this problem has not been resolved and is still open

I also queried the memory leak problem of requests: memory leaks will occur when using requests, and the memory will not be released after the request. Using requests.Session() is an example of use that can solve this problem.
However, requests.Session() has thread safety issues. It is better to use a Session() for one thread, but requests.Session() does not follow the thread End and end. This problem has not been completely resolved

This is clearly not the answer they looked, Celery official document per Child Tasks Setting Max ,
explained the problem of memory leaks if there Celery job program, but not within the control program. You can use this option. You can configure the worker to start a new process instead and close the previous process after executing the maximum number of tasks.

At present, because a large number of requests packages are used in the program, it is impossible to quickly replace and find a way that there is no memory leak (moreover, the information on the Internet is probably all urllib3, urllib, and requests will have memory leaks)

So the current solution to the urgent need is to use CELERYD_MAX_TASKS_PER_CHILD to rebuild after the worker has executed 30 tasks.

CELERYD_CONCURRENCY = 1  # celery worker 的并发数
CELERYD_PREFETCH_MULTIPLIER = 1  # 一次预取多少消息乘以并发进程数。默认值为4 要禁用预取将其设置1 若为0 将允许工作程序继续使用所需数量的消息。
CELERYD_MAX_TASKS_PER_CHILD = 30  # 池工作进程可以被新任务替换之前执行的最大任务数。默认值是没有限制。

But rebuilding the process is not a good solution. It is still necessary to study how to solve the memory leak problem of requests. Put it later for research.


  • From: xaohuihui
  • Hand rubbing is not easy, remember to star

Guess you like

Origin blog.51cto.com/14612701/2543766