Troubleshoot php memory leaks

background

This is a document written in 2008. I just took over as the technical director of a website at that time. The website has about 600,000 ip/3 million pv visits every day. The website products are very complicated, the code structure is poor, and development engineers come and go. Go, the code can only be read only. Suddenly one day, the php-fpm process started to run out of memory and the cpu occupancy rate soared, and 504 errors frequently appeared on the front end

php-fpm process running out of memory

This is the legendary memory leak. The so-called memory leak refers to the situation where the memory usage rate of the process gradually increases without releasing it during the running process, resulting in less and less available memory for the system.

Strictly speaking, this is not a fatal error. "Memory leak" is only a threat to long-running programs, and there is no need to worry about the execution script of a single task. The easiest way to deal with it is to restart the process at regular intervals. There is a max_request in the configuration information of php-fpm, which defines how many requests a fastcgi process processes and then exits so that the system can release memory. However, if the memory usage increases very fast and restarts the process frequently, it will affect the stability of the service. nature, so this issue must be addressed head-on

Memory leak troubleshooting is very difficult

Because the code size is very large, it is basically impossible to check it by doing a code review, and PHP does not run on a virtual machine, and there is no official monitor (similar to java hprof, JVM Monitor, etc.), search on the Internet, find Less than any answers to explore and solve problems

valgrind scheme

Use valgrind to debug php-cgi process
Valgrind is a memory debugging and code analysis of commonly used programs in Linux. It is very helpful for debugging memory leaks in C/C++ programs. Its mechanism is to add counts to function calls such as system alloc/free. Use valgrind to debug php-cgi, which requires php-cgi to be the debug version. Practice has proved that it does not work:

  1. The php-cgi debug version cannot run online at all, and the performance is too poor

  2. The memory leak of the php program is due to some circular references or gc logic errors, which cannot be detected by valgrind. It is suitable for checking whether the php interpreter has memory leaks

Remarks: The php interpreter (Zend core) has its own mechanism for checking memory leaks. The core code of the php interpreter is called (Zend Core)

php comes with check

To debug the php-cgi process, I checked the code of php-cgi and found that zend core has implemented self-checking for memory leaks,
but for the same reason, php-cgi debug cannot run and cannot get debugging information

DTrace for FreeBSD

DTrace is the core debugger supported by the freebsd system, which can add count points to each system function call, which was used by twitter. This method was not used in the end for the following reasons:

  1. It is very cumbersome to find a server to install freebsd and deploy it online or simulate load
  2. I pored over DTrace's docs and found this can be considered an enhanced valgrind, also doesn't solve our problem

None of these 3 methods will work, and you are stuck. But think about it from another angle:

Although there is no convenient tool to solve the memory leak of PHP programs, web programs are segmented according to requests. For an HTTP request, the corresponding PHP process executes a PHP file. A natural idea is to record the memory of the PHP process before and after each HTTP request
processing The difference in occupancy, and then sort the results, you can find out the file with the greatest possibility of increasing the process memory, which is the most likely to cause a memory leak

There are two ways to calculate the memory usage of a process

  1. The PHP built-in function memory_get_usage
    is a counter in Zend Core, which is the amount of memory used by Zend Core, but PHP memory leaks may be caused by Zend Core logic errors, so memory_get_usage is not necessarily reliable

  2. The Linux system file /proc/{$pid}/status will record the running status of a process, and the VmRSS field in it
    records the resident physical memory (Residence) used by the process. This is the physical memory actually occupied by the process. It is more reliable to use this data, and it is easy to extract this value in the program
    . Once you find an idea, you can start writing the program by hand.

Directly modify the source code of php-cgi, add the counting code before and after processing each fastcgi request in main.c, output the log to the log file, recompile and go online
after running for 30 minutes, execute

cat short.log| awk '{print $3 "\t" $7 "\t" $6 "\t" $4$5}' |sort -r -n |head -n 100

It is easy to find the code files that are most likely to have memory leaks, and then further investigate and refactor the code. This is very simple: the files that can not be loaded will not be loaded, and after the large array is used up, unset it quickly... Done
!

back story

Later, I discovered that there is no need to modify the source code of php. There are two configuration items in the php.ini configuration file: auto_append_file, auto_prepend_file, which can inject code before and after the request... What a
tragedy

The same idea is used to optimize the performance of web programs, but it is much simpler. You don’t need to write code, just add $request_time to the nginx log, and use awk/sort to find the bottleneck.

Guess you like

Origin blog.csdn.net/victorc666/article/details/124561263