PHPer certainly received such complaints: small chrysanthemum has been transferred! How your website so card! When this happens we are stuck online business (blocking), most PHPer will be two of a smear, then think of the famous phrase: 性能瓶颈都在数据库
Then put the pot training and preparation DBA, and quickly look slow sql, but this is very wrong approach, because there are too many factors that can lead to business stuck, stuck listed some common questions below.
1. infinite loop
The most common is to write an infinite loop Code
The code through $condition
the control loop to exit, if not strict verification procedures, in some cases $condition
never lead to true will request stuck.
2.sesstion_start function causes stuck
PHP's session lock wait (ps: many places called the session a deadlock, which do not meet the definition of a deadlock), which I believe most PHPer have encountered, PHP will default session information is stored in /tmp/sess_
the following session file inside, call the session_start()
function when will call flock
the system call to the session file locking, if the request is not the end of the previous session or manual release will lead to subsequent requests can not get a lock, stuck in session_start()
this place. Here's an example of this kind of code, such as:
Js ajax request by the timing of the front at the back-end PHP interface ( /ajax/doSomething
) to do some of the more time-consuming thing, people may write code granted that first request even if not finished processing, it will not affect the request for the second time, because there are a lot of FPM processes each request will be distributed to the process does not make sense, but does not know that the second request will get stuck in session_start()
.
3.flock function causes stuck
The most common scenario is to write the log, in the PHP code to ensure that each fwrite
case is less than the contents of the log write 8k we can use atomic append an additional way to write the log, but can not guarantee less than 8k if we need to write the log before each file encryption file lock between the two logs are generated to avoid the situation interspersed, as follows:
If the lock is obtained A process for some problems blocking the process B will then stuck in the third row flock
position, unless A kill off the process, the system will automatically release the file locks
Note that there are many other types of locks even if the process is kill will not automatically be released.
This is 8k can be changed, and in glibc fwrite many details are not the same.
4. The network client is not set timeout
MySQL, CURL, Swoole \ Client-peer client did not set the timeout may cause the process to block. Swoole \ Client when establishing a TCP connection connect
method last parameter is the timeout, -1
that is never time out, set the attention here is not just referring to this connect
method, but behind all send
, recv
are never time out, in synchronous blocking programming model next, if at this time of the end machine downtime and other direct causes of the network is, then the performance of the business end is stuck state, all send
, recv
the method will be blocked, as follows:
5. Swoole lock coroutine
In Swoole coroutine mode, incorrect use can lead to lock all coroutine stuck a large area, the following code, by go
create two coroutines method (the students do not understand the coroutine can be understood as creating two threads), the first coroutine lock after lock to get in co::sleep
position to make the cpu then began to execute the second coroutine, second coroutine will get stuck in a position to acquire a lock line 6, while the first and never coroutine It can not be restored to continue.
How to find stuck
These are just cite a few examples, there are a variety of positions in the real business of stuck, encounter such problems experienced PHPer will use strace -p
the view command in the current PHP process which in the end blocking system call above to locate the problem, but this approach there are several issues:
- Orientation is not clear
example of this deadlock problem when strace can only see something likefutex(0x7f4c8d567128, FUTEX_WAIT, 2, NULL)
this information is not very intuitive, many people simply do not know what PHP code will triggerfutex
the system call, as well as the previously mentionedsession_start
that problem, a lot of people did not know there would be triggeredflock
, also said that it is difficult to target specific issues according to a system call. - -P do not know what process
we start the online environment often dozens or even hundreds of PHP processes, stuck in some requests, some requests under normal circumstances, you in the end thatstrace -p
which process it? Looks like the only chance. - Can not find the problem the cycle of death
due to thestrace
principle of command is to track all system calls, if it is the first case mentioned earlier, that is, the cycle of death stuck,strace
simply can not get any useful information. At this point we can only use thegdb
tool to get an infinite loop where specific current, specific practices are as follows: First:gdb attach
back then a process id.
Then:p (char *)executor_globals.current_execute_data.func.op_array.filename.val
Print PHP file currently being executed.p (char *)executor_globals.current_execute_data.func.op_array.function_name.val
Print the name of the function currently executing.p executor_globals.current_execute_data.opline.lineno
Print the number of lines currently being executed.
Further call stack can get here not carried out.
But the bottom too obvious, a lot of details to note, is not proficient in the PHP core is very difficult to find such a problem (ps: by.gdbinit
can slightly reduce the points of difficulty, but there are many other issues).
Use Swoole Tracker found stuck problem
In response to these problems, Swoole official out a solution Swoole Tracker stack tool supports both FPM and Swoole.
Very simple to use:
- First click on the link above registered accounts.
- Then install the
swoole_tracker
extension. - Finally landing back at
调试器
=>进程列表
Click the堆栈
button to get the current card on which the figure:
end
In addition to the above problem stuck, there is a situation that calls slow down, such as the original system call 5ms, but due to network and other reasons, the call will return to 100ms, the performance of the business is slower and not stuck in there, this situation can not catch stack tool by positioning tracker problems, stuck because time is very short, very difficult to catch the call stack, then need Swoole tool chain is another tool 阻塞IO检测工具
we'll tell you later.