Foreword
Open the mailbox in the morning a few days ago received an e-mail monitoring and alarming: the higher a certain ip server CPU load, please research investigation to solve as soon as possible, the transmission time is just before dawn.
In fact, as early as last year, I have dealt with similar issues, and recorded: "Once the investigation to optimize the production of 100% Practice CPU"
But the cause of this problem was created not the same as last time, we can then look down.
problem analysis
After receiving the e-mail I immediately logged onto that server, also looked at the crime scene (the load is still high).
So I took advantage of this kind of problem troubleshooting routines positioning again.
Firstly, top -c
the system resource usage displayed in real time ( -c
parameters can be displayed full command).
Then enter 大写 P
the application in accordance with the CPU
usage of the sort, the first is the highest utilization rate of the program.
Sure it is one of our Java
application.
This application is simply to run some of the timing of the report, task scheduling is triggered every morning, a few hours under normal circumstances would be finished running.
The second step is a natural routine operations need to know this application consuming most CPU
of the threads in the end re doing.
Use top -Hp pid
and enter P
still can follow CPU
the thread sorting usage.
At this point we need only remember the thread ID to convert it to hexadecimal stored, by jstack pid >pid.log
generating a log file, use hexadecimal just saved the process ID
to search for a snapshot of this thread to know the consumption CPU
of the thread doing.
If you find it troublesome, I also strongly recommend Ali open source issue positioning artifact arthas
to locate the problem.
For example, the above operations can be streamlined into a single command thread -n 3
to be the busiest three threads snapshot to print out very efficient.
More about arthas tutorial please refer to the official documentation .
Because before I forget shots, and here I conclude that it directly:
Green busiest thread is a GC
thread, which means it is busy doing garbage collection.
GC View
Investigation here, experienced older drivers will think: mostly application memory use has caused problems.
By then I jstat -gcutil pid 200 50
print out (every 200ms printed 50 times) memory usage, gc recovery situation.
Information can be obtained from the following figure:
Eden
District andold
area almost filled, visible memory recovery is problematic.fgc
Recovering a high frequency, recovery occurred within 8 times of 10s ((866493-866485)/ (200 *5)
).- Lasts longer, fgc 8W has happened many times.
Memory Analysis
Since it is the initial position is a memory problem, you still have to take a memory snapshot analysis in order to ultimately locate the problem.
Command jmap -dump:live,format=b,file=dump.hprof pid
can export a snapshot file.
Then we have to help MAT
this kind of analysis tools to run for.
identify the problem
In fact, it is obvious that by this picture, there is a very large string in memory, and this string is just this timing reference to the task thread.
Probably forget about this string of memory occupied by about 258m, on a string is already a very large objects.
This string is a loud noise that produced it?
In fact, look at the image above citation relations and the string is not difficult to see that this is a insert
the SQL
statement.
Then have to praise MAT
this tool, he can help you predict the memory snapshot gives a snapshot of the problem places at the same threads that may arise.
The final snapshot found this thread by specific business code:
He calls the method a write to the database, and this method will splice a insert
statement, which values
is stitching cycle generation, probably as follows:
<insert id="insert" parameterType="java.util.List">
insert into xx (files)
values
<foreach collection="list" item="item" separator=",">
xxx
</foreach>
</insert>
复制代码
So once this list is very large, this mosaic SQL statement will be very long.
Just by memory analysis can actually see that this List
is very large, this also led to the final insert
great statement occupied memory.
Optimization Strategy
Since it is good to find the cause of the problem is resolved, there are two directions:
- Controlling the source
List
size, thisList
is acquired from a data tables, may acquire paging; so that subsequentinsert
statements will be reduced. - Control batch size of the write data, in fact, the essence or should spliced
SQL
length down. - The entire writing efficiency needs to be reassessed.
to sum up
The problem does not solve the long time spent from the analysis that is still relatively typical, in which the process to sum up:
- First locate the consumption
CPU
process. - Repositioning consume
CPU
specific thread. - Memory problems
dump
a snapshot analysis. - Concluded adjust the code, test results.
Finally, I would like to have not received a production warning.
Share your thumbs and is the biggest support for me