Once again, the production of high CPU load investigation practice

Foreword

Open the mailbox in the morning a few days ago received an e-mail monitoring and alarming: the higher a certain ip server CPU load, please research investigation to solve as soon as possible, the transmission time is just before dawn.

In fact, as early as last year, I have dealt with similar issues, and recorded: "Once the investigation to optimize the production of 100% Practice CPU"

But the cause of this problem was created not the same as last time, we can then look down.

problem analysis

After receiving the e-mail I immediately logged onto that server, also looked at the crime scene (the load is still high).

So I took advantage of this kind of problem troubleshooting routines positioning again.


Firstly, top -cthe system resource usage displayed in real time ( -cparameters can be displayed full command).

Then enter 大写 Pthe application in accordance with the CPUusage of the sort, the first is the highest utilization rate of the program.

Sure it is one of our Javaapplication.

This application is simply to run some of the timing of the report, task scheduling is triggered every morning, a few hours under normal circumstances would be finished running.


The second step is a natural routine operations need to know this application consuming most CPUof the threads in the end re doing.

Use top -Hp pidand enter Pstill can follow CPUthe thread sorting usage.

At this point we need only remember the thread ID to convert it to hexadecimal stored, by jstack pid >pid.loggenerating a log file, use hexadecimal just saved the process IDto search for a snapshot of this thread to know the consumption CPUof the thread doing.

If you find it troublesome, I also strongly recommend Ali open source issue positioning artifact arthasto locate the problem.

For example, the above operations can be streamlined into a single command thread -n 3to be the busiest three threads snapshot to print out very efficient.

More about arthas tutorial please refer to the official documentation .

Because before I forget shots, and here I conclude that it directly:

Green busiest thread is a GCthread, which means it is busy doing garbage collection.

GC View

Investigation here, experienced older drivers will think: mostly application memory use has caused problems.

By then I jstat -gcutil pid 200 50print out (every 200ms printed 50 times) memory usage, gc recovery situation.

Information can be obtained from the following figure:

  • EdenDistrict and oldarea almost filled, visible memory recovery is problematic.
  • fgcRecovering a high frequency, recovery occurred within 8 times of 10s ( (866493-866485)/ (200 *5)).
  • Lasts longer, fgc 8W has happened many times.

Memory Analysis

Since it is the initial position is a memory problem, you still have to take a memory snapshot analysis in order to ultimately locate the problem.

Command jmap -dump:live,format=b,file=dump.hprof pidcan export a snapshot file.

Then we have to help MATthis kind of analysis tools to run for.

identify the problem

In fact, it is obvious that by this picture, there is a very large string in memory, and this string is just this timing reference to the task thread.

Probably forget about this string of memory occupied by about 258m, on a string is already a very large objects.

This string is a loud noise that produced it?

In fact, look at the image above citation relations and the string is not difficult to see that this is a insertthe SQLstatement.

Then have to praise MATthis tool, he can help you predict the memory snapshot gives a snapshot of the problem places at the same threads that may arise.

The final snapshot found this thread by specific business code:

He calls the method a write to the database, and this method will splice a insertstatement, which valuesis stitching cycle generation, probably as follows:

    <insert id="insert" parameterType="java.util.List">
        insert into xx (files)
        values
        <foreach collection="list" item="item" separator=",">
            xxx
        </foreach>
    </insert>
复制代码

So once this list is very large, this mosaic SQL statement will be very long.

Just by memory analysis can actually see that this Listis very large, this also led to the final insertgreat statement occupied memory.

Optimization Strategy

Since it is good to find the cause of the problem is resolved, there are two directions:

  • Controlling the source Listsize, this Listis acquired from a data tables, may acquire paging; so that subsequent insertstatements will be reduced.
  • Control batch size of the write data, in fact, the essence or should spliced SQLlength down.
  • The entire writing efficiency needs to be reassessed.

to sum up

The problem does not solve the long time spent from the analysis that is still relatively typical, in which the process to sum up:

  • First locate the consumption CPUprocess.
  • Repositioning consume CPUspecific thread.
  • Memory problems dumpa snapshot analysis.
  • Concluded adjust the code, test results.

Finally, I would like to have not received a production warning.

Share your thumbs and is the biggest support for me

Guess you like

Origin juejin.im/post/5d07caf2f265da1ba647ed61