Performance Troubleshooting Process

Project scenario:

Record a performance troubleshooting


Problem Description

Some time ago, the test raised a performance problem, and then analyzed it. This interface is a menu interface, which requires relatively high performance. The request time of the test environment is 500 concurrent, about 2.5s, but the actual requirement is within 1s, so it is To analyze this problem.


Cause Analysis:

1. The main logic of the code

In the first step, we first analyzed the code of this method. The code logic has several parts. 1. Obtain user information in redis 2. Perform statistics on menu visits, and use the user information in the first step 3. Obtain from redis Menu information 4. If the menu information is obtained, return directly, if the information is not obtained, then query the menu information from the database.

2. Preliminary Analysis

Generally speaking, there is no problem with the code logic. First, some preliminary log printing was performed, such as obtaining user information, statistical functions, and obtaining menu information from redis. It took time to print, and then tested in the test environment, and found that obtaining information from redis is basically basic. All of them are above 500ms. If you query from the database, the time will be longer, basically about 7s.

3. Preliminary optimization

First of all, we optimized the problems that we can see. For example, 1. Obtaining user information in redis is a must, so it cannot be omitted. 2. The statistics of visits, which is not directly related to the function of obtaining menus, can be placed Go to the asynchronous thread pool to execute, it is not necessary to execute in the main thread. 3. When obtaining the menu from the database, the sql statement is associated with an application table. There is an application name to be obtained. Use the left join method, and there is an in operation behind it. The in operation may have performance problems, so optimize Under the sql, the where conditions are checked in the menu table, so put these conditions in the menu table to check, and then join the application table in the form of a temporary table to obtain the information of the application table, so that the driving table is the smallest Set join can improve performance, and check whether the index is used correctly and whether the index is established.

4. Test again

After making the above changes, we tested again, but the results were not satisfactory. The results were better, but the impact was not significant. We also analyzed it later. The above changes can improve a little, but the amount of data in the test environment itself is small. It's not very big, and the changes to sql cannot be measured, but it saves some time for asynchronous execution statistics.

5. Local testing

Since the environment cannot be tested, then we can directly test locally to facilitate debugging problems. The local startup test is 500 concurrent, and the time is 4.5s because the local computer is not as good as the server. It is understandable, but it does not affect the debugging performance. Then I suspected that it was a problem with redis. I checked the logs and found that the database was accessed many times during the request process. The redis setting expiration time was 3 minutes, but it seemed to expire in one second. In order to ignore the impact of this problem, set the expiration time to Set to not expire and test again.

6. Test again

After setting the expiration time, the database is no longer requested frequently. It is about 1s faster, 3.5s, but it is still very slow.

7. Analysis

I saw that the download rate is 950kb, the response to a request is about 2.5kb, 500 concurrency, that is, 1M data, 950kb is almost the same, bandwidth is not the reason for the impact, and then continue to check redis, we use jedis, and then I optimized the parameters of the jedis connection pool, changed try(resource){} to use jedispool.returnResource and put it back into the connection pool, and the connection pool parameters increased

        poolConfig.setTimeBetweenEvictionRunsMillis(5000);
        poolConfig.setMinEvictableIdleTimeMillis(60000);
并且把原来只有100个连接改成了500个
    poolConfig.setMaxTotal(500);
    poolConfig.setMaxIdle(500);
    poolConfig.setMinIdle(10);
    poolConfig.setMaxWaitMillis(1000);    

8. Test

The test again found that 1s was reduced to 2.5s. With some optimizations, the code for obtaining user information and traffic statistics was directly commented out later. The test was 1.4s and 1s was reduced.

9. Analysis

Then I wondered if it was the redis problem of the server. I installed redis locally to test, connected to my local redis, and replaced it with my local redis. It was still the same, and it was almost on the verge of collapse. . . Then I checked that the version of redis is 3.1.200. I wondered if my version was too low. Then I went to install a version above Windows 6.0 and debugged it all morning. As a result, the installed version found on Windows was basically unusable. , it is easy to crash and exit the process. Later, I found out that redis is not well supported by windows, and then installed a virtual machine locally, installed redis6.2.5 version on the virtual machine, and enabled multi-thread support. After understanding Redis should know that multithreading support can be enabled after version 6.0.
In the redis.conf file, change no to yes, and change the thread, which is generally consistent with the number of cpu cores.

# io-threads-do-reads no
# io-threads 4

(Later, it was found that the redis version and multi-threading have almost zero influence, not influencing factors)

10. Windows parameter settings

It took more than half a day for the ninth step above, and then adjusted the parameters of wiindows.
One is the setting of the number of handles, and the other is the setting of the number of tcp connections. Make it larger. Don’t affect the number of concurrency because of these. I will post the debug link for the parameters, and you can find them by yourself.
Handle limit

  1. https://www.lmlphp.com/user/16721/article/item/460591/
    tcp connection limit
  2. https://blog.imdst.com/windows-xia-dan-ji-zui-da-tcplian-jie-shu/
    (This is not the cause of the problem, but it also gives you a direction to think about)

11. Test

Then when starting the project, use java -jar to start, check the possible impact of the idea, and the result is still optimized. The final result is about 1.3s. Looking at the log, the redis request takes 500ms. This result is 500ms compared to the beginning. It has been optimized a lot, but it is still not right. It is only 500 concurrency. How can it take so long to get from redis? Redis is not so bad, and it is not a bigkey. The data is only 2.5kb and it is not a big key.

12. Analysis

Then I continued to read the log, request once, then read the complete request log, and found that there were many times in the log to obtain user information, and then I searched the log in the code, and found the previous multi-tenant code, for the redis connection pool An aop interception is carried out, that is, redis is intercepted, and the user's information is added to the key of redis, which is intercepted twice. Both methods are @around, and the tenant information is obtained from redis in @around (redis operation ),
that is, every time redis is operated, it will execute four *2, 8 operations, because redis is operated in the aspect method, around is executed before and after, and this time we found the most influential problem. I commented out this class directly, and then tested

13. Test

The test again is very fast. In the case of 500 concurrency, the redis request takes a very short time, sometimes 0ms, it may be really fast, the machine does not feel time-consuming, and the total time is 100ms, that is, from From 1.3s to 100ms, this section took more than 1s.


solution:

The final solution is to delete the facet operation of redis and implement it in other ways. The final root cause is found to be caused by the facet operation of redispool, so everyone should be cautious when operating facets, especially for frequent The utility class to use.

Summarize

It took nearly a week for this performance optimization, and finally found the problem. The summary is that performance optimization requires understanding of many things, including checking the performance of the machine during the test, looking at the cpu, memory, bandwidth, Disk performance, whether it is a machine problem that limits concurrency, or a network problem, or a code problem, and the code problem involves whether to synchronize all operations, sql optimization, index optimization, redis connection pool optimization, etc. It is easy to analyze one by one to see the result, but it is not easy to find out.

Guess you like

Origin blog.csdn.net/qq_34526237/article/details/127168831