1.xx system tuning real experience
Environmental pressure measuring system architecture is shown below:
Pressure test results
Threads |
TPS |
ART |
APP_CPU |
APP_MEM |
150 |
1551.340 |
0.095s |
57.377% |
16.522% |
200 |
1562.721 |
0.126s |
59.862% |
16.624% |
300 |
1572.278 |
0.188s |
57.108% |
16.643% |
150 concurrent concurrent users, TPS: 1551.341, average response time: 0.095 seconds gradually increased concurrency to 300, TPS: 1572.278 almost no growth, and the average response time of an increasing trend, the CPU consumption of each server in comparing different periods of concurrent basically the same, and did not reach the system bottleneck.
Analyze the reasons
Grab a snapshot of the thread:
As can be seen from the figure: number of threads in the BLOCK state, specifically thread dump file analysis, a large number of blocking log4j
After communication with the frame group, the frame for upgrading, the initialization using log4j2, retest results are as follows:
From the above chart thread snapshot map can be found log4j obstruction has disappeared.
The continued pressure measurement
- Application service system, CPU still can not get pressure, maximum current consumption is still about 60%, this time almost a wireless drive obstruction, redis database with no significant pressure.
- Grab a snapshot of the database analysis: no locks, access time series sql executed soon
- Redis monitor resource usage: memory, IO, CPU consumption are not high
Follow-up analysis of ideas
- Tried to pressure single node using JMeter tool, accessible to each application server consumes 90% cpu.
- Shield F5, polling for four application servers, consistent with the phenomenon through F5.
- JMeter pressure using single, double or four machine results as shown below:
4 app like scoring performance 2app, questioning the possible deployment of application bottlenecks physical machine. After speaking with the system environment group, excluding the host resource constraints may line.
Analysis of combing the logic behind this deal
Use jd-gui.exe decompile jar package item group, to obtain SQL positioned as follows:
Grab a snapshot db2 find this sql implementation:
Then talk about the project method to obtain the sequence of the shield (here is to write a fixed value)
before fixing:
Modified:
Again retest: 200 four thread pressure measurement applications, the TPS about 3000 strokes / second, the application server CPU consumption was 90%, and further confirmed the sequence of taking the limit, the handling capacity.
Configuration database query sequence, a sequence is found in cache, change the value of this empirical value 200, a pressure measurement:
Crawl database snapshot again:
Database snapshots before and after contrast optimization: to obtain the serial execution time reduced 3 orders of magnitude.
Finally, for everyone to leave Questions: After system restart, press-ART measure trends to the next, to explain this phenomenon.