Dry: performance testing-related issues and solutions summary (important)

Dry: performance testing-related issues and solutions summary (important)

Before the start of the performance testing you need to know:

1, the specific needs of the project.

2, indicators: response time less than how much, how many concurrent, how many tps, how many total tps, the stability of the total number of transactions, transaction success rates, transaction fluctuation range, stable operation duration, resource utilization, which measure the transaction, which interface which scenario testing.

3, environment: the number of production servers, the number of server test environment, test results indicators in accordance with the ratio of resources.

4, the agreement: the system to communicate with any agreement.

5, the number of presses: If the number of concurrent users too much pressure needs to be sent to different presses, presses otherwise there may be bottlenecks, leading to tps response time and jitter.

6, transaction accounting: accounting analysis LiveJournal come tps.

7, System Architecture: request stream through which part, when the pressure measuring monitor these areas.

test:

1, standard: a user iteration 100 times, concern response times, transaction success rate of 100%.

2, load: 10 users running 10 minutes and focus on response time, transaction success rate of 100%.

3, capacity: estimating a total TPS, calculated according to the formula and the pacing vu each transaction, the maximum processing capacity acquisition system (optimal capacity), then make three measured gradient as the outer comparison (two less than optimum capacity, a group of greater than optimal capacity), the capacity of four arithmetic VU, TPS arithmetic, compare the actual capacity share each test and accounting (the more closer to real world scenarios), the response time of interest, the total tps, tps, transaction success rate, AP cpu utilization, DB cpu utilization, thread deadlocks, database deadlocks.

Wherein the response time should be less than the load test, the total should be about equal to the estimated total tps tps (by no more than 10 is normal), should be close to each transaction tps estimated total proportion tps *, transaction success rate of 100%, AP cpu less than 60%, DB cpu less than 80%. Whether dump thread stack has a thread deadlock detection, check the database to see if the log database deadlock.

4, stability: the capacity to take 80% of the optimum pressure as 24 hours continuous operation, performance observation system of long-running, focus response time, tps, total tps, transaction success rate, total number of transactions, see if there is a memory overflow ( heap overflow, stack overflow, lasting generations overflow), cpu utilization whether the target, whether mem not continue to grow, whether the normal trigger fullgc, gc time, gc frequency, fullgc time, fullgc frequency (focus, JVM tuning is to reduce fullgc frequency).

monitor:

Capacity testing and stability testing start monitoring when nmon.

Pressure measuring performance problems and solutions encountered:

First, the capacity of the testing process cpu too high

1, real-time monitoring with the vmstat cpu usage. AP cpu has a small pressure to more than 80%, no more than 60% of the indicators.

2, analysis is the use cpu sys cpu too high or too high, it is common use cpu to high.

3, if it is too high to use sys cpu, cpu consumption up to first find out the process (top command), to find this thread consumes too much cpu which several threads, the thread then converted into 16 hex, then jstack command to dump thread stack, look at this thread stack calling what led to use cpu too high.

Second, the memory overflow (heap overflow, stack overflow, lasting generations overflow)

1, heap memory overflow

1) After the test pressure stability over time, LR error log message java.lang.OutOfMemoryError.Java heap space.

2) command jmap -histo pid dump heap memory usage, view the top 20 rankings heap memory objects to see if the application has its own way, from the highest to check on, check if the method is caused by what the cause heap memory overflow.

3) If the first 20 years without their own way, use jmap -dump to dump heap memory, in the analysis of the dump down with MAT heap memory, memory overflow analysis of export.

4) If the method of application is no problem, you need to modify the JVM parameters, modify xms, xmx, adjust the heap memory parameters, generally increasing the heap memory.

2, the stack memory overflow

1) After the test pressure stability over time, LR error log reported Java.Lang.StackOverflowError.

2) Modify jvm parameter, the parameter will change xss large increase stack memory.

3) stack overflow caused by bulk operations must be done to reduce the amount of batch data.

3, lasting generations overflow

1) After a predetermined time stability of the pressure measurement, logging packets Java.Lang.OutOfMenoryError.PermGen Space.

2) The reason for this is due to the class, method description, description field, too few constant pool, access modifiers such as static variables, resulting in the permanent generation fill permanent generation overflow.

3) Modify jvm configuration, XX: MaxPermSize = 256 parameter transfer large. Minimize static variables.

Third, the thread deadlock

1, the pressure measuring capacity test period, LR reported connection timeout.

2. The reasons for this phenomenon are many, such as bandwidth is not enough, the thread pool is not enough middleware, database connection pool is not enough, and so will result in the number of connections filled not connect and timeout errors reported.

3, jstack thread stack dump command, search the thread stack, there are no block, if any, is the thread deadlock, deadlock find the thread, the corresponding code analysis.

Fourth, the database deadlock

1, the pressure measuring capacity test period, LR reported connection timeout.

2. The reasons for this phenomenon are many, such as bandwidth is not enough, the thread pool is not enough middleware, database connection pool is not enough, and so will result in the number of connections filled not connect and timeout errors reported.

3, database log search block, the block can be found to exist, then the database is deadlock, find the log view corresponding sql, sql optimization resulting in deadlock.

Fifth, the database connection pool does not release

1, the pressure measuring capacity test period, LR reported connection timeout.

2. The reasons for this phenomenon are many, such as bandwidth is not enough, the thread pool is not enough middleware, database connection pool is not enough, and so will result in the number of connections filled not connect and timeout errors reported.

3, to view the application database connection to the database of the number of (show full processlist), if the application inside the configuration database connection is 30, the viewing application is also a connection to the database 30 in the database, the connection pools filled up . 90 will be configured into a try, to see if the connection to the database 90, it can be determined that the database connection pool does not release the cause. Look at the code, the database connection is not a part of the case to create a connection but does not close the connection. The basic cause of this situation is, you can modify the code.

Six, TPS can not go up

1, stressful times tps frequently shake, resulting in a total tps stagnant. See if there fullgc (tail -f gc_mSrv1.log | grep full).

2, pacing is set too small can also cause tps not go up, shake a big deal for multi-point increase user can be.

3, tps shake, shake single pressure big deal, found it stable, then the suspect is not cause too much pressure, so the hair when the pressure capacity of the largest bird transaction assigned to other press, and then found not shake the tps . Note: Multiple presses only affects tps jitter, will not affect the cpu server.

4, see the response time has not timed out, the number of users is not enough to see enough.

Seven, server pressure imbalance (difference of 1% -2% of normal)

1, running optimal capacity when only one of four AP cpu more than 60%, the other three are below 60%.

2, to see if the server has timed tasks.

3, to see if there is a bottleneck press.

4, whether or not there is a bandwidth bottleneck (LAN does not have this problem).

5, view the deployment version of the configuration is the same.

6, others may also use the AP, because there are many virtual machines on the same physical machine, because people with first, before anyone else the resources are accounted for.

Eight, fullgc too long

1, running capacity and stability, there have LR report request timeout error log is fullgc background check, look at a few newspaper LR wrong and log fullgc time corresponds, fullgc will suspend the entire application, leading to the front LR He did not respond, so the error, then you can reduce the old generation of memory, thereby reducing fullgc time, reduce LR fullgc time the error will not allow users hardly feel the application is halted.

Write a scheduled task to restart the service 2, four AP turns the full gc (part fullgc server, other server will fullgc), then you can develop strategies to allow different server does not fullgc the same time, such as at night or trading less.

note:

Server error log for the next test.

After the service is started within minutes made pressure will be great pressure, it is best to start twenty-three minutes after the service began to run pressure.

 

Guess you like

Origin www.cnblogs.com/111testing/p/11863290.html