Performance Tuning 01

I have a friend, he once told me, they never go through the company's system performance tuning, complete functional test on the line after, the line has not been any performance problems Yeah, why do many systems have Performance tuning it?

At that time I will answer him one, if your company does is 12306, do not do it on-line system performance optimization, try what would happen.

If you, how would you answer it? Today, we chatted about this topic, you wanted to be able to find out with these questions: Why do we have to do performance tuning? When did you start doing? Do performance tuning is not a standard can refer to?

Why do performance tuning?

If you do not go through an online product performance testing, then it is like a time bomb, you do not know what time the problem occurs, you do not know where the limit it can bear.

Some performance problems is time to accumulate slowly produced, at a certain time naturally exploded; and more performance problem is caused by fluctuations in traffic caused by, for example, user activity, or the company's products rose; of course there may be a the product line after the half-dead, has not been much traffic, so it did not cause time bomb.

Now suppose your system to do an event, product manager or boss tells you expect hundreds of thousands of users visited and asked whether the system can withstand the pressure of the event. If you do not know their own situation the performance of the system, only timidly answered the boss, there may be no problem about it.

So, either do performance tuning, this question is actually very good answer. All system after completion of the development, more or less, there will be a performance problem, we first need to do is to find ways to expose problems, such as stress testing, simulation of possible operating scenarios, etc., and then to solve these through performance tuning problem.

For example, when you query a certain piece of information with a App, need to wait for more than ten seconds; in buying activity, you can not enter the event page, and so on. You see, the system response is most directly reflects the performance of a reference factor.

If the response that the problem does not appear on the online system, we are not do not do performance tuning out? Give you to tell a story.

Once my former club before system development department came a big God, why call him the great God, because he came to the company in the year, he only had one thing, is to reduce the number of servers to half of the original, performance of the system, but also improves.

Not only good system performance tuning can improve system performance, but also for the company to save resources. This is what we do the most direct purpose of performance tuning.

When did you start tuning intervene?

Why do performance optimization to solve the problem, then a new problem arises: If you need to do a comprehensive system of performance monitoring and optimization, we begin to intervene when tuning it? Is not intervene sooner the better?

In fact, in the early stages of project development, we need not be too concerned about performance optimization, so that would give us struggling to optimize the performance, not only will not improve the system performance, but also affect the progress of the development, and even get the opposite effect, giving the system brings new problems.

We just need to ensure effective coding at the code level, for example, reduce disk I / O operations, reduce the lock contention and the use of efficient algorithms and so on. Encounter more complex business, we can fully use the code to optimize the business of design patterns. For example, the design of commodity prices, they often have a lot of discounts, red envelopes activities, we can use decorative patterns to design this business.

After the completion of the coding system, we can test the performance of the system. At this time, the product line manager will generally provide the expected data, we measured the pressure at the reference platform provides, through performance analysis, statistical tools to statistical performance indicators to see whether within the expected range.

After the success of the project on-line, we also need the actual situation of the line, in accordance with the log monitoring and log performance statistics, to observe system performance issues, once problems are detected, it is necessary to analyze the log and promptly fix the problem.

What reference factors can reflect the performance of the system?

Above we talked about performance tuning in various stages of research and development projects it is how to get involved, in which repeatedly talked about performance metrics, performance indicators in the end, what does?

Before we understand the performance metrics, we start to understand what computer resources in the system will become a performance bottleneck.

CPU: Some applications require a lot of computing, they will be a long, uninterrupted take up CPU resources, leading to other resources can not compete for CPU and slow to respond, leading to system performance issues. For example, an infinite loop code backtracking recursion caused due to regular expressions, JVM frequent FULL GC, as well as a large number of context switches caused by multi-threaded programming, etc., which are likely to cause the CPU resources are busy.

Memory: Java programs are generally allocated to memory management by JVM, it is mainly used in the JVM heap memory to store objects created in Java. The system heap memory read and write speed is very fast, so the absence of the basic read and write performance bottlenecks. However, due to the high cost of memory than disk, compared to disk, memory, storage space is very limited. So when memory space is fully occupied, the object can not be recovered, it will lead to memory overflow, memory leaks and other problems.

Disk I / O: disk compared to memory, the storage space is much larger, but the disk I / O read and write speeds slower than memory, although the introduction of SSD solid state drive has been optimized, but still not with the memory of read and write speeds on a par.

Network: Network system performance, it also plays a vital role. If you purchased a cloud service, you must have had to select the size of the network bandwidth links. The bandwidth is too low, the transmission data is relatively large, or larger than the concurrent system, the network can easily become a performance bottleneck.

Abnormal: Java applications, the need to build thrown exception stack, abnormal capture and processing, this process is very consuming system performance. If you throw an exception in the case of high concurrency, exception handling continuously, then the system performance will be significantly affected.

Database: Most database systems will be used, and the operation of the database is often related to read and write disk I / O's. Large number of database read and write operations, will cause the disk I / O performance bottleneck, leading to delayed database operation. For a large number of read and write operations of the database system, the performance of the database is the core of the system optimization.

Lock competition: in concurrent programming, we often need more than one thread, read and write operations share the same resources, this time in order to maintain atomicity of data (that is, to ensure that this shared resource in another thread a thread write, not modify), we will use the lock. Use locks may bring context switching, leading to performance overhead to the system. After JDK1.6, Java context in order to reduce lock contention caused by the switching of the internal JVM lock optimization has been done many times, for example, adding a lock bias, spin locks, lightweight lock, lock coarsening, eliminate lock Wait. And how rational use of lock resources, optimizing resource lock, you will need to learn more knowledge of the operating system, Java multi-threaded programming based on accumulated project experience, combined with the actual scene to deal with related issues.

Understand these basic contents of the above, we can get a few indicators below, to measure the performance of the system in general.

Response time

Response time is an important indicator to measure the performance of the system, the shorter the response time, the better the performance, the general response time of an interface is in milliseconds. In the system, we can put a bottom-up response time is subdivided into the following categories:

Database Response time: the time consumed by the database operation, the entire chain of requests is often the most time-consuming;

The server response time: the server comprises a program end time, and the service requests Nginx distribution consumed execution time consumed;

Network response time: This is a network transmission, network hardware needs to request for transmission operations such as the time consumed by parsing;

Client Response time: For ordinary Web, App client, consuming time is negligible, but if your client a great deal of logic embedded processing, consumption is likely to become a long time, making the system bottleneck.

Throughput

In our tests, we tend to pay more attention to system interface TPS (transactions per second), as reflected in the performance of the interface TPS, TPS, the better the performance. In the system, we can also put the throughput from the bottom up into two categories: disk throughput and network throughput.

Let's look at disk throughput, disk performance are two key metrics.

One is the IOPS (Input / Output Per Second), the input output per second (or write cycles), which I refer to a unit of time the system can handle / O request number, I / O requests typically read data or write operation request, the random write performance is concerned. Adapt to frequent random read and write applications such as storing small files (pictures), OLTP databases, mail servers.

The other is data throughput, this means the amount of data transmitted successfully per unit time. For a large number of sequential read and write frequent applications, continuous transmission of large amounts of data, for example, television video editing, video-on-demand VOD (Video On Demand), data throughput is the key measure.

Next look network throughput, this refers to the absence of frame loss transmission network, the device can accept the maximum data rate. Not only has a relationship with the network throughput bandwidth even told closely related CPU processing power, network card, firewall, external interfaces and I / O and so on. The size of the main throughput, internal program algorithm and bandwidth size is determined by the processing power of the network card.

Allocation of computer resources usage

Usually represented by the resource usage CPU usage, memory usage, disk I / O, network I / O. These parameters is like a bucket, if any piece of wood appear short board, any one of the unreasonable distribution, the impact on overall system performance is devastating.

Load-bearing capacity

When the system pressure rises, you can observe, the rising curve of the system response time whether the gentle. This indicator can directly back to you, the system can withstand load pressure limit. For example, when you perform pressure measurement system, system response time will increase the number of concurrent systems extended until the system can not handle so many requests, a lot of errors when thrown on to the limit.

to sum up

With today's study, we know that performance tuning can make the system stable, the user experience better, even in larger systems, but also help the company to save resources.

But in the beginning of the project, we do not need to intervene prematurely optimize performance, just in time to ensure its outstanding coding, efficient, and good programming.

Upon completion of the project, we can test the system, we can use the following performance indicators as a standard performance tuning, response time, throughput, resource allocation computer usage, load-bearing capacity.

Looking back at my own experience of the project, there is electricity supplier systems, payment systems and billing system to recharge the game, the user level is the million level, and to bear all kinds of large-scale buying activities, the performance of the system so I'm very demanding. In addition to determining the performance of the system by observing the above indicators is good or bad, also need to update iteration, to fully protect the stability of the system.

Here, you extend a method, that is, before the iterative version of the system performance indicators as a reference standard by automating performance testing, system performance after checking whether there is an abnormal version of the iterative development, where not just compare the throughput, response time , such as a direct indicator of the load capacity, but also CPU usage of system resources, changes in several indirect indicators of memory usage, disk I / O, network I / O and so on.

WillliveWillWork

Published 24 original articles · won praise 0 · Views 420

Private letter concerns