Performance Measurement and Analysis process (a)

A dark and stormy night, a big wave line (G +) special offers on a taxi platform, so many are single. Ever since, the fight smart prompt service car platform used could not carry a direct lie nest (as shown below). Later, in charge of intelligence services prompted the development and operation and maintenance of the authorities after the meet to decide: must conduct a comprehensive in-depth performance thoroughly intelligent prompt service immediately! just now! immediately!

So a big lump oncoming problem: Smart Tips For such a background service, performance testing should be concerned about those indicators? What is the meaning of these indicators? By what criteria are these indicators? Below you will find answer it.

Outline

      Concerned about the performance of different groups have different emphases. Background service caller interface is generally only interested in throughput, response time and other external indicators. The owner of the background service is not only concerned about the external indicators, but also internally focused metrics CPU, memory, load and so on.

Take a taxi platform, it is concerned with IntelliSense external indicators can be withstood a surge in traffic due to big waves caused concessions. As for the development of intelligent prompt service, operation and maintenance, testers, not only concerned about the external indicators, but also concerned about the CPU, memory, IO and other internal indicators, as well as matters related to the deployment of operation and maintenance, the server hardware and software configuration.

 

External indicators

From the outside, the performance test focused on the following three indicators

  • Throughput: number of requests that can be processed per second system, the number of tasks.

  • Response time: processing a service request or a time-consuming task.

  • Error Rate: error in the results of the proportion of requests batch of requests.

Indicators response time depends on the particular service. As a class of smart prompt service, returns the data valid period is short (multi-user need to re-enter a letter request), real-time requirements are relatively high, the upper limit of the response time generally less than 100ms. And a navigation service class, since the results returned using relatively long period (entire navigation process), the upper limit of the response time is generally 2-5s.

   For statistical response time, should mean a plurality of angles, .90, .99, statistical distribution, rather than the mean are given. Below is an example of a response time statistics

Throughput indicator being response time, the server hardware and software configurations, and other factors affect the network status.

  • More throughput, the longer the response time.

  • The higher the server hardware configuration, the greater the throughput.

  • Network poorer the smaller the throughput.

In the mean response time at low throughput, stable distribution, will not have much fluctuation.

At high throughput, response time and throughput will increase with the growth trend growth may be linear, it may also be close to the index. When close to the peak throughput of the system, the response time of the surge will occur.

Error rate and service implementation dependent. Typically, the proportion of errors due to external factors such as caused by network timeouts should not exceed 5%%, resulting in the error rate due to the service itself should not exceed 1%.

 

Internal indicators

From the server's perspective, performance test focused on CPU, memory, server load, network, disk IO, etc.

CPU

All instructions and background data processing services are the responsibility of the CPU, the service plays a decisive role in the performance of the CPU utilization of services.

Linux system CPU main statistical data follows several dimensions

  • us: cpu time used by user mode Percentage

  • sy: cpu-time system state Percentage Used

  • ni: nice used weighting process cpu time allocated to user mode Percentage

  • id: idle time cpu percentage

  • wa: cpu wait for IO completion percentage

  • hi: the percentage of time-consuming hardware interrupt

  • si: soft interrupt the percentage of time consuming

The figure is the output line of open platforms forwarding service on a server top command, the following services, for example with this CPU is described on the indicators

 

us & sy: Most services use the background of CPU time slice occupy us and sy ratio is the highest. At the same time these two indicators is influence each other, a high proportion of us, sy proportion of low, and vice versa. Generally it means that the measured ratio is too high sy services between the system and the user mode toggle frequently, then there will be some decrease overall system performance. In addition, the use of multi-core CPU on the server, the scheduling among responsible nuclear CPU 0 CPU, usage on the CPU 0 is too high can lead to other scheduling efficiency between the CPU core is low. Therefore, during the test CPU 0 need to focus on.

ni: Each Linux process has a priority, a high priority process has the right priority to the implementation of, this is called pri. In addition to the priority of the process, there is a correction value priorities. This correction value is called nice value of the process. In general, the measured service and overall server ni value is not very high. If during the test value ni is relatively high, you need to configure the Linux server system, measured service operating parameters to find the cause

id: the process of running online services, the need to retain a certain degree of redundancy id to deal with unexpected traffic spikes. During performance testing, if id been low, do not increase the throughput, you need to check the test server thread / process configuration, the server system configuration.

wa: disk, network, and other IO operations will lead to increased CPU wa indicators. Typically, network IO occupied wa resources are not very high, and frequent disk read and write can lead to wa surge. If the service is not measured IO-intensive services, you need to check that the measured amount of log services, data loading frequency.

hi & si: hardware interrupt is peripheral interrupt to the CPU, peripheral hardware that is sent to the CPU or memory is an asynchronous signal hardware interrupt signal; soft interrupt issued by the operating system kernel software itself interrupt signal. Typically the operating system kernel interrupt handler or by a hardware interrupt process scheduling program, that is, we often say that the system calls (System Call). In performance testing, hi there will be some CPU utilization, but not too high. For IO intensive services, CPU occupancy rate will be higher si.

RAM

Performance testing for the main purpose of monitoring is to check the memory test services fluctuation in memory occupied.

In the Linux system with a plurality of commands to get the memory usage of the specified process, it is the most commonly used top command, as shown in FIG.

 

among them

  • VIRT: total number of virtual memory used by the process. It includes all code, data and shared libraries plus pages have been swapped out, all memory has applied for a total of

  • RES: physical memory (stack, heap) no exchange process is used, the application memory after memory segment has been re-assigned

  • SHR: process uses the total number of shared memory. This value may simply reflect shared with other processes memory, it does not mean that this memory is currently being used by another process

  • SWAP: virtual memory used by the process being swapped out the size of the exchange is to have applied, but the space is not used, including (stack, heap, shared memory)

  • DATA: Process total physical memory other than the executable code, i.e. process stack, heap total space applications

As it can be seen from the above explanation, the main testing process monitoring RES and VIRT, using a shared memory multi-process architecture services, but also need to monitor the sofa control SHR.

LOAD (Server load)

Linux systems running load refers to the average queue length, which is the average number of processes waiting for the CPU

As can be seen from the definition of server load, the server running the ideal state is the core of all CPU run queue is 1, that is, all active processes are running, no waiting. In this state the server is running under load threshold.

Under normal circumstances, according to experience, the server load should be in 70% to 80% of the threshold, so that both use most of the performance of the server, but also leave some redundancy capability to deal with traffic growth.

Linux provides a command to see a lot of load on the system, the most commonly used is the top and uptime

uptime and top load for the output of the same content, the system recently are 1 minute, 5 minutes, 15 minutes of mean load

 

View the system load threshold command is as follows

 

During performance testing, the system load is one of the most important indicators of the health evaluation of the entire system. Typically, the pressure should be close to the test system but not the load exceeds a threshold, the system load can not exceed the maximum concurrent testing the 80% threshold, and when the stability test, the system should be about 50% of the load threshold value.

The internet

Monitoring network performance testing includes monitoring the network traffic, network connection status.

Network traffic monitoring

You can use nethogs command. This command is similar to the top, is a real-time interactive command, run the interface is as follows

 

In the background service performance tests, the results for return text service, does not require much attention in terms of traffic.

Network connection status monitoring

Performance monitoring of network tests to monitor a network connection state of the main changes and abnormalities. For TCP protocol services, need to monitor changes in service has been established connection (ie TCP connection ESTABLISHED state). HTTP protocol for the service need to monitor the buffer state of the network service corresponding to the measured process, number a TIME_WAIT state. Linux comes with a lot of commands such as netstat, ss support the above functions. The figure is the result of the specified pid netstat monitoring process

 

Disk IO

Performance testing, if the test disk read and write services too often, leads to a large number of requests waiting for the IO state, the system load increases, the response time becomes long, the throughput decreases.

Under Linux iostat command can be used to monitor disk status, as shown below

 

  • tps: the number of transfers per second device. "First transmission" means "one I / O request." A plurality of logical requests may be combined into "one I / O request." "First transmission" requested size is unknown

  • kB_read / s: amount of data from the second device (driveexpressed) read units Kilobytes

  • kB_wrtn / s: amount of data to a second device (driveexpressed) written in units of Kilobytes

  • kB_read: total amount of data read unit Kilobytes

  • kB_wrtn: the amount of the total number of written data units Kilobytes

Output from iostat, it is possible to get the system running the most basic statistics. But for performance testing, these data can not provide more information. -X need to add parameters

 

  • rrqm / s: this device a read request is associated Merge the second number (when the system call requires the read data, VFS requests sent to each FS, FS if found to be different from the same read request to read Block data, FS will request the merge combined)

  • wrqm / s: the device associated write request second number is the Merge

  • await: the average processing time for each IO request (in milliseconds)

  • % Util: in all IO processing time within the time, divided by the total time statistics. For example, if the statistical interval of 1 second, 0.8 seconds, the device has the IO processing busy, idle and 0.2 seconds, then the apparatus% util = 0.8 / 1 = 80%, which implies the device parameter.

 

Common performance bottlenecks

  • To a certain load not yet reached the upper limit threshold system : the system is generally measured resource allocation services too little caused. If the number of threads found during testing such cases, can be opened from the ulimit, system, allocated memory and other dimensions to locate the problem

  • CPU of us and sy is not high, but wa high : If the measured service is a type of disk IO-intensive services, high-wa is a normal phenomenon. But if it is not such a service, the most likely cause is twofold, first service in question for disk read and write business logic, reason for the high frequency is too high to read and write wa, write data is too large, such as unreasonable load data the strategy, too many log, are likely to cause this problem. Second, the lack of server memory, non-stop service in the swap partition swapped out.

  • Response time the same request suddenly big and small : This problem occurs under normal throughput, it is possible for two reasons, first, the service in question for locking logical resources, resulting in processing certain requests during spent a lot of time wait for unlock resources; the second is the limited resources allocated to the service of Linux itself, some requests need to wait for another request after the release of resources to continue.

  • Memory continued to rise : at a fixed throughput premise, if memory continued to rise, it is likely that significant memory leak test services, such as memory checks need to use valgrind tool positioning.

 

For (chestnut) Examples

Smart Tips lying nest after the services, must immediately do its performance thoroughly. Under the current circumstances, the test results need to provide external indicators and internal indicators.

Structure and function of each module intelligent alerts service as shown below

 

We can see from the figure, before testing the underlying data service intelligence prompted by the Service's performance has been determined upper limit. Therefore, this test is our task in the underlying data service performance 3500qps premise, to find the upper limit of the performance of each module smart tips upstream service.

A background service complete performance testing procedure as shown in FIG.

 

Test preparation:

  • Test data: Because smart prompted to run already online, this test using intelligence tips log lying nest day as the test data

  • QPS prediction: this test is to find this number

  • Server configuration: Use the same hardware and software configurations and online server 

 

Pressure measurement process:

We use Jmeter transmitting test data to simulate a user request, the original test configuration file Jmeter used as shown in FIG. As can be seen from the figure, the performance test profile configuration file consists of data (shared among threads manner, reaches the end of behavior, etc.), to control the throughput, HTTP sampler (domain name, port, HTTP the METHOD, request body, etc.) , in response to the assertion (the content of the returned results verify).

 

Data profiling

 

Throughput control

 

HTTP requests samples

 

In response to the assertion

 

 

 

CPU

In linux, sar, top, ps and other commands can monitor the cpu usage. Generally, the most commonly used is the top command. Top output command is as follows:

 

The top command is an interactive command run will always remain in the terminal and regularly updated. In performance testing, the following parameters can be used to make the top command to run only once

$top –n 1 –b –p ${pid}

 

Server Load

linux, the uptime command acquired using the server load, the output of the command is shown below

 

The meaning of each column is as follows:

"The number of users currently logged on a long time the system is running last 1 minute, 5 minutes, 15 minutes load average"

 RAM

In linux, top, ps command can be viewed on the memory usage of a specified process. But the most accurate information in / proc / $ {PID} / status, as described below in FIG.

 

The output of the above command, we focus on VmRSS, VmData, VmSize

 Disk IO

Disk monitoring data using iostat command to get

 

Test Report Output

After completion of the statistical monitoring indicators collected during performance testing, performance reports can be output.

In general, the performance of the report to include the following:

  • Test Conclusion: include measured service maximum QPS, response time and other metrics that have reached expectations, deployment recommendations.

  • Test Environment Description: including performance requirements, test configuration with the server, the source of test data, test methods

  • Statistical monitoring indicators: response time statistics, QPS, server statistical indicators, process indicators statistics. It would be best to graphically represent statistical data. 

 

Epilogue

After testing, it was concluded that the performance of a single intelligence service is prompt 300qps, the performance of the entire online service is smart tips 1800qps; and flow a dark and stormy day about 5000qps +, no wonder smart tips lie nest, indeed flow too, far more than the throughput capacity of the line.

Finally, smart alert service application server for expansion, and a car hit the traffic platform has been restricted, both revenue and expenditure, to ensure the future dark and stormy night a public about wine, about food, about people of a taxi P experience, improve the success rate of about variety, can be described as extremely virtuous.

 

 

 Original Address https://cloud.tencent.com/developer/article/1038026

 

Guess you like

Origin www.cnblogs.com/111testing/p/11403445.html
Recommended