Let’s talk about the stress testing solution | JD Logistics Technical Team

Preface

Stress testing in preparation for major promotions is not a new thing at this stage. There are no technical bottlenecks or resource issues. Each team has many people who can perform performance testing, and some teams have already implemented it as a daily routine. However, Stress testing is not as simple as just setting parameters on the stress testing platform, running scripts, and then seeing whether a certain indicator in the stress testing report meets the stress testing goals. I have also done performance testing with some classmates, and found that There are some detailed issues in the stress testing process. Some students do it but do not fully understand it. The stress testing program is a particularly important part of the performance test. Today, I will discuss with you some understanding of the stress testing program;

The essence of performance testing is to simulate users in the production environment, construct users' real behavioral requests, put pressure on the stress test system as realistic as possible, and verify whether the system performance meets business needs and whether there are performance bottlenecks;

From it we can see several core points: stress testing goals, stress testing scenarios, and stress testing environment. Today we mainly focus on three major parts.

1. Stress test goals

When we were formulating a stress testing plan and talking about stress testing goals, many students directly considered TPS, QPS, and TP99.

A very important thing that has been ignored is the stress test background, which is why we want to do this stress test. The stress test background is the direction of our stress test. If the direction is wrong, it will cause us to waste time and effort after the stress test is completed. , the conclusion of the pressure test is meaningless

So what is the relationship between stress testing background and stress testing goals?

Let’s talk about several common stress testing backgrounds we have now:

1. Businesses whose call volume increased significantly during the big promotion period compared with normal times

Goal: Whether it can support the estimated peak traffic of the big promotion (business estimate + business growth + big promotion growth) + the carrying capacity of a single computer room + how much calls the interface can support

Conclusion: xxx configuration with optimal/maximum throughput xxx can support the estimated peak call volume of this promotion xxxx. Is there any risk?

2. Contents that have been stress tested during the last major promotion, but have been added/modified since then

Goal: Whether the interface performance has changed + whether it can support the estimated peak traffic of the big promotion + how much calls the interface can support

3. The newly added interface has not been stress tested, and there is an estimated business call volume.

Goal: Whether the interface can meet the estimated business call volume (limit) + normal

4. The newly added interface has not been stress tested, and there is no estimated business call volume.

Goal: Interface performance evaluation, are there any bottlenecks in system performance? Is there any room for optimization?

5. The new interface replaces the old interface

Goal: Interface performance evaluation, whether it can meet the business call volume of the old interface + the new business call volume

6. Performance optimization of old interfaces

Goal: Compare the performance indicators before and after optimization to see if there is any optimization effect.

7. During the big promotion period, the peak call volume does not change significantly compared with the daily call volume, but the peak hours are longer for the business

Goal: Peak flow stability pressure test

8. There is not much call volume, but users have relatively high requirements for user experience.

Goal: Is there any perceptual delay in the interface response time? Pay attention to TP99 and see if it needs to be optimized.

9. I don’t know whether the system needs to be expanded.

Goal: Extreme stress test, whether the application server and database resource usage is reasonable

10. It is known that the performance of links and interfaces is relatively slow. I need to know where the bottleneck is.

Goal: Find weak points in the link and interface system (content can be optimized)

2. Stress test scenario

The research and construction of business models is the most core part of our pre-stress testing work. The creation of business models must be based on the actual production environment system business operation mode. Only when the model conforms to the actual production business usage mode, the performance test results can It truly and effectively reflects the performance of the system after going online. The quality of business modeling directly determines the success of performance test execution, which is what we call the stress test model.

Three things need to be analyzed clearly during the business modeling process:

1. What are the scenarios that generate traffic? How to choose the scenarios that require stress testing?

2. How to distribute and design the traffic between various scenarios and transactions?

3. To achieve the test goal, how much floor data is needed to construct and how should this floor data be distributed and deployed? In addition, business model design data is required, and how the data should be distributed and structured.

In fact, what you need to do is to understand the business clearly and complete it based on it: business model, traffic model and data model.

1. Business model

Generally, it is collected, organized and refined from four dimensions: business operation perspective, technical operation perspective, online problem analysis and testing experience:

Business operations: Collect actual user usage and business growth trends from the perspective of actual business applications. For example, we have several sources of business calls. What are the scenarios that users usually use the most? What are the peak hours of user operations?

Technical operations: From the perspective of technical operations, sort out the calling links of our interface implementation logic. For example, if the dispatch workbench line is expanded and queried, the exception interface will be adjusted if there are less than 20 tasks under the line, and the exception interface will not be adjusted if there are more than 20 tasks. ;For example, how to implement the query interface, whether through cache or database;

Online problems: Based on the collection of user feedback and online problems, combined with online problem repair methods; for example, front-line feedback that the xxx operation will perceive the response to be slow, what user operation scenarios will R&D cover when optimizing this?

Testing experience: Improve the business model based on testing experience

2. Traffic model

After the business scenario is determined, it is necessary to think about how to distribute the traffic between various scenarios and transactions? .

The user operation scenarios in the production environment are relatively complex, and the size of the request packets and request paths are also different. It is unreasonable to use a single request packet for stress testing. There are two ways of thinking in the process of traffic model analysis. That is, user behavior model and system business model.

User behavior model: By describing the characteristics of user behavior during peak periods, and through research and analysis of user behavior, a user behavior model is summarized. What is more common is the current traffic recording, which records business traffic during peak business hours and then plays back the traffic. The advantage is that it is relatively simple to implement, but the disadvantage is that the recorded traffic user behavior is difficult to statistically analyze.

System business model: Based on the system business characteristics during peak periods, obtain the system business traffic during peak periods through system logs, data points, etc., learn the main traffic behavior of users, and then implement it by writing scripts and configuring traffic proportions. Advantages The traffic proportion of users is relatively clear. The disadvantage is that it requires manual classification and analysis of data.

3. Data model

After designing the business model and traffic model, you also need to know how much basic data (also called floor data) is needed. The purpose of the floor data is to be as consistent as possible with the online test (at least the quantity distribution is consistent), no matter what type of database it is, For different volumes of data, the query options used are different. A full table scan for hundreds of rows of data is definitely better than an index scan, but what if there are millions of rows? This facilitates our detailed assessment. Generally speaking, the amount of data should be based on the amount of data in the actual production environment, and the equivalent amount should be replaced in the performance test environment.

Summary: How many users (WHO) at what time or for how long (When), based on how much data (How much), what business was completed (What), and finally what indicators need to be focused on (How).

Example:

Single interface stress test (different calling methods):

1. The default page opens automatic query, and the default query is 20 records per page, which is also the largest user form.

2. Called through the interface, the maximum number of items returned by the interface per page is 1,000.

Single interface cache stress test:

  • Walk cache for 10 minutes
  • Do not cache cache for 10 minutes
  • Partial cache hits, partial cache misses, 5 minutes of cache failure every 10 minutes

Single interface mixed scenario stress test:

Scheduling workbench - paging to query lines within user permissions and different user behaviors

1. Queries in one area per day

2. Query volume in 7 regions in 1 day

3. Three mixed scenarios (1 area in 7 days 10%, 7 areas in 1 day 20%, 1 area in 1 day 70%)

Single application multi-interface hybrid pressure test:

Proportion of interface calls based on call volume

Single service interface hybrid stress test:

Applicable scene:

The operation indicator dashboard (linked to the large screen page, with about 150 common users) calls five interface methods when the user accesses the large screen:

After the user initially enters the page, 5 interfaces are called by default without query conditions.

When the user stays on this page, the page content is automatically refreshed every 30 seconds and 5 interfaces are called.

The user manually enters the query conditions and clicks the query to automatically execute the query operation and call 5 interfaces.

Extreme pressure test:

[618 Preparation Stress Test-Execute Core Process Extreme Stress Test] Test Plan

Double instance linear growth verification

Full link stress test:

[Preparing for 618 Performance Test-Transportation ORC Order Receiving] Performance Test Report

System stability stress test

After meeting the business requirements, continue to pressurize for a period of time (4-6) to verify system stability.

3. Stress testing environment

1. Prioritize the online environment

2. Differences in throughput between stress testing environment and production environment. How to choose between single instance stress testing and dual instance stress testing?

3. Differences between stress testing environment and production environment

4. Differences between the pressure testing environment and the production environment in terms of data volume

Author: JD Logistics Zhu Fei

Source: JD Cloud Developer Community Ziyuanqishuo Tech Please indicate the source when reprinting

 

Alibaba Cloud suffered a serious failure and all products were affected (restored). Tumblr cooled down the Russian operating system Aurora OS 5.0. New UI unveiled Delphi 12 & C++ Builder 12, RAD Studio 12. Many Internet companies urgently recruit Hongmeng programmers. UNIX time is about to enter the 1.7 billion era (already entered). Meituan recruits troops and plans to develop the Hongmeng system App. Amazon develops a Linux-based operating system to get rid of Android's dependence on .NET 8 on Linux. The independent size is reduced by 50%. FFmpeg 6.1 "Heaviside" is released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/10141960