JD Logistics Normalized Pressure Test Practice | JD Cloud Technical Team

Author: JD Logistics Wang Jiangbo

1. Purpose of normalized stress testing

Why do normalized stress testing?

Currently facing the main problem, the performance problem is found after a delay , which brings uncontrollable risks to the big promotion. At present, frequent iterations of daily requirements, changes in system configuration, changes in upstream and downstream dependencies, replacement of server resources, and many other factors will have a certain impact on system performance; it is difficult to perform stress tests on all new projects or requirements before and after they go online. This often results in many performance problems being delayed until the pressure test during the big promotion.

The pressure test preparation time for the big promotion is tight, there are many tasks, and the pressure test preparation pressure is relatively high. In the 11.11 review, some departments' work hours statistics, stress testing accounted for a large part of the workload. Compared with other problems, performance problems are more difficult to optimize and have a longer repair cycle. In the case of a shortage of parallel resources for multiple special projects in preparation for the big promotion, frequent system tuning brings uncontrollable risk factors to the whole big promotion.

Based on this, the method of normalized pressure testing is introduced to continuously control system performance and ensure service stability through weekly or monthly regular pressure testing behaviors; at the same time, performance problems caused by online requirements are exposed in advance , and timely positioning and optimization Problem; reduce the pressure of preparations and improve the efficiency of pressure testing.

2. Implementation process of normalized stress testing

2.1 Normalized pressure test

Normalized stress testing is an automated stress testing behavior based on a certain period or specific trigger conditions. Through periodic stress testing on a single container/cluster, the goal of monitoring changes in performance indicators and discovering service performance degradation risks in a timely manner is achieved.

2.2 Implementation strategy

Through the three-step approach, from the shallower to the deeper, gradually normalize the pressure test in the platform technology tribe:

The first step of stand-alone pilot: As the first time to use the normalized pressure test, by isolating the stand-alone environment , I learned about the stress test idea, execution process, stress test platform capability support and risk point identification of the normalized stress test;

The second step of cluster pilot: select star-rated core services on contract fulfillment and basic platforms, and execute small clusters (2-3 sets) in the online environment to perform normalized pressure testing tasks, from the impact of online business, the impact of upstream and downstream dependencies, and the pressure testing platform Evaluate the feasibility of normalized stress testing in online clusters from various aspects such as capacity support and online stress testing risk management and control;

The third step is in full swing: according to the practice of contract fulfillment and the online normalized pressure test cluster of the basic platform, it is promoted to the technical department of the entire platform , and combined with the Kit pressure test tool, the core service performance data dashboard is established , and the performance report of the pressure test results is collected and summarized. Make the service performance trend visualized; open the green channel for stress testing, normalize the service that meets the stress testing standards, and promote the stress testing green pass.

2.3 Implementation process

Normalized stress testing interface : Prioritize the core interface covering the main business process, and the star-certified interface in the core service of ops-review.

Pressure test template task selection criteria:

1) Set the stress test template task according to the production peak traffic model combined with the server resource usage;

2) Sort out link-dependent interface calls, and set up a stress test template from the perspective of call volume according to the upper limit of the worst downstream dependencies and combined with its own interface performance;

3) For services without downstream dependencies, set up a stress test template from the perspective of throughput or CPU according to the system's own best processing capabilities;

Frequency of stress testing: It is recommended to perform stress testing on long and complex interfaces every day; for self-closed-loop systems, it is recommended to implement it at the launch frequency.

Stress testing window period: During the low-peak business period confirmed by the production and research institute, specify the execution time of the normalized stress testing task.

Pressure testing environment: Normalized pressure testing is performed on a stand-alone machine or a small cluster in the production environment.

Pressure test data: It is recommended to use the real traffic on the R2 recording line as the input parameter of the normalized pressure test to ensure the validity of the pressure test results.

Pressure test results: Every time the pressure test results are tested, the classmates are on duty to follow up, and the unqualified interfaces and Xingyun bugs are continuously tracked, and collaborative research and development are carried out for performance analysis, troubleshooting, and pressure test task maintenance.

Pressure testing tools: forcebot normalized pressure testing and R2

3. Normalization plan

• In 2023-Q1, carry out normalized pressure test points on contract performance and basics, form best practices, and share technology empowerment.

•In the 2023-Q2 quarter, it will be promoted to the platform technology department -distribution and transaction lines. Before the 618 promotion, the platform technology department will complete 125 normalized pressure test constructions based on jdos3.0 for core 0-level reading services.

4. High-fidelity pressure measurement based on traffic recording

The Double Eleven promotion has just passed, and a very important part of preparing for the promotion is to conduct stress tests on the major core service systems to ensure the stability of the system during the promotion period, and at the same time provide support for the expansion of the promotion based on the stress test results. data support. So how to conduct high-fidelity stress testing so that the stress testing results are closer to real online performance? In the entire stress testing process, the preparation of stress testing data is a very important link, which largely determines whether the stress testing results are true and reliable;

With the continuous development of business, not only user traffic and business scenarios are becoming more and more complex, but also service calling relationships and modules are becoming more and more numerous, and data structure is becoming more and more difficult. Simple data sets cannot simulate real online business traffic. An untrue flow ratio may easily lead to distortion of pressure measurement results.

At present, major companies basically use traffic recording for module-level stress testing or full-link stress testing. First, the recorded traffic is stored, and then the traffic is edited and filtered, and then pressure is sent to the service under test through the stress testing engine. ; Combined with the Forcebot stress testing platform, this chapter introduces in detail how to use the R2 platform to record online traffic for high-fidelity stress testing.

4.1 Traffic Recording Pressure Test

The basic frame diagram of using the R2 platform to record online traffic for stress testing is as follows:

1. Users access online services to generate real traffic based on users;

2. The tester creates a recording task on the R2 tool management terminal of the Taishan platform. When the task is started, the operation command is sent to ducc, and then the recording command is sent to the online server through ducc ​​(the online service has opened pfinder and connected to the R2 platform), Start recording online traffic;

3. The recorded traffic will be reported to the R2 tool and will be stored as data;

4. After the traffic recording is completed, a stress test script can be created on the Forcebot stress test tool platform. The Forcebot platform has been connected to the R2 platform, and the R2 server is requested to obtain the playback traffic address and load the recorded traffic;

5. After the Forcebot platform obtains the traffic, it can normally send pressure to the service under test through the press machine to perform the pressure test task.

4.2 Record pressure measurement traffic

According to the system architecture and stress test scenario analysis, select the interfaces and scenarios that need to record traffic.

• If only a single interface is considered during stress testing, then it is sufficient to record the traffic of a single interface;

• Some applications have multiple core interfaces and require mixed-scenario stress testing. When recording traffic, it is necessary to record the traffic of multiple interfaces at the same time;

•Of course, you can also set in the recording task to record only the traffic that requests or responds to a specific business scenario;

① Create a pressure measurement traffic recording task: select the entry application, set the name and file size of the recording task, note: Generally, when recording pressure measurement traffic, it is recommended to record all scene traffic and produce actual traffic with high fidelity as much as possible; During tasks, it is recommended that the size of the recorded file be no higher than 2G;

Traffic recording policies include manual recording, scheduled recording, and periodic recording. During normalized stress testing, in order to prevent the traffic from being too old and the current production traffic to deviate greatly, you can create a task of periodically recording traffic on the R2 platform, and record the production traffic on a daily or weekly basis to ensure that the stress testing data of immediacy.

② Select the initial service to be recorded. You can select multiple interfaces to record at the same time. The platform will display the interface call link. You can start recording for the service or middleware on the call link at the same time, and then select the recorded instance. After setting The recording can start after the task.

After the traffic recording is completed, you can create a stress test script on the forcebot stress test platform;

4.3 Create a pressure test script

4.3.1 Single interface pressure test script

Create a JSF playback script in the script management, edit the recording information configuration, select the application to be stress tested, and the corresponding R2 recording traffic task. Forcebot supports searching or manually uploading the JSF file (jar package) on the JD private server platform, and the platform will automatically Then parse the classes and methods in the jar package, and call jsfOpenApi to obtain the interface alias and directly connected ipPort. Obtain information about interface services through the above methods, and quickly build an environment for jsf interfaces. After selecting the interface to be tested, the jsf alias, and the method of the pressure test, the pressure test script will be automatically generated; the generated script is associated with the recording request in the selected R2 recording task by default, and the pressure test can be performed directly.

As shown in the figure below, you can verify the intranet environment. You can verify whether the script can normally obtain traffic and initiate an actual request to the corresponding interface. This is also a necessary step before stress testing. After the verification script passes, save it. The corresponding script and lib file are automatically generated; if it is a single interface scenario stress test, you can use this script to create a stress test task here.

It is worth noting that the script generated in this way is not editable, and you need to edit the script to create a custom script; here, you must have thought of it smartly, this page can only select one of the methods of an interface, if you want What should I do to perform mixed stress testing on different methods or different interfaces of the same interface? Don't worry, the answer is on the way. .

4.3.2 Multi-interface hybrid pressure test script

In actual production, our applications often provide multiple interfaces, or provide different method services on the same interface. If we only conduct stress testing according to a single interface during stress testing, such stress testing data can only reflect the performance of the system itself under a single-scenario transaction. In actual production, especially during big promotions, the system often Multiple interface requests need to be processed, and system resources are also shared by multiple interfaces, so mixed-scenario stress testing can better reflect the real processing capabilities of the system;

Before performing a mixed pressure test, it is necessary to first clarify the proportion of the call volume of each interface scenario in the same period of time. When creating a pressure test script, it is necessary to set the proportion of pressure requests in each pressure test scenario according to this ratio. rate ;

Step 1: Generate a standard JSF playback script

Before customizing the script, generate a standard JSF playback script as described in 3.3.1, and the dependent lib files will be automatically generated;

Step 2: Generate custom scripts

The default script generated in step 1 cannot be edited. You can generate a custom script when viewing the code, and then edit the custom script.

① First define the interface path and its methods, corresponding to the aliases of different interfaces, and then load traffic according to different interfaces;

Among them, ipList specifies the IP and port of the server to be pressure tested. If the interface alias is a cluster deployment and only wants to perform pressure testing on one of the machines, you need to specify the IP and port;

② Create playback transactions for different interfaces. Here, the interface path, interface loading traffic, and interface aliases all need to correspond one-to-one. rate is the call volume ratio of multiple interfaces involved in the script, for example, interface 1: interface 2: interface 3=7:8:5 (refer to the call volume ratio of each interface during the big promotion or daily call peak period), then you need Set the corresponding pressure ratio in testCase.

③ Because multiple interfaces involve different interface paths, traffic sources, and interface aliases, it is necessary to modify the default doReplay method without parameters to the method of passing parameters

④ After the script is modified, click Save

⑤ The creation of mixed pressure test scripts with the same interface and different methods is the same, the difference is that the aliases for the same interface are the same, and there is no need to specify other interface aliases;

Step 3: Import the attached jtm.properties

After editing the custom script in step 2, the verification cannot be executed successfully because the script still lacks the attached document for traffic recording and playback. After saving the script, return to the upper-level directory, download the attachment jtm.properties in the standard groovy script generated in step 1 to the local, then upload the attachment document to our custom script, and modify the attachment document of the script .
Add jtm.replay.recent.record.num=1 at the end of the attached document , specifying to obtain the latest recorded traffic bound to the periodic traffic recording task every time the stress test is performed;

4.4 Practice of high-fidelity pressure testing promoted by Double Eleven

With the convenience provided by the R2 traffic recording platform, it is no longer difficult to obtain online traffic, which can help us quickly complete the preparation of pressure test data, and at the same time restore the actual business scenario with high fidelity in the pressure test traffic.

In this Double 11 promotion, the logistics promise business line fully adopts the method of R2 traffic recording to carry out the pressure test of the big promotion. The self-pressure test results are closer to the performance of the online interface, and the authenticity reaches more than 90%. It provides a resource expansion evaluation for the big promotion. more accurate data support. At the same time, through this high-fidelity stress test, we found multiple system performance problems, including the problem of reduced availability in extreme business scenarios.

The figure below shows the performance comparison of using R2 traffic recording pressure test, military exercise pressure test and double 11 promotion.

5. USF normalized pressure test practice

Based on focebot's normalized stress testing capability, USF was selected to select 3-star core services for normalized stress testing practice, TOP4 core interface was selected, R2 was used to record online traffic, and mixed scenario normalized stress testing was carried out according to the traffic model of the big promotion , to continuously monitor the performance of the core interface of the USF.

Forcbot supports normalized stress testing tools, multiplexing stress testing tasks (supports traffic recording and stress testing tasks), configurable performance baselines including response time TP99 and TPS, server CPU and other resource indicators set performance baselines, and judge based on performance baselines Whether the pressure test is up to standard, and the pressure test results that do not meet the standard can be set to automatically create cloud defects to track and process performance problems. It also provides pressure test monitoring comparison data and historical records of pressure test results to facilitate analysis of performance results and problems, automatically sends pressure test email notifications, and synchronizes performance pressure test results in a timely manner.

Currently, the normalized pressure test of forcebot supports the following functions:

•1. Support the multiplexing of stress testing tasks. You can use historical stress testing tasks without creating stress testing tasks and scripts separately. It supports jsf, http, custom jimdb, jmq, and playback scripts.

•2, configurable timing execution tasks, flexible execution time.

•3. Support traffic recording.

• 4. Can automatically create line cloud defects

•5. It is configurable whether the pressure test reaches the standard baseline (effective: whether to use the indicator for the pressure test compliance rate statistics; if it is checked, it will be used as one of the indicators, and if it is not checked, it will not be used as a statistical indicator when calculating the compliance rate. Standard: Checked When the index values ​​that take effect on both lines are met at the same time, the stress test result is up to standard; otherwise, if any index value does not meet the conditions, the stress test fails to meet the standard.

The following is the normalized pressure measurement based on USF.

5.1 Pressure test material preparation

Pressure measurement data:

• Select 14:00-16:00 during the peak business period to record 10% of the online traffic corresponding to 6 machines, and record [public cluster] into 1G. (Multiple clusters will be considered later)

•The recording interface service is the TOP4 interface on the USF3.0 line. It has completed the star mark management, reached the interface of Samsung, and completed the availability rate, TP99, and management with downgrading and current limiting schemes.

Pressure test scenario: mixed scenario design (model)

Application deployment topology diagram:

Pressure testing environment:

The pressure testing environment is currently a single-instance UAT environment with the same configuration as the online one.

• It is consistent with the existing online database and cache, and the online data has been synchronized.

• The configuration of the pressure test environment database and the configuration of the cache service are consistent with the online ones.

1. Online machine configuration* 60 instances

2. Application server configuration: 4C8G

3. Database configuration: 16C64G memory

4. Pressure test machine configuration

5. Application server configuration: 4C8G

6. Database configuration: 16C64G memory

5.2 Pressure test risk assessment

• Pressure testing environment selection:

•1) Normalize the pressure test in the UAT environment with the same configuration first, and continuously adjust the performance baseline according to the performance results

•2) After stabilization, the applications and middleware in the production environment can be reused for normal stress testing.

• Task execution window:

•Choose low-peak business hours for stress testing, combined with ump monitoring. The peak hours of USF services are generally during the peak hours of system usage during the morning of 6-9, 9-11, and 14-17. Therefore, the current window period for task execution is 17:40 on weekdays. At present, the manned alarm information is processed in a timely manner, and the relevant conditions of the monitoring application and the database are monitored.

• Stress test link synchronization:

•Sort out the upstream and downstream links of the stress test, determine the stress test range, stress measurement level, and stress test time, and synchronize relevant parties to reach a consensus.

5.3 Create a normalized pressure test task

5.3.1 Selection criteria for pressure test template tasks

• Reuse historical stress testing tasks (template tasks) to directly create normalized stress testing tasks. When actually selecting a historical stress test task scenario, it is recommended to choose according to the actual situation of the system. Generally, you can choose a performance inflection point scenario or a scenario where the pressure reaches the expected value (such as CPU60% or TPS compliance). Generally, it is recommended not to stress test the system resource saturation. status scene.

•Example: Here we choose the historical pressure test task for USF, which is the scenario where the interface meets the throughput TPS of Double Eleven. At this time, the CPU pressure of the server is 27%, and the CPU of the database is 36%.

Template task selection:

View the task, you can see the script executed in this scenario, and set the number of concurrent threads, execution mode (concurrent mode and RPS mode), and execution time related to the pressure, which can be adjusted as needed.

5.3.2 Pressure measurement timing task setting

• The execution cycle can be specified by cycle or Cron expression, usf uses Cron here: 0 40 17 * * ? Execute at 5:40 pm every day. And set the target number of threads and execution time here. (This will cover the number of threads and execution time in the pressure test task).

•The frequency of normalized pressure test execution and the execution time reference are comprehensively customized according to the code launch cycle and the low-peak time period of business calls .

1) Execution mode - RPS mode

What is bound is that the pressure test task is RPS, so the normalized pressure test task we created is also in RPS mode. The target QPS setting is not the sum of the QPS of all interfaces in the script , but the pressure measurement target value corresponding to the interface with the largest proportion in the script . If the configuration is wrong, excessive pressure will be generated.

2) Execution mode - concurrent number mode

The bound pressure test task is in concurrent mode, so the normalized pressure test task we created is in concurrent execution mode. The target number of threads is set.

5.3.3 Pressure measurement baseline setting

• According to the pressure test scenario corresponding to the pressure test task, set a reasonable pressure test baseline according to the transaction name (interface method). If the associated pressure test task is a mixed script, you can set multiple interface transactions step by step (transaction name default: forcebot .test method name) performance baseline. The indicators of general concern are average TPS, TP99, number of errors, and CPU monitoring. Allowable fluctuation range, according to the actual situation of the interface to give a certain fluctuation space. If it is greater than the set fluctuation range, and select the setting to submit Xingyun bug, Xingyun bug will be automatically submitted, which is convenient for bug tracking to close the loop.

• Points to note in baseline indicator setting: If the baseline value is extremely low, the allowable fluctuation range percentage needs to be set relatively large, otherwise a small fluctuation will be considered as a failure of the pressure test. Baseline fluctuation range, specific interface specific analysis, R & D and testing reached a consensus,

1) Custom Performance Baseline Settings

• Example of USF's findUserInfo service setting:

• TPS reference value = 2700, the allowable fluctuation range is 10%. (2430-2970) floating up and down

• TP99 base value = 12ms, the allowable fluctuation range is 50%. (12ms-18ms) Floating up, time-related is floating up.

•The base value of error number=0, and the allowable fluctuation range is 0.

• CPU monitoring baseline value = 25%, allowable fluctuation range = 20%. (20%~30%) up and down

Transaction name: Currently, it cannot be recognized automatically. The transaction name that can be written in the script defaults to forcebot. The test method name can also be enhanced to use a custom transaction name that is
TestUtils.transactionBegin("findUserInfo"), that is, findUserInfo

Performance baseline setting For example : the interface performance tp99 is around 12ms. At this time, the baseline value is set to 12ms. If the allowable fluctuation range is set to 10%, then the allowable fluctuation range is 12ms*10% = 13.2ms. It is obviously unreasonable to fail the test. At this time, we need to set the allowable fluctuation percentage according to the maximum acceptance range of our interface tp99.

2) Multi-interface performance baseline setting

In the custom baseline setting, multiple interface transactions can be added, and the transaction is the transaction name in the script.

Default transaction name: forcebot.test method name

Custom transaction name: such as TestUtils.transactionBegin("findUserInfoByOrgCode");

5.3.4 Xingyun Defect Tracking

For the values ​​of various indicators that do not meet the performance baseline settings, the normalized pressure test results are not up to standard. If the task configuration is enabled to automatically create cloud defects, cloud defects will be automatically submitted for the execution results that do not meet the standards, which can ensure Each stage of the bug life cycle can be traced to ensure that problems are dealt with in a timely manner.

5.3.5 Monitoring positioning problem

You can view the performance trend of the service for a period of time. If the interface performance fluctuates greatly, you need to further investigate the cause of the interface performance degradation.

1) Monitoring data-TPS

2) Monitoring data-TP99

3) Execution record comparison details PK

In the execution record, there are script version and whether it meets the standard and bug details. Select the results that meet the standard and those that do not meet the standard, and perform PK comparison. The comparison items include TPS, TP99, Error Per Second and other indicators.

The pressure test results of USF-related interfaces, the PKs that meet the standards and those that do not meet the standards are as follows: It was found that there was an error call in 12-04, and further traced the cause of the error

5.3.6 Sending stress test results by email

Set up the receiver’s email address, and copy the email to the R&D and testing personnel. The stress test result email will provide a summary of the stress test data. If a certain indicator in the stress test result fails to meet the standard (beyond the set value and fluctuation range) , the task is considered as not up to the standard. Combining monitoring information and logs of the execution time period with R&D to locate problems or adjust performance baseline indicators.

The mail is as follows:

Guess you like

Origin blog.csdn.net/jdcdev_/article/details/130488275