Some personal views on the performance testing platform

I just joined the new company three and a half years ago and was responsible for the testing of the performance testing platform.

Now in the new company, he is also responsible for the project management , product design, quality assurance and delivery of the full-link stress testing platform .

Write this article to talk about some of my current thoughts and practical experience.

Organization

The organizational structure of each company is different. It can be done horizontally according to the BU of the business line, or horizontally according to the project team belonging to each different system, and the test team as a functional department is a vertical matrix organizational structure. Introduce the idea of ​​performance test management platform.

1. Task management

1. Task application

Generally speaking, there are two sources of performance testing requirements:

①. The project team raises the demand

The project team actively proposes performance testing requirements, and needs a unified performance testing task management module, which includes the project line to which the system under test belongs, system name and configuration information of related environments, as well as development, operation and maintenance, DB name, and Testing time, deadline time and other information. This situation can be divided into three types:

New system release: When the new system is released and launched, it is necessary to conduct a complete test on functions, performance, security and other aspects to evaluate whether it meets the established online requirements of the business and products; Iteration of the old system: Some optimization of the existing system, the
new The increase of functions or the introduction of new business channels may bring higher traffic impact. At this time, the project manager or development manager will put forward relevant performance requirements, hoping to verify whether the existing system meets the online requirements; production accident repair verification: the system is
in The production environment encounters performance problems and brings some losses. After tuning or repairing, a round of comprehensive performance testing is required to evaluate whether the existing actual business requirements are met; ②. The testing team raises the
requirements

Proactively propose the possible performance bottlenecks brought about by the iteration of the project and the introduction of new requirements, and then after evaluation, decide whether to conduct tests to evaluate the stability and usability of the system.

2. Task approval

After the performance test task application is submitted, the project team, performance team, and even other relevant personnel need to conduct a comprehensive evaluation based on the existing situation, work arrangement, and construction period to decide whether to conduct performance testing, when to start, and resource allocation. It requires the cooperation and participation of multiple teams and multiple personnel, as well as the risk estimation that cannot be delivered on schedule.

3. Task scheduling

According to the specific work arrangement, resource allocation, work scheduling and other further work.

2. Use case management

The use case here refers to the abstract management of the business model established based on the task type, resources and other aspects in the performance test. Specifically, it can be divided into the following three business models:

1. Regular tasks

Routine tasks refer to the performance requirements proposed by system iterations or new system releases, including specific information such as project lines, system names, relevant personnel information, and business models. Carry out modeling and analysis of the specific system under test scenario according to the above situation, and then formulate specific test cases.

2. Daily polling

This type can refer to the automatic execution and conditional triggering in continuous integration, and set up scheduled tasks to perform performance tests on the systems within the scope. The test types mainly include concurrent, multi-node and other test types.

3. Full link pressure test

The full-link stress test is mainly carried out in the production environment and daily online daily performance inspection.

3. Environmental management

The premise of performance testing is that there is a stable and available environment. Generally speaking, it is carried out in the following two environments:

1、UAT

The UAT environment is what we commonly call the user acceptance test environment. The environment is relatively stable, and the configuration is the same as that of production or can be replaced by the same amount, which can meet the needs of conventional performance testing.

2、PAT

The PAT environment can be understood as an independent performance test environment. Others are consistent with production, and the number of applications is kept in proportion to the minimum ratio. It is mainly used for daily iterative stress testing, performance baselines, and problem optimization verification.

PS: The full-link stress test is carried out in a real production environment, but the risk of performance testing in the production environment is too high, and there are many transformations to the existing system. Whether to carry out the test needs to be determined after detailed evaluation.

4. Pressure testing machine management

1. Pressure testing machine scheduling

When the actual flow impact is high, a single press may not be able to support it. In this case, multiple pressure actuators are needed for distributed pressure measurement. In the actual production environment, traffic changes are likely to be random. How to increase and decrease the stress test execution machine and make reasonable use of resources is a problem that needs to be considered and solved.

2. State management

This mainly includes the status changes of the stress testing machine, including idleness, use (even predicting when it can be released for other stress testing tasks, etc.), unavailability (damage or other reasons), etc.

3. Exception management

When the performance test is in progress, if the service is unavailable due to some reasons, it is necessary to stop the pressure test in time. Generally speaking, the following methods are mainly used:

Manual stop: stop the pressure test execution from the function button on the management interface;
fuse measures: through monitoring and alarm measures, when the system is unavailable or exceeds a certain threshold, the pressure test will be automatically stopped; bottom-line
means: when the manual stop or automatic fuse measures When they are all unavailable, some external means can be used to stop the pressure test execution, which is also a disaster recovery measure;

5. Data Management

Tests are driven by data, so what data do you need to prepare for performance testing?

1. Basic data

The basic data includes the data necessary for the normal operation of the system business, such as: SKU information of the e-commerce platform, inventory data, etc., as well as bank user information, relevant data of certain businesses, etc., which can be solved by means of backup from production.

2. Embedded data

Embedded data mainly refers to the DB level, that is, performance testing needs to simulate the actual environment, including data volume, etc. From the DB level, the same database table has different data volumes under the same business model, and its performance is also different of. The preparation of embedded data can be done by backing up from production, or customizing and preparing some usable data through scripts and SQL statements.

3. Test data

The data necessary for the test script to run can usually be resolved through parameterization. Of course, from the perspective of the test platform, the solution to this problem can be managed through a unified data pool, and the interface uses different options to call APIs to generate data available for testing.

6. Monitoring and management

In performance testing, monitoring is an essential and important part. Generally speaking, the following aspects need to be monitored:

Network: Network monitoring generally focuses on these aspects: stability, gateway, CDN, firewall, etc.

Front-end: The time spent on front-end display and resource rendering, which resources consume the most time and resources, all need to be monitored to obtain relevant information.

Redis: Some system architectures use Redis as a persistence layer or buffer, so information such as cache availability, cache failure, and cache penetration must also be monitored.

MQ queue: MQ is an asynchronous communication framework, and there are similar frameworks such as Kafka. It is also essential to monitor the production and consumption rate, resource occupation, and possible congestion of message queues.

Server: Server monitoring, mainly in terms of memory, CPU, IO, etc., including proportion, usage, threshold, and reminders.

DB: The database mainly monitors information such as memory and CPU, as well as SQL execution time and number of connections.

PS: In fact, judging from the above monitoring, they are all monitored layer by layer from the user layer to the final service processing layer, so that everything can be spoken with data.

Seven, log management

During the test process, sometimes hidden problems cannot be seen intuitively from the test data, so it is necessary to check the corresponding logs, analyze the problem nodes from the detailed logs, and then conduct targeted analysis.

From the perspective of the performance testing platform, it is hoped that the logs can be displayed and filtered more intuitively. Otherwise, we need to use commands or other tools to find and analyze logs at specific levels, which will undoubtedly waste a certain amount of time.

8. Report management

Report management mainly includes the following aspects:

1. Real-time results

The real-time monitoring results of the test are stored in the database, and then displayed on the interface through grafana and other tools to manage the test results more intuitively.

2. Test report

At the end of each round of testing, it is necessary to conduct statistical analysis on the test results, so as to provide a reference and evaluation basis for the next test as a basis. The test report interface can be displayed in a custom style.

3. Performance baseline

The performance baseline refers to using the final result of each performance test as a performance reference baseline. In each subsequent iteration, the last performance test result is used as the evaluation point, and then the performance baseline is continuously updated as the basis for the next evaluation.

It can be displayed in various ways such as data line, tree diagram, etc. The purpose is to monitor the long-term system stability and availability, and provide multiple dimensions of reference for system tuning or reconstruction.

9. Mock management

Generally speaking, performance testing is to simulate concurrency and test by calling different service interfaces, but sometimes due to some reasons, some services are temporarily unavailable, or the number of times is limited. In this case, mock services are urgently needed.

Mock services generally need to meet the following characteristics:

high performance;

Support multi-protocol (http, rpc, websocket);

Custom functions and some personalized configurations such as mandatory waiting, random return, etc.;

From the perspective of the platform, it is more convenient to manage and configure by visualizing the Mock configuration for addition, deletion, modification and query management.

10. System Management

1. User management

Including user registration, addition, deletion, status management and other functions of using this system.

2. Authority management

Different user roles are assigned different system access rights, and this module is designed using the principle of single sign-on.

3. Group management

The group management here can be understood as functional groups divided based on different identities and roles, and the management of functions such as adding new functions, assigning rights and functions, and changing status.

The above content synthesizes some ideas and practices of the full-link stress testing platform I am currently working on, and some specific implementation details such as Mock platform, stress testing cluster scheduling, number creation tools, monitoring integration, stress testing log sampling, data reports, etc. function, I will write a series of articles on platform design to introduce it in detail.

Finally, I would like to thank everyone who has read my article carefully. Reciprocity is always necessary. Although it is not a very valuable thing, you can take it away if you need it:

These materials should be the most comprehensive and complete preparation warehouse for [software testing] friends. This warehouse has also accompanied tens of thousands of test engineers through the most difficult journey, and I hope it can help you! Partners can click the small card below to receive  

Guess you like

Origin blog.csdn.net/okcross0/article/details/130169438