Comprehensive analysis and performance testing - Design and build a full-link stress testing platform to get through...


Preface

1. What exactly is stress testing?

Stress testing refers to testing under high concurrency and large traffic. Testers can observe the performance of the system under peak load to find performance risks in the system.

Like monitoring, stress testing is a common way to discover problems in the system, and it is also an important means to ensure system availability and stability.

In the process of stress testing, we cannot only do stress testing on a certain core module, but need to include the access layer, all back-end services, databases, caches, message queues, middleware and dependent third-party service systems and Its resources are included in the goals of stress testing.

Because once user access behavior increases, the entire link containing the above component services will be impacted by uncertain large traffic. Therefore, they all need to rely on stress testing to discover possible performance bottlenecks. This stress test performed on the entire call link is also called "full-link stress testing."

2. How to build a full-link stress testing platform

There are two key points in building a full-link stress testing platform.

One point is the isolation of traffic. Since the stress test is conducted in a formal environment, it is necessary to distinguish between stress test traffic and formal traffic, so that the stress test traffic can be processed separately.

Another point is risk control. That is to say, try to avoid the impact of stress testing on normal access users.

Therefore, generally speaking, a full-link stress testing platform needs to include the following modules:

1) Traffic construction and generation module;
2) Pressure test data isolation module;
3) System health check and pressure test flow intervention module.

3. Generation of stress test data

Generally speaking, the ingress traffic of our system is HTTP requests from the client. Therefore, we will consider copying these inlet traffic during the peak period of the system, and after doing some traffic cleaning work (such as filtering some invalid requests), store the data in NoSQL storage components such as HBase and MongoDB or Among cloud storage services such as Amazon S3, we call them traffic data factories.

In this way, when we want to perform stress testing, we can obtain data from this factory, divide the data into multiple parts, and then send it to multiple stress testing nodes.

Special points to note:

First, we can use multiple methods to copy traffic. The simplest way: directly copy the access log of the load balancing server, and the data is written to the traffic data factory in the form of text. However, when initiating a stress test with the data generated in this way, you need to write your own parsing script to parse the access log, which will increase the cost of the stress test and is not recommended.

Another way: use some open source tools to copy traffic. Here, I recommend GoReplay, a lightweight traffic copy tool, which can hijack the traffic of a certain port of the local machine, record them in a file, and transmit them to the traffic data factory.

During stress testing, you can also use this tool to perform accelerated traffic playback, so that you can implement stress testing on the formal environment.

Secondly, as mentioned above, when we deliver stress test traffic, we need to ensure that the node delivering the traffic is closer to the user, at least not in the same computer room as the service deployment node, so as to ensure the authenticity of the stress test data.

In addition, we also need to dye the pressure test flow, that is, add pressure test marks. In actual projects, I will add a tag item to the HTTP request header, for example, called is stress test. After copying the traffic, I will add this tag item to the request in batches, and then write it to the data traffic factory.

4. How to isolate data

While copying the pressure test traffic, we also need to consider modifying the system to isolate the pressure test traffic from the official traffic, so as to avoid the impact of the pressure test on the online system. Generally speaking, we need to do two things.

On the one hand, for requests to read data (generally called downstream traffic), we will do mocks or special processing for certain services or components that cannot be stress tested, for example.

In business development, we usually record the user's behavior based on the request. For example, if the user requests the page of a certain product, we will record the behavior of browsing the product one more time, and these behavioral data will be written into a separate big data In the log, it is then transmitted to the data analysis department to form a business report for the product or boss to make business analysis and decisions.

During the stress test, these behavioral data will definitely be increased. For example, the number of browsing behavior of the product page in a day was 100 million times, but after the stress test it became one billion times. This will have an impact on business reports and subsequent products. direction decisions. Therefore, we do special processing for user behaviors generated by these stress tests and no longer record them in big data logs.

For another example, our system will rely on some recommendation services to recommend some products that you may be interested in. However, a characteristic of the display of this data is that the displayed products will no longer be recommended.

If your stress test traffic passes through these recommendation services, a large number of products will be requested by the stress test traffic, and online users will no longer see these products, which will affect the recommendation effect.

Therefore, we need to mock these recommendation services, so that requests without stress test marks can go through the recommended services, and requests with stress test marks can go through the Mock service.

When building Mock services, you need to pay attention to one thing: these Mock services are best deployed in the computer room where the real services are located. This can simulate the real service deployment structure as much as possible and improve the authenticity of the stress test results.

On the other hand, for requests to write data (generally called upstream traffic), we will write the data generated by the stress test traffic into the shadow library, which is a storage system that is completely isolated from online data storage. For different storage types, we will use different shadow library construction methods.

1) If the data is stored in MySQL, we can create a database table structure that is the same as online in the same MySQL instance and different Schema, and import the online data as well.

2) If the data is stored in Redis, we add a unified prefix to the data generated by the stress test traffic and store it in the same storage.

3) There is also some data that will be stored in Elasticsearch. For this part of the data, we can put it in another separate index table.

By special processing of the downstream traffic and adding a shadow library to the upstream traffic, we can isolate the stress test traffic.

5. How to implement stress testing

After copying the online traffic and completing the transformation of the online system, we can implement the stress test. Before this, a stress test goal is usually set. For example, the QPS of the overall system needs to reach 200,000 per second.

However, during the stress test, the request volume will not be increased to 200,000 times per second at once, but the traffic will be gradually increased according to a certain step size (for example, each stress test increases by 10,000 QPS). After increasing the traffic once, let the system run stably for a period of time and observe the performance of the system.

If you find that a bottleneck occurs in a dependent service or component, you can first reduce the stress test traffic, for example, roll back to the QPS of the last stress test to ensure the stability of the service, then expand the capacity of this service or component, and then continue to increase the traffic pressure. Measurement.

In order to reduce the labor investment cost during the stress test, a traffic monitoring component can be developed, in which some performance thresholds are preset.

For example, the threshold of the container's CPU usage can be set to 60% to 70%;
The upper limit of the system's average response time can be set to 1 second;
The proportion of system slow requests is set to 1% and so on.

When the system performance reaches this threshold, the traffic monitoring component can detect it in time, and notify the stress testing traffic delivery component to reduce the stress testing traffic, and send an alarm to the development and operation and maintenance students, who can quickly troubleshoot the performance. Bottleneck, continue to perform stress testing after solving the problem or expanding capacity.

The following is the most comprehensive software testing engineer learning knowledge architecture system diagram in 2023 that I compiled.

1. Python programming from entry to proficiency

Please add image description

2. Practical implementation of interface automation projects

Please add image description

3. Web automation project actual combat

Please add image description

4. Practical implementation of App automation project

Please add image description

5. Resumes of first-tier manufacturers

Please add image description

6. Test and develop DevOps system

Please add image description

7. Commonly used automated testing tools

Please add image description

8. JMeter performance test

Please add image description

9. Summary (little surprise at the end)

On the road to chasing your dreams, don’t forget your original intention of struggle. Stick to your passion and surpass yourself. Only by putting in hard work and sweat can you shine with your unique glory.

Keep faith, be down-to-earth, and keep working hard. No matter how difficult the road ahead is, believe in your own abilities and bravely pursue your dreams. Only by persevering can you create a truly brilliant life.

Difficulties will not always exist, but the spirit of persistence will make you stronger. Bravely pursue your dreams and persevere, and the light of success will eventually shine on your life.

Guess you like

Origin blog.csdn.net/shuang_waiwai/article/details/134908225