Members test the road to environmental governance

01

   background

Membership business is one of the company's important businesses. It provides the most basic service guarantee for the majority of member users. As the number of members exceeds 100 million, the complexity of the business also increases exponentially. How to efficiently support the testing of membership business has also become a The member testing team has to face objective challenges, the most core and fundamental of which is the governance of the test environment. The characteristics of the test environment are summarized as follows:

  • Feature 1: The number of basic application services is as many as hundreds, distributed under dozens of domain names, and the maintenance cost is high.

  • Feature 2: The call relationship is complex, and applications call each other and depend on each other, and the cost of joint debugging is high.

  • Feature 3: Calls between services rely on routing forwarding and service discovery, and the cost of positioning is high.

Based on the above characteristics, the environment will have the following problems during the test process:

  • Problem 1: There are a large number of applications, and the basic configuration of each application is arbitrary, and there are personalized configurations, which is difficult to manage

    reason.

  • Problem 2: The test environment is seriously public, relying on various R&D or test private servers, and the stability is relatively poor.

  • Problem 3: The routing function cannot be managed in a unified manner, and the call relationship is chaotic. When there is a problem in the environment, it takes a long time to troubleshoot the problem, and it needs to be checked one by one according to the call relationship.

  • Question 4: The test improvement infrastructure is unstable, and some basic test work is less stable and has little effect.

02

   History of member testing environment


  Phase 1: Manual Phase  

After a period of accumulation of project testing time and careful and systematic analysis, the test environment is the most time-consuming and uncontrollable link in project testing. Based on the principle of quality control, the test team decided to focus on building a test environment. Fundamentally control the quality and efficiency of testing. Apply for several fixed virtual machines as test machines, combine the business packaging method and deployment process, and build them manually on the virtual machines. We call it the "manual stage". The specific process is shown in the figure:

4e0f677a1f0568dcf389b065c94d58d1.png

Due to the limited number of machines, multiple applications need to be deployed for a single machine. In order to prevent port number conflicts, wiki is used to maintain machines, applications, and port numbers.

f8dcb089820f97c5dd74e796e9c57ff1.png

Solve the problem:

The test team takes over the test environment packaging-deployment-maintenance process, and the quality and efficiency are controllable.

There is a problem:

  1. Every time an application is deployed on a new machine, a dependency needs to be installed. This dependency has no fixed version, making the configuration and startup commands of the same application inconsistent when deployed on different machines.

  2. Deploying multiple applications on one machine does not avoid conflicts and subsequent nginx configurations. It is artificially stipulated that the same machine cannot deploy the same application, and the port number is maintained by the wiki. The maintenance cost is high, the timeliness is poor, and the accuracy rate is not high.

  3. Different personal habits lead to inconsistencies in application configuration and startup scripts, making it difficult to centralize management.

  4. There is no code compilation process, and the deployment of code packages is completely dependent on research and development, with low timeliness.

  5. There is a lot of configuration work, the operation is cumbersome, and the efficiency is low.

  6. High requirements are placed on the ability of personnel deployed in the environment.

  Phase 2: Script Phase  

With the continuous growth of the membership business, the logic of the service has become more and more complex. The original manual deployment method has been overwhelmed, and there have been a lot of special processing, which can no longer meet the needs of business testing. It is decided to use the Jenkins capability for standardized deployment and unify the application configuration information in the The maintenance is performed in the mysql database. We call this stage the "script stage". The overall flow chart is as follows:

0ed2733333b524b20f0cd72dd9b94e0e.png

Maintain the basic information of each application in the database, including code paths, basic dependencies, compilation commands, and startup scripts, as shown in the following figure:

2740e0786ec8d6be3ea01e858d744dfc.png

The front-end page supports single-application deployment. After Jenkins builds and selects the corresponding application, fills in the branch, and selects the machine, it can be deployed:

4fc115bed959411517f7962847011ca5.png

The process of building Jenkins is shown in the following figure:

27c7e6d1179b60ebb3ffc2f5748bbc23.png

Code deployment to the server details:

ce24f3f25577ca876e76e64909b05b56.png

Solve the problem:

  1. Each application has a basic configuration in the database, which can be managed and maintained in a unified manner.

  2. Dependency, server initialization, application startup, etc. are completed through scripts, standardized and unified versions are easy to maintain.

  3. The code is managed in a unified manner, and the code compilation and packaging process is taken over, so that the service quality can be controlled.

  4. The deployment environment is automated to improve deployment efficiency.

  5. Lower requirements for deployment environment personnel.

There is a problem:

  1. There is no page operation for the configuration content, and it can only be operated in the database, which is inconvenient to operate.

  2. The machine is fixed and cannot be dynamically expanded. If a problem occurs, no one will maintain it, and it cannot support personalized deployment.

  3. Routing configuration files cannot be maintained uniformly.

  4. Environments are not differentiated by usage and scenarios.

  5. The scalability is poor, and the cost is too high when encountering server migration.

  Phase 3: Platformization  

With the upgrading of business code architecture, micro-services, containerization, and cloud-native upgrades, individual scripted configurations can no longer adapt to business iterations, and the deployment of the test environment seems to return to the initial stage overnight. manual stage. At the same time, new business demands emerged:

  • Dynamically apply for machine deployment test environment, use it and throw it away.

  • Single application deployment is extended to template deployment, supporting the deployment and configuration of multiple applications to meet the requirements of scenario-level testing.

  • Multi-role and multi-purpose use, support development joint debugging/test verification, support joint debugging environment, test environment, automation environment, etc.

  • Complicated routing configuration and parameter replacement, support for docking of different test capabilities, etc.

    In this context, the company's testing department builds an environment platform based on the company's resources, completes functions such as unified allocation of test environment resources, unified deployment of benchmark environments, unified management and deployment of applications, and one-click use of the environment. The first link lays the foundation for the improvement of the overall test efficiency. The environment platform capabilities are introduced as follows:

0ee345ef73615957d2bddeed3ffb9f1b.png

Relying on the test environment platform provided by the test department for business adaptation, up to now, the test environment of membership business has been fully connected to the platform, supporting more than hundreds of applications, deploying more than hundreds of times per day, serving more than 100 people, and using the platform Build a test environment The coverage rate of test projects is 90%+, the success rate of stable environment deployment is 99%+, and the success rate of manual deployment of projects is 90%+. We call this stage the "platform stage", and the specific overall flow chart is as follows:

fc8d819bc9fb6966b2817cdaf9b5fe42.png

Solve the problem:

  1. The basic configuration of each application is maintained on the platform, with operable pages, centralized management, and flexible configuration (configuration personalization, nginx, dependent host, registration center).

  2. Membership testing only needs to maintain custom scripts, and the server basically relies on the hosting platform for processing.

  3. Code compilation and other operations host the company's internal CI/CD, the process is standardized, efficiency is improved, and the version is controllable.

  4. The environment is distinguished according to the way of use, reducing environmental problems caused by usage confusion, and avoiding conflicts between branches and environments.

  5. The system and operable range can be seen by personnel through authority control.

  6. The joint debugging environment is easy to manage and can be measured in a centralized manner.

03

   Solve business pain points

In the iterative testing environment combined with the characteristics of membership business, we will focus on solving the following business pain points: 

  1. Test environment integration  

Integrate the test environment based on the purpose, and manage it according to the domain name. Under each domain name, different test environments are built according to the function of use:

  • Stable environment: Each application regularly deploys the master branch every day, which is consistent with the online service code. It provides a stable joint debugging environment and an automated execution environment internally and externally.

  • CI environment: CI template is triggered when git changes, personalized scripts are started after environment deployment, and automated and security tests are performed to ensure that problems are exposed in advance.

  • Business testing: modify the configuration and branch information based on the stable branch, and conduct project testing after the environment is deployed. After the test is completed, the environment is released and the server can be recycled.

4a59aba89ff1e38c7a6c6a0634da49b1.png

The ideal environment construction is to deploy only modified or newly added applications, and use stable ones for other applications, which can reduce the cost of test environment construction and reduce other environmental problems caused by configuration factors.

For example, the project in the figure needs to be completed: test project 1. This function needs to use application A, application B, application C, and application D to complete the overall process test. This time, only application C and application D are modified, and only these two applications are deployed. The application can use a stable environment, and the environment is built as shown in the figure below:

1b5d194e3904bc43258d6d1cd51c432e.png


  2. Routing management  


2.1 nginx management

The routing between member applications strongly depends on nginx. Whether the member test environment is available and whether the routing service is correct depends on whether the nginx configuration is correct. This configuration is important and personalized. After consulting the member test and environment platform public management configuration files, the business is responsible for configuration File management and research and development of custom scripts, the environment platform provides configuration and replacement capabilities, and cooperates according to the following division of labor to ensure the stable operation of nginx. The flow chart is as follows:

5ef97468dd2e53dd404bbd302d1e1ac2.png

As far as the project test in the figure is concerned, how to ensure that the application in the stable environment can be called back to the corresponding application in the project environment instead of the corresponding application in the stable environment. Once the calling address of application B in the figure is modified, two sets of Environments can be used in parallel, and this problem can be attributed to a routing problem, as shown in the following figure:

340a74a61e7d0d80917d95d30acf4b95.png

Calling between services strongly depends on nginx. If nginx can dynamically obtain the changed application ip+port number, the problem of using stable environment applications to provide services for unchanged applications will become simple, which also solves the problem of temporary environment and stable The mutual call problem of the environment, the effect is as follows:

478d9e24d6d348fc03cf9f3a86e6c5d0.png

To achieve the effect shown in the figure above, it needs to be processed through routing. The specific process is as follows:

075f1ed534cfe539ba1c99b559b3ba80.png

Routing is a forwarding function, and the specific logic is still being applied. When starting a specific application, in addition to the necessary parameters of the application, the member test team also adds personalized content for testing and related quality improvement. The specific process is as follows:

a107de6f99f5c8c30f49690faaa82e66.png

2.2.nginx integrated guardro

QAE is the company's internal Docker-based application engine that enables efficient and reliable automated O&M. guardro is Guardro-template. By subscribing to the event message of QAE Controller (Guardro), the upstream file of nginx can be updated in real time, and nginx can be triggered to reload the configuration. By introducing guardro to solve the problem that the dynamic application service of the container leads to the failure of replacing nginx with an unfixed ip, the implementation process is as follows:

  1. Subscribe to Practice News for QAE Applications

  2. Start a small web server locally to receive messages

  3. When a message arrives, according to the content of the message, replace the specific part in the template file to generate the corresponding nginx upstream file

  4. Trigger nginx to reload configuration

After introducing guardro, the specific process is shown as follows:

7a9fe68f4e65b75d69295479b3130abf.png

2.3. Registration Center

The call between services is gradually changed from nginx to microservice call. In order to adapt to offline testing, the application configuration, deployment method, personalized script optimization, and Eureka itself are upgraded to support other applications and replacement capabilities of the synchronous and stable registration center. The application in the template deploys the automatic registration function in order to realize the ability of a registration center in the environment, and supports the separation of stability and project registration centers. The process is as follows:

c2e9b0641bdc1e3121d3c7f4c53813c3.png

The core capability relies on eureka-plus for secondary development. When deploying, in addition to deploying the registration center, an additional synchronization tool will be deployed. This tool realizes the function of synchronization + filtering, and the realization capabilities are as follows:

0d6f8d71e853bbac13578e9e2a9409c3.png

  3. Intelligent positioning of environmental problems  

The most obvious problem brought about by the complexity of business links is the difficulty of troubleshooting. Many factors can lead to such problems, such as dependent service deployment failures, routing jump errors, and service configuration errors. The link platform rover and the map platform atlas are connected to the test environment to provide a one-click problem location capability. The solution is as follows:

3.1 Overall interaction diagram

56b2dd230e83da6403efb411a6f7c635.png

3.1.1 qae docking skywalking

Skywalking can provide jar package form, and the qae container can be introduced at startup

3028f74fa71585f113f62e7d2decab75.png

3.1.2 nginx docking full link

When nginx is upgraded to openresty, lua needs to be pre-installed. At the same time, the business line needs to connect to the log collection system by itself, dynamically change the listening IP address, and perform information delivery to complete the call link display.

38c6a7a039d87cba9c078005c43fbc2a.png

3.2 Final Link Presentation

After the application is connected to the router, the call link information can be clearly displayed, which is convenient for testing to quickly determine the environmental blocking problem during the project test, quickly solve the environmental problem, and improve the test efficiency.

cafa09493b9258a8cf8337e8908a147b.png

04

   Outlook

After several years of accumulation and management, the member test environment has achieved phased results, which plays a vital role in supporting the test business and quality improvement. With the overall wave of cost reduction and efficiency increase, the construction of the test environment will also From virtualization->containerization->cloud nativeization continues to develop, the next step is to continue to cooperate with the test environment platform to try cloud native migration, further empower the business, further maintain advanced, further reduce costs and increase efficiency.

5dfbc2bf0687a7b4b768d8a125f86420.jpeg

maybe you want to see

Exploration and Practice of Member Interface Governance

Video production large mirror optimization practice

iQIYI data lake practice

8abbb91105a6261592e79d6106d5fcee.gif Follow us, more exciting content will accompany you!

Guess you like

Origin blog.csdn.net/weixin_38753262/article/details/129360077