How does Tonglian Data use Docker+Rancher to build an automatic release pipeline?

Hello everyone, today I will share the experience of Tonglian Data using Docker and Rancher to build an automated release pipeline. I will introduce the process and scheme design of Tonglian Data's automated publishing, the experience we have learned from some pits we have stepped on, and our automation. The current state of the system operation of the release pipeline.

1. The need for communication data and the reasons for choosing Rancher

Tonglian Data is a newly emerging financial technology company in recent years. Its main goal is to combine big data, cloud computing, artificial intelligence and other technologies with professional investment concepts to create a world-class and revolutionary financial service platform. . There are several keywords involved here: big data, cloud computing, artificial intelligence. These hot words have been mentioned by many companies, so I will not repeat them.

As a start-up company, Tonglian Data will also encounter problems that many start-up companies will encounter. For example, there are too many products, and many products will be overturned every year, and various problems will be encountered. For example, when an application is being developed, it involves issues such as how many CPUs it needs and what language to use for programming. The developers will not communicate with the back-end staff in advance to come up with a product and let the operation and maintenance staff go online directly.

Based on such a background, we urgently need to open up this pipeline, because in the foreseeable future, there will be a large number of new applications of Tonglian Data every year, and how to open up this pipeline needs to be solved urgently.

The first thing we do is to do continuous integration. Since our company produces hundreds of new projects every year, each project has different languages, frameworks, and deployment methods. The release process is also relatively long and inefficient, and deploying applications takes up a lot of time for operation and maintenance personnel.

We decided to solve this problem with containers, and selected Rancher after evaluating several vendors, mainly because:

First of all, the operation interface of Rancher is very simple . I believe that people who have used Rancher will feel this way. It does not require much professional knowledge and is easy to get started. Secondly, Rancher is also very simple to deploy and can be deployed with one click . Rancher also provides good API support for easy integration.

The process of automating the release

Then, we started building our own CI/CD. At that time, we encountered a lot of difficulties in the process. The following picture has been simplified. In fact, there are many more branches and branches:

Enter image description

As far as our original process is concerned, the first part of the process is research and development; then it enters the QA environment for deployment. At this time, people need to deploy, usually operation and maintenance personnel, but the operation and maintenance personnel are generally unwilling to do this; after the deployment is completed, Enter the QA environment testing phase and notify the developers and testers to test. There may be delays in the process because the testers may be busy with other things and cannot test immediately; after passing the QA environment test, enter the STG environment for deployment; For STG environment testing, these processes may be repeated many times. Then enter the security testing phase; after the test is passed, there is a preparation process for a formal package, and finally it can be deployed in production. Even this simplified process is very complicated, and it involves a lot of offline communication, so the efficiency cannot be high.

After using Rancher, the original process is greatly simplified. The improved simplified process is as follows:

Enter image description

The first step of the process is continuous integration, which means that developers can directly pass CI after writing the code. CI triggers automatic compilation, and then automatically deploys the script, and the test environment is ready. To put it simply, every time a developer submits a code, the test environment is always in a ready state. At this time, you can directly enter the test phase. The whole process is offline, and you don't need to go through any process, and everything is automated.

After the tester completes the test, the STG environment test is carried out, because the background has been connected with Rancher and automated, which gives the QA environment test a powerful automation capability that has never been seen before, which means that the QA environment test can be automatically connected to the STG environment. test. After the test is passed, it enters the security test stage. This stage is required by the company and cannot be avoided. After the security test is passed, it will enter the production deployment. The offline communication steps and some deployments that could not be avoided before can be omitted, the whole process is optimized and concise, and the efficiency is also improved.

CI/CD may be very simple, such as PUSH code, QA environment is automatically ready. However, this is not the case in practice, and there are still many problems that need to be solved. For example, the branch model of development involves when developing code, which branch can be deployed by PUSH, or can all PUSH branches be deployed?

3. Development branch model

At that time, we thought that the best way is to deploy any branch of PUSH, which is very convenient. But then it turns out that this approach doesn't work, it's confusing, and it's hard to manage. Previously, one of our Git The Successful Branch Modeling branch model is similar to this. This model specifies a develop branch, a feature type branch, a release type branch, a hotfixes type branch and a master branch .Enter image description

In normal development, developers often cut a feature branch on the develop branch. For example, to develop a story that contains many functions, then everyone cuts a story, and the story here has its own ID, and then develops it. After the story is developed, merge it. In the end, we only chose one line for CI. When the code is PUSHed or merged into the develop branch, we will help you to do this, and when this feature branch is used, we will do it differently.

The feature branch means that when the user submits a feature branch, we will deploy another set for you. Each feature branch deploys a set, which is equivalent to each story can be tested separately and independently, and finally merged into the develop branch. Then when testing, the test can decide on which branch line to test according to its own needs.

For example, in the A test, if the user only cares about story A, then test it in the test environment of the story A branch. After all these stories are merged in, perform a centralized test again. After the test is passed, switch this branch to release when publishing. A branch of , and then release the official package on the release, let QA continue to test in the STG environment, just like the flow chart seen earlier. The branch model is very chaotic. In order to do CI, we will define each branch with the developers , and each different behavior corresponding to each branch is also defined , which is very useful under the chaotic branch model.

Fourth, the version number rules

In order to do CI, the rules of version numbers must be consistent . If each team's version numbers are named differently, the matching rules will be very troublesome and confusing. Later, we chose a version number rule of Semantic Versioning, which is a few points and X. This is a common version number naming method. This version number contains a standard document, which describes the specific definition of this version number. , the first is called MAJOR, the second is called MINOR, the third is called PATCH, and you can add various version numbers of your own.Enter image description

5. CI trigger path

Next, I will introduce the trigger path of our CI - Git push, push to the develop branch, Git push will push to the Gitlab server, and then call Jenkins through the webhook, and Jenkins will build the package out. Originally, we wanted to call Rancher's API through Jenkins. Later, we found that there was a gap in calling Rancher directly, and the process was not so smooth. In order to solve the problem of the gap between Jenkins and Rancher, we installed a Ponyes API between them. software.Enter image description

Why do you need Ponyes as a middle layer? What value does it provide?

(1) Dynamically modify the version number

First of all, it can solve the problem of dynamically modifying the version number. When you use Rancher, you find that Rancher's store is very useful. We can define some things in the store, and then QA only needs to fill in a few parameters to put an application When it is deployed, before the middle layer of Ponyes, it must be done by operation and maintenance personnel, and the process is more complicated.

In order to use the Rancher app store, we also define an app store template, so that we can generate a truly deployable application from this template every time we PUSH code. However, there are still some problems with the version number. Every time we push code to Jenkins, we will upgrade the version number of Jenkins once according to the number of views, for example, 1.0.1.0-1, the first reading is -1, the second reading is -1 The second reading is -2, and the version number of Jenkins changes according to the number of readings. At this time, the app store will also change accordingly.

Therefore, we made such a template, in such a way, in Ponyes, each Jenkins can get the corresponding version number, and then inject this version number into it through a variable:

sample:
  image:    {{ REGISTRY }}/automation/auto-sample:{{ m['auto-sample']}}

This process also involves the registry, because the QA environment and the STG environment are completely separate. When performing template rendering, we need to know whether to send it to the QA environment or the STG environment, so as to make corresponding changes to the address of the registry. In this case , the problem of modifying the version number mentioned above is solved.

(2) Multi-service management

There is also a more thorny problem, that is, what if there are multiple services in a stack? For example, a relatively small team may have a few people in total, and each person is responsible for several projects, and the relationship with microservices is somewhat similar, then a stack may have several services, the most typical of which are front-end, back-end, or other Some middleware, each item is a service.

At the beginning of deployment, these services in Rancher are also managed by a stack, and if there are multiple services, they will face the problem of how to manage them. For example, a stack has three services ABC. When service C is updated, the entire stack should also update the version number, because in Rancher, after the stack is deployed, there is a yellow button for update available. If there is a new version, click this button , you can upgrade the stack, and you must upgrade the entire stack, not just a service, and you need to manage multiple services. Ponyes records the relationship between the service and the stack. Any service update will also trigger the stack update. The update method is that each service updates, the stack version number +1.

(3) Parallel release of multiple versions

Then there is a more serious problem. If multiple versions are released in parallel, what should we do? For example, we have three modules ABC, A has released 1.0 and 2.0, B has released 3.0 and 4.0, and C has released 5.0 and 6.0. If 5.0.2 of a C application is released, what should be selected for the corresponding AB module number? This question has troubled us for a long time.

There are several ways to solve it, such as using something to record the relationship between C5.0 and a version number of AB, which means that the user can customize. But there is also a problem. Users need to manage the version number relationship between them by themselves. After a long time, they may mess up or mistake the version number, or mistake the relationship between the version numbers, and then go online to the production environment. The consequences more serious.

Another way is to statically record what version AB was when the user's C version number was last released to 5.0. But then the system becomes quite complex and error prone.

Finally, we chose to remove the ability to release multiple versions in parallel, and only support the release of a single version , which means that if you want to release, it must be the latest version, and the historical version can be manually processed by another method. In this way, the system will be simpler, and It's less error-prone and doesn't require the user to maintain a relationship between version numbers. The above functions are solved with the help of Ponyes written by ourselves.

Rancher 2.0 also mentioned the function of CI/CD. In the actual process, there will be a very real problem. There are many links from code development to the deployment of the entire production environment. The specific situation is also very complicated, and the role of CI may only be Stopping at QA, there will be new problems in the following links. At this time, a system is needed to manage the entire life cycle. This is the role of Rancher.

6. Three lessons learned from stepping on the pit

(1) How many Rancher environments are deployed?

This seems to be a small issue, and it also sparked a lot of discussions within us at first. Initially, we only deployed one set of Rancher, and used the environment in Rancher to distinguish QA or production. The easiest way to deploy a set of Rancher, but there is a serious problem: what if Rancher is upgraded? This set of Rancher manages both QA and production. When upgrading, the production environment needs to be upgraded. If there is a bug at this moment, the problem will be very serious.

Later, we disassembled it into four sets, and the Rancher platform itself had to have an upgraded environment. It is recommended that you deploy multiple sets of environments (at least two sets) when deploying Rancher in the early stage. We actually have one set for each environment of dev, QA, staging, and production, and there are four sets in total.

(2) Configuration item explosion

The second question is about configuration items. We first used Rancher compose, which is very simple and powerful. We click "Deploy" in the Rancher Catalog, and all the configuration options pop up. We only need to click some things to deploy a very complex application. Later, we found that there are more and more configuration items (even as many as hundreds) of applications. This makes it difficult to display them on one page. At the same time, hundreds of configurations make it impossible for us to fill in, and operation and maintenance cannot be successfully deployed. We thus face the problem of configuration item explosion.

Our solution experience is that in the container platform, configuration items must be managed centrally . We manage all configuration items with consul, and every time the container starts, we go to consul and pull the configuration to the corresponding container. In this way, containers can drift on any platform. In addition, the configuration item itself has a copy in the original server. We copy the original configuration item and modify it, and then another instance can be deployed. Therefore, configuration items must be managed centrally.

(3) Pan-domain name + Rancher LB

Each of our businesses has a web service and needs to apply for its own domain name. Every year we have hundreds of projects online, which means there are hundreds of domain names to apply. In addition, these domain names need to be used in development, testing, and production environments, so we need to apply for nearly 500 domain names every year. This is a horrible thing.

Later, we used the pan-domain method. For example, use the domain name of *.sub.example.com to directly CNM to one of the hosts in the Rancher environment. Then set its LB on Rancher, the LB can be distributed on all hosts, and each host will have the same LB. This domain name points to any host and it will work.

For example, in the QA environment, the following domain names are served on this LB. If the host is broken, the container itself will start, and the entry problem can be solved simply by modifying the * amount. If it is a production environment, you can add a layer of nginx on top, configure three upstreams, and kill any one, and you can also enter through the other two entrances.

Using the pan-domain name method, after configuring the pan-domain name, adding any domain name to the Rancher LB does not need to apply for a new domain name. Instead, you can directly write 123.sub.example.com, and then configure it directly on the LB. After the domain name is allocated, it can be used, and there is no need to go through the application process.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325294360&siteId=291194637
Recommended