Kubernetes combat guide: seamless migration from Spring Cloud to k8s with zero downtime

1. Project migration background

1.1 Why do you want to break ground on "Tai Sui"?

At present, the company's test environment, UAT environment, and production environment have all used k8s for maintenance and management. Most of the projects have been containerized and have been running smoothly online for a long time. After we completed the containerization of large and small projects, the release tools of testing, UAT, production environment and CICD process gradually realized unified management, and developed an internal release review platform based on k8s, and at the same time connected Project management tools such as Jira.

When the self-research platform is released, it can automatically associate the development progress of the project and the Release version. The most important thing is that it can control the release authority, unified release tools and release mode, and supports one-click release of multiple projects The multiple modules of, also include the unified rollback of failed applications and the rollback of individual applications.

Because the project has been using GitRunner for release since its inception, and based on virtual machine deployment, it has not been integrated into the release review platform, but because the project is more important and involves more services and machines, it must be The project is containerized and the release tools are unified to better adapt to the company's environment and better respond to the development of the next generation of cloud computing.

1.2 Why should we abandon Git Runner?

First, let's take a look at the Git Runner release page. Although it looks simple and refreshing, it is inevitable that we will encounter some problems.

Kubernetes combat guide: seamless migration from Spring Cloud to k8s with zero downtime

 

1.2.1 Multi-branch parallel development issues

When multiple branches are developed in parallel or there are many branches that can be released to the production environment, it is easy to make mistakes in the manual deployment stage, or to look at the serial, of course, this probability is very small.

But we can see another problem. Every commit or merge will trigger a build. When we use Git Flow branch flow, there may be many branches that are in parallel development, parallel testing, and parallel construction at the same time. If Git Runner is based on virtual If it is created by the machine, it is very likely that there will be a queuing situation. Of course, this queuing problem can also be solved.

1.2.2 Multi-microservice configuration maintenance issues

Secondly, if a project is slightly larger, it is not very convenient to maintain. For example, this project to be migrated, a front-end and more than 20 business applications, plus nearly 30 services of Zuul, ConfigServer, and Eureka, each service corresponds to a Git warehouse, and then each service is in the development branch at the same time. There are many, if you want to upgrade the GitLab CI script or microservice machine and want to add nodes, it will be a tedious job.

1.2.3 Security issues

Finally, there is a security issue. GitLab's CI scripts are generally built into the code repository, which means that anyone with Push or Merge permissions can modify the CI script at will, which will lead to unexpected results. , It will also threaten the security of servers and businesses. For release, any developer can click the release button. These may always be a security risk.

But this does not mean that Git Runner is a tool that is not recommended. The new version of GitLab's built-in Auto DevOps and integrated Kubernetes are still very popular. But maybe for us, there are not many projects that use Git Runner for release, so we want to unify release tools and unified management of CI scripts, so other CI tools may be more suitable.

1.3 Why containerization?

1.3.1 Port conflict

Before containerization, this project was deployed using virtual machines. Each virtual machine started two or three microservices. This encountered a problem, namely the problem of port conflicts. When adding new applications to the project, you need to consider the server. For the problem of port conflicts, we must also consider that the ports of each microservice cannot be the same, because when using virtual machines to deploy applications, there may be machine node failures that require manual migration of applications. If some microservice ports are the same, the migration process It may be blocked.

In addition, when a project has only a few applications, the port may not be a problem to maintain. Like this project, more than 30 microservices are involved, which can become a very painful thing. When using container deployment, each container is isolated from each other, all applications can use the same port, and there is no need to worry about the port.

1.3.2 Program health issues

Most of the people who have used Java programs have encountered program suspended animation. For example, the port is clearly open, but the request is not processed. This is a phenomenon of program suspended animation. When we deploy with virtual machines, we often fail to do a good health check. Perhaps the interface-level health check is not done on the virtual machine. This will cause the problem that the program freezes and cannot be handled automatically, and it is on the virtual machine. Doing some interface-level health checks and processing operations is not a simple thing, but also a boring thing, especially when there are too many microservices for a project and the health check interface is inconsistent.

But on k8s, the built-in Read and Live probes are extremely simple to deal with the above problems. As shown in the figure, we can see that three methods of health checks are currently supported:

Kubernetes combat guide: seamless migration from Spring Cloud to k8s with zero downtime

 

  • tcpSocket : Port health check
  • exec : According to the return value of the specified command
  • httpGet : Interface-level health check

At the same time, the flexibility of these health checks is also very high. You can customize the check interval, the number of errors, the number of successes, and the host check parameters, and the interface-level health check httpGet mentioned above also supports custom host names, request headers, and check paths. As well as the configuration of HTTP or HTTPS, you can see that using the health check that comes with k8s can save us a lot of work and no longer need to maintain a lot of annoying scripts.

1.3.3 Fault recovery issues

When deploying applications using virtual machines, you may sometimes encounter host failures, single-node applications cannot be used, or multi-node applications deployed due to unavailability of other copies, resulting in high pressure and service delays. It is precisely that the host cannot recover quickly. At this time, it may be necessary to manually add nodes or add new servers to solve this kind of problem. This process may be very long and perhaps painful. Because you need to prepare the dependent environment before you can deploy your own application, and sometimes you may need to change the CI script. . .

When using k8s orchestration, we don’t need to care about such problems. All the fault recovery and disaster tolerance mechanisms are taken care of by the powerful k8s. You can go for a cup of coffee, or when you just turn on the computer to deal with this problem, everything has been restored. As always.

1.3.4 Other minor issues

Of course, the convenience and problems that k8s brings to us are far more than those mentioned above. Container mirroring helps us solve the problem of relying on the environment. Service orchestration helps us solve the problem of fault tolerance. We can use k8s package management. The tool creates a new environment with one click. We can use k8s service discovery so that developers no longer need to pay attention to the development of the network part. We can use k8s permission control so that operation and maintenance personnel no longer need to manage the permissions of each server. You can use k8s' powerful application publishing strategy so that we don't need to think too much about how to achieve zero downtime application publishing and application rollback, etc., all of these conveniences are quietly changing our behavior.

2. Migration plan

2.1 Blue-green migration

First look at the architecture before the migration

Kubernetes combat guide: seamless migration from Spring Cloud to k8s with zero downtime

 

Like most SpringCloud architectures, NodeJS is used as the front end, Eureka is used for service discovery, Zuul is used for routing distribution, and ConfigServer is used as the configuration center. This architecture is also the most common architecture of SpringCloud in the enterprise. It does not use more additional components, so when we migrated for the first time, we did not consider too much. We still followed the plan used in migrating other projects, that is, creating a new one on k8s A set of environment (the middleware is not involved in this migration), that is, a containerized environment, configure a same domain name, and then add hosts resolution for testing. If there is no problem, switch the domain name directly to complete the migration. This method is the simplest and most commonly used method, similar to the blue-green deployment of the program release. At this time, a new set of environment corresponding to the architecture diagram in k8s is as follows:

Kubernetes combat guide: seamless migration from Spring Cloud to k8s with zero downtime

 

During the test, this project paralleled two sets of environments, a virtual machine environment, and a container environment. The container environment only receives traffic from testers. The two environments are connected to the same set of middleware services, because other projects are large Part of it was also migrated in this way, and the project has also gone through the same process in the test environment without any problems, so it is also believed that this way will not cause any problems in this project. However, the reality is always different from the expectation. During the testing process, the two sets of environments coexisted, which caused some production data problems. Because the container environment was not tested for integrity, and the domain name was not forced to switch, all were shut down in an emergency. The container problem was restored. Due to time constraints, we did not investigate the problem carefully, but repaired some of the data. Later, we thought that it might be caused by the inconsistency between the master branch of some microservices and the production code during the migration. Of course, it might not be that simple. In order to avoid the recurrence of such problems, the migration plan can only be modified.

2.2 Grayscale migration

Due to some problems with the migration plan above, a plan was redefined, which was slightly more troublesome than the last time, using microservices to migrate to k8s, similar to the grayscale release of application releases.

When a single application is migrated, it is necessary to ensure that the code of the container environment and the virtual machine environment are consistent. During the migration, the microservices adopt the method of domain name registration. That is, each microservice is configured with an internal domain name and registered to Eureka through the domain name, instead of using the IP and port of the container to register (because the internal IP of k8s and the virtual machine are not connected), the environment at this time is shown in the figure below :

Kubernetes combat guide: seamless migration from Spring Cloud to k8s with zero downtime

 

At this time, there is a domain name service-c.interservice.k8s pointing to ServiceC, and then when ServiceC registers with Eureka, it modifies its address to this domain name (the default is the host IP+port), and then other applications call ServiceC through this address. When ServiceC tests After there is no problem, the ServiceC in the virtual machine is offline, and the final architecture is shown in the figure:

Kubernetes combat guide: seamless migration from Spring Cloud to k8s with zero downtime

 

In addition to Zuul, front-end UI, and Eureka, other services are migrated to k8s in grayscale, which is more complicated than the blue-green form. It is necessary to create a separate Service and domain name for each microservice, and delete it after the migration is completed. After this step, all services except Eureka have been deployed on k8s, and the migration of Eureka involves more details.

2.3 Eureka migration

After this step, there are no other problems with service access. Services other than Eureka have been deployed in k8s, and Eureka's transitional migration design may have more problems. Because we can't directly deploy a set of highly available Eureka clusters on k8s, and then directly change the microservice registration address in ConfigServer to the Eureka address in k8s, because at this time the two Eureka clusters are independent zones, and the registration information is not Will not be shared, this will lose the registration information in the process of changing the configuration. At this time, the architecture diagram may appear as follows:

Kubernetes combat guide: seamless migration from Spring Cloud to k8s with zero downtime

 

That is, in the process of replacing the configuration, there may be ServiceA registered to the previous Eureka, and ServiceB registered to the Eureka in k8s, which will cause ServiceA to not find ServiceB, and vice versa.

So after building an Eureka cluster in k8s, you need to configure a temporary domain name for each Eureka instance, and then change the zone configuration of the previous Eureka cluster and the Eureka cluster in k8s, so that Eureka in k8s and Eureka in the virtual machine form a new In this way, the registration information will be synchronized, regardless of whether it is registered to Eureka, the service will not be found. The architecture diagram at this time is as follows (all services are still registered in the original Eureka cluster):

Kubernetes combat guide: seamless migration from Spring Cloud to k8s with zero downtime

 

The next thing to do is to change the configuration of the microservice. There are three places to change at this time:

  1. The address of the microservice registered to Eureka is more container IP and port, and domain name registration is no longer used, because at this time the microservices are already in k8s and can be connected directly through the internal Pod IP;
  2. Change the Eureka address of the service registration to the service address of k8s Eureka. Eureka uses StatefulSet to deploy, and you can connect directly through eureka-0/1/2.eureka-headless-svc;
  3. After all the microservices have been migrated, change the zone of the k8s Eureka cluster to: eureka-0/1/2.eureka-headless-svc, and delete the Service and domain names of other microservices.

The final architecture diagram is as follows:

Kubernetes combat guide: seamless migration from Spring Cloud to k8s with zero downtime

 

3. Summary

In order to ensure the availability of the service, we reluctantly adopted the gray-scale method for migration, which is much more troublesome than the blue-green method, and there are many issues to consider. Under the premise that there are no problems with the program, it is recommended to use the blue-green method for migration, which not only has fewer problems, but also is more convenient and quicker. Of course, the gray-scale method may be more secure for large-scale projects or projects that cannot be interrupted, because switching all at once may miss areas that need to be tested. Of course, no matter which way, containerization of applications and migration to Kubernetes are more important things. After all, cloud computing is the future, and Kubernetes is the future of cloud computing.

Original link: https://www.cnblogs.com/dukuan/p/13285941.html

Guess you like

Origin blog.csdn.net/yunduo1/article/details/109097617