Realize full link gray scale based on Higress through OpenKruise

OpenKruise is a Kubernetes-based extension suite that focuses on the automation of cloud-native applications, such as deployment, release, operation and maintenance, and availability protection. This article introduces the way of building automated operation and maintenance through OpenKruise to realize the full-link grayscale function.

Gray release improves the stability and efficiency of application delivery

In the process of releasing an application, we usually want to use a small amount of specific traffic to verify whether the release of the new version is normal to ensure overall stability. This process is called grayscale release. Regarding the grayscale release, we verify the stability of the new version by gradually increasing the scope of the release. If there is a problem with the new version, we can also find it in time, control the scope of influence, and ensure the overall stability.

Progressive release generally has the following characteristics:

  • Gradually increase the scope of influence of the release, and refuse to release all at once;
  • The phased release process can be carefully verified through the canary release method to verify the stability of the new version;
  • It can be paused, rolled back, continued, and can automate state transitions to flexibly control the release process and ensure stability.

According to survey data, 70% of online problems are caused by changes. We often say that there are three tricks in safe production, grayscale, observable, and rollback, which are also to control the risks and impacts of changes. By adopting grayscale release, we can release new versions more steadily and avoid losses caused by problems during the release process.

The microservice architecture puts forward higher requirements for grayscale publishing

In the context of microservice architecture, the traditional grayscale publishing model often cannot meet the complex and diverse requirements of microservice delivery. This is because:

  • The microservice call link is relatively long and complex. In the microservice architecture, the call link between services is relatively complex, and a change in one service may affect the entire call link, thereby affecting the stability of the entire application.
  • A grayscale may involve multiple modules, and the entire link must call the new version. Due to the interdependence between services in the microservice architecture, the modification of one service may require corresponding adjustments of other services. This leads to the need to call new versions of multiple services at the same time when performing grayscale release, which increases the complexity and uncertainty of the release.
  • Multiple projects run in parallel, and multiple environments need to be deployed. The environment construction is inflexible and costly. In the microservice architecture, there are often multiple projects developed in parallel, and multiple environments need to be deployed to support different projects. This increases the difficulty and cost of environment construction, resulting in inefficient releases.

In order to solve these problems, we need to adopt a more flexible, controllable, and suitable release method for microservice scenarios, and the full-link grayscale release scenario has emerged as the times require. Usually each microservice will have a grayscale environment or group to accept grayscale traffic. We hope that the traffic entering the upstream grayscale environment can also enter the downstream grayscale environment, ensuring that a request is always delivered in the grayscale environment, thus forming a traffic "swim lane". In the traffic link in the "swim lane", even if there are some microservice applications on this call link that do not have a grayscale environment, these microservice applications can still return to the grayscale environment of the downstream application when requesting the downstream application .

Full-link grayscale escorts the release of microservices

According to the actual situation of the service, this method can independently publish and control the flow of a single service, and can also control multiple services to publish and change at the same time, so as to ensure the stability of the entire system. At the same time, an automated deployment method can also be used to achieve a fast and reliable release process and improve release efficiency and stability.

The challenge of practicing full-link grayscale

It is a very complicated process to realize the full-link grayscale release of microservices in K8s, which requires the modification and coordination of multiple components and configurations. Here are some specific steps and questions:

  • In the microservice architecture, the gateway is the entrance of the service, and the configuration of the gateway needs to be adjusted according to the requirements of the grayscale release to achieve route matching and traffic characteristics (such as Header modification).
  • In order to achieve full-link grayscale publishing, a new grayscale application environment needs to be deployed and marked with grayscale (newly deployed a set of gray applications and grayscale labels). This allows traffic to flow to the grayscale environment, enabling grayscale publishing.
  • Verify that the traffic is normal, upgrade the baseline environment, destroy the grayscale environment, and restore the gateway configuration. During the gray scale release process, the traffic needs to be verified to ensure the normal flow of traffic and the normal operation of the service. If the verification is passed, the baseline environment can be upgraded to the grayscale version, and the grayscale environment can be destroyed. Finally, the configuration of the gateway needs to be restored to ensure that traffic flows properly.
  • If an exception occurs, a quick rollback is required. Due to the complexity of the microservice architecture, various abnormal situations may occur, such as service crashes, abnormal traffic, etc. In this case, a quick rollback is required to avoid greater losses. Therefore, it is necessary to pre-design the rollback scheme and execute the rollback operation quickly when an exception occurs.

On the other hand, the production traffic is end-to-end, which means that we need to control the closed loop of traffic in components such as front-end, gateway, and back-end microservices. Not only for RPC/Http traffic, but also for asynchronous calls such as MQ traffic, we also need to comply with the rules of full-link "swim lane" calls. The complexity of traffic control involved in this whole process is also very high.

In order to simplify the process of microservice full-link gray scale release, some automation tools and products can be used, such as MSE, Kruise Rollout, etc. These tools and products can help us more conveniently realize the full-link gray scale release of microservices, and improve the efficiency and stability of the release.

Kruise Rollout+MSE end-to-end full-link gray scale release practice

Why Kruise Rollout?

Kruise Rollout [1] is a progressive delivery framework proposed by the OpenKruise community. Its design concept is to provide a set that can combine traffic publishing with instance grayscale, support various publishing forms such as canary, blue-green, A/B Testing, and support the automation of the publishing process based on custom metrics such as Prometheus Metrics. Non-sense docking, easy-to-extend bypass standard Kubernetes release components. The main features are as follows:

  • Non-intrusive: No intrusion is made to the user's application delivery configuration, and the bypass method is used to expand the capability of progressive delivery, and it can achieve plug-and-play effects.
  • Scalability: Fully consider the support for a variety of similar workloads (Deployment, StatefulSet, CloneSet, and custom CRD workloads); in terms of traffic scheduling, the solution through lua scripts can support Nginx, Alb, Mse, and Gateway API and other traffic scheduling schemes.

Kruise Rollout itself supports various grayscale release capabilities (canary, A/B Testing, blue-green release). After in-depth understanding, it is found that its release model is very suitable for MSE full-link grayscale, so after combining with Kruise Rollout It is very convenient for users to realize the MSE full-link gray scale release capability.

Best Practices for MSE Full Link Grayscale Publishing

01. Deploy application & configure full-link gray scale release CRD

We can refer to the MSE Cloud Native Gateway Full Link Grayscale [2] document to deploy the Demo application.

➜  ~ kubectl get deployments         
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
demo-mysql       1/1     1            1           30h
nacos-server     1/1     1            1           46h
spring-cloud-a   2/2     2            2           30h
spring-cloud-b   2/2     2            2           30h
spring-cloud-c   2/2     2            2           30h

After deploying the application, we must first distinguish between online traffic and grayscale traffic.

We define the process of full-link grayscale release by creating Rollout CRD.

1. Create a Rollout configuration for the applications involved in the entire link

# a rollout configuration
---
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-a
spec:
  objectRef:
    workloadRef:
      apiVersion: apps/v1
      kind: Deployment
      name: spring-cloud-a
  ...
# b rollout configuration
---
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-b
  annotations:
    rollouts.kruise.io/dependency: rollout-a
spec:
  objectRef:
    workloadRef:
      apiVersion: apps/v1
      kind: Deployment
      name: spring-cloud-b
  ...
# c rollout configuration
---
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-c
  annotations:
    rollouts.kruise.io/dependency: rollout-a
spec:
  objectRef:
    workloadRef:
      apiVersion: apps/v1
      kind: Deployment
      name: spring-cloud-c
  ...

2. Then, in order to indicate the characteristics of online traffic and grayscale traffic, configure grayscale traffic rules on the ingress service

canary:
  steps:
    - matches:
        - headers:
            - type: Exact
              name: x-user-id
              value: '100'
      requestHeaderModifier:
        set:
          - name: x-mse-tag
            value: gray
  trafficRoutings:
    - service: spring-cloud-a
      ingress:
        name: spring-cloud-a
        classType: mse

3. For applications in grayscale release, Kruise will automatically identify the gray version for him

# only support for canary deployment type
patchCanaryMetadata:
  labels:
    alicloud.service.tag: gray

After the Rollout CRD is installed, we can check it out:

➜  ~ kubectl get Rollout             
NAME        STATUS    CANARY_STEP   CANARY_STATE   MESSAGE                            AGE
rollout-a   Healthy   1             Completed      workload deployment is completed   4s
rollout-b   Healthy   1             Completed      workload deployment is completed   4s
rollout-c   Healthy   1             Completed      workload deployment is completed   4s

So far we have defined such a set of rules for full-link grayscale release. The release link involves the MSE cloud native gateway, A, B, and C applications, and the traffic with x-user-id=100 is grayscale traffic.

Next, let's quickly perform a grayscale release and verification.

02. Gray release & verification traffic

1. This release involves changes to applications A and C, so we directly edit the Deployment yaml of applications A and C to make changes

# a application
---        
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-a
spec:
  ...
  template:
    ...
    spec:
        # 修改 mse-1.0.0 -> mse-2.0.0,触发应用A发布,以及MSE全链路灰度
        image: registry.cn-hangzhou.aliyuncs.com/mse-demo-hz/spring-cloud-a:mse-2.0.0
# c application
---        
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-c
spec:
  ...
  template:
    ...
    spec:
        # 修改 mse-1.0.0 -> mse-2.0.0,触发应用A发布,以及MSE全链路灰度
        image: registry.cn-hangzhou.aliyuncs.com/mse-demo-hz/spring-cloud-c:mse-2.0.0

2. After the change is completed, we check the form of our current application

➜  ~ kubectl get deployments
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
demo-mysql             1/1     1            1           30h
nacos-server           1/1     1            1           46h
spring-cloud-a         2/2     0            2           30h
spring-cloud-a-84gcd   1/1     1            1           86s
spring-cloud-b         2/2     2            2           30h
spring-cloud-c         2/2     0            2           30h
spring-cloud-c-qzh9p   1/1     1            1           113s

We found that Kruise Rollout did not directly modify our original deployment, but first created two grayscale applications spring-cloud-a-84gcd and spring-cloud-c-qzh9p for us.

3. When the application is ready to start, we will verify the normal online traffic and grayscale traffic respectively.

a. Access the gateway, if it does not meet the grayscale rules, go to the baseline environment:

➜  ~ curl -H "Host: example.com" http://39.98.205.236/a
A[192.168.42.115][config=base] -> B[192.168.42.118] -> C[192.168.42.101]%

b. How to conform to the grayscale rules and follow the grayscale environment:

➜  ~ curl -H "Host: example.com" http://39.98.205.236/a -H "x-user-id: 100"
Agray[192.168.42.119][config=base] -> B[192.168.42.118] -> Cgray[192.168.42.116]% 

4. How can we quickly roll back if we encounter unexpected behavior?

Let's try to roll back the C application, just change the Deployment of the C application back to the original configuration.

# c application
---        
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-c
spec:
  ...
  template:
    ...
    spec:
        # 修改 mse-2.0.0 -> mse-1.0.0,回滚c应用
        image: registry.cn-hangzhou.aliyuncs.com/mse-demo-hz/spring-cloud-c:mse-2.0.0

After the modification is completed, we found that the grayscale Deployment of the C application is gone.

➜  ~ kubectl get deployments
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
demo-mysql             1/1     1            1           30h
nacos-server           1/1     1            1           46h
spring-cloud-a         2/2     0            2           30h
spring-cloud-a-84gcd   1/1     1            1           186s
spring-cloud-b         2/2     2            2           30h
spring-cloud-c         2/2     0            2           30h

5. How to complete the release? Just run kubectl-kruise rollout approve rollouts/rollout-a to release the application in Grayscale

03. Integrate ArgoCD to realize GitOps-based full-link grayscale

GitOps is a continuous delivery method, and its core idea is to store the declarative infrastructure and applications of the application system in the Git repository. Using Git as the core of the delivery pipeline, every developer can submit a pull request (Pull Request) and use Git to accelerate and simplify Kubernetes application deployment and operation and maintenance tasks. By using simple tools like Git, developers can more efficiently focus on creating new features rather than operations-related tasks (eg, application system installation, configuration, migration, etc.).

Just imagine, as a developer, we hope that the submitted application definition (Deployment) written in YAML can be released in an automated gray-scale environment first. After the traffic is fully verified and the new version of the application is confirmed to be correct, we can go further. application releases. How can it be done? Next we demonstrate the full-link grayscale capability achieved by integrating ArgoCD.

prerequisite

Install ArgoCD, refer to ArgoCD [3] , ArgoCD is a declarative GitOps continuous delivery tool for Kubernetes.

Configure and deploy application resources through ArgoCD

1. Create spring-cloud-c application in ArgoCD

2. On the ArgoCD management interface, click NEW APP to configure as follows.

a. In the GENERAL area, configure Application as spring-cloud-c and Project as default.

b. In the SOURCE area, set the Repository URL to  https://github.com/aliyun/alibabacloud-microservice-demo.git , Revision to argocd-samples, and Path to argocd-samples/spring-cloud-c.

c. In the DESTINATION area, configure the Cluster URL to  https://kubernetes.default.svc and the Namespace to default.

d. After the configuration is complete, click CREATE at the top of the page.

 

3. After the creation is complete, you can view the application status of spring-cloud-c on the ArgoCD management interface

 

4. Click the corresponding application to view the resource deployment status

 

5. We modify spring-cloud-c.yaml in argocd-samples/spring-cloud-c and submit it via git

 

6. We can find that the grayscale version of the spring-cloud-c application is automatically deployed

✗ kubectl get pods -o wide  | grep spring-cloud
NAME                                   READY   STATUS    RESTARTS   AGE    IP              NODE                      NOMINATED NODE   READINESS GATES
spring-cloud-a-69d577cc9-g7sbc         1/1     Running   0          16h    192.168.0.191   us-west-1.192.168.0.187   <none>           <none>
spring-cloud-b-7bc9db878f-n7pzp        1/1     Running   0          16h    192.168.0.193   us-west-1.192.168.0.189   <none>           <none>
spring-cloud-c-554458c696-2vp74        1/1     Running   0          137m   192.168.0.200   us-west-1.192.168.0.145   <none>           <none>
spring-cloud-c-554458c696-g8vbg        1/1     Running   0          136m   192.168.0.192   us-west-1.192.168.0.188   <none>           <none>
spring-cloud-c-md42b-74858b7c4-qzdxz   1/1     Running   0          53m    192.168.0.165   us-west-1.192.168.0.147   <none>           <none>

The architecture diagram is as follows:

7. When the application is ready to start, we will verify the normal online traffic and grayscale traffic respectively.

a. Access the gateway, if it does not meet the grayscale rules, go to the baseline environment:

➜  ~ curl -H "Host: example.com" http://39.98.205.236/a
A[192.168.0.191][config=base] -> B[192.168.0.193] -> C[192.168.0.200]%

b. How to conform to the grayscale rules and follow the grayscale environment:

➜  ~ curl -H "Host: example.com" http://39.98.205.236/a -H "x-user-id: 100"
A[192.168.0.191][config=base] -> B[192.168.0.193] -> Cgray[192.168.0.165]%

8. To roll back, we only need to roll back the last commit of spring-cloud-c.yaml in argocd-samples/spring-cloud-c through git.

9. End the release Use kubectl-kruise rollout approve rollouts/rollout-c to complete the release of the application in Grayscale.

Outlook and Summary

Kruise Rollout is the exploration of the OpenKruise community in the field of progressive delivery. This time, it cooperated with MSE to implement the grayscale release scenario of the microservice scenario in the cloud native field. In the future, Kruise Rollout will make continuous efforts in scalability, such as: a scalable traffic scheduling solution based on Lua scripts, so as to be compatible with more gateways and architectures in the community (Istio, Apifix, etc.).

In the microservice governance architecture, the full-link grayscale function can provide traffic swim lanes, which greatly facilitates rapid verification during testing and release. The "explosion radius" is minimized through precise drainage rules, which can help DevOps improve the line on stability.

MSE's full-link grayscale capabilities are also continuously expanding and iterating with the deepening of customer scenarios. In addition to ensuring the stability of the release state through the MSE full-link grayscale capability, we can also solve the unstable risks of traffic, dependencies, and infrastructure through MSE in the running state. The MSE microservice engine has been committed to helping enterprises create an always-on I believe that only products that have been continuously polished by customer scenarios will become more and more timeless.

Related Links:

[1] Crosses Rollout

https://openkruise.io/rollouts/introduction

[2] MSE cloud native gateway full link grayscale

https://help.aliyun.com/zh/mse/configure-an-end-to-end-canary-release-based-on-mse-ingress-gateways#p-omu-6xu-8wm

[3] ArgoCD

https://argo-cd.readthedocs.io/en/stable/getting_started/

[4] github: Higress

https://github.com/alibaba/higress

Author: Shimian, Liheng

Click to try cloud products for free now to start the practical journey on the cloud!

Original link

This article is the original content of Alibaba Cloud, and shall not be reproduced without permission

The Indian Ministry of Defense self-developed Maya OS, fully replacing Windows Redis 7.2.0, and the most far-reaching version 7-Zip official website was identified as a malicious website by Baidu. Go 2 will never bring destructive changes to Go 1. Xiaomi released CyberDog 2, More than 80% open source rate ChatGPT daily cost of about 700,000 US dollars, OpenAI may be on the verge of bankruptcy Meditation software will be listed, founded by "China's first Linux person" Apache Doris 2.0.0 version officially released: blind test performance 10 times improved, More unified and diverse extremely fast analysis experience The first version of the Linux kernel (v0.01) open source code interpretation Chrome 116 is officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/yunqi/blog/10097874