Using Istio to achieve grayscale release (canary release)

Introducing Grayscale Publishing (aka Canary Publishing)

When an application goes online, a major challenge for operation and maintenance is how to upgrade without affecting the online business. Students who have made products know that no matter how complete automated and manual testing is done before the release, there will be more or less failures after the release. According to Murphy's Law, a release that can go wrong will go wrong.

“ANYTHING THAN CAN GO WRONG WILL GO WRONG” –MURPHY’S LAW

Therefore, we cannot hope to find all potential faults during offline testing. When version upgrade failures cannot be avoided 100%, a controllable version release is required to control the impact of failures within an acceptable range and to roll back quickly.

Grayscale release (also known as canary release) can be used to achieve a smooth transition from the old version to the new version, and to avoid the impact of problems during the upgrade process on users.

The "canary release" comes from the practice of miners air-testing mines with canaries. In the past, when miners were digging coal, they would put the canary in before going down to the mine, or they would always carry the canary with them when digging coal. Canaries are sensitive to methane and carbon monoxide concentrations and will call the police first. So everyone uses the "canary" for the first test.

In the picture below, a small number of users in the lower left are used as "canaries" to test the newly launched version 1.1. If there is a problem with the new version, the "canaries" will call the police, but it will not affect the normal operation of other users' businesses.

 

640?wx_fmt=png&wxfrom=5&wx_lazy=1

 

The process of grayscale release (canary release) is as follows:

  • Prepare a "canary" server isolated from the production environment.

  • Deploy the new version of the service to the "canary" server.

  • Automated and manual testing of services on "canary" servers.

  • After the test passes, connect the "canary" server to the production environment and import a small amount of production traffic to the "canary" server.

  • If there is a problem with the online test, it is rolled back by rerouting production traffic from the "canary" server to the old version of the service, fixing the problem and re-publishing.

  • If the online test goes well, gradually import the production traffic into the new version server according to a certain strategy.

  • After the new version of the service runs stably, delete the old version of the service.

The principle of Istio's implementation of grayscale release (canary release)

As can be seen from the above process, if a set of grayscale publishing process is to be implemented, applications and operation and maintenance processes are required to support the publishing process, and the challenges of workload and difficulty are very large. Although the problems faced are similar, each enterprise or organization generally adopts different privatization implementation solutions for grayscale release, which leads to a lot of costs for R&D and operation and maintenance to solve this problem.

Istio solves this problem in a consistent way through a high degree of abstraction and good design. It uses sidecar to forward application traffic, and issues routing rules through Pilot, which can realize grayscale publishing of applications without modifying the application.

Remarks: The rolling update function of kubernetes can also realize application upgrade without interrupting business, but rolling upgrade is to upgrade the application by gradually replacing the old version of the service with the new version of the service. The traffic distribution of the application is controlled, so the production traffic cannot be gradually diverted to the new version of the service in a controlled manner, and the impact of the service upgrade on users cannot be controlled.

After using Istio, specific traffic (such as users with specified characteristics) can be imported into the new version of the service through customized routing rules, and tested in the production environment. The impact of the failure on the user. And when there are new and old versions of services at the same time, different versions of services can be independently scaled and expanded according to application pressure, which is very flexible. The process of grayscale publishing using Istio is shown in the following figure:

 

0?wx_fmt=png

Steps

The following uses the BookinfoInfo sample program that comes with Istion to test the process of grayscale publishing.

Test environment installation

First, refer to the step-by-step guide to teach you how to build the Istio and Bookinfo sample programs from scratch to install Kubernetes and the Istio control plane.

Because this test does not require the installation of all three versions of the reviews service, if the application is already installed, use the following command to uninstall it first.

istio-0.2.10/samples/bookinfo/kube/cleanup.sh

Deploy the V1 version of the service

First deploy only the V1 version of the Bookinfo application. Since the yaml file in the example contains three versions of the reviews service, we first delete the V2 and V3 versions of the Deployment from the yaml file istio-0.2.10/samples/bookinfo/kube/bookinfo.yaml.

Remove this section from Bookinfo.yaml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: reviews-v2
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: reviews
        version: v2
    spec:
      containers:
      - name: reviews
        image: istio/examples-bookinfo-reviews-v2:0.2.3
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9080
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: reviews-v3
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: reviews
        version: v3
    spec:
      containers:
      - name: reviews
        image: istio/examples-bookinfo-reviews-v3:0.2.3
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9080    
---         

Deploy the V1 version of the Bookinfo program.

kubectl apply -f <(istioctl kube-inject -f istio-0.2.10/samples/bookinfo/kube/bookinfo.yaml)

Confirm the pod deployment through the kubectl command line, and you can see that only the V1 version of the service is available.

kubectl get pods

NAME                              READY     STATUS    RESTARTS   AGE
details-v1-3688945616-nhkqk       2/2       Running   0          2m
productpage-v1-2055622944-m3fql   2/2       Running   0          2m
ratings-v1-233971408-0f3s9        2/2       Running   0          2m
reviews-v1-1360980140-0zs9z       2/2       Running   0          2m

Open the application page in a browser at the External IP of istio-ingress. Since the reviews service of the V1 version does not call the rating service, you can see that the Product page displays the review information without star ratings.

http://10.12.25.116/productpage

0?wx_fmt=png

At this time, the deployment of microservices in the system is shown in the following figure (the following schematic diagrams ignore the details and ratings services that are not related to this example):

 

0?wx_fmt=png

Deploy the V2 version of the reviews service

Before deploying the V2 version of the reviews service, you need to create a default routing rule route-rule-default-reviews.yaml to direct all production traffic to the V1 version to avoid affecting online users.

apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: reviews-default
spec:
  destination:
    name: reviews
  precedence: 1
  route:
  - labels:
      version: v1

Enable this routing rule.

istioctl create -f route-rule-default-reviews.yaml -n default

Create a V2 version of the deployment file bookinfo-reviews-v2.yaml, the content is as follows

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: reviews-v2
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: reviews
        version: v2
    spec:
      containers:
      - name: reviews
        image: istio/examples-bookinfo-reviews-v2:0.2.3
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9080

Deploy the reviews service of the V2 version.

kubectl apply -f <(istioctl kube-inject -f  bookinfo-reviews-v2.yaml)

At this time, two versions of the reviews service, V1 and V2, are deployed in the system, but all business traffic is directed to V1 by the rules reviews-default, as shown in the following figure:

 

0?wx_fmt=png

Import test traffic to the V2 version of the reviews service

When conducting a simulation test, it is difficult to completely simulate the production environment for testing due to differences in the network, server, operating system and other environments of the test environment and the production environment. In order to reduce the impact of environmental factors on test results, we hope to conduct pre-launch testing in the production environment. However, if there is no good isolation measures, the test may affect the online business and cause losses to the enterprise.

By using Istio's routing rules, testing can be performed in a production-like environment, and the production traffic and test traffic of online users are completely isolated, minimizing the impact of simulated testing on online services. As shown below:

 

0?wx_fmt=png

Create a rule to import traffic with username test-user to V2

apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: reviews-test-user
spec:
  destination:
    name: reviews
  precedence: 2
  match:
    request:
      headers:
        cookie:
          regex: "^(.*?;)?(user=test-user)(;.*)?$"
  route:
  - labels:
      version: v2

Note: The precedence attribute is used to set the priority of the rule. When there are multiple rules at the same time, the rule with the higher priority will be executed first. The precedence of this rule is set to 2 to ensure that it runs before the default rule, and diverts the test-user user's request to the V2 version of the reviews service.

Enable this rule.

istioctl create -f route-rule-test-reviews-v2.yaml -n default

Log in as the test-user user, and you can see the rating page of the V2 version with stars.

 

0?wx_fmt=pngLog out the test-user, and you can only see the evaluation page of the V1 version without a star rating. As shown below:0?wx_fmt=png

Import some production traffic to the V2 version of the reviews service

After the online simulation test is completed, if the system test is in good condition, a part of user traffic can be imported into the V2 version of the service through rules to conduct a small-scale "canary" test.

Modify the rule route-rule-default-reviews.yaml to import 50% of the traffic to the V2 version.

Note: This example only describes the principle, so for simplicity, 50% of the traffic is imported into the V2 version. In actual operation, it is more likely to import less traffic first, and then gradually import the traffic according to the monitored operation of the new version. The proportions of 5%, 10%, 20%, 50% ... are gradually introduced.

apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: reviews-default
spec:
  destination:
    name: reviews
  precedence: 1
  route:
  - labels:
      version: v1
    weight: 50
  - labels:
      version: v2
    weight: 50
istioctl replace -f route-rule-default-reviews.yaml -n default

At this point, the system deployment is shown in the following figure:

 

0?wx_fmt=png

Import all production traffic to the V2 version of the reviews service

If the new version of the service is running fine, you can import all traffic to the V2 version.

apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: reviews-default
spec:
  destination: 
    name: reviews
  precedence: 1
  route:
  - labels:
      version: v2
    weight: 100
istioctl replace -f route-rule-default-reviews.yaml -n default

The system deployment is shown in the following figure:

0?wx_fmt=png

At this time, no matter you log in with any user, you can only see the evaluation page with star rating of the V2 version, as shown in the following figure:

0?wx_fmt=png

Note: If there is a problem with the new version of the service during the release of Grayscale, you can re-import the traffic to the V1 version of the service by modifying the routing rules, and repair the V2 version before testing.

Delete the reviews service of the V1 version

After the V2 version goes online and runs stably, delete the reviews service and test rules of the V1 version.

kubectl delete pod reviews-v1-1360980140-0zs9z

istioctl delete -f route-rule-test-reviews-v2.yaml -n default

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324981612&siteId=291194637