WASM-based non-intrusive full-link A/B test practice

1 background introduction

We all know that ServiceMesh can provide non-intrusive traffic management capabilities for microservices running on it. By configuring VirtualService and DestinationRule, functions such as traffic management, timeout retry, traffic replication, current limiting, and fuse can be realized without modifying the microservice code.

The premise of traffic management is that there are multiple versions of a service. We can classify them according to the purpose of deploying multiple versions. The brief description is as follows to facilitate the understanding of the rest of the article.

  • traffic routing : According to the request information (Header/Cookie/Query Params), the request traffic is routed to the endpoint (Pod[]) of the specified version (Deployment) of the specified service (Service). This is what we call A/B Testing (A/B Testing).
  • Traffic shifting : Publish via gray/canary, and route request traffic to the endpoints (Pod[]) of each version (Deployment[]) of the specified service (Service) indiscriminately and proportionally.
  • traffic switching/mirroring : Announced in Blue/Green, traffic switching is performed proportionally according to the requested information, and traffic replication is performed.

The practice described in this article is to implement full-link A/B testing based on the request header.

1.1 Brief description of functions

From the documentation of the Istio community, we can easily find documentation and examples on how to route traffic to a specific version of a service based on the request header. But this example can only take effect on the first service of the entire link.

For example, a request needs to access three ABC services, and all three services have enversions and frversions. we hope:

  • user:enThe request with the header value , the full link route isA1-B1-C1
  • user:frThe request with the header value , the full link route isA2-B2-C2

The corresponding VirtualService configuration is as follows:

http:
- name: A|B|C-route
  match:
  - headers:
      user:
        exact: en
  route:
  - destination:
      host: A|B|C-svc
      subset: v1
- route:
  - destination:
      host: A|B|C-svc
      subset: v2

Through actual measurement, we can find that only the route of the service A is in line with our expectations. B and C cannot be routed to the specified version based on the Header value.

Why is this? For the microservices on the service grid, this header appears out of thin air, that is, the microservice code is unaware. Therefore, when A service requests B service, this header will not be transparently transmitted; that is, when A requests B, this header has been lost. At this time, the VirtualService configuration that matches the header for routing is meaningless.

To solve this problem, from the business point of view of the microservice party, only the code can be modified (enumerate all headers that the business focuses on and transmit transparently). But this is an intrusive modification, and it cannot flexibly support new headers.

From the perspective of the infrastructure of the service grid, any header is a kv pair that has no business meaning and needs to be transparently transmitted. Only by doing this, the service grid can transparently transmit user-defined headers indiscriminately, thereby supporting the non-intrusive full-link A/B Test function.

So how can it be achieved?

1.2 Status of the community

As explained above, in the case that the header cannot be transmitted transparently, this function cannot be realized by simply configuring the header matching of the VirtualService.

However, is there any other configuration in VirtualService that can realize header transparent transmission? If it exists, the cost of using VirtualService is minimal.

After various attempts (including careful configuration of header-related set/add), I found it impossible to achieve. The reason is that the intervention of VirtualService on the header occurs in the inbound phase, and the transparent transmission needs to intervene in the header in the outbound phase. The microservice workload has no ability to transparently transmit the header value that appears out of thin air, so this header will be lost when it is routed to the next service.

Therefore, we can draw a conclusion: it is impossible to use VirtualService alone to achieve a non-intrusive full-link A/B Test. Furthermore, the existing configurations provided by the community cannot be used directly to support this function.

Then, only the more advanced configuration of EnvoyFilter is left. This is the conclusion we didn't want at the beginning. There are two reasons:

  1. The configuration of EnvoyFilter is too complicated, and it is difficult for general users to quickly learn and use it in the service grid. Even if we provide examples, once the requirements change slightly, the examples are of little reference value for modifying EnvoyFilter.
  2. Even if EnvoyFilter is used, the current built-in filter in Envoy does not directly support this function, and it needs to be developed with the help of Lua or WebAssembly (WASM).

1.3 Implementation scheme

Next enter the technology selection. I use one sentence to summarize:

  • The advantage of Lua is its compactness, but its disadvantage is its unsatisfactory performance
  • The advantage of WASM is good performance, but the disadvantage is that development and distribution are more difficult than Lua.
  • The mainstream implementation of WASM is C++ and Rust, and the implementation of other languages ​​is not yet mature or has performance problems. This article uses Rust.

We use Rust to develop a WASM. In the outbound phase, we get the header defined by the user in EnvoyFilter and pass it back.

The distribution of the WASM package uses the configmap storage of Kubernetes, and the Pod obtains and loads the WASM configuration through the definition in the annotation. (Why use this form of distribution will be discussed later.)

2 Technical realization

Related code described in this section:
https://github.com/AliyunContainerService/rust-wasm-4-envoy/tree/master/propagate-headers-filter

2.1 Use RUST to implement WASM

1 Define dependencies

The core of the WASM project depends on only one crates , which is proxy-wasm , which is the basic package for developing WASM using Rust. In addition, there is a package serde_json for deserialization and a package log for printing logs . Cargo.tomlIt is defined as follows:

[dependencies]
proxy-wasm = "0.1.3"
serde_json = "1.0.62"
log = "0.4.14"

2 Define the build

The final build form of WASM is a dynamic link library compatible with c, which is Cargo.tomldefined as follows:

[lib]
name = "propaganda_filter"
path = "src/propagate_headers.rs"
crate-type = ["cdylib"]

3 Header transparent transmission function

First define the structure as follows, which head_tag_nameis the name of the user-defined header key and the name head_tag_valueof the corresponding value.

struct PropagandaHeaderFilter {
    config: FilterConfig,
}

struct FilterConfig {
    head_tag_name: String,
    head_tag_value: String,
}

{proxy-wasm}/src/traits.rsThe method is trait HttpContextdefined in on_http_request_headers. We implement this method to complete the Header transparent transmission function.

impl HttpContext for PropagandaHeaderFilter {
    fn on_http_request_headers(&mut self, _: usize) -> Action {
        let head_tag_key = self.config.head_tag_name.as_str();
        info!("::::head_tag_key={}", head_tag_key);
        if !head_tag_key.is_empty() {
            self.set_http_request_header(head_tag_key, Some(self.config.head_tag_value.as_str()));
            self.clear_http_route_cache();
        }
        for (name, value) in &self.get_http_request_headers() {
            info!("::::H[{}] -> {}: {}", self.context_id, name, value);
        }
        Action::Continue
    }
}

Lines 3-6 are to get the user-defined header key-value pair in the configuration file. If it exists, call the set_http_request_headermethod to write the key-value pair into the current header.

Line 7 is a workaround for the current proxy-wasm implementation. If you are interested in this, you can read the following reference:

2.2 Local authentication (based on Envoy)

1 WASM construction

Use the following commands to build the WASM project. It should be emphasized that wasm32-unknown-unknownthis target currently only exists in nightly, so you need to temporarily switch the build environment before building.

rustup override set nightly
cargo build --target=wasm32-unknown-unknown --release

After the build is complete, we use docker compose to start Envoy locally to verify the WASM function.

2 Envoy configuration

In this example, two files need to be provided for Envoy startup, one is the built one propaganda_filter.wasm, and the other is the Envoy configuration file envoy-local-wasm.yaml. The schematic is as follows:

volumes:
  - ./config/envoy/envoy-local-wasm.yaml:/etc/envoy-local-wasm.yaml
  - ./target/wasm32-unknown-unknown/release/propaganda_filter.wasm:/etc/propaganda_filter.wasm

Envoy supports dynamic configuration, and local testing uses static configuration:

static_resources:
  listeners:
    - address:
        socket_address:
          address: 0.0.0.0
          port_value: 80
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
...
                http_filters:
                  - name: envoy.filters.http.wasm
                    typed_config:
                      "@type": type.googleapis.com/udpa.type.v1.TypedStruct
                      type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
                      value:
                        config:
                          name: "header_filter"
                          root_id: "propaganda_filter"
                          configuration:
                            "@type": "type.googleapis.com/google.protobuf.StringValue"
                            value: |
                              {
                                "head_tag_name": "custom-version",
                                "head_tag_value": "hello1-v1"
                              }
                          vm_config:
                            runtime: "envoy.wasm.runtime.v8"
                            vm_id: "header_filter_vm"
                            code:
                              local:
                                filename: "/etc/propaganda_filter.wasm"
                            allow_precompiled: true
...

Envoy configuration focuses on the following 3 points:

  • In line 15, we http_filtersdefined a name header_filterintype.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
  • The 32-line local file path is/etc/propaganda_filter.wasm
  • The type of line 20-26 is related to the configuration type.googleapis.com/google.protobuf.StringValue, and the content of the value is {"head_tag_name": "custom-version","head_tag_value": "hello1-v1"}. Here the custom Header key name is custom-versionand the value is hello1-v1.

3 Local verification

Execute the following command to start docker compose:

docker-compose up --build

Request local service:

curl -H "version-tag":"v1" "localhost:18000"

At this time, Envoy's log should have the following output:

proxy_1        | [2021-02-25 06:30:09.217][33][info][wasm] [external/envoy/source/extensions/common/wasm/context.cc:1152] wasm log: ::::create_http_context head_tag_name=custom-version,head_tag_value=hello1-v1
proxy_1        | [2021-02-25 06:30:09.217][33][info][wasm] [external/envoy/source/extensions/common/wasm/context.cc:1152] wasm log: ::::head_tag_key=custom-version
...
proxy_1        | [2021-02-25 06:30:09.217][33][info][wasm] [external/envoy/source/extensions/common/wasm/context.cc:1152] wasm log: ::::H[2] -> custom-version: hello1-v1

2.3 How to distribute WASM

The distribution of WASM refers to the process of storing WASM packages in a distributed warehouse for the specified Pod to pull.

1 Configmap + Envoy's Local method

Although this method is not the final state of WASM distribution, because it is easier to understand and suitable for simple scenarios, this example finally chose this solution as an example to explain. Although configmap's own job is not for WASM, the local modes of configmap and Envoy are mature, and the combination of the two can meet current needs.

The ASM product of Alibaba Cloud Service Grid already provides this similar way. For details, please refer  to Writing WASM Filter for Envoy and deploying it to ASM .

To pack the WASM package into the configuration, the first consideration is the size of the package. We use wasm-gc for packet cutting, as shown below:

ls -hl target/wasm32-unknown-unknown/release/propaganda_filter.wasm
wasm-gc ./target/wasm32-unknown-unknown/release/propaganda_filter.wasm ./target/wasm32-unknown-unknown/release/propaganda-header-filter.wasm
ls -hl target/wasm32-unknown-unknown/release/propaganda-header-filter.wasm

The execution result is as follows, you can see the comparison of the size of the bag before and after cutting:

-rwxr-xr-x  2 han  staff   1.7M Feb 25 15:38 target/wasm32-unknown-unknown/release/propaganda_filter.wasm
-rw-r--r--  1 han  staff   136K Feb 25 15:38 target/wasm32-unknown-unknown/release/propaganda-header-filter.wasm

Create configmap:

wasm_image=target/wasm32-unknown-unknown/release/propaganda-header-filter.wasm
kubectl -n $NS create configmap -n $NS propaganda-header --from-file=$wasm_image

Patch for the specified Deployment:

patch_annotations=$(cat config/annotations/patch-annotations.yaml)
kubectl -n $NS patch deployment "hello$i-deploy-v$j" -p "$patch_annotations"

patch-annotations.yamlas follows:

spec:
  template:
    metadata:
      annotations:
        sidecar.istio.io/userVolume: '[{"name":"wasmfilters-dir","configMap": {"name":"propaganda-header"}}]'
        sidecar.istio.io/userVolumeMount: '[{"mountPath":"/var/local/lib/wasm-filters","name":"wasmfilters-dir"}]'

2 Envoy's Remote Way

Envoy supports both localand remoteformal resource definitions. The comparison is as follows:

vm_config:
  runtime: "envoy.wasm.runtime.v8"
  vm_id: "header_filter_vm"
  code:
    local:
      filename: "/etc/propaganda_filter.wasm"
vm_config:
  runtime: "envoy.wasm.runtime.v8"
  code:
    remote:
      http_uri:
        uri: "http://*.*.*.216:8000/propaganda_filter.wasm"
        cluster: web_service
        timeout:
          seconds: 60
      sha256: "da2e22*"

remoteThe method is closest to the original Enovy, so this method was originally the first choice for this example. However, during the actual measurement process, it was found that there was a problem in the hash verification of the package. For details, please refer to the reference below. In addition, the Envoy community’s Daniel Weekly praised me and said that it remoteis not the future direction of Envoy to support WASM distribution. Therefore, this example finally abandoned this approach.

3 ORAS + Local method

ORAS is the reference implementation of the OCI Artifacts project, which can significantly simplify the storage of any content in the OCI registry.

Use ORAS client or API/SDK to push Wasm modules with allowed media types to the registration library (an OCI-compatible registration library), and then deploy the Wasm Filter to the Pod corresponding to the specified workload through the controller. Mount in local mode.

Alibaba Cloud Service Grid ASM product provides support for WebAssembly (WASM) technology. Service grid users can deploy the extended WASM Filter through ASM to the corresponding Envoy proxy in the data plane cluster. Through the ASMFilterDeployment Controller component, it can support dynamic loading of plug-ins, easy to use, and support for hot updates. Specifically, the ASM product provides a new CRD ASMFilterDeployment and related controller components. This controller component will monitor the ASMFilterDeployment resource object, and will do two things:

  • Create an Istio EnvoyFilter Custom Resource for the control plane and push it to the corresponding asm control plane istiod
  • Pull the corresponding wasm filter image from the OCI registry and mount it to the corresponding workload pod

For details, please refer to: Simplify and expand service grid functions based on Wasm and ORAS .

Follow-up practice sharing will use this method to distribute WASM, so stay tuned.

Similarly, other friends in the industry are also advancing this approach, especially http://Solo.io provides a complete set of WASM development framework wasme, based on which can develop-build-distribute WASM packages (OCI image) and deploy Go to the Webassembly Hub . The advantages of this solution are obvious, and it fully supports the life cycle of WASM from development to online. But the shortcomings of this scheme are also very obvious. The self-contained wasme makes it difficult to split it and extend it beyond the solo system.

Alibaba Cloud Service Grid ASM team is communicating with relevant industry teams including solo on how to jointly promote the OCI specification of Wasm filter and the corresponding life cycle management to help customers easily expand Envoy's functions and put it in the service grid. The application has been pushed to new heights.

2.4 Cluster verification (based on Istio)

1 Experimental example

After WASM is distributed to the configmap of Kubernetes, we can perform cluster verification. Example ( source code ) contains three-Service: hello1- hello2- hello3, each service contains two versions: v1/ enand v2/ fr.

Each Service is configured with VirtualService and DestinationRule to define matching Header and route to the specified version.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: hello2-vs
spec:
  hosts:
    - hello2-svc
  http:
  - name: hello2-v2-route
    match:
    - headers:
        route-v:
          exact: hello2v2
    route:
    - destination:
        host: hello2-svc
        subset: hello2v2
  - route:
    - destination:
        host: hello2-svc
        subset: hello2v1
----
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: hello2-dr
spec:
  host: hello2-svc
  subsets:
    - name: hello2v1
      labels:
        version: v1
    - name: hello2v2
      labels:
        version: v2

Envoyfilter is as follows:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: hello1v2-propaganda-filter
spec:
  workloadSelector:
    labels:
      app: hello1-deploy-v2
      version: v2
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_OUTBOUND
        proxy:
          proxyVersion: "^1\\.8\\.*"
        listener:
          filterChain:
            filter:
              name: envoy.filters.network.http_connection_manager
              subFilter:
                name: envoy.filters.http.router
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.wasm
          typed_config:
            "@type": type.googleapis.com/udpa.type.v1.TypedStruct
            type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
            value:
              config:
                name: propaganda_filter
                root_id: propaganda_filter_root
                configuration:
                  '@type': type.googleapis.com/google.protobuf.StringValue
                  value: |
                    {
                      "head_tag_name": "route-v",
                      "head_tag_value": "hello2v2"
                    }
                vm_config:
                  runtime: envoy.wasm.runtime.v8
                  vm_id: propaganda_filter_vm
                  code:
                    local:
                      filename: /var/local/lib/wasm-filters/propaganda-header-filter.wasm
                  allow_precompiled: true

2 Verification method

The request carrying the header is curl -H "version:v1" "http://$ingressGatewayIp:8001/hello/xxx"entered through the istio-ingressgateway, and the entire link enters the specified version of the service according to the header value. Here, as specified in the header versionas v2, the whole link
is hello1 v2- hello2 v2- hello3 v2. Results as shown below.

The verification process and results are as follows.

for i in {1..5}; do
    curl -s -H "route-v:v2" "http://$ingressGatewayIp:$PORT/hello/eric" >>result
    echo >>result
done
check=$(grep -o "Bonjour eric" result | wc -l)
if [[ "$check" -eq "15" ]]; then
    echo "pass"
else
    echo "fail"
    exit 1
fi

result:

Bonjour eric@hello1:172.17.68.205<Bonjour eric@hello2:172.17.68.206<Bonjour eric@hello3:172.17.68.182
Bonjour eric@hello1:172.17.68.205<Bonjour eric@hello2:172.17.68.206<Bonjour eric@hello3:172.17.68.182
Bonjour eric@hello1:172.17.68.205<Bonjour eric@hello2:172.17.68.206<Bonjour eric@hello3:172.17.68.182
Bonjour eric@hello1:172.17.68.205<Bonjour eric@hello2:172.17.68.206<Bonjour eric@hello3:172.17.68.182
Bonjour eric@hello1:172.17.68.205<Bonjour eric@hello2:172.17.68.206<Bonjour eric@hello3:172.17.68.182

We see that the output information Bonjour ericcomes from the frversion of each service , indicating that the function verification is passed.

3 performance analysis

After adding EnvoyFilter+WASM, the functional verification is passed, but how much delay will it bring? This is a problem that both providers and users of service grids are very concerned about. This section will verify the following two concerns.

  • Increase the incremental delay overhead after EnvoyFilter+WASM
  • Cost comparison between WASM version and Lua version

3.1 Lua implementation

The implementation of Lua can be written directly to EnvoyFilter without a separate project. Examples are as follows:

patch:
  operation: INSERT_BEFORE
  value:
    name: envoy.lua
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
      inlineCode: |
        function envoy_on_request(handle)
          handle:logInfo("[propagate header] route-v:hello3v2")
          handle:headers():add("route-v", "hello3v2")
        end

3.2 Pressure measurement method

1 deployment

  • Deploy the same Deployment/Service/VirtualService/DestinationRule on the 3 namespaces respectively
  • In hello-abtest-luadeploying the EnvoyFilter in Lua
  • In hello-abtest-wasmdeploying WASM in the EnvoyFilter
hello-abtest        基准环境
hello-abtest-lua    增加EnvoyFilter+LUA的环境
hello-abtest-wasm   增加EnvoyFilter+WASM的环境

2 tools

This example uses hey as a pressure measurement tool. The predecessor of hey is boom , used to replace ab (Apache Bench). Use the same pressure test parameters to pressure test the three environments respectively. The schematic is as follows:

# 并发work数量
export NUM=2000
# 每秒请求数量
export QPS=2000
# 压测执行时常
export Duration=10s

hey -c $NUM -q $QPS -z $Duration -H "route-v:v2" http://$ingressGatewayIp:$PORT/hello/eric > $SIDECAR_WASM_RESULT

Please pay attention to the hey pressure test result file, the result cannot appear at the end socket: too many open files, otherwise it will affect the result. You can use ulimit -n $MAX_OPENFILE_NUMcommands to configure and then adjust the pressure test parameters to ensure the accuracy of the results.

3.3 Report

We select 4 key indicators from the three result reports, as shown in the figure below:

3.4 Conclusion

  1. Compared with the benchmark version, adding two versions of EnvoyFilter, the average delay is tens to hundreds of milliseconds longer, and the increase in time-consuming ratio is
  • wasm  1.2%(0.6395-0.6317)/0.6317 and 1%(1.3290-1.2078)/1.2078
  • lua  11%(0.7012-0.6317)/0.6317 and 20%(1.4593-1.2078)/1.2078
  1. The performance of the WASM version is significantly better than the LUA version
Note: Compared with the LUA version, the WASM implementation is a set of codes with multiple configurations. Therefore, the execution process of WASM has one more process of obtaining configuration variables than LUA.

4 Outlook

4.1 How to use

From the perspective of technical implementation, this article describes how to implement and verify a WASM that transparently transmits a user-defined header, so as to support the requirement of non-intrusive full-link A/B Test.

However, as a service grid user, it is very cumbersome and error-prone to implement it step by step according to this article.

The ASM team of Alibaba Cloud Service Grid is launching an ASM plug-in catalog mechanism. Users only need to select a plug-in in the plug-in catalog and provide a very small number of kv configurations such as a custom header for the plug-in to automatically generate and deploy related configurations. EnvoyFilter+WASM+VirtualService+DestinationRule.

4.2 How to expand

This example only shows the matching routing function based on Header. If we want to match and route based on Query Params, how can we expand it?

This is a topic that the ASM plug-in catalog is paying close attention to, and the final plug-in catalog will provide best practices.

the above.

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/weixin_43970890/article/details/114918764