Introduction to Istio Ambient Mesh

1 Introduction to Ambient Mesh

Istio's traditional model is to deploy the Envoy proxy as a sidecar in the workload's pods, and while sidecars have significant advantages over refactoring applications, there are still some limitations:

  • Intrusive : The sidecar must "inject" the application by modifying the Kubernetes pod's configuration and redirecting traffic. Therefore, installing and upgrading sidecars requires restarting pods, which will have an impact on the workload.
  • Low resource utilization : Since the sidecar proxy is injected into each workload Pod, the Pod must reserve enough CPU and memory resources for the sidecar, resulting in insufficient resource utilization of the entire cluster.
  • Traffic disruption : Traffic capture and HTTP processing are usually done by Istio's sidecars, computations are resource intensive and may break some applications that do not conform to HTTP implementations.

The Istio ambient mesh is a sidecar-free data plane for Istio designed to reduce infrastructure costs and improve performance. Its essence is to separate the L4 and L7 functions in the sidecar proxy (Envoy), so that some users who only need security functions can use the Istio service mesh with minimal resistance (low resource consumption, operation and maintenance costs).

The ambient mesh splits the functionality of Istio into 2 distinct layers:

  • L4 Security Overlay: Users can use features like TCP routing, mTLS, and limited observability.
  • L7 processing layer: Users can enable L7 features on demand to get the full functionality of Istio, such as rate limiting, fault injection, load balancing, circuit breaking, and more.

ztunnel is a shared proxy of ambient mesh running on each node, deployed in the way of DaemonSet, at the bottom layer of mesh similar to CNI. ztunnel builds a zero-trust tunnel (ztunnel) between nodes responsible for securely connecting and authenticating elements within the mesh. All traffic for workloads in the ambient mesh is redirected to the local ztunnel for processing, which identifies the traffic's workload and chooses the correct certificate for it to establish an mTLS connection.

ztunnel implements a core functionality in a service mesh: zero trust, it creates a secure overlay for workloads in an ambient mesh-enabled namespace, providing features like mTLS, telemetry, authentication, and L4 authorization without termination or resolution HTTP. After enabling the ambient mesh and creating a security overlay, namespaces can optionally be enabled for L7 functionality, which allows namespaces to implement the full suite of Istio functionality, including Virtual Services, L7 telemetry, and L7 authorization policies. waypoint proxy can automatically scale up and down according to the real-time traffic of the namespace it serves, which will save users a lot of resources.

 Istio will create the corresponding waypoint proxy for the service account according to the service, which can help users reduce the resource consumption and reduce the fault domain as much as possible. See Model III below.  

2 Ambient Mesh Supported Environments and Limitations

Currently, ambient mesh is known to only support the following environments, and other environments have not been tested yet.

  • GKE (without Calico or Dataplane V2)
  • EX
  • kind

And ambient mesh has many limitations, such as:

  • AuthorizationPolicy is not as strict as expected in some cases, or not effective at all.
  • In some cases a request to directly access the Pod IP instead of the Service will not work.
  •  Services under ambient mesh cannot be accessed through LoadBalance and NodePort, but you can deploy an ingress gateway (without ambient mesh enabled) to access services from outside;
  • STRICT mTLS cannot completely block cleartext traffic.
  • EnvoyFilter is not supported.

See Ambient Mesh[1] for details .

3 Using Eksctl to create a Kubernetes cluster on AWS

In this example, an EKS cluster will be created on AWS using eksctl to test the Istio ambient mesh. eksctl[2] is a CLI tool for managing EKS (Amazon Managed Kubernetes Service). See eksctl Getting started[3] for the installation and use of eksctl .

Create the cluster configuration file cluster.yaml, we will create an EKS cluster with 2 worker nodes, each node resource is 2C8G, and the cluster version is 1.23.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: aws-demo-cluster01
  region: us-east-1
  version: '1.23'

nodeGroups:
  - name: ng-1
    instanceType: m5.large
    desiredCapacity: 2
    volumeSize: 100
    ssh:
      allow: true # will use ~/.ssh/id_rsa.pub as the default ssh key

Execute the following command to create an EKS cluster.

eksctl create cluster -f cluster.yaml

After the creation is complete, view the EKS cluster.

> eksctl get cluster
NAME			REGION		EKSCTL CREATED
aws-demo-cluster01	us-east-1	True

Execute the following command to update the kubeconfig file of the aws-demo-cluster01 cluster to the ~/.kube/config file, so that our local kubectl tool can access the aws-demo-cluster01 cluster.

aws eks update-kubeconfig --region us-east-1 --name aws-demo-cluster01

See Installing or updating the latest version of the AWS CLI[4] for the installation of the aws CLI tools, and Configuration basics[5] for the certification of the aws CLI .

4 Download Istio

Download the istioctl binary files and sample resource files that support ambient mesh according to the corresponding operating system, see Istio download [6] . The binary file of istioctl can be found in the bin directory, and the sample resource files can be found in the samples directory.

5 Deploy the sample application

Deploy the Bookinfo application of the Istio sample, and two clients, sleep and notsleep. sleep and notsleep can execute curl commands to initiate HTTP requests.

kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
kubectl apply -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/sleep.yaml
kubectl apply -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/notsleep.yaml

The Pod and Service of our currently deployed istio and application are shown below.

6 Deploying Istio

Execute the following commands to install Istio and specify the profile=ambientparameters to deploy ambient mesh related components.

istioctl install --set profile=ambient

If the installation is successful, the following results will be output.

✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ CNI installed
✔ Installation complete

After the installation is complete, we can see the following components in the istio-system namespace:

  • istiod : The core component of Istio.
  • istio-ingressgateway : Manages north-south traffic in and out of the cluster, we will not use istio-ingressgateway in this example.
  • istio-cni : Configure traffic redirection for Pods that join the ambient mesh, and redirect the incoming and outgoing traffic of Pods to the ztunnel of the same node.
  • ztunnel : ztunnel builds zero-trust tunnels between nodes, providing mTLS, telemetry, authentication, and L4 authorization.
> kubectl get pod -n istio-system
NAME                                   READY   STATUS    RESTARTS   AGE
istio-cni-node-gfmqp                   1/1     Running   0          100s
istio-cni-node-t2flv                   1/1     Running   0          100s
istio-ingressgateway-f6d95c86b-mfk4t   1/1     Running   0          101s
istiod-6c99d96db7-4ckbm                1/1     Running   0          2m23s
ztunnel-fnjg2                          1/1     Running   0          2m24s
ztunnel-k4jhb                          1/1     Running   0          2m24s                       

7 Packet capture settings

In order to observe the traffic access situation more intuitively, we can capture the Pod, but the application Pod does not have the relevant capture tool installed. At this time, we can use the kubectl debug tool to create an ephemeral temporary container to share the namespace of the container. debugging. See Debugging a Running Pod[7] for details on kubectl debug .

Execute the following commands on the four terminals to capture packets for sleep and productpage and the ztunnel Pod on the two nodes. --imageThe parameter specifies the image of the temporary container. The nicolaka/netshootimage is pre-installed with common network packet capture tools such as tcpdump, tshark, and termshark.

kubectl debug -it sleep-55697f8897-n2ldz  --image=nicolaka/netshoot 
kubectl debug -it productpage-v1-5586c4d4ff-z8jbb --image=nicolaka/netshoot
kubectl debug -it -n istio-system ztunnel-fnjg2 --image=nicolaka/netshoot 
kubectl debug -it -n istio-system ztunnel-k4jhb --image=nicolaka/netshoot

Execute termshark -i eth0commands to capture packets on the eth0 network card of the Pod. Since istio-cni will continue to initiate HTTP health detection on ztunnel with /healthz/readypath , in order to avoid this traffic affecting our observation, set the following filter conditions in the termshark Filter box in the two ztunnel Pods.

# ztunnel-fnjg2,sleep 所在节点的 ztunnel
ip.addr==192.168.58.148 || ip.addr==192.168.13.108

# ztunnel-k4jhb,productpage 所在节点的 ztunnel
ip.addr==192.168.13.108

8 Not using Ambient Mesh to manage traffic

Since the current default Namespace has not been added to the ambient mesh, the traffic of the application does not pass through ztunnel at this time, and the communication between Pods is carried out through the Service system of kubernetes, and the traffic between Pods will not be encrypted by mTLS, but in plaintext. way of dissemination.

Make a request to productpage using sleep.

kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1

# 返回结果,响应结果的第一行内容
<!DOCTYPE html>

Looking at the packet capture results of sleep and productpage, you can see that sleep (192.168.58.148) accesses the service IP (10.100.171.143) of the productpage Service name after DNS resolution. After being forwarded by the kubernetes Service, it finally accesses the actual Pod IP of productpage ( 192.168.13.108).

At this time, the ambient mesh has not taken over the traffic of the default Namespace, so the relevant data packets will not be captured on the ztunnel.

9 Add Default Namespace to Ambient Mesh (L4 function)

Adding a istio.io/dataplane-mode=ambientlabel means adding the Namespace to the ambient mesh.

kubectl label namespace default istio.io/dataplane-mode=ambient

Once a Namespace is added to the ambient mesh, the istio-cni DaemonSet will set iptables redirection rules for the Pods in the Namespace to redirect all inbound and outbound traffic of the Pods to the ztunnel running on the same node.

9.1 MTLS traffic encryption

ztunnel creates a security overlay for workloads in an ambient mesh-enabled namespace, providing features such as mTLS, telemetry, authentication, and L4 authorization.

For the convenience of viewing, you can clear the previously captured packets on sleep and productpage.

Then use sleep to initiate a request to productpage.

kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1

The plaintext data packets can still be captured on sleep and productpage, but this time the source IP of the data packets captured on productpage becomes the IP address of the ztunnel of the node where sleep is located.

On the ztunnel where the sleep node is located, we can capture the plaintext data packets sent by sleep, and ztunnel encrypts the data packets and sends them to the ztunnel on the productpage node. The ztunnel on the productpage node receives the encrypted packet, decrypts it, and sends it to productpage.

We can also see access records in the logs of the ztunnel of the node where sleep and productpage are located. Check outbound traffic logs (sleep -> ztunnel on sleep node).

kubectl logs -n istio-system ztunnel-fnjg2 -f

We can see the words (no waypoint proxy) in the log of outbound traffic. By default, ambient mesh only performs L4 processing and does not perform L7 processing. Therefore, the traffic will only pass through ztunnel at this time, and will not pass through the waypoint proxy.

Check the traffic log in the inbound direction (ztunnel -> productpage on the productpage).

kubectl logs -n istio-system ztunnel-k4jhb -f

9.2 L4 authorization policy

The security overlay can implement a simple L4 authorization policy, create an AuthorizationPolicy as shown below, and only allow users whose Service Account is sleep to access the application whose label is app=productpage.

kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: productpage-viewer
 namespace: default
spec:
 selector:
   matchLabels:
     app: productpage
 action: ALLOW
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/default/sa/sleep"]
EOF

Execute the following requests on sleep and notsleep respectively. Since L7 processing is not currently enabled, restrictions on HTTP request methods, paths and other conditions are not yet available.

# 成功
kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1
# 成功
kubectl exec deploy/sleep -- curl -XDELETE -s http://productpage:9080/ | head -n1
# 失败,只允许 sa 是 sleep 的用户
kubectl exec deploy/notsleep -- curl -s http://productpage:9080/ | head -n1

10 Enable L7 function

To enable L7 mesh capability for a service, you need to explicitly create a Gateway. Note that the gatewayClassName in the created Gateway resource must be set to istio-mesh, so that Istio will create the corresponding waypoint proxy for the productpage. Any traffic destined for the productpage service will go through the L7 proxy, the waypoint proxy.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
 name: productpage
 annotations:
   istio.io/service-account: bookinfo-productpage
spec:
 gatewayClassName: istio-mesh
EOF

Check out the waypoint proxy created by Istio for productpage.

Visit the productpage from sleep.

kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1

Check outbound traffic logs (sleep -> ztunnel on sleep node -> waypoint proxy).

kubectl logs -n istio-system ztunnel-fnjg2 -f

From the log below, you can see the words (to server waypoint proxy), indicating that the request is sent to the waypoint proxy for processing.

Check the traffic log in the inbound direction (ztunnel -> productpage on the productpage).

kubectl logs -n istio-system ztunnel-k4jhb -f

10.1 L7 authorization policy

Next, update the AuthorizationPolicy to allow only users whose Service Account is sleep to access the application with the label app=productpage through GET.

kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: productpage-viewer
 namespace: default
spec:
 selector:
   matchLabels:
     app: productpage
 action: ALLOW
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/default/sa/sleep"]
   to:
   - operation:
       methods: ["GET"]
EOF

Execute the following requests on sleep and notsleep respectively, this time the HTTP DELETE request on sleep is also rejected.

# 成功
kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1
# 失败,RBAC 错误,因为不是 GET 请求
kubectl exec deploy/sleep -- curl -X DELETE -s http://productpage:9080/ | head -n1
# 失败,RBAC 错误,只允许 sa 是 sleep 的用户
kubectl exec deploy/notsleep -- curl -s http://productpage:9080/  | head -n1

10.2 Observability

L7 metrics for all productpage service requests can be viewed on the productpage waypoint proxy.

kubectl exec deploy/bookinfo-productpage-waypoint-proxy -- curl -s http://localhost:15020/stats/prometheus | grep istio_requests_total

10.3 Flow Control

First create a gateway for the reviews service and enable L7 capabilities.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
 name: reviews
 annotations:
   istio.io/service-account: bookinfo-reviews
spec:
 gatewayClassName: istio-mesh
EOF

Then create a VirtualService and a DestinationRule respectively to control the traffic to be sent to the reviews service of v1 and v2 in a 90/10 ratio.

kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
    - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 90
    - destination:
        host: reviews
        subset: v2
      weight: 10
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: reviews
spec:
  host: reviews
  trafficPolicy:
    loadBalancer:
      simple: RANDOM
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  - name: v3
    labels:
      version: v3
EOF

Execute the following command to send 10 requests from sleep to productpage, and you can see that about 10% of the traffic goes to reviews-v2.

# 注意访问路径是 http://productpage:9080/productpage,会调用 reviews 服务
kubectl exec -it deploy/sleep -- sh -c 'for i in $(seq 1 10); do curl -s http://productpage:9080/productpage | grep reviews-v.-; done'

# 返回结果
 <u>reviews-v1-7598cc9867-dh7hp</u>
 <u>reviews-v1-7598cc9867-dh7hp</u>
 <u>reviews-v1-7598cc9867-dh7hp</u>
 <u>reviews-v1-7598cc9867-dh7hp</u>
 <u>reviews-v2-6bdd859457-7lxhc</u>
 <u>reviews-v1-7598cc9867-dh7hp</u>
 <u>reviews-v1-7598cc9867-dh7hp</u>
 <u>reviews-v1-7598cc9867-dh7hp</u>
 <u>reviews-v1-7598cc9867-dh7hp</u>
 <u>reviews-v2-6bdd859457-7lxhc</u>

10.4 Fault Injection

Create a VirtualService for the productpage service and inject a 5s delay into the request.

kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: productpage
spec:
  hosts:
    - productpage
  http:
  - route:
    - destination:
        host: productpage
    fault:
      delay:
         percentage:
            value: 100.0
         fixedDelay: 5s
EOF

Visit the productpage from sleep, and you can see that the request takes about 5s.

> kubectl exec deploy/sleep -- time curl -s http://productpage:9080 | head -n 1

# 返回结果
<!DOCTYPE html>
real	0m 5.04s
user	0m 0.00s
sys	    0m 0.00s

11 Clean up the environment

# 卸载 Istio
istioctl uninstall -y --purge && istioctl delete ns istio-system
# 删除示例应用
kubectl delete -f samples/bookinfo/platform/kube/bookinfo.yaml
kubectl delete -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/sleep.yaml
kubectl delete -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/notsleep.yaml
# 删除集群
eksctl delete cluster aws-demo-cluster01

12 Experience Demo

Friends who want to quickly experience ambient mesh can also try the tutorial on the solo.io official website [8] .

13 References

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4923278/blog/5577986