1 Introduction to Ambient Mesh
Istio's traditional model is to deploy the Envoy proxy as a sidecar in the workload's pods, and while sidecars have significant advantages over refactoring applications, there are still some limitations:
- Intrusive : The sidecar must "inject" the application by modifying the Kubernetes pod's configuration and redirecting traffic. Therefore, installing and upgrading sidecars requires restarting pods, which will have an impact on the workload.
- Low resource utilization : Since the sidecar proxy is injected into each workload Pod, the Pod must reserve enough CPU and memory resources for the sidecar, resulting in insufficient resource utilization of the entire cluster.
- Traffic disruption : Traffic capture and HTTP processing are usually done by Istio's sidecars, computations are resource intensive and may break some applications that do not conform to HTTP implementations.
The Istio ambient mesh is a sidecar-free data plane for Istio designed to reduce infrastructure costs and improve performance. Its essence is to separate the L4 and L7 functions in the sidecar proxy (Envoy), so that some users who only need security functions can use the Istio service mesh with minimal resistance (low resource consumption, operation and maintenance costs).
The ambient mesh splits the functionality of Istio into 2 distinct layers:
- L4 Security Overlay: Users can use features like TCP routing, mTLS, and limited observability.
- L7 processing layer: Users can enable L7 features on demand to get the full functionality of Istio, such as rate limiting, fault injection, load balancing, circuit breaking, and more.
ztunnel is a shared proxy of ambient mesh running on each node, deployed in the way of DaemonSet, at the bottom layer of mesh similar to CNI. ztunnel builds a zero-trust tunnel (ztunnel) between nodes responsible for securely connecting and authenticating elements within the mesh. All traffic for workloads in the ambient mesh is redirected to the local ztunnel for processing, which identifies the traffic's workload and chooses the correct certificate for it to establish an mTLS connection.
ztunnel implements a core functionality in a service mesh: zero trust, it creates a secure overlay for workloads in an ambient mesh-enabled namespace, providing features like mTLS, telemetry, authentication, and L4 authorization without termination or resolution HTTP. After enabling the ambient mesh and creating a security overlay, namespaces can optionally be enabled for L7 functionality, which allows namespaces to implement the full suite of Istio functionality, including Virtual Services, L7 telemetry, and L7 authorization policies. waypoint proxy can automatically scale up and down according to the real-time traffic of the namespace it serves, which will save users a lot of resources.
Istio will create the corresponding waypoint proxy for the service account according to the service, which can help users reduce the resource consumption and reduce the fault domain as much as possible. See Model III below.
2 Ambient Mesh Supported Environments and Limitations
Currently, ambient mesh is known to only support the following environments, and other environments have not been tested yet.
- GKE (without Calico or Dataplane V2)
- EX
- kind
And ambient mesh has many limitations, such as:
- AuthorizationPolicy is not as strict as expected in some cases, or not effective at all.
- In some cases a request to directly access the Pod IP instead of the Service will not work.
- Services under ambient mesh cannot be accessed through LoadBalance and NodePort, but you can deploy an ingress gateway (without ambient mesh enabled) to access services from outside;
- STRICT mTLS cannot completely block cleartext traffic.
- EnvoyFilter is not supported.
See Ambient Mesh[1] for details .
3 Using Eksctl to create a Kubernetes cluster on AWS
In this example, an EKS cluster will be created on AWS using eksctl to test the Istio ambient mesh. eksctl[2] is a CLI tool for managing EKS (Amazon Managed Kubernetes Service). See eksctl Getting started[3] for the installation and use of eksctl .
Create the cluster configuration file cluster.yaml, we will create an EKS cluster with 2 worker nodes, each node resource is 2C8G, and the cluster version is 1.23.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: aws-demo-cluster01
region: us-east-1
version: '1.23'
nodeGroups:
- name: ng-1
instanceType: m5.large
desiredCapacity: 2
volumeSize: 100
ssh:
allow: true # will use ~/.ssh/id_rsa.pub as the default ssh key
Execute the following command to create an EKS cluster.
eksctl create cluster -f cluster.yaml
After the creation is complete, view the EKS cluster.
> eksctl get cluster
NAME REGION EKSCTL CREATED
aws-demo-cluster01 us-east-1 True
Execute the following command to update the kubeconfig file of the aws-demo-cluster01 cluster to the ~/.kube/config file, so that our local kubectl tool can access the aws-demo-cluster01 cluster.
aws eks update-kubeconfig --region us-east-1 --name aws-demo-cluster01
See Installing or updating the latest version of the AWS CLI[4] for the installation of the aws CLI tools, and Configuration basics[5] for the certification of the aws CLI .
4 Download Istio
Download the istioctl binary files and sample resource files that support ambient mesh according to the corresponding operating system, see Istio download [6] . The binary file of istioctl can be found in the bin directory, and the sample resource files can be found in the samples directory.
5 Deploy the sample application
Deploy the Bookinfo application of the Istio sample, and two clients, sleep and notsleep. sleep and notsleep can execute curl commands to initiate HTTP requests.
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
kubectl apply -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/sleep.yaml
kubectl apply -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/notsleep.yaml
The Pod and Service of our currently deployed istio and application are shown below.
6 Deploying Istio
Execute the following commands to install Istio and specify the profile=ambient
parameters to deploy ambient mesh related components.
istioctl install --set profile=ambient
If the installation is successful, the following results will be output.
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ CNI installed
✔ Installation complete
After the installation is complete, we can see the following components in the istio-system namespace:
- istiod : The core component of Istio.
- istio-ingressgateway : Manages north-south traffic in and out of the cluster, we will not use istio-ingressgateway in this example.
- istio-cni : Configure traffic redirection for Pods that join the ambient mesh, and redirect the incoming and outgoing traffic of Pods to the ztunnel of the same node.
- ztunnel : ztunnel builds zero-trust tunnels between nodes, providing mTLS, telemetry, authentication, and L4 authorization.
> kubectl get pod -n istio-system
NAME READY STATUS RESTARTS AGE
istio-cni-node-gfmqp 1/1 Running 0 100s
istio-cni-node-t2flv 1/1 Running 0 100s
istio-ingressgateway-f6d95c86b-mfk4t 1/1 Running 0 101s
istiod-6c99d96db7-4ckbm 1/1 Running 0 2m23s
ztunnel-fnjg2 1/1 Running 0 2m24s
ztunnel-k4jhb 1/1 Running 0 2m24s
7 Packet capture settings
In order to observe the traffic access situation more intuitively, we can capture the Pod, but the application Pod does not have the relevant capture tool installed. At this time, we can use the kubectl debug tool to create an ephemeral temporary container to share the namespace of the container. debugging. See Debugging a Running Pod[7] for details on kubectl debug .
Execute the following commands on the four terminals to capture packets for sleep and productpage and the ztunnel Pod on the two nodes. --image
The parameter specifies the image of the temporary container. The nicolaka/netshoot
image is pre-installed with common network packet capture tools such as tcpdump, tshark, and termshark.
kubectl debug -it sleep-55697f8897-n2ldz --image=nicolaka/netshoot
kubectl debug -it productpage-v1-5586c4d4ff-z8jbb --image=nicolaka/netshoot
kubectl debug -it -n istio-system ztunnel-fnjg2 --image=nicolaka/netshoot
kubectl debug -it -n istio-system ztunnel-k4jhb --image=nicolaka/netshoot
Execute termshark -i eth0
commands to capture packets on the eth0 network card of the Pod. Since istio-cni will continue to initiate HTTP health detection on ztunnel with /healthz/ready
path , in order to avoid this traffic affecting our observation, set the following filter conditions in the termshark Filter box in the two ztunnel Pods.
# ztunnel-fnjg2,sleep 所在节点的 ztunnel
ip.addr==192.168.58.148 || ip.addr==192.168.13.108
# ztunnel-k4jhb,productpage 所在节点的 ztunnel
ip.addr==192.168.13.108
8 Not using Ambient Mesh to manage traffic
Since the current default Namespace has not been added to the ambient mesh, the traffic of the application does not pass through ztunnel at this time, and the communication between Pods is carried out through the Service system of kubernetes, and the traffic between Pods will not be encrypted by mTLS, but in plaintext. way of dissemination.
Make a request to productpage using sleep.
kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1
# 返回结果,响应结果的第一行内容
<!DOCTYPE html>
Looking at the packet capture results of sleep and productpage, you can see that sleep (192.168.58.148) accesses the service IP (10.100.171.143) of the productpage Service name after DNS resolution. After being forwarded by the kubernetes Service, it finally accesses the actual Pod IP of productpage ( 192.168.13.108).
At this time, the ambient mesh has not taken over the traffic of the default Namespace, so the relevant data packets will not be captured on the ztunnel.
9 Add Default Namespace to Ambient Mesh (L4 function)
Adding a istio.io/dataplane-mode=ambient
label means adding the Namespace to the ambient mesh.
kubectl label namespace default istio.io/dataplane-mode=ambient
Once a Namespace is added to the ambient mesh, the istio-cni DaemonSet will set iptables redirection rules for the Pods in the Namespace to redirect all inbound and outbound traffic of the Pods to the ztunnel running on the same node.
9.1 MTLS traffic encryption
ztunnel creates a security overlay for workloads in an ambient mesh-enabled namespace, providing features such as mTLS, telemetry, authentication, and L4 authorization.
For the convenience of viewing, you can clear the previously captured packets on sleep and productpage.
Then use sleep to initiate a request to productpage.
kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1
The plaintext data packets can still be captured on sleep and productpage, but this time the source IP of the data packets captured on productpage becomes the IP address of the ztunnel of the node where sleep is located.
On the ztunnel where the sleep node is located, we can capture the plaintext data packets sent by sleep, and ztunnel encrypts the data packets and sends them to the ztunnel on the productpage node. The ztunnel on the productpage node receives the encrypted packet, decrypts it, and sends it to productpage.
We can also see access records in the logs of the ztunnel of the node where sleep and productpage are located. Check outbound traffic logs (sleep -> ztunnel on sleep node).
kubectl logs -n istio-system ztunnel-fnjg2 -f
We can see the words (no waypoint proxy) in the log of outbound traffic. By default, ambient mesh only performs L4 processing and does not perform L7 processing. Therefore, the traffic will only pass through ztunnel at this time, and will not pass through the waypoint proxy.
Check the traffic log in the inbound direction (ztunnel -> productpage on the productpage).
kubectl logs -n istio-system ztunnel-k4jhb -f
9.2 L4 authorization policy
The security overlay can implement a simple L4 authorization policy, create an AuthorizationPolicy as shown below, and only allow users whose Service Account is sleep to access the application whose label is app=productpage.
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: productpage-viewer
namespace: default
spec:
selector:
matchLabels:
app: productpage
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/sleep"]
EOF
Execute the following requests on sleep and notsleep respectively. Since L7 processing is not currently enabled, restrictions on HTTP request methods, paths and other conditions are not yet available.
# 成功
kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1
# 成功
kubectl exec deploy/sleep -- curl -XDELETE -s http://productpage:9080/ | head -n1
# 失败,只允许 sa 是 sleep 的用户
kubectl exec deploy/notsleep -- curl -s http://productpage:9080/ | head -n1
10 Enable L7 function
To enable L7 mesh capability for a service, you need to explicitly create a Gateway. Note that the gatewayClassName in the created Gateway resource must be set to istio-mesh, so that Istio will create the corresponding waypoint proxy for the productpage. Any traffic destined for the productpage service will go through the L7 proxy, the waypoint proxy.
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
name: productpage
annotations:
istio.io/service-account: bookinfo-productpage
spec:
gatewayClassName: istio-mesh
EOF
Check out the waypoint proxy created by Istio for productpage.
Visit the productpage from sleep.
kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1
Check outbound traffic logs (sleep -> ztunnel on sleep node -> waypoint proxy).
kubectl logs -n istio-system ztunnel-fnjg2 -f
From the log below, you can see the words (to server waypoint proxy), indicating that the request is sent to the waypoint proxy for processing.
Check the traffic log in the inbound direction (ztunnel -> productpage on the productpage).
kubectl logs -n istio-system ztunnel-k4jhb -f
10.1 L7 authorization policy
Next, update the AuthorizationPolicy to allow only users whose Service Account is sleep to access the application with the label app=productpage through GET.
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: productpage-viewer
namespace: default
spec:
selector:
matchLabels:
app: productpage
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/sleep"]
to:
- operation:
methods: ["GET"]
EOF
Execute the following requests on sleep and notsleep respectively, this time the HTTP DELETE request on sleep is also rejected.
# 成功
kubectl exec deploy/sleep -- curl -s http://productpage:9080/ | head -n1
# 失败,RBAC 错误,因为不是 GET 请求
kubectl exec deploy/sleep -- curl -X DELETE -s http://productpage:9080/ | head -n1
# 失败,RBAC 错误,只允许 sa 是 sleep 的用户
kubectl exec deploy/notsleep -- curl -s http://productpage:9080/ | head -n1
10.2 Observability
L7 metrics for all productpage service requests can be viewed on the productpage waypoint proxy.
kubectl exec deploy/bookinfo-productpage-waypoint-proxy -- curl -s http://localhost:15020/stats/prometheus | grep istio_requests_total
10.3 Flow Control
First create a gateway for the reviews service and enable L7 capabilities.
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
name: reviews
annotations:
istio.io/service-account: bookinfo-reviews
spec:
gatewayClassName: istio-mesh
EOF
Then create a VirtualService and a DestinationRule respectively to control the traffic to be sent to the reviews service of v1 and v2 in a 90/10 ratio.
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
weight: 90
- destination:
host: reviews
subset: v2
weight: 10
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
loadBalancer:
simple: RANDOM
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
- name: v3
labels:
version: v3
EOF
Execute the following command to send 10 requests from sleep to productpage, and you can see that about 10% of the traffic goes to reviews-v2.
# 注意访问路径是 http://productpage:9080/productpage,会调用 reviews 服务
kubectl exec -it deploy/sleep -- sh -c 'for i in $(seq 1 10); do curl -s http://productpage:9080/productpage | grep reviews-v.-; done'
# 返回结果
<u>reviews-v1-7598cc9867-dh7hp</u>
<u>reviews-v1-7598cc9867-dh7hp</u>
<u>reviews-v1-7598cc9867-dh7hp</u>
<u>reviews-v1-7598cc9867-dh7hp</u>
<u>reviews-v2-6bdd859457-7lxhc</u>
<u>reviews-v1-7598cc9867-dh7hp</u>
<u>reviews-v1-7598cc9867-dh7hp</u>
<u>reviews-v1-7598cc9867-dh7hp</u>
<u>reviews-v1-7598cc9867-dh7hp</u>
<u>reviews-v2-6bdd859457-7lxhc</u>
10.4 Fault Injection
Create a VirtualService for the productpage service and inject a 5s delay into the request.
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: productpage
spec:
hosts:
- productpage
http:
- route:
- destination:
host: productpage
fault:
delay:
percentage:
value: 100.0
fixedDelay: 5s
EOF
Visit the productpage from sleep, and you can see that the request takes about 5s.
> kubectl exec deploy/sleep -- time curl -s http://productpage:9080 | head -n 1
# 返回结果
<!DOCTYPE html>
real 0m 5.04s
user 0m 0.00s
sys 0m 0.00s
11 Clean up the environment
# 卸载 Istio
istioctl uninstall -y --purge && istioctl delete ns istio-system
# 删除示例应用
kubectl delete -f samples/bookinfo/platform/kube/bookinfo.yaml
kubectl delete -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/sleep.yaml
kubectl delete -f https://raw.githubusercontent.com/linsun/sample-apps/main/sleep/notsleep.yaml
# 删除集群
eksctl delete cluster aws-demo-cluster01
12 Experience Demo
Friends who want to quickly experience ambient mesh can also try the tutorial on the solo.io official website [8] .
13 References
- [1] Ambient Mesh: https://github.com/istio/istio/blob/experimental-ambient/README.md
- [2] eksctl: https://eksctl.io/
- [3] eksctl Getting started: https://eksctl.io/introduction/#installation
- [4] Installing or updating the latest version of the AWS CLI: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
- [5] Configuration basics: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html#cli-configure-quickstart-creds
- [6] Istio download: https://gcsweb.istio.io/gcs/istio-build/dev/0.0.0-ambient.191fe680b52c1754ee72a06b3e0d3f9d116f2e82
- [7] Debug running Pod: https://kubernetes.io/zh-cn/docs/tasks/debug/debug-application/debug-running-pod/
- [8] solo.io official website: https://academy.solo.io/get-started-with-istio-ambient-mesh-with-istio-ambient-mesh-foundation-certification
- [9] Ambient Mesh Security Deep Dive: https://istio.io/latest/blog/2022/ambient-security/
- [10] Get Started with Istio Ambient Mesh: https://istio.io/latest/blog/2022/get-started-ambient/
- [11] Istio Ambient Mesh Launch Demo: https://www.youtube.com/watch?v=nupRBh9Iypo
- [12] Introduction to Istio’s sidecar-less proxy data plane ambient mode: https://lib.jimmysong.io/blog/introducing-ambient-mesh/
- [13] Translation: Istio Ambient Mode Security Architecture Deep Analysis: https://www.zhaohuabing.com/post/2022-09-09-ambient-mesh-security-deep-dive/
- [14] How to evaluate Istio's newly launched Ambient mode? : https://mp.weixin.qq.com/s/98wXMdeutx0wHqcJUIFl8A
- [15] Preliminary exploration of Istio Ambient mode: https://mp.weixin.qq.com/s/YA2My5PSHXIwovWLRKmsgA
- [16] Detailed explanation of the principle of HBONE tunnel in Istio Ambient mode - Part 1: https://mp.weixin.qq.com/s/ksiYJEJKnit65KfIIGoN8Q
- [17] Create a kubeconfig for Amazon EKS: https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html
- [18] Beyond Istio OSS - The current state and future of Istio service mesh: https://jimmysong.io/blog/beyond-istio-oss/#sidecar-management