Ambient Mesh: A new model for Istio's data plane

Abstract: Based on Istio's perfect complement to the Kubernetes ecosystem, with the large-scale popularization of Kubernetes, Istio's new model of data plane—Ambient MeshIstio has also achieved rapid preemption of users' minds and markets.

This article is shared from Huawei Cloud Community " Istio Data Plane New Model - Ambient Mesh ", author: Chuangyuanhui.

If it is said which design pattern is the most classic in the cloud-native world built on the basis of Kubernetes, the Sidecar pattern is undoubtedly the most powerful competitor among them. When it is necessary to provide an application with auxiliary functions that have nothing to do with its own logic, sidecar that injects corresponding functions into the application Pod is obviously the most Kubernetes-native way, and Istio is the representative of this mode.

The vision of the Istio project is to solve the problems of connection, security and observability between services in the microservice scenario in as transparent a way as possible. The main implementation method is to deploy a Proxy next to the application, and in the Kubernetes scenario, inject Sidecar into the application Pod to intercept application traffic to the Sidecar. Sidecar processes application traffic based on user configuration obtained from the Istio control plane, and implements service governance in a way that is almost non-invasive to application code.

Although Istio is not limited to only supporting the Kubernetes platform, the design concept of Istio has a natural affinity with the sidecar model of Kubernetes. Based on the Sidecar model, Istio can realize rapid development, deployment, and verification on the Kubernetes platform. At the same time, at the functional level, Isito strips the service governance function from the application code and sinks it to Sidecar as an infrastructure, abstracting the actual cloud-native application network layer, which greatly reduces the mental burden of application developers. Capability is exactly what the Kubernetes ecosystem has been missing. Based on Istio's perfect complement to the Kubernetes ecosystem, with the large-scale popularization of Kubernetes, Istio's new model of data plane—Ambient MeshIstio has also achieved rapid preemption of users' minds and markets.

Although deploying the Istio data plane in the Sidecar mode seems to be a natural choice that people cannot refuse, it needs to be emphasized that the implementation of Istio's complete functions is not strongly bound to the Sidecar mode, and we have various other options. s Choice. In addition, as the use of Istio continues to deepen and the scale of implementation continues to expand, it can be found that there are many challenges in deploying Istio data in Sidecar mode:

  1. Intrusion: Istio basically achieves zero intrusion into the application code, but because the injection of Sidecar needs to change the Pod Spec and redirect the application traffic, the Pod needs to be restarted when the application is connected to the grid, and the startup of the application container and the Sidecar container Conflicts caused by uncertain order may also cause application interruption;
  2. Life cycle binding:  Sidecar is essentially infrastructure, and its life cycle is often inconsistent with that of the application. Therefore, when upgrading Sidecar, the application Pod also needs to be restarted, which may also cause application interruption. For Job applications, the existence of Sidecar will As a result, the Pod cannot be cleaned up in time;
  3. Low resource utilization: Sidecar is exclusive to a single application Pod, and the application traffic has peaks and valleys. In general, the memory usage of Sidecar is strongly related to the cluster size (number of Services, number of Pods), so resources need to be reserved according to extreme conditions, resulting in The overall resource utilization of the cluster is low. At the same time, since Sidecar needs to be injected into each Pod, as the cluster scale continues to expand, the total amount of resources occupied by Sidecar will also increase linearly.

To address the shortcomings of the Sidecar deployment model, Google and Solo.io jointly launched a new Sidecar-less deployment model --- Ambient Mesh.

Architecture Introduction

Compared with Istio in the original Sidecar mode, the control plane of Ambient Mesh is basically unchanged. The component composition of the data plane and the functions of each component are as follows:

  1. istio-cni: Required component, deployed in the form of DaemonSet. In fact, istio-cni is not a new component of Ambient Mesh. It already existed in the original sidecar mode. At that time, it was mainly used to replace istio-init, the Init Container, to configure traffic interception rules, and to avoid security issues caused by istio-init. Ambient Mesh expands it and deploys it as a mandatory component. It is responsible for configuring traffic forwarding rules, hijacking the application traffic of Pods that have joined Ambient Mesh in this node, and forwarding it to the ztunnel of this node;
  2. ztunnel: Required component, deployed in the form of DaemonSet. ztunnel acts as a proxy for the traffic of Pods on the node where it is located, and is mainly responsible for the processing of L4 traffic, L4 telemetry and mTLS (two-way authentication) management between services. Originally ztunnel was implemented based on Envoy, but considering the intentional constraints on ztunnel functions and the requirements for security and resource occupancy, the community has used rust to build this component from scratch;
  3. waypoint: configured on demand, deployed in the form of Deployment. waypoint is responsible for handling HTTP, fault injection and other L7 functions. Deploy at the granularity of load or Namespace. In Kubernetes, a Service Account or a Namespace corresponds to a waypoint Deployment, which is used to process Layer 7 traffic sent to the corresponding load. At the same time, the number of waypoint instances can be dynamically scaled according to the traffic.

The following uses the actual processing process of the Ambient Mesh data plane to show the specific roles played by the above components:

1. Similar to the Sidecar mode, Ambient Mesh can also add services to the grid at the granularity of grid, Namespace, and Pod; the difference is that the newly added Pod does not need to be restarted, nor does it need to inject Sidecar;

2. istio-cni monitors the addition and deletion of Pods in the node and the entry and exit of the grid, and dynamically adjusts the forwarding rules. The traffic sent by the Pods in the grid will be transparently forwarded to the ztunnel of the node, directly skipping the processing of kube-proxy ;

3. ztunnel also needs to monitor the addition and deletion of Pods on the local node and the entry and exit of the grid, and obtain and manage the certificates of the Pods located on the local node and taken over by the grid from the control plane;

4. The ztunnel at the source end processes the intercepted traffic, finds the certificate corresponding to the Pod according to the source IP of the traffic, and establishes mTLS with the peer end;

5. If the target service to be accessed is not configured with a waypoint or L7-related processing policy, the source ztunnel will directly establish a connection with the destination ztunnel (as marked by the yellow line in the figure above), and the peer ztunnel will terminate mTLS and implement the L4 security policy. Forward traffic to the target Pod;

6. If the target service is configured with a waypoint (using a specially configured Gateway object) and an L7 processing strategy, the source ztunnel will establish mTLS with the corresponding waypoint. After the waypoint terminates mTLS, it will perform L7 logic processing, and then communicate with the target Pod The ztunnel of the node where it is located establishes mTLS, and finally the ztunnel of the destination end also terminates mTLS and sends the traffic to the target Pod.

Value Analysis

Although from the perspective of the underlying implementation, there is a huge difference between Ambient Mesh and the original Sidecar mode, but from the perspective of the user, the usage and implementation effects of the core Istio API (VirtualService, DestinationRules, etc.) are consistent, which can ensure Basically the same user experience. Ambient Mesh is the second data plane mode supported by the Istio community in addition to the Sidecar mode, so the grid technology itself can bring value to users. Ambient Mesh is no different from the previous Sidecar mode. Therefore, only the value of Ambient Mesh relative to the native Sidecar mode is analyzed here, and the value of the grid itself will not be repeated.

Ambient Mesh is mainly adjusted for Istio's data plane architecture to overcome the shortcomings of the existing Sidecar model, so its value must be based on its architectural characteristics. As mentioned earlier, the architectural features of Ambient Mesh mainly include "Sidecar-less" and "L4/L7 processing layering". The value analysis is based on these two points:

1. The advantages of Sidecar-less can actually be seen as the opposite of the defects of the Sidecar model:

2. As for why L4/L7 should be layered, first of all, it is necessary to distinguish the difference between the two. Compared with L4, the processing of L7 is more complicated and requires more resources such as CPU/memory, and there is also a big difference in resource occupation between different types of operations; at the same time, the more complex the operation, the greater the attack surface exposed. In addition, Envoy currently does not support strong isolation of the traffic of different tenants, and the problem of "noisy neighbor" is inevitable. Therefore, the advantages of the Ambient Mesh layered processing architecture are as follows:

Of course, Ambient Mesh, as a new data plane architecture of Istio, still exists as an experimental feature in the community, and there are still many problems to be solved, such as:

future outlook

From the release point of view, since its release in September 2022, Ambient Mesh has been in an independent branch as an experimental feature. Therefore, the next plan for Ambient Mesh is to merge into the main branch (which has been implemented in February 2023) and release it as an Alpha feature, and finally reach Stable at the end of 2023, making it available for production.

From the perspective of API, the ideal is to share the same set of API under the two architectures. Of course, this is unrealistic, because some of the existing Istio APIs are designed on the premise of sidecar mode deployment. The most typical is the sidecar CRD, which is used to customize the configuration delivered to different sidecars, thereby reducing unnecessary resource occupation of sidecars. These Sidecar-Only APIs are obviously meaningless under Ambient Mesh. At the same time, Ambient Mesh itself introduces two unique components, ztunnel and waypoint, so Ambient Mesh also needs to create a new API to manage these unique components and implement some Ambient Mesh Only functions. In the end, Ambient Mesh will implement the existing core Istio APIs (VirtualService, DestinationRules, etc.) and create some unique APIs. It is important to unify the three types of APIs (Sidecar mode unique, Ambient Mesh unique, and both) use and interaction.

So, has Ambient Mesh fully covered the use scenarios of Sidecar mode, so that Sidecar mode has completely withdrawn from the stage of history? The answer is naturally no. Similar to the disputes between various exclusive and shared modes in the industry, the Sidecar mode is essentially the monopoly of the application Pod on the proxy. A dedicated Proxy can often guarantee better resource availability, avoid the impact of other applications as much as possible, and ensure the normal operation of high-priority applications. It is foreseeable that in the final mixed deployment of the two modes, the application selects the proxy mode on demand is a more ideal way. Therefore, building a hybrid deployment mode to ensure good compatibility and a unified experience between the two modes in this mode will also be the focus of follow-up work.

Summarize

Sidecar模式之于Istio就像一场原型验证,以一种最Kubernetes Native的方式快速展示网格技术的价值,抢占用户认知和市场。不过随着Istio的落地逐渐进入深水区,开始在生产环境大规模部署,Sidecar模式就显得力不从心了。此时Ambient Mesh以一种更符合大规模落地要求的形态出现,克服了大多数Sidecar模式的固有缺陷,让用户无需再感知网格相关组件,真正将网格下沉为基础设施。

但是显然Ambient Mesh并不是网格数据面架构演进的终点。当前还没有一种网格数据面方案能在侵入性、性能、资源占用等各个考量维度做到完美。Ambient Mesh基本做到了对应用的零侵入,但是L7的三跳处理引发的性能问题,ztunnel等常驻进程的资源占用令人无法忽视;gRPC等RPC库通过内置实现xDS,直连Istio控制面,将网格杂糅进SDK,确实能实现很好的性能和资源占用表现,只是不可避免地需要付出与应用强耦合、多语言支持复杂度高等固有代价;基于eBPF直接将全套网格数据面功能像TCP/IP协议栈一样下沉到内核貌似是理想的终局方案,只是考虑到内核安全以及与内核交互的复杂性,eBPF的执行环境其实是非常受限的,例如eBPF程序加载到内核前必须经过verififier的校验,执行路径必须完全已知,无法执行任意的循环。因此对于HTTP/2,gRPC等复杂的L7处理,基于eBPF的开发和维护都会比较困难。

Considering the extreme requirements of infrastructure on performance and resource consumption and the evolution of related technologies in the past, for example, for the basic network, most applications can share the kernel protocol stack, and some special applications use DPDK, RDMA and other dedicated technologies to accelerate. Similarly, for the grid data surface, it may be more feasible to combine multiple technologies to optimize the solutions for the corresponding scenarios. It can be foreseen that this type of solution basically takes the node-level agent like Ambient Mesh as the main body. With the development of grid and eBPF technology, as many grid data plane functions as possible will be downgraded to eBPF (Fast Path) for realization; less Some advanced functions are implemented by Proxy (Slow Path) in user mode; for applications that have high requirements on performance and isolation, dedicated Sidecars are deployed for them. In this way, it can meet the requirements of various dimensions such as intrusiveness, performance, and resource occupation in most scenarios.

To sum up, in the end, a set of data plane solutions will dominate the world, or a mixed deployment of various solutions, depending on the directors of each company, it is still necessary to continue to explore and evolve related technologies, and then use practice tests, and finally let time tell us the answer.

references

[1] Istio Ambient Mesh Explained: https://lp.solo.io/istio-ambient-mesh-explained
[2] What to expect for ambient mesh in 2023: https://www.solo.io/blog/ambient-mesh-2023
[3] Introducing Ambient Mesh: https://istio.io/latest/blog/2022/introducing-ambient-mesh
[4] Get Started with Istio Ambient Mesh: https://istio.io/latest/blog/2022/get-started-ambient

 

 

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/9867635