Service Mesh Traffic Capture for Strongly Secure TKE Stack Clusters Using Istio CNI

author

Chen Jijie, enterprise application cloud native architect, is responsible for the design and development of cloud native application governance products in Tencent Enterprise IT. He mainly studies the use of cloud native practice models such as container clusters and service grids to reduce the threshold for microservice development and governance and improve Operational efficiency.

Summary

For cluster administrators who need a quick fix: There are two ways to properly install Istio CNI in TKE Stack: If your TKE Stack cluster uses a Galaxy version that supports cniVersion 0.3.1, install Istio CNI by default; otherwise Please install Istio CNI as a "network card plugin" and specify the cluster default network name when creating a Pod. If you find that you cannot create new Pods after installing Istio CNI in your TKE Stack cluster, please uninstall the installed Istio CNI immediately and manually restore the Galaxy configuration files written on each node: /etc/cni/net. The first element of the plugins array field in the d/00-galaxy.conflist file is extracted and saved as a separate conf file: /etc/cni/net.d/00-galaxy.conf. Delete the Pod that is being created but cannot succeed, wait for it to be rebuilt, and the Pod creation function should automatically resume.

Istio is a popular service mesh software. It completes traffic observation and governance by injecting Envoy, a proxy software that can capture ingress and egress traffic, into business pods as a sidecar.

In order for the Envoy proxy to capture traffic to and from service containers, Istio needs to issue the following IPTABLES rules to the network where the Pod is located:

*nat
-N ISTIO_INBOUND
-N ISTIO_REDIRECT
-N ISTIO_IN_REDIRECT
-N ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp --dport 15008 -j RETURN
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A ISTIO_INBOUND -p tcp --dport 22 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_OUTPUT -o lo -s 127.0.0.6/32 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
COMMIT

In the normal installation mode, the operation of issuing IPTABLES rules is completed through the initialization container istio-init injected together with the Envoy proxy container. Distributing the IPTABLES rules to the Pod network requires that the Pod can use two high-privilege functions (Capabilities), NET_ADMIN and NET_RAW . Linux divides the privileges traditionally associated with the superuser root into distinct units called Capabilites. Capabilites Each unit can be enabled and disabled independently. In this way, when the system is doing permission checking, it checks specific Capabilites and determines whether the current user's process can perform corresponding privileged operations. For example, if you want to set the system time, you must have the Capabilites CAP_SYS_TIME.

Security Challenges with Istio Traffic Capture

The container is essentially a process running on the host. Although the container runtime only provides necessary Capabilities to the container by default, if the container is run in --privilededmode , or if more Capabilities are added to the container, the container can have the same high capabilities as other processes. The ability to operate permissions. In this way, a Pod that can use the NET_ADMIN and NET_RAW permissions can theoretically not only operate the network of its own Pod, but if not handled properly, it may also affect the network configuration of other Pods on the same worker node and even the host machine. In general, this is not recommended in environments where the permissions of container applications are strictly restricted.

Starting with Kubernetes version 1.11, we can use the PodSecurityPolicy resource (PSP) in the cluster to restrict the default permissions or capabilities that Pods in the cluster can use. By writing the following PSP, Pods in the cluster can be restricted from using any privileged Capabilities:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: pod-limited
spec:
  allowPrivilegeEscalation: false
  # 不允许使用 Capabilities
  # allowedCapabilities:
  #   - '*'
  fsGroup:
    rule: RunAsAny
  runAsUser:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - configMap
  - downwardAPI
  - emptyDir
  - persistentVolumeClaim
  - secret
  - nfs

As you can imagine, on a cluster with the above restrictions added, the istio-init container will not be able to do its scheduled work because it cannot get the corresponding privileges: the Envoy proxy software cannot capture the traffic in the Pod. In this way, the function of the entire Istio software is out of the question. Also, starting with Kubernetes 1.21, the PSP feature will be gradually deprecated. Alternative mechanisms may be used to restrict Pod permissions on newer clusters.

Solving Privilege Diffusion Problems with Istio CNI

The security risk mentioned above comes from the fact that all business pods that need to be injected into the sidecar need to be injected into the high-privileged istio-init container synchronously, and business pods can be created and used by anyone using the cluster. For the cluster, this constitutes an infinite spread of the attack surface. The solution to such problems is usually to centralize the management of the attack surface. That is to say, a small number of controllable high-privilege Pods listen to the process of Pod creation, and complete the IPTABLES delivery process for them before the Pods are started.

This is exactly the problem that Istio CNI solves.

Usually, if you want to listen to the events of Pod creation and deletion, you can easily obtain the creation and recycling events of various resources in the cluster by using the Imformer mechanism. This is also a common practice in developing various types of Controllers in Kubernetes. However, the task of issuing IPTABLES rules for Pods is different from that of ordinary Controllers: when a Pod event is created or deleted, it needs to enter the Linux Namespace of the corresponding container on the worker node where the Pod is located, and issue IPTABLES rules. This is more of a Daemonset in terms of execution location. On the other hand, the processing of Pod creation and deletion events coincides with some commands defined by CNI, which is a mechanism defined by Kubernetes to provide networks for Pods and Services. It happens that issuing IPTABLES is also a network-related operation, so the Istio team simply provides this function directly as a CNI plug-in.

The workflow of Istio CNI is as follows:

The Istio CNI DaemonSet is responsible for installing the executable program of the Istio CNI plugin onto the worker nodes. These programs later receive calls from k8s when new business Pods are created or destroyed, and then they complete the configuration of the IPTABLES rules. Since these programs are running on worker nodes, they have high permissions: but they can be centrally managed, so permissions are controlled, and business Pods no longer need high permissions to configure these IPTABLES at this time, only need to run a Simple checker to ensure these rules are in place before the business container runs.

The above picture is a code snippet of the Istio self-injection template. It can be seen that when Istio CNI is enabled, if the Istio CNI function is enabled, the container injected by Istio into the Pod no longer requires high privileges.

Problems installing Istio CNI in TKE Stack

Unlike normal CNI plugins, Istio CNI does not provide general IP address management (IPAM) and networking (SDN) functionality. It is only responsible for issuing the above rules after the Pod's network has been established. Therefore, Istio CNI does not and cannot replace the functionality of existing CNI plugins for the cluster. That is to say, in addition to configuring Istio CNI, the k8s cluster also needs to be configured with other software responsible for IPAM and SDN, such as Flannel or Calico that we are familiar with.

In order to work with different kinds of existing CNI plugins, Istio CNI can run either as "Interface Plugins" or as "Chained Plugins" attached to existing network card plugins. According to the description of the CNI standard, a NIC plug-in is a CNI plug-in used to create a virtual NIC, while a plug-in chain allows multiple plug-ins to provide additional functionality to an already created NIC. The plug-in chain mode is very consistent with the positioning of Istio CNI, and it is also the default operation mode of Istio CNI. In this mode of operation, Istio CNI will first check the configuration of the current CNI plug-in in the cluster: if it is already a plug-in chain, it will add itself to its tail to become a new function; if the current plug-in is a "NIC plug-in" , convert it to a plugin chain before adding itself to the end of the chain.

TKE Stack is an open source k8s distribution led by Tencent. Compared with the community version of k8s, TKE Stack mainly provides stronger network access capabilities, multi-cluster management capabilities, and integrated management of container resources with business and user factors, etc. Rich features. TKE Stack is also an open-source version of the container service provided by Tencent Cloud. It has deployed an ultra-large-scale cluster with more than hundreds of thousands of cores within Tencent and has been running stably for several years.

The default CNI plug-in of TKE Stack is Galaxy, which is a "meta-CNI" framework that allows the cluster to access various network plug-ins: based on it, we can let the Pods in the cluster obtain the common overlay network based on plug-ins such as Flannel. , you can also gain powerful capabilities such as Underlay networking based on other plugins. For example, a typical Underlay network can provide the ability to allow a Pod to obtain the IP address of another subnet (such as the subnet where the worker node is located), obtain a fixed IP address, and so on.

After testing, it was found that on some clusters, the default plugin chain running mode of the Istio CNI plugin is not compatible with Galaxy. The reason is that there is a flaw in the configuration conversion process of Istio CNI : the configuration of the original Galaxy CNI on these clusters is in the network card plug-in (ie 00-galaxy.conf) mode. After processing by Istio CNI, the relevant configuration cannot be recognized by Galaxy CNI. and processing.

The specific reason is that Istio CNI will delete the cniVersion version number (if any) in the original configuration during the process of copying the original configuration into the plug-in chain mode, and in the newly generated plug-in chain configuration file 00-galaxy. This version number is forced to 0.3.1 which is not yet supported by Galaxy CNI. Enter the daemonset container related to Galaxy CNI and simulate the execution of the CNI version check command. You can find that the maximum cniVersion supported by Galaxy CNI on this cluster is 0.2.0. The relevant source code can be clicked here .

CNI_COMMAND=VERSION /opt/cni/bin/galaxy-sdn </etc/cni/net.d/00-galaxy.conf

After running Istio CNI in plugin chain mode in such a TKE Stack cluster, there will be an issue where new Pods cannot be created. The specific error is: plugin galaxy-sdn does not support config version "0.3.1". This error message can be found both from the Pod creation log and from the kubelet.

To make matters worse, even uninstalling Istio CNI at this point does not restore Galaxy CNI functionality. This is because although the Istio CNI uninstall process will try to roll back the changes it made, the rollback process just removes the Istio CNI related content from the newly created conflist format configuration and does not restore the CNI configuration file to the original conf format , the unrecognized version number 0.3.1 was preserved.

At this point, the administrator needs to log in to each cluster worker node, manually extract the first element of the plugins array field in the /etc/cni/net.d/00-galaxy.conflist file, and save it as a separate conf file : /etc/cni/net.d/00-galaxy.conf. Delete the Pod that is being created but cannot succeed, wait for it to be rebuilt, and the Pod creation function should automatically resume.

Solutions to Istio CNI Installation Problems

Once the cause of the problem is clear, it is straightforward to solve the problem. Two ideas for installing Istio CNI in a TKE Stack cluster are:

  1. Running Istio CNI with NIC Plugin
  2. Upgrade Galaxy CNI version in TKE Stack cluster

Running Istio CNI with NIC Plugin

Since Istio CNI provides how the NIC plugin works, enabling it is an easier way to deal with it. When installing Istio CNI, turn off the chained parameter to run Istio CNI as a network card plug-in. If you are installing Istio CNI for an existing Istio cluster, you may need to manually modify the values ​​data in the Istio injection template configuration configmap/istio-sidecar-injector resource located in the istio-system namespace. The istio_cni configuration section in it:

"istio_cni": {
  "enabled": true,
  "chained": false
}

It should be noted that the Istio CNI running in network card mode will add the k8s.v1.cni.cncf.io/networks annotation (Annotation) when the Pod is created to notify the CNI plugin on the cluster that can support this annotation to call Istio CNI completion function. Galaxy CNI, as a meta-CNI plugin, can support this annotation. When Galaxy encounters this annotation, the default Galaxy network plugin will be skipped and the CNI plugin configured in the annotation will be enabled. Istio CNI doesn't actually provide networking, so running only Istio CNI will result in the Pod not getting the correct IP and thus still not being created correctly. The following code snippet from the Istio injection template shows the logic:

  {{- if and (.Values.istio_cni.enabled) (not .Values.istio_cni.chained) }}
  {{ if isset .ObjectMeta.Annotations `k8s.v1.cni.cncf.io/networks` }}
    k8s.v1.cni.cncf.io/networks: "{{ index .ObjectMeta.Annotations `k8s.v1.cni.cncf.io/networks`}}, istio-cni"
  {{- else }}
    k8s.v1.cni.cncf.io/networks: "istio-cni"
  {{- end }}
  {{- end }}

As you can see from the code above, the Istio template tries to read the annotation value already on the Pod and appends istio-cni to the end. In this way, we only need to list the Galaxy default network configuration name in advance in the form of annotations when creating the Pod, and then the Pod can be created correctly. The default network name for the current Galaxy CNI can be found from DefaultNetworks configured in configmap/galaxy-etc in the kube-system namespace.

kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: galaxy-flannel
  name: my-pod

The actual test results show that running the Istio CNI in the network card plug-in mode and marking the original network mode on the Pod can successfully run the Pod on the TKE Stack and use the various functions of Istio normally.

Upgrade Galaxy CNI in TKE Stack cluster

Although running Istio CNI in independent network card plug-in mode can solve the problem that Pods cannot be created, but additional annotations need to be added to Pods, which adds extra complexity to application developers or deployment pipelines, and may even affect Pods Use other network functions provided by Galaxy CNI. The ideal effect is to transparently support related functions as provided by Istio CNI natively.

Fortunately, in the latest 1.0.8 version of the Galaxy CNI code, each cniVersion 0.4.0 and earlier has been supported. Therefore, upgrade the version of Galaxy CNI to the latest version, and you can run Istio CNI in the default plugin chain mode. If the components in your cluster have been customized by your own team, you need to contact the development team of these customized components to verify the upstream version they are using, and remind them to upgrade the version of the Galaxy component.

After upgrading to the latest version of Galaxy components, and then running the corresponding verification script, you can find that the new version of Galaxy has supported multiple cniVersion including 0.3.1.

Summarize

As a popular service mesh software, Istio can provide microservices with powerful traffic governance capabilities and rich observation capabilities that are close to non-intrusive. These capabilities of Istio come from its ability to completely capture network traffic to and from business containers.

Although Istio itself provides a variety of features to install in a specified namespace, the ability to use Istio as a cluster-level base platform is the first choice for many teams. In some complex cluster environments such as public multi-tenant clusters and special security policy requirements, installing and operating Istio will face some unique challenges.

This article briefly introduces the operation principle of the Istio CNI plugin required to use the IPTABLES network policy that Istio traffic management functions rely on in a cluster with strict security restrictions, as well as the problems and solutions encountered when running Istio CNI in a TKE Stack cluster method. Using these methods, it is possible to provide the full functionality of Istio in a way that is compatible with the existing network functionality of the cluster while running business applications with lower privileges.

References

about us

For more cases and knowledge about cloud native, you can pay attention to the public account of the same name [Tencent Cloud Native]~

Welfare:

① Reply to the [Manual] in the background of the official account, you can get the "Tencent Cloud Native Roadmap Manual" & "Tencent Cloud Native Best Practices"~

②The official account will reply to [series] in the background, and you can get "15 series of 100+ super practical cloud native original dry goods collection", including Kubernetes cost reduction and efficiency enhancement, K8s performance optimization practices, best practices and other series.

③If you reply to the [White Paper] in the background of the official account, you can get the "Tencent Cloud Container Security White Paper" & "The Source of Cost Reduction - Cloud Native Cost Management White Paper v1.0"

④ Reply to [Introduction to the Speed ​​of Light] in the background of the official account, you can get a 50,000-word essence tutorial of Tencent Cloud experts, Prometheus and Grafana of the speed of light.

⑤ If you reply to the [Selected Collection] in the background of the official account, you can get a wonderful speech from 24 Tencent Cloud experts from Tencent - the 40,000-word "Tencent Cloud Technology Practice Collection 2021".

[Tencent Cloud Native] New products of Yunshuo, new techniques of Yunyan, new activities of Yunyou, and information of cloud appreciation, scan the code to follow the public account of the same name, and get more dry goods in time! !

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4534936/blog/5479834