Long-awaited: K8S finally welcomes swap memory Beta support!

Follow[Cloud Native Treasure Box]Official account to get more cloud native news

picture

Kubernetes version 1.22 started supporting the Alpha feature of using swap memory on Linux nodes, and in version 1.28 it was upgraded to a beta version with many improvements. Previous versions of Kubernetes did not support swap memory on Linux systems, but with the Alpha release and subsequent improvements, the Kubernetes project team put a lot of effort into supporting the beta version of swap memory, making it more stable, robust, and user-friendly.​ 

Usage of this feature involves activating the NodeSwap feature gate on the kubelet and configuring the memorySwap.swapBehavior option to define how nodes use swap memory.

Swap memory is not supported before 1.22

In previous releases, Kubernetes did not support the use of swap memory on Linux becauseit was difficult to provide guarantees and interpret pod memory utilization when swapping was involved. As part of the early design of Kubernetes, swap support was considered out of scope and the kubelet would fail to start by default if swap was detected on a node.

Here are some considerations regarding disabling memory swapping in Kubernetes versions prior to 1.22:

  1. 1. Performance: Memory swapping may have a negative impact on the performance of the container. When a process in a container attempts to access memory that has been swapped out, the system needs to swap it back from disk to memory, which causes performance degradation.

  2. 2. Uncertainty:  Memory swapping may introduce uncertainty. Containers should run within their configured memory limits. If memory swapping occurs, the actual available memory of the container may be affected by uncontrollable factors, causing the application to run unstable.

  3. 3. Resource guarantee: Kubernetes relies on cgroups and Linux kernel functions to ensure resource isolation and restrictions when scheduling and managing containers. Memory swapping may break this isolation, making it more difficult for Kubernetes to manage container resources.

  4. 4. Security considerations:  Memory swapping may leak sensitive information to the swap partition, which may pose a threat to security.

However, there are many use cases that require the use of swap memory[1] that would benefit from swap-enabled Kubernetes nodes, including improved node stability, better support for memory-intensive applications with small working sets, and the use of memory-constrained devices. and memory flexibility.

1.22 starts to support swap memory

Kubernetes version 1.22 introduces alpha support for swap memory[2] to configure swap memory usage on a node-by-node basis for Kubernetes workloads running on Linux nodes. Now in version 1.28, support for swap memory on Linux nodes has been upgraded to Betawith many new improvements.

Prior to version 1.22, Kubernetes did not provide support for swap memory on Linux systems. This is due to the inherent difficulty in guaranteeing and calculating Pod memory utilization when swapping memory is involved. As a result, swap memory support was considered beyond the initial design scope of Kubernetes, and the default behavior of the kubelet is to fail to start if swap memory is detected on a node.

In version 1.22, Linux's swap feature was first introduced in Alpha phase. This represents a major advance, giving Linux users the opportunity to try out the swap memory feature for the first time. However, as an Alpha version, it has not yet been developed and has some issues, including insufficient cgroup v2 support, insufficient metrics and API statistical summaries, insufficient testing, and more.

Kubernetes has many use cases for swapped memory [3] and is suitable for large numbers of users. Therefore, the Node Special Interest Group within the Kubernetes project put a lot of effort into supporting the beta version of the swapped memory feature on Linux nodes. Compared with the Alpha version, after enabling swap memory, the kubelet runs more stable and robust, is more user-friendly, and resolves many known defects.  This upgrade to beta represents a critical step toward the goal of fully supporting swapped memory in Kubernetes.

How to use this feature?

Activate  feature gating on kubelet through NodeSwap, which can be used on nodes that have configured swap memory Use this feature. Additionally, you must disable the failSwapOn setting or disable the deprecated --fail-swap-on command line flag.

You can configure the memorySwap.swapBehavior option to define how the node uses swap memory. For example:

# 将此段内容放入 kubelet 配置文件
memorySwap:
  swapBehavior: UnlimitedSwap

swapBehavior The available configuration options for are:

  • • UnlimitedSwap (default) : Kubernetes workloads can use as much swap memory as requested, up to the system limit.

  • • **LimitedSwap**: Kubernetes workloads have limited use of swap memory. Only BurstableQoS Pods [4] are allowed to use swap memory.

If no configuration is specified for memorySwap and feature gating is enabled, by default the kubelet will apply the same behavior as the UnlimitedSwap setting.

Please note that only cgroup v2 supports NodeSwap. For Kubernetes v1.28, using swap memory with cgroup v1 is no longer supported.

Use kubeadm to install a cluster that supports swap memory

before the start

This demonstration requires the kubeadm tool to be installed. The installation process follows the steps described in the kubeadm installation guide [5]. If swap memory is enabled on the node, you can proceed with cluster creation. If swap memory is not enabled, see the instructions provided for enabling swap memory.

Create a swap memory file and enable the swap memory function

I'll demonstrate creating 4GiB of unencrypted swap memory.

dd if=/dev/zero of=/swapfile bs=128M count=32
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
swapon -s # 仅在该节点被重新启动后启用该交换内存文件

To start the swap memory file at boot time, add something like /swapfile swap swap defaults 0 0 to the /etc/fstab file.

Set the node with swap memory enabled in the Kubernetes cluster

For clarity, here is an example kubeadm configuration file for a cluster with the swap memory feature enabled kubeadm-config.yaml.

---
apiVersion: "kubeadm.k8s.io/v1beta3"
kind: InitConfiguration
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
failSwapOn: false
featureGates:
  NodeSwap: true
memorySwap:
  swapBehavior: LimitedSwap

Next use kubeadm init --config kubeadm-config.yaml to create a single-node cluster. During the initialization process, if the kubelet failSwapOn is set to true, a warning will appear informing that the swap memory feature is enabled on the node. We plan to remove this warning in a future release.

How to determine swap memory limit via LimitedSwap?

Configuration of swap memory, including its limitations, is a challenge. Not only is it prone to configuration errors, but as a system-wide property, any misconfiguration can compromise the entire node rather than just a specific workload. To mitigate this risk and ensure node health, we implemented automatic configuration of the flaw in the beta version of swap memory.

Using LimitedSwap, Pods that do not belong to the Burstable QoS category (i.e. BestEffort/Guaranteed QoS Pods) are prohibited from use Swap memory.  BestEffort QoS Pods exhibit unpredictable memory consumption patterns and lack information about their memory usage, making safe allocation of swap memory difficult to accomplish. In contrast, Guaranteed QoS Pods are typically used for applications that allocate resources precisely based on the workload's settings, where memory resources are immediately available. To maintain the above security and node health guarantees, these Pods will not be allowed to use swap memory when LimitedSwap is in effect.

Before calculating swap memory limits in detail, it is necessary to define the following terms:

  • • nodeTotalMemory: The total amount of physical memory available on the node.

  • • totalPodsSwapAvailable: The total amount of swap memory available to Pods on the node (some swap memory can be reserved for system use).

  • • containerMemoryRequest: Container’s memory request.

The swap memory limit is configured as:(containerMemoryRequest / nodeTotalMemory) × totalPodsSwapAvailable

In other words, the amount of swap memory a container can use is proportional to its memory requests, the node's total physical memory, and the total amount of swap memory available to Pods on the node.

It's worth noting that for containers in Burstable QoS Pods, you can choose not to use swap memory by setting the memory limit to be the same as the memory request. Containers configured in this way will not have access to swap memory.

How does this feature work?

We can imagine many possible ways in which swap memory can be used on a node. When swap memory is provided and available on the node, the SIG node recommendation[6] kubelet should be able to follow the following configuration:

  • • Able to start when the swap memory feature is enabled.

  • • By default, kubelet will instruct the Container Runtime Interface (CRI) not to allocate swap memory for Kubernetes workloads.

The swap memory configuration on the node is exposed to the cluster administrator through memorySwap[7] in KubeletConfiguration. As a cluster administrator, you can specify the behavior of a node when swap memory is present by setting memorySwap.swapBehavior .

kubelet uses the CRI (Container Runtime Interface)[8] API to instruct CRI to configure specific cgroup v2 parameters (such as memory.swap.max) in a way that supports the swap memory expected by the container. configuration. Next, CRI is responsible for writing these settings to the container-level cgroup.

How to monitor swap memory?

picture

A significant flaw in the Alpha version is the inability to monitor or view swap memory usage. This issue has been addressed in the beta version introduced in Kubernetes 1.28, which now provides the ability to monitor swap memory usage through a number of different methods.

The beta version of kubelet now supports collecting node-level metric statistics[9], which can be accessed through the /metrics/resource and /stats/summary kubelet HTTP endpoints. This information enables clients to directly access the kubelet to monitor swap memory usage and remaining swap memory when using LimitedSwap. Additionally, a machine_swap_bytes metric has been added to cadvisor to display the total physical swap memory capacity on the machine.

Precautions

Having swap memory available on the system reduces predictability. Because the performance of swap memory is worse than that of conventional memory, sometimes by multiple orders of magnitude, unexpected performance degradation may result. Additionally, swapping memory changes the behavior of the system under memory pressure. Because enabling swap memory allows workloads in Kubernetes to use larger and unpredictable amounts of memory, it also increases the risk of noisy neighbors and unintended binning configurations because the scheduler cannot account for swap memory usage. Condition.

picture

The performance of a node with swap memory enabled depends on the underlying physical storage. When using swap memory, in I/O operations per second (IOPS)-constrained environments (such as cloud virtual machines with I/O constraints), compared to faster storage media such as SSDs or NVMe, Performance will be significantly worse.

Therefore, we do not advocate the use of swap memory for performance-constrained workloads or environments. Additionally, it is recommended to use LimitedSwap as this significantly mitigates the risk to the node.

Cluster administrators and developers should benchmark their nodes and applications before using swap memory in production scenarios, we need your help[10]!

Security Risk

Enabling swap memory on a system without encryption poses a security risk because critical information (such as volumes representing Kubernetes Secrets) may be swapped to disk [11]. If unauthorized individuals gain access to the disk, they could potentially obtain this confidential data. To mitigate this risk, the Kubernetes project strongly recommends that you encrypt the swap memory space. However, handling encrypted swap memory is not the responsibility of the kubelet; rather, it is a general operating system configuration issue that should be addressed at that level. It is the administrator's responsibility to provide encrypted swap memory to mitigate this risk. 

Additionally, as mentioned previously, when LimitedSwap mode is enabled, the user can choose to completely disable the container from using swap memory by setting the memory limit to be the same as the memory request. This setting prevents the corresponding container from accessing swap memory.

Looking to the future

Kubernetes version 1.28 introduced beta support for swap memory on Linux nodes, and we will continue to work towards the official release of swap memory [12]. I hope this will include:

  • • Added the ability to set the amount of system reserved swap memory based on what the kubelet detects on the host.

  • • Added support for controlling swap memory usage at the Pod level via cgroups.

    • • This is still under discussion.

  • • Collect feedback on test cases.

    • • We will consider introducing new swap memory configuration modes, such as setting swap memory limits for workloads at the node level.

How to study further?

You can check out the current swap memory documentation [13] to learn how to use swap memory in Kubernetes.

For more information, as well as assistance with testing and providing feedback, see Swap Memory KEP-2400[14] and its Swap Memory Design Proposal[15].

reference

https://kubernetes.io/blog/2021/08/09/run-nodes-with-swap-alpha/

https://kubernetes.io/blog/2023/08/24/swap-linux-beta/

Reference link

[1] Use cases that require swap memory: https://github.com/kubernetes/enhancements/blob/9d127347773ad19894ca488ee04f1cd3af5774fc/keps/sig-node/2400-node-swap/README .md#user-stories
[2] Kubernetes version 1.22 introduces Alpha support for swap memoryhttps:// kubernetes.io/blog/2021/08/09/run-nodes-with-swap-alpha/
[3] Swap memory use case: https://github. com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md#user-stories
[4] BurstableQoS Pod:  https://kubernetes.io/zh-cn/docs/concepts/workloads/pods/pod-qos/#burstable
[5] kubeadm installation guide: https: //kubernetes.io/zh-cn/docs/setup/product-environment/tools/kubeadm/create-cluster-kubeadm
[6] SIG node recommendation: https ://github.com/kubernetes/enhancements/blob/9d127347773ad19894ca488ee04f1cd3af5774fc/keps/sig-node/2400-node-swap/README.md#proposal
[7] in KubeletConfiguration memorySwaphttps://kubernetes.io/zh-cn/docs/reference/config-api/kubelet-config.v1
[8] CRI (container Runtime interface): https://kubernetes.io/zh-cn/docs/concepts/architecture/cri
[9] Node-level indicator statistics:  a>https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md Swap memory design proposal: https://github.com/kubernetes/enhancements/issues/4128 Swap memory KEP-2400: https://kubernetes.io/zh-cn/docs/concepts/architecture/nodes/# swap-memory Swapping memory documentation: https://kubernetes.io/zh-cn/docs/reference/command-line-tools-reference/feature -gates/#feature-stages Swap memory is officially released: https://kubernetes.io/zh-cn/docs/concepts/configuration/secret/#information-security-for -secrets Key information (for example, a volume representing a Kubernetes Secret) may be swapped to disk: https://kubernetes.io/zh-cn/blog/2023/08/24/swap-linux-beta/#how-do-i-get-involved We need your help: https://kubernetes.io/zh-cn/docs/reference/instrumentation/node-metrics/
[10]
[11]
[12]
[13]
[14]
[15]


Recommended reading

Guess you like

Origin blog.csdn.net/fly910905/article/details/134922787