Talking about Cgroups V2

 Wang Xigang  360 Cloud Computing

Heroine declaration

Cgroups v1 was introduced in a previous article " Talking about Cgroups ". Cgroups provides a foundation for virtualizing containers. With the continuous development of container technology, the management of controllers in Cgroups v1 has become more complicated. The emergence of Cgroups v2 simplifies Hierarchy and becomes an official feature in kernel 4.5.0. This article will introduce Cgroups v2 from its background and specific changes compared to v1.

PS: rich first-line technology, a wide range of forms, all in " 3 60 cloud computing " point of concern Oh!

1

background

I wrote an article before to introduce cgroup v1, but since the current k8s uses cephfs for data storage, when multi-tenant use, IO needs to be restricted. At present, cgroup v1 has no cooperation between memcg and blkio, so the throttle of buffer io has not been realized. And the implementation of cgroup v1 in the kernel has been chaotic. The main reason is that, in order to provide flexibility, cgroup allows processes to belong to different groups in multiple hierarchy. But in fact, multiple hierarchy is not very useful, because the controller (controller) can only belong to one hierarchy. So in actual use, there is usually one controller per hierarchy.

This kind of multi-hierarchy is not very useful except to increase the complexity of the code and the difficulty of understanding. On the one hand, tracking the process of all controllers becomes complicated; on the other hand, it is also difficult for each controller to work together (because the controllers may belong to different hierarchy, so starting from 3.16, the kernel has begun to shift to a single hierarchy (unified hierarchy). And the realization of the right The limit of buffer io.

2

Changes in Cgroups v2

Due to the various problems of Cgroups v1, Cgroups v2 changed the multi-hierarchy approach into a unified hierarchy, and mounted all controllers to a unified hierarchy.

The current kernel does not remove the Cgroups v1 version, allowing both Cgroups v1 and v2 to coexist. But the same controller cannot be mounted to these two different Cgroup versions at the same time.

The following are five improvements in Cgroups v2:

  • All controllers in Cgroups v2 will be mounted under a unified hierarchy. There is no such thing as allowing different controllers to be mounted to different hierarchy in v1.

  • Proess can only be bound to the root ("/") directory of the cgroup and the leaf nodes in the cgroup directory tree

  • Specify which controllers can be used through cgroup.controllers and cgroup.subtree_control

  • The task file in the v1 version and the cgroup.clone_children file in the cpuset controller have been removed

  • When the cgroup is empty, the notification mechanism is improved, through the cgroup.events file notification

3

unified hierarchy

Although Cgroups v1 allows different controllers to be mounted to different hierarchies, it is very flexible, but in fact this method is not necessary for users. Therefore, in the Cgroups v2 version, all controllers are mounted to a hierarchy.

You can use the following command to mount Cgroups v2 to the file system, and all available controllers will be mounted automatically.

mount -t cgroup2 none $MOUNT_POINT

A contoller cannot be used in Cgroups v1 and v2 at the same time. If you want to use a controller that has been used by Cgroups v1 in Cgroups v2, you need to umount it from Cgroups v1 first.

It should be noted that when the system starts, systemd uses Cgroups v1 by default, and mounts the available controller to /sys/fs/cgroup. If you want to turn off Cgroups v1 when the system starts, you can modify the kernel parameters in the /etc/default/grub file and add GRUB_CMDLINE_LINUX_DEFAULT="cgroup_no_v1=all". (all means to close all controllers. If you want to close the specified controller, replace all with the name of the controller you need, separated by commas). In this way, you can use the controller you want in Cgroups v2.

4

controllers

Currently cgroup v2 supports the following controllers:

  • io (since Linux 4.5)

  • memory (since Linux 4.5)

  • pids (since Linux 4.5)

  • perf_event (since Linux 4.11)

  • rdma (since Linux 4.11)

  • cpu (since Linux 4.15)

5

subtree control

Each Cgroup under the hierarchy will contain the following two files:

  • cgroup.controllers

    This is a read-only file. Contains all available controllers under the Cgroup.

  • cgroup.subtree_control

    This file contains the controllers that have been turned on under the Cgroup. And the controllers contained in cgroup.subtree_control are a subset of the controllers in the cgroup.controllers file.

The content format of the cgroup.subtree_control file is as follows. Spaces are used between controllers, with "+" in front of it for enabling, and "-" for disabling. For example, the following example:

echo '+pids -memory' > x/y/cgroup.subtree_control

The specific organizational structure of Cgroups v2 is shown in the following figure:

image.png

6

"no internal processes" rule

Unlike Cgroups v1, Cgroups v2 can only bind processes to leaf nodes. Therefore, the process cannot be bound to any subgroup that has the controller enabled.


image.png

7

cgroup.events file

In the implementation of Cgroups v2, the mechanism of getting notifications when getting group empty is also optimized.

Cgroups v1 used release_agent and notify_on_release to be removed in v2. Instead, the cgroup.events file is used. This is a read-only file with one key-value pair per line, separated by spaces between key and value.

Currently, only one key in this file is populated, and the corresponding value is 0. 0 means that there is no process in the cgroup, and 1 means that the cgroup contains process.

8

cgroup.stat file

Each group under the Cgroups v2 hierarchy will contain a read-only file cgroup.stat. Its content is also in the form of key-value. Currently this file contains the following two keys:

  • nr_descendants

    Indicates the number of subgroups surviving in the cgroup

  • nr_dying_descendants

    Indicates the number of cgroups that have died in the cgroup

9

Limit on the number of descendant Cgroups

The Cgroups v2 hierarchy also contains two restricted files for viewing and setting the number of descendant Cgroups under the Cgroups:

  • cgroup.max.depth (since Linux 4.14)

    This file defines the maximum depth of sub-cgroups. 0 means that the cgroup cannot be created. If you try to create a cgroup, an EAGAIN error will be reported; max means there is no limit, and the default value is max.

  • cgroup.max.descendants (since Linux 4.14)

    The maximum number of active cgroup directories that can be created currently. The default value "max" means no limit. If the limit is exceeded, return to EAGAIN.

related articles

  • http://man7.org/linux/man-pages/man7/cgroups.7.html

  • https://facebookmicrosites.github.io/cgroup2/docs/create-cgroups.html

  • https://events.static.linuxfound.org/sites/events/files/slides/cgroup_and_namespaces.pdf

  • https://lwn.net/Articles/679786/

  • https://www.lijiaocn.com/%E6%8A%80%E5%B7%A7/2019/01/28/linux-tool-cgroup-detail.html

Guess you like

Origin blog.51cto.com/15127564/2666789