Linux process isolation Namespace learning

1. Introduction to linux namespace

1.1. Concept

Linux Namespace is a mechanism provided by the Linux kernel. It is used to isolate the resource views of different processes, so that each process has an independent resource space, thereby achieving isolation and resource management between processes .

The design goal of Linux Namespace is to solve the problem of resource conflicts between multiple processes and provide a lightweight virtualization technology . By using Namespace, multiple independent virtual environments can be created on a physical host, each with its own views of processes, file systems, networks, and users. It provides an abstract mechanism to isolate the originally globally shared resources into different collections. Members in the collection exclusively enjoy their originally globally shared resources. As shown below:
Insert image description here

Process A and process B belong to two different Namespaces respectively, so process A will be able to use all Namespace resources provided by the Linux kernel: such as independent host names, independent file systems, independent process numbers, and so on. Similarly, process B can also use similar resources, but its resources and the resources used by process A are isolated from each other and cannot be aware of each other.

From a user's perspective, each namespace should be like a separate Linux computer, with its own init process (PID 1), and the PIDs of other processes increasing in sequence. Both A and B spaces have init with PID 1. Process, the process of the child namespace is mapped to the process of the parent namespace. The parent namespace can know the running status of each child namespace, and the child namespace is isolated from the child namespace.

1.2. Concepts related to virtualization

1.2.1. Common process-level virtualization technologies:

  • chroot(Change Root): chrootIt is a technology that changes the root directory of the process to a specified directory. It restricts the file system access scope of the process so that the process can only run in the specified directory tree. This allows for a certain degree of process isolation.

  • Linux容器(Linux Containers,LXC): It is an operating system-level virtualization technology that uses features such as namespaces and control groups LXCin the Linux kernel to achieve process isolation. Process-level isolation and resource control can be achieved by creating and managing multiple container instances, each with its own file system, network, and process space.(namespaces)(cgroups)

  • Docker: Docker is a layer of packaging based on LXC, providing higher-level container management and deployment tools. It makes the packaging, distribution and deployment of applications more convenient by using the concepts of images and containers. Docker adds some additional functions and tools based on LXC to make the use of containers easier and more efficient.

  • systemd-nspawn: systemd-nspawnis Systemda tool in the project that provides a simple container environment based on Linux namespace and chroot. It can start a process and isolate it in an independent file system environment, realizing process isolation and resource control.

1.2.2. Common linux virtualization technology

Full Virtualization: Full virtualization technology simulates a complete virtual hardware environment
by running a virtual machine monitor on physical hardware . (Hypervisor)In full virtualization, the virtual machine operating system does not need to be modified and can run an unmodified operating system. Representative products include:

  • KVM(Kernel-based Virtual Machine): KVM is an open source complete virtualization solution that provides virtualization capabilities based on the Linux kernel. KVM is used as a virtual machine monitor and provides hardware acceleration QEMU(Quick Emulator)through hardware virtualization extensions (such as Intel's VT-xand AMD's ).AMD-V

Para-virtualization:
Para-virtualization technology modifies the operating system inside the virtual machine so that it can communicate and cooperate with the virtualization layer, thereby improving performance and efficiency. Representative products include:

  • Xen: Xenis an open source paravirtualization solution that can run virtual machines without modifying the operating system. Xen uses an approach called " Xen Plug-in KernelHypervisor " to achieve paravirtualization by modifying the operating system kernel so that it communicates with it.

Container virtualization (Containerization):
Container virtualization technology achieves isolation of applications and dependencies by creating isolated container instances at the operating system level. Containers share the operating system kernel and are therefore more lightweight and efficient than virtual machines. Representative products include:

  • Docker: Docker is a popular containerization platform that uses container images (Images) to package applications and their dependencies, and runs container instances on the host through the Docker engine. Docker provides convenient build, distribution and deployment tools, making the use of containers simple and efficient.
  • LXC (Linux Containers): LXC is a lightweight containerization solution that uses features such as namespaces and control groups (cgroups) in the Linux kernel to achieve process isolation. LXC provides a container runtime environment in which independent user-space instances can be run .

Hardware-assisted Virtualization: Hardware-assisted virtualization
technology uses virtualization extensions in physical processors, such as Intel's VT-x and AMD's AMD-V, to provide hardware support for virtualization and improve virtual machines. performance and efficiency. The above-mentioned KVM and Xen are also solutions for hardware-assisted virtualization.

Lightweight Virtualization: Lightweight virtualization
technology is a special form of virtualization that uses features such as namespaces and control groups (cgroups) at the operating system level to control processes. Isolation and resource control. Representative products include:

  • Docker: In addition to being a containerization platform, Docker also provides a lightweight virtualization method. Docker containers can run as independent processes on the host machine, with isolated file systems, networks, and process spaces.

1.3. Linux namespace development history

The development history of Linux Namespace can be traced back to 2002, and was first proposed and implemented by Eric W. Biederman.

The following is the main development history of Linux Namespace:

  • 2002: The earliest Linux Namespace implementation was introduced by Eric W. Biederman in the 2.4 kernel version, including Mount Namespaceand UTS Namespace.
  • 2006: Introduced by Eric W. Biederman and Serge E. Hallyn in version 2.6.24 PID Namespace, allowing each Namespace to have an independent process ID space.
  • 2008: Introduced in version 2.6.29 by Eric W. Biederman and Serge E. Hallyn Network Namespace, enabling independent network isolation.
  • 2013: Docker launched the Docker container platform, which implemented lightweight container virtualization technology based on Linux Namespace and Cgroups, triggering a boom in container technology.
  • IPC Namespace2016: Linux Kernel version 4.6 and were introduced User Namespaceto implement inter-process communication and user isolation respectively.
  • 2017: Introduced in Linux Kernel version 4.11 CGROUP Namespace, allowing each Namespace to have independent resource limits.

Over time, Linux Namespace has gradually become an important feature in the Linux kernel, providing a foundation for the development of containerization technology. It provides a flexible and lightweight isolation mechanism that enables the creation of multiple independent virtual environments on a single host, realizing resource isolation and management.

1.4. The role of linux namespace

The functions of Linux Namespace include:

  • Process isolation: Linux Namespace can isolate different processes. Each process runs in its own Namespace, is not affected by other processes, and has an independent process view.
  • File system isolation: Through Mount Namespace, each process can have its own file system mount point to achieve file system isolation.
  • Network isolation: Through Network Namespace, each process can have independent network devices, IP addresses, routing tables and firewall rules to achieve network isolation.
  • Inter-process communication isolation: Through the IPC Namespace, the isolation of inter-process communication (IPC) is achieved, so that different processes cannot communicate directly in different Namespaces.
  • User and user group isolation: Through User Namespace, each process can have independent user and user group views, thereby achieving user and user group isolation.
  • Process resource restriction: Through PID Namespace and CGROUP Namespace, you can limit the resource usage of the process, such as CPU, memory, disk IO, etc., to achieve resource management and isolation.

By using Linux Namespace, you can create multiple independent virtual environments on a physical host. Each environment has its own process, file system, network and resource restrictions, thereby achieving isolation and resource management between processes and improving system security. sex and performance.
Insert image description here

2. Main functions of namespace

The main function list is as follows:

name effect
clone() Used to create a new process and specify a new namespace. You can use clone()the function's CLONE_NEWPID, CLONE_NEWNS, CLONE_NEWNET, CLONE_NEWIPC, CLONE_NEWUSERand other flags to create a new PID、Mount、Network、IPCor User Namespace.
unshare() Used to create new namespaces on the fly. You can use unshare()the function's CLONE_NEWPID, CLONE_NEWNS, CLONE_NEWNET, CLONE_NEWIPC, CLONE_NEWUSERand other flags to create a new PID、Mount、Network、IPCor User Namespace. Unlike clone()functions, unshare()functions do not create a new process, but switch the current process to a new namespace.
setns() Used to add a process to an existing namespace to achieve namespace sharing. You can add a process to the corresponding namespace by specifying the namespace's file descriptor and namespace type.
mount() Used to mount the file system in the specified Mount Namespace. You can specify parameters such as mount point, file system type, and mount options to perform the mount operation.
umount() Used to unmount the specified Mount Namespace中file system. You can specify the mount point to be unmounted to complete the uninstallation operation.
unshare(CLONE_NEWCGROUP) Used to switch a process to a new one CGROUP Namespaceand implement resource restrictions and control of the process.
netns() For creating or managing Network Namespace. You can create, delete, view or switch to a different one by specifying the corresponding command line parameters Network Namespace.
ip netns命令 Command line tools are used to create, delete, view or switch Network Namespace.
ipc_namespace() For creating or managing IPC Namespace.
user_namespace() For creating or managing User Namespace.

2.1、clone()

clone()Function is one of the system calls used to create new processes in Linux . It has the ability to create new namespaces . clone()Is Ca wrapper function defined in the language library that is responsible for creating the stack of a new process and calling system calls (wrapper)that are hidden from the programmer . It is actually a more general implementation of Linux system calls , which can be used to control how many functions are used. There are more than 20 kinds of parameters at the beginning that are used to control all aspects of the clone process (such as whether to share virtual memory with the parent process, etc.). The following is a detailed description of the function:clone()clone()fork()flagsCLONE_falg(标志位)clone()

#include <sched.h>

int clone(int (*fn)(void *), void *child_stack, int flags, void *arg, ...);

clone()The first parameter fn of the function is a function pointer, which points to the function to be executed by the new process. This function has the following prototype: int fn(void *arg). The new process will execute from the entry point of the specified function. When this function returns, the child process terminates. This function returns an integer representing the exit code of the child process.

child_stackThe parameter is the stack of the new process, which can be allocated memory space or NULL. If child_stackso NULL, the new process uses the same stack as the parent process.

flagsThe argument is a bitmask that sets the behavior and properties of the new process. Commonly used signs include:

  • CLONE_NEWPID:Create a new one PID Namespaceso that the new process runs in a separate process view, independent of the parent process and other processes.
  • CLONE_NEWNS: Creates a new one Mount Namespaceso that the new process runs in a separate file system view, independent of the parent process and other processes.
  • CLONE_NEWNET:Create a new one Network Namespaceso that the new process runs in a separate network view, independent of the parent process and other processes.
  • CLONE_NEWIPC:Create a new one IPC Namespaceso that the new process runs in a separate IPC resource view, independent of the parent process and other processes.
  • CLONE_NEWUSER:Create a new one User Namespaceso that the new process runs in a separate user and user group view, independent of the parent process and other processes.

argarguments are the arguments passed to the new process.

clone()The return value of the function is that of the new process ID(PID), and is returned if an error occurs -1.

By specifying different flag bits, different types of namespaces can be created in the clone() function to achieve process isolation and resource management. This provides the basis for the implementation of containerization technology.

2.2. setns()Function

setns()The function is one of the system calls in Linux used to add a process to an existing namespace.

#define _GNU_SOURCE
#include <sched.h>

int setns(int fd, int nstype);

setns()Function is used to add a process to an existing namespace. It accepts two parameters:

  • fd: An open file descriptor pointing to an existing namespace. This file descriptor can /proc/[pid]/ns/[namespace_type]be obtained by opening the file. namespace_typeA type that represents a namespace, such as pid、mnt、net、ipc、utsetc.
  • nstype: Specify the type of namespace to be added. This value should namespace_typecorrespond to , such as CLONE_NEWPIDcorresponds to pid, CLONE_NEWNScorresponds to mnt, CLONE_NEWNETcorresponds to , netetc.

setns()The return value of the function is an integer, indicating the success or failure of the operation. If successful, return 0; otherwise -1, return and set the corresponding error code.

Use setns()functions to add the current process to an existing namespace, thereby sharing resources and context in the namespace. This is useful in certain scenarios, especially in containerization technologies, where multiple containers can share the same namespace. In docker, using docker execthe command to execute a new command in an already running container requires the use of setns()the function.

2.3. unshare()Function

unshare()Functions can be used to detach a process from a specified namespace. By calling unshare()the function and specifying the appropriate namespace option, the current process can be detached from the specified namespace and become the first member of the new namespace.

#define _GNU_SOURCE
#include <sched.h>

int unshare(int flags);

flags: Specifies the type of namespace to be created and related options. Multiple options can be combined together using the bitwise OR operator. Common options are:

  • CLONE_NEWPID: Create a new PID namespace.
  • CLONE_NEWNET: Create a new network namespace.
  • CLONE_NEWNS: Create a new mount namespace.
  • CLONE_NEWIPC: Create a new IPC namespace.
  • CLONE_NEWUTS: Create a new UTS namespace (for hostname and domain name).

unshare()The return value of the function is an integer, indicating the success or failure of the operation. If successful, return 0; otherwise return -1 and set the corresponding error code.

Use unshare()the function to create a new namespace in the current process and make it the first member. This allows processes to have independent resources and context in the new namespace. This is very useful in containerization and isolation technologies to achieve process isolation and resource isolation.

It should be noted that unshare()the function can only create a new namespace and cannot add a process to an existing namespace. To add a process to an existing namespace, use setns()the function.

unshareThe unshare command
is used to create a new namespace and make the current process the first member of the new namespace.

unshare [options] [command [arguments...]]

unshareCommands can take some options and parameters:

parameter explain
-mor--mount Create a new mount namespace.
-uor--uts Create a new UTS namespace.
-ior--ipc Create a new IPC namespace.
-nor--net Create a new network namespace.
-por--pid Create a new PID namespace.
-Uor--user Create a new user namespace.
-Cor--cgroup Create a new control group namespace.
-for--fork Immediately after creating the new namespace, execute a subcommand (command).
-ror--map-root-user Maps the root user in the user namespace to the current user.
-sor--setgroups Set additional groups in the user namespace.
-hor--help Display help information.

unshareThe command is used to create a new namespace and make the current process the first member of the new namespace. Options can be used to specify the type of namespace to create. After creating a new namespace, you can also perform specific operations in the new namespace by specifying a subcommand.

For example, executing the following command will create a new network namespace and execute bash commands in that namespace:

$ unshare -n bash

# 查找新建命名空间
$ ps -ef | grep bash
root          1      0  0 15:47 pts/0    00:00:00 /bin/bash
root         38      0  0 15:47 pts/1    00:00:00 /bin/bash
root         84     38  0 15:48 pts/1    00:00:00 bash
root        139     84  0 15:49 pts/1    00:00:00 grep --color=auto bash

$ ls -l /proc/84/ns
total 0
lrwxrwxrwx 1 root root 0 Sep  4 15:49 cgroup -> cgroup:[4026531835]
lrwxrwxrwx 1 root root 0 Sep  4 15:49 ipc -> ipc:[4026533180]
lrwxrwxrwx 1 root root 0 Sep  4 15:49 mnt -> mnt:[4026533178]
lrwxrwxrwx 1 root root 0 Sep  4 15:49 net -> net:[4026534452]
lrwxrwxrwx 1 root root 0 Sep  4 15:49 pid -> pid:[4026533181]
lrwxrwxrwx 1 root root 0 Sep  4 15:49 pid_for_children -> pid:[4026533181]
lrwxrwxrwx 1 root root 0 Sep  4 15:49 time -> time:[4026531834]
lrwxrwxrwx 1 root root 0 Sep  4 15:49 time_for_children -> time:[4026531834]
lrwxrwxrwx 1 root root 0 Sep  4 15:49 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Sep  4 15:49 uts -> uts:[4026533179]

After executing this command, the system will create a new network namespace and place the current process (including child processes) in the namespace. A new bash shell will then be started and executed in the new network namespace.

By using unsharecommands, you can easily create new namespaces and perform specific operations in the new namespace. This is useful for tasks such as process isolation, resource isolation, and containerization.

3. namespace classification

Namespace type System call parameters Page Isolates Kernel version
Mount CLONE_NEWNS mount_namespaces Mount points isolate file system mount points 2.4.19
UTS CLONE_NEWUTS uts_namespaces Hostname and NIS domain name isolate host name and domain name information 2.6.19
IPC CLONE_NEWIPC ipc_namespaces System V IPC, POSIX message queues isolate inter-process communication 2.6.19
PID CLONE_NEWPID pid_namespaces Process IDs ID of the quarantine process 2.6.24
Network CLONE_NEWNET network_namespaces Network devices, stacks, ports, etc. Isolate network resources 2.6.29
User CLONE_NEWUSER user_namespaces User and group IDs IDs of isolated users and user groups 3.8
Cgroup CLONE_NEWCGROUP cgroup_namespaces Cgroup root directory 4.6
Time CLONE_NEWTIME time_namespaces Boot and monotonic clocks 5.6

3.1、Mount Namespace

Mount NamespaceThe mount point used to isolate the file system. Different Mount Namesaceprocesses have different mount points and also have different file system views. Mount NamespaceIt was the first to be supported in history Namespace. mount NamespaceProvides a file hierarchy view for the process. Mount NamespaceCalling mount()and umount()will only affect the file system in the current Namespace and has no impact on the global file system. In fact, Mount Namespaceit chrootwas invented based on continuous improvements and chrootcan be regarded as the first Namespace in Linux. Then the file system that is mounted on the root directory of the container and is used to provide an isolated execution environment for the container image is the so-called container image, also known as rootfs(根文件系统).

3.1.1. chrootIntroduction

chroot(Change Root)Create a new file system environment by modifying the root directory of the current process. The following is an introduction to the principle of chroot:

chroot系统调用:chroot命令是通过chroot系统调用来实现的。chroot系统调用的原型如下:

int chroot(const char *path);

该系统调用将进程的根目录更改为指定的目录。注意,只有具有足够特权的用户可以使用chroot命令,通常需要使用root用户或者具有sudo权限的用户

修改根目录:chroot命令在执行时,会将指定的目录通过chroot系统调用传递给内核。内核会将当前进程的根目录更改为指定的目录,进程以这个目录作为新的根目录。

文件和目录访问:一旦chroot命令执行成功,进程以新的根目录作为基准进行文件和目录的访问。进程只能访问新的根目录及其子目录下的文件和目录,对于根目录以外的文件和目录则无法访问

隔离环境:使用chroot命令创建的新的文件系统环境是隔离的,进程在这个环境中运行时无法访问或修改主机系统的文件和目录。这样可以提供一定程度的安全性和隔离性,特别是在系统修复、软件开发和测试等场景中。

需要注意的是,chroot只是修改了进程的根目录,并不能完全隔离进程。进程仍然可以通过其他方式访问和修改主机系统的资源。

3.1.2、rootfs介绍

rootfs(根文件系统)是Linux文件系统的最顶层,包含了操作系统的基本文件和目录结构。在引导过程中,rootfs是最早被挂载的文件系统,它是系统启动的基础

rootfs通常以一个镜像文件或者一个设备文件的形式存在,它包含了操作系统所需的核心文件和目录,如/bin、/sbin、/etc等。这些文件和目录是构成系统的基础,包括系统初始化脚本、核心命令等。

在Linux系统中,rootfs是只读的,它由操作系统提供并在系统启动时挂载到根目录("/")。一旦系统启动,rootfs会被切换为可读写模式,此后可以通过写入其他文件系统来改变系统的状态。这种只读的rootfs设计可以保护系统的核心文件和目录,避免意外的修改和损坏。

在容器化技术中,每个容器都有自己的rootfs,容器的应用程序和依赖都存在于该文件系统中。容器可以通过挂载其他文件系统或目录来扩展其功能,但它们的根文件系统仍然是只读的,保证了容器的隔离性和安全性。

3.1.3、Mount Propagation挂载传播

挂载传播决定了这个挂载操作对于其他进程的可见性和影响。挂载传播定义了挂载对象(mount object)之间的关系,系统利用这些关系决定任何挂载对象中的挂载事件传播到其它挂载对象。所谓传播事件,就是一个挂载对象状态变化导致的其它挂载对象的挂载与解除挂载动作的事件。

  • 如果两个挂载对象具有共享关系(share relationship),那么一个挂载对象的挂载事件会传播到另一个挂载对象,反之亦然。
  • 如果两个挂载对象形成从属关系(master slave),那么一个挂载对象的挂载事件会传播到另一个挂载对象,但反之不行。在这种关系中,从属对象是事件的接受者。

一个挂载状态可以为如下的其中一种:

  • 共享状态(shared)
  • 从属状态(slave)
  • 共享/从属状态(shared and slave)
  • 私有挂载(private)
  • 不可绑定挂载(unbindable)

传播事件的挂载对象称为共享挂载(shared mount);接收传播事件的挂载对象称为从属挂载(slave mount)。既不传播也不接收传播事件的挂载对象称为私有挂载(private mount)。另一种特殊的挂载对象称为不可绑定的挂载(unbindable mount),它们与私有挂载相似,但是不允许执行绑定挂载,即创建 mount namespace 时这块文件对象不可被复制。
Insert image description here

共享挂载的应用场景非常明显,就是为了文件数据的共享所必须的一种挂载方式;从属挂载更大的意义在于一些“只读”场景;私有挂载则是纯粹的隔离,作为独立个体存在;不可绑定挂载则有助于防止没必要的文件拷贝。

默认情况下,所有挂载都是私有的。从共享挂载克隆的挂载对象也是共享的挂载,它们互相传播挂载事件。
从属挂载克隆的挂载对象也是从属的挂载,它也从属于原来的从属挂载的主挂载对象。

mount --make-shared /mntS      # 将挂载点设置为共享关系属性
mount --make-private /mntP     # 将挂载点设置为私有关系属性
mount --make-slave /mntY       # 将挂载点设置为从属关系属性
mount --make-unbindable /mntU  # 将挂载点设置为不可绑定属性

3.1.4、挂载信息的查看

/proc/[pid]/mounts
文件中每行数据的含义为:

字段 含义
挂载源(Source) 表示文件系统的挂载点,可以是设备文件路径(如/dev/sda1)、网络路径(如//server/share)或特殊文件系统(如proc、sysfs)。
挂载点(Mount Point) 表示挂载源被挂载到的目录路径。
文件系统类型(Filesystem Type) 指示挂载的文件系统类型,如ext4、tmpfs、proc、sysfs等。
挂载选项(Mount Options) 表示挂载时使用的选项,多个选项以逗号分隔。
使用的主设备号(Major Device Number) 表示挂载设备的主设备号。仅适用于块设备文件。
使用的次设备号(Minor Device Number) 表示挂载设备的次设备号。仅适用于块设备文件

/proc/[pid]/mounts文件的内容可以提供有关特定进程的挂载信息,帮助用户了解进程所使用的文件系统和挂载选项等信息。可以通过读取该文件来获取进程的挂载点和文件系统类型等相关信息。

需要注意的是,/proc/[pid]/mounts文件是只读的,只能用于查看挂载信息,不能修改其中的内容。
常见挂载选项:

选项 含义
rw 挂载为可读写的文件系统。可以在该文件系统上进行读取和写入操作。
ro 挂载为只读的文件系统。只能在该文件系统上进行读取操作,不能进行写入操作。
relatime 更新访问时间,但仅在访问时间超过修改时间时更新。相较于atime选项,relatime选项减少了对文件系统的访问次数,提高了性能。
async 异步写入。文件系统将写入操作放入缓冲区,然后立即返回,而不等待写入操作完成。
sync 同步写入。文件系统在执行写入操作时会等待写入操作完成,然后再返回。
noexec 不允许在该文件系统上执行可执行文件。即禁止在该文件系统上运行程序。
nodev 不允许在该文件系统上创建设备文件。即禁止在该文件系统上创建字符设备或块设备。
nosuid 禁止设置setuid和setgid位。即禁止在该文件系统上设置可执行文件的setuid和setgid权限。
noatime 不更新访问时间。即不会更新文件或目录的访问时间戳。
nodiratime 不更新目录的访问时间。即只有文件的访问时间会被更新。
index Btrfs文件系统中用于禁用或启用文件系统的索引功能
metacopy Btrfs文件系统中用于启用元数据拷贝功能

/proc/[pid]/mountinfo

字段 含义
Mount ID 挂载的ID号,用于唯一标识每个挂载点。
Parent ID 父挂载点的ID号,表示当前挂载点的父级挂载点。
Major:Minor 挂载的设备的主次设备号。
Root 挂载点的根目录。
Mount Point 设备或文件系统被挂载到的目标路径。
Mount Options 挂载时使用的选项,多个选项以逗号分隔。
Optional Fields 可选字段,可能包含一些附加的挂载信息,如安全标签(security label)、备份目录(backup directory)等。
Filesystem Type 文件系统的类型,如ext4、tmpfs、proc等。
Mount Source 设备或文件系统的源路径。
Super Options 超级选项,用于指定与挂载点相关的额外选项。
Subtree Options 子树选项,用于指定与挂载点子树相关的选项。

proc/[pid]/mountinfo文件提供了有关挂载点的详细信息,包括设备、文件系统类型、挂载路径、挂载选项等。通过读取该文件,可以了解系统中的挂载点及其相关信息,对于了解文件系统的结构和配置非常有用。

/proc/[pid]/mountstats

字段 含义
device 挂载设备的路径或标识符。
path 挂载点的路径。
type 文件系统类型。
mountinfo 与挂载点相关的详细信息的文件路径。
mountsource 挂载点的源路径。
superoptions 超级选项,用于指定与挂载点相关的额外选项。
options 挂载选项,用于指定挂载时使用的选项。
age 挂载与卸载之间的时间差,以秒为单位。
opts 挂载选项的统计信息。
mount_time 挂载点的挂载时间,以纳秒为单位。
umount_time 挂载点的卸载时间,以纳秒为单位。
num_mounts 挂载点的总挂载次数。
num_mnt_errs 挂载点的挂载错误次数。
num_mount_errors 挂载点的严重错误次数。
num_mounts_succeed 挂载点的成功挂载次数。
num_umounts 挂载点的总卸载次数。
num_umount_errors 挂载点的卸载错误次数。

/proc/[pid]/mountstats文件提供了有关挂载点的统计信息,包括挂载次数、挂载选项的使用情况、挂载时间等。通过读取该文件,可以了解挂载点的使用情况和性能统计,对于监控和调优文件系统的挂载点非常有用。

3.5、User Namespaces

用户命名空间(User Namespace)用于隔离用户和用户组的视图。它允许在一个命名空间中重新映射用户和用户组的ID,从而实现用户和用户组的隔离和管理。User namespace 主要隔离了安全相关的标识符(identifiers)和属性(attributes),包括用户ID、用户组ID、root目录、 key(指密钥)以及特殊权限。说得通俗一点,一个普通用户的进程通过clone() 创建的新进程在新user namespace 中可以拥有不同的用户和用户组。这意味着一个进程在容器外属于一个没有特权的普通用户,但是他创建的容器进程却属于拥有所有权限的超级用户,这个技术为容器提供了极大的自由。用户命名空间的主要目的是提供更安全的环境,使不同用户在同一系统上运行时能够相互隔离,避免权限冲突和攻击。以下是用户命名空间的一些重要概念和特性:

用户ID映射:用户命名空间允许重新映射用户ID(UID)和组ID(GID)到不同的值,称为用户ID映射。这样做可以在命名空间中创建和管理独立的用户和用户组。例如,一个用户在一个用户命名空间中可能有ID为0的特权用户,但在全局命名空间中却没有这个特权。

用户权限隔离:用户命名空间隔离了用户和用户组的权限,使得在不同的命名空间中用户的权限不会影响其他命名空间的用户。这提供了更好的安全性和隔离性,防止不同用户之间的权限冲突,以及减少攻击面。

文件系统隔离:用户命名空间还可以隔离文件系统的视图,使得在不同命名空间中的用户可以有独立的文件系统,并且对文件和目录的访问权限不会相互干扰。

容器化和虚拟化:用户命名空间是实现容器化和虚拟化的关键组成部分。它允许不同的容器或虚拟机在同一主机上运行,每个容器或虚拟机都有自己独立的用户和用户组。

3.5.1、查看user ns方法:

$ lsns -t user
        NS TYPE  NPROCS PID USER COMMAND
4026531837 user     263   1 root /usr/lib/systemd/systemd --system --deserialize 17

$ ls /proc/1/ns/user 
/proc/1/ns/user

$ ll /proc/1/ns/user 
lrwxrwxrwx 1 root root 0 Sep  1 10:21 /proc/1/ns/user -> user:[4026531837]

$ readlink /proc/1/ns/user 
user:[4026531837]

3.5.2、User Namespaces的创建和销毁

# 使用unshare -U /bin/bash 创建新的shell会话
$ lsns -t user
        NS TYPE  NPROCS PID USER COMMAND
4026531837 user       3   1 root /bin/bash

[root@ce31e508d31c /]
$ unshare -U sh
whoami: cannot find name for user ID 65534

[I have no name!@ce31e508d31c /]
$ sudo ip netns add test
sudo: /etc/sudo.conf is owned by uid 65534, should be 0
sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set
whoami: cannot find name for user ID 65534

[I have no name!@ce31e508d31c /]
$ lsns -t user
        NS TYPE  NPROCS   PID USER  COMMAND
4026534745 user       2   155 65534 sh
whoami: cannot find name for user ID 65534

具有足够权限的用户(如root用户)才能销毁用户命名空间。

# root用户下查看
$ lsns -t user
        NS TYPE  NPROCS   PID USER COMMAND
4026531837 user     261     1 root /usr/lib/systemd/systemd --system --deserialize 17
4026534745 user       1   613 root sh

$ kill -9 613

$ lsns -t user
        NS TYPE  NPROCS PID USER COMMAND
4026531837 user     263   1 root /usr/lib/systemd/systemd --system --deserialize 17

四、namespace 生命周期

每个命名空间都有自己的生命周期,可以创建、运行、销毁和释放。

下面是Linux命名空间的典型生命周期:

1、创建命名空间:

使用系统调用(如clone()unshare())创建新的命名空间。通过指定不同的标识符,可以创建各种类型的命名空间,例如PID命名空间、网络命名空间、挂载命名空间等。

2、运行命名空间:

在创建命名空间后,可以将进程加入到该命名空间中。通过调用setns()系统调用或使用nsenter命令,可以将进程从父命名空间切换到新的命名空间。在命名空间中,进程可以访问和修改属于该命名空间的资源。

3、销毁命名空间:

当不再需要命名空间时,可以将其销毁。通过调用unshare()系统调用或使用ip netns delete等命令,可以销毁命名空间。销毁命名空间将释放该命名空间所占用的资源,并将其中的进程重新归入到原始的父命名空间中。

需要注意的是,命名空间可以被继承和共享。例如,一个进程创建了一个新的PID命名空间,并在其中启动了一个子进程,那么该子进程将成为新命名空间的一部分。此外,命名空间可以通过不同的手段进行通信和共享资源,如Unix域套接字或Mount命名空间的绑定挂载点。

总结起来,Linux命名空间的生命周期涉及创建、运行、销毁和释放操作,通过这些操作可以实现进程资源的隔离和管理。

命名空间可能不会被销毁的情况

1、进程仍在命名空间中运行:

如果有一个或多个进程仍在使用命名空间,并且没有退出或切换到其他命名空间,那么该命名空间将保持存在。只有在最后一个进程退出或切换到其他命名空间时,命名空间才会被销毁。

2、子命名空间的存在:

如果一个命名空间是另一个命名空间的子命名空间,并且子命名空间仍然活跃,那么父命名空间将一直保持存在。只有当父命名空间和所有子命名空间中的进程都退出或切换到其他命名空间时,该命名空间才会被销毁。

3、共享命名空间:

如果一个命名空间被多个进程共享,并且这些进程仍在活跃状态,那么该命名空间将一直保持存在。只有当最后一个共享此命名空间的进程退出或切换到其他命名空间时,该命名空间才会被销毁。

需要注意的是,命名空间的生命周期取决于其中的进程和命名空间之间的关系。只要还有进程活跃或者仍有命名空间之间的继承或共享关系存在,命名空间就会被保留下来。因此,确保在不再需要使用命名空间时进行适当的清理操作非常重要,以避免资源的浪费和潜在的问题。

五、查看进程所属的 namespace

5.1、lsns

lsns命令不需要任何参数,它会列出当前系统上存在的所有命名空间的信息。每个命名空间都有一行输出,包含以下信息:

  • NS TYPE:命名空间的类型,如pid、mnt、net、ipc、uts、cgroup等。
  • NS ID: The ID of the namespace, used to identify different namespaces.
  • NPROCS: The number of processes currently running in the namespace.
  • FLAGS: The flag bit of the namespace, indicating the attributes and status of the namespace.

lsnsSome common options for the command include:

  • -aor --all: Show all namespaces, including unreferenced namespaces.
  • -t TYPEOr --type TYPE: Display only namespaces of the specified type.
  • -n NAMESPACEOr --namespace NAMESPACE: Display only the namespace information of the specified namespace ID or name.
$ lsns
        NS TYPE  NPROCS    PID USER COMMAND
4026531836 pid      217      1 root /usr/lib/systemd/systemd --system --deserialize 17
4026531837 user     262      1 root /usr/lib/systemd/systemd --system --deserialize 17
4026531838 uts      221      1 root /usr/lib/systemd/systemd --system --deserialize 17
4026531839 ipc      217      1 root /usr/lib/systemd/systemd --system --deserialize 17
4026531840 mnt      206      1 root /usr/lib/systemd/systemd --system --deserialize 17
4026531860 mnt        1     50 root kdevtmpfs
4026531992 net      227      1 root /usr/lib/systemd/systemd --system --deserialize 17
4026532380 mnt        9  68131 root nginx: master process /usr/sbin/nginx
4026532384 mnt        1  79144 root /pause
4026532385 uts        1  79144 root /pause
4026532386 ipc        2  79144 root /pause
4026532387 pid        1  79144 root /pause
4026532389 net        2  79144 root /pause
4026532427 mnt        1   9682 ntp  /usr/sbin/ntpd -u ntp:ntp -g
4026532464 mnt        1  79146 root /pause
4026532466 uts        1  79146 root /pause

5.2、ls -l /proc/[pid]/ns

$ ls -l /proc/99997/ns
total 0
lrwxrwxrwx 1 root root 0 Sep  4 14:31 cgroup -> cgroup:[4026531835]
lrwxrwxrwx 1 root root 0 Sep  1 10:21 ipc -> ipc:[4026532661]
lrwxrwxrwx 1 root root 0 Sep  1 10:21 mnt -> mnt:[4026532659]
lrwxrwxrwx 1 root root 0 Sep  1 10:21 net -> net:[4026532664]
lrwxrwxrwx 1 root root 0 Sep  1 10:21 pid -> pid:[4026532662]
lrwxrwxrwx 1 root root 0 Sep  4 14:31 pid_for_children -> pid:[4026532662]
lrwxrwxrwx 1 root root 0 Sep  4 14:31 time -> time:[4026531834]
lrwxrwxrwx 1 root root 0 Sep  4 14:31 time_for_children -> time:[4026531834]
lrwxrwxrwx 1 root root 0 Sep  1 10:21 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Sep  1 10:21 uts -> uts:[4026532660]

In each line of output, the first field represents the permissions and properties of the symbolic link, and the ninth field represents the target of the symbolic link. The format of the target is [type]:[namespace_id], where type represents the type of the namespace, and namespace_id is the ID of the namespace.

By executing the ls -l /proc/[pid]/ns command, you can view the symbolic links of all namespaces to which a specific process belongs. This allows you to know the namespace the process is currently in and the namespace type and ID.

5.3、pstree -p

The pstree -p command is used to display processes and their subprocesses in a tree structure and display their PIDs (process IDs).

$ pstree -p | grep nginx
           |-nginx(68131)-+-nginx(68132)
           |              |-nginx(68133)
           |              |-nginx(68134)
           |              |-nginx(68135)
           |              |-nginx(68136)
           |              |-nginx(68137)
           |              |-nginx(68139)
           |              `-nginx(68140)

5.4、ip netns list

reference documents

1、https://blog.csdn.net/y3over/article/details/128863060

2、https://www.cnblogs.com/sally-zhou/p/13398260.html

3、http://www.taodudu.cc/news/show-320037.html?action=onClick

4、http://www.360doc.com/content/21/0803/11/31115656_989326901.shtml

5、http://www.noobyard.com/article/p-nqmbazhv-s.html

6、https://www.cnblogs.com/sparkdev/p/8214455.html

7、https://blog.csdn.net/key_3_feng/article/details/129942638

Guess you like

Origin blog.csdn.net/yuelai_217/article/details/132664238