Obviously there is still a lot of memory, why is the error "Unable to allocate memory"?

2944445d8cf94fca56577789ac2c5f5d.gif

Author | Zhang Yanfei allen

Source | Development of Internal Strength Cultivation

Recently, my friend told me that there is a strange problem with the online server. When executing any command, it will report an error "fork: unable to allocate memory". This issue appeared recently and was resolved after the first few reboots, but it happens every 2-3 days.

# service docker stop
-bash fork: 无法分配内存
# vi 1.txt
-bash fork: 无法分配内存

Seeing this prompt, everyone's first reaction must be to suspect that the memory is really not enough. Our reader thinks so too. But looking at the memory usage, I found that there is no memory at all, and the memory is still free! (try several times to have a chance to execute successfully once)

61bb2a2ec807edf12511482fefeb9530.png

Fei Ge helped out three ideas:

  • Is it possible that under the numa architecture, the node is bound to the process when it starts, so that only the memory in one node works?

  • Under the numa architecture, if all memory is inserted into one slot, other nodes will have no memory

  • Check the current number of incoming (threads) processes, whether it exceeds the maximum limit

Here I will directly report the conclusion to you. The previous speculation about numa's insufficient memory is wrong. The real reason is the third one above. Some java processes on this server have created too many threads, which led to the generation of this error. It is not really insufficient memory.

8c0b47dc8e30a97797f6d98e19c9f0c9.png

low-level process analysis

In this question, the Linux error message is misleading. As a result, everyone did not think about the process number for the first time. Therefore, there is such a complicated and tortuous troubleshooting process, so that the discussion in the group can be solved.

So I want to go deep into the kernel to see how the error message prompts such an inappropriate error message. Then by the way, let's also understand the process of understanding the creation process.

The operating system of the reader's online server is CentOS 7.8, and I checked the corresponding kernel version is 3.10.0-1127.

1.1 Anatomy of do_fork

In the Linux kernel, whether a process or a thread is created, the core do_fork will be called. Inside this function, the kernel data objects required by the new process (thread) are created by copying.

//file:kernel/fork.c
long do_fork(unsigned long clone_flags, ...)
{
 //所谓的创建,其实是根据当前进程进行拷贝
 //注意:倒数第二个参数传入的是 NULL
 p = copy_process(clone_flags, stack_start, stack_size,
    child_tidptr, NULL, trace);
 ...
}

The core of the entire process creation is located in copy_process, let's look at its source code.

//file:kernel/fork.c
static struct task_struct *copy_process(unsigned long clone_flags, 
    ...
    struct pid *pid,
    int trace)
{
 //内核表示进程(线程)的数据结构叫task_struct
 struct task_struct *p;

 ......

 //拷贝方式生成新进程的核心数据结构
 p = dup_task_struct(current);

 //拷贝方式生成新进程的其它核心数据
 retval = copy_semundo(clone_flags, p);
 retval = copy_files(clone_flags, p);
 retval = copy_fs(clone_flags, p);
 retval = copy_sighand(clone_flags, p);
 retval = copy_mm(clone_flags, p);
 retval = copy_namespaces(clone_flags, p);
 retval = copy_io(clone_flags, p);
 retval = copy_thread(clone_flags, stack_start, stack_size, p);

 //注意这里!!!!!!
 //申请整数形式的 pid 值
 if (pid != &init_struct_pid) {
  retval = -ENOMEM;
  pid = alloc_pid(p->nsproxy->pid_ns);
  if (!pid)
   goto bad_fork_cleanup_io;
 }

 //将生成的整数pid值设置到新进程的 task_struct 上
 p->pid = pid_nr(pid);
 p->tgid = p->pid;
 if (clone_flags & CLONE_THREAD)
  p->tgid = current->tgid;

bad_fork_cleanup_io:
 if (p->io_context)
  exit_io_context(p);
......
fork_out:
 return ERR_PTR(retval); 
}

It can be seen from the above code that the creation process of the Linux kernel to create the entire process kernel object is achieved by calling different copy_xxx methods, including mm structure, including namespaces and so on.

Let's focus on this paragraph related to alloc_pid. In this paragraph, the purpose is to apply for a pid object. An error is returned if the application fails. Pay attention to the details of this code: no matter what type of failure alloc_pid returns, its error type is hard-coded to return -ENOMEM. . . In order to facilitate everyone's understanding, I will show this logic again separately.

//file:kernel/fork.c
static struct task_struct *copy_process(...){
 ......

 //申请整数形式的 pid 值
 if (pid != &init_struct_pid) {
  retval = -ENOMEM;
  pid = alloc_pid(p->nsproxy->pid_ns);
  if (!pid)
   goto bad_fork_cleanup_io;
 }
bad_fork_cleanup_io:
...
fork_out:
 return ERR_PTR(retval); 
}

When preparing to call alloc_pid, set the error type to -ENOMEM (retval = -ENOMEM) directly. As long as alloc_pid returns incorrectly, the error ENOMEM will be returned to the upper layer. Regardless of what causes the alloc_pid memory is wrong .

Let's look at the definition of ENOMEM. It stands for Out of memory. (The kernel just returns an error code, and the application layer gives a specific error prompt, so the actual prompt is "unable to allocate memory" in Chinese).

//file:include/uapi/asm-generic/errno-base.h
#define ENOMEM  12 /* Out of memory */

have to say. This error message from the kernel is too problematic. It caused great confusion to users.

1.2 What causes alloc_pid to fail

Then let's take a closer look at the circumstances under which the allocation of pid will fail? Look at the source code of alloc_pid

//file:kernel/pid.c
struct pid *alloc_pid(struct pid_namespace *ns)
{
 //第一种情况:申请 pid 内核对象失败
 pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
 if (!pid)
  goto out;

 //第二种情况:申请整数 pid 号失败
 //调用到alloc_pidmap来分配一个空闲的pid
 tmp = ns;
 pid->level = ns->level;
 for (i = ns->level; i >= 0; i--) {
  nr = alloc_pidmap(tmp);
  if (nr < 0)
   goto out_free;

  pid->numbers[i].nr = nr;
  pid->numbers[i].ns = tmp;
  tmp = tmp->parent;
 }

 ...
out:
 return pid; 
out_free:
 goto out; 
}

The pid we usually say is not a simple integer type in the kernel, but a small structure (struct pid), as follows.

//file:include/linux/pid.h
struct pid
{
 atomic_t count;
 unsigned int level;
 struct hlist_head tasks[PIDTYPE_MAX];
 struct rcu_head rcu;
 struct upid numbers[1];
};

So you need to first apply for a piece of memory in memory to store this small object. The first error case is if the memory allocation fails, alloc_pid will return failure. In this case, it is indeed a memory problem, and it is understandable that the kernel returns ENOMEM after an error.

Then look down at the second case, alloc_pidmap is to apply for a process number for the current process, which is what we usually call the PID number. An error is also returned if the application fails.

In this case, it is just an error in assigning the process number, and it has nothing to do with insufficient memory. But in this case the kernel will cause the error type returned to the upper layer to be ENOMEM (Out of memory) . This is really unreasonable.

Through this, we also learned another knowledge! It is not enough for a process to just apply for a process ID. Instead, multiple applications are applied through a for loop.

//file:kernel/pid.c
struct pid *alloc_pid(struct pid_namespace *ns)
{
 //调用到alloc_pidmap来分配一个空闲的pid
 tmp = ns;
 pid->level = ns->level;
 for (i = ns->level; i >= 0; i--) {
  nr = alloc_pidmap(tmp);
  if (nr < 0)
   goto out_free;

  pid->numbers[i].nr = nr;
  pid->numbers[i].ns = tmp;
  tmp = tmp->parent;
 }
}

If the currently created process is a process in a container, it must apply for at least two PID numbers. One PID is the process ID in the container namespace and one is the process ID in the root namespace (host).

This is also in line with our usual experience. Every process in the container can actually be seen in the host. However, the process ID seen in the container is generally different from that seen on the host. For example, the pid of a process in the container is 5, and it is 1256 in the host namespace. Then the object of the process in the kernel is probably as follows.

c89338a6ec0cfc645d81646a260306d5.png

5e3401e2210ce7366470bcc2664e75f2.png

Has the new version changed

Next, my first thought might be because the kernel version we are using is too old. (The kernel version I use is 3.10.1)

So I doubled down on the very new Linux 5.16.11 to see if the new version fixes this inappropriate prompt.

Recommend a tool: https://elixir.bootlin.com/ . The source code for any version of the linux kernel can be viewed on this website. If you are just looking at it temporarily, it is very suitable to use.

//file:kernel/fork.c
static __latent_entropy struct task_struct *copy_process(...)
{
 ...
 pid = alloc_pid(p->nsproxy->pid_ns_for_children, args->set_tid,
    args->set_tid_size);
 if (IS_ERR(pid)) {
  retval = PTR_ERR(pid);
  goto bad_fork_cleanup_thread;
 }
}

It seems that there is a drama, retval is no longer written to death is ENOMEM, but is set according to the actual error of alloc_pid. Let's see if alloc_pid is correctly set to the wrong type?

When I opened the source code of alloc_pid and saw this large comment, my heart froze. . .

 
  
//file:include/pid.c
struct pid *alloc_pid(struct pid_namespace *ns, ...)
{
 /*
  * ENOMEM is not the most obvious choice especially for the case
  * where the child subreaper has already exited and the pid
  * namespace denies the creation of any new processes. But ENOMEM
  * is what we have exposed to userspace for a long time and it is
  * documented behavior for pid namespaces. So we can't easily
  * change it even if there were an error code better suited.
  */
 retval = -ENOMEM;
 .......
 
 return retval
}

I'll give you a rough translation of this note. It means " ENOMEM is not the most obvious choice, especially if the pid creation fails. However, ENOMEM is something we expose to userspace for a long time. So even with a more appropriate error code, we can't easily change it it "

Seeing this, I remembered that many people also call Linux a mountain of shit, maybe this is one of them! The latest version does not solve this problem well.

e7183f5ac923d8087c1e48ff9614ea0c.png

in conclusion

When creating a process in Linux, if the pid is insufficient, the error message returned is "insufficient memory". This inappropriate error prompt caused many students to be confused.

Through today's article, when you encounter this kind of insufficient memory error in the future, you have to be more careful. Don't be fooled by the kernel. First, let's see if you have too many processes (threads). .

As for how to solve this problem, you can increase the number of available pids by modifying the kernel parameters (/proc/sys/kernel/pid_max).

But I think the most fundamental method is to find out why there are so many processes (threads) in the system, and then kill it. The default number of 20,000 to 30,000 processes is already too large for most servers, and it must be unreasonable to even exceed this number.

f5151b017297a6f766c46d4cf96d7606.gif

Recommended in the past

How to solve Redis cache breakdown (invalidation), cache penetration, and cache avalanche?

If you are asked about distributed locks, how should you answer?

How does Redis with outstanding performance use epoll?

Basic knowledge of Java: what is a "bridge method"?

ba759554b7e002f867177cb0d2f74b91.gif

point to share

14995d99eeb817f583469af35736b60e.gif

Favorites

780005c9b40479ec63858018f3f37513.gif

Like

f09529b8b914b1a2ebf72d49f2e3a4bf.gif

click to watch

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324123208&siteId=291194637