sdb locate the problem report failed to create thread

In a clustered environment, sdb, if faced with a highly concurrent operating scenarios, sometimes inexplicably reported -10 error.

This error -10, after careful review diaglog log node, find the operating system create problems thread failure.

 

Then this article is to guide you how to locate the problem.

 

The error log diaglog sdb node information

The total error log information through will be similar to the following key information

Failed to create new agent: boost::thread_resource_error: Resource temporaily unavailable
Failed to create new agent, probe = 30
Failed to create subagent thread, rc = -10
Failed to start session EDU, rc = -10

 


 

General operating system when a thread is created, which will be limited by the parameters of it, there are two main

  1. File handles limit
  2. Memory resources

 

  • Let's start with file handles

In linux operating system, known as everything is a file, whether it is a process, thread, socker or other, it will eventually be classified as an operating system file operations. Operating system or process, each application for a resource, such as thread, socker, will open a file, then the file open, can be understood as a simple file handle.

So handle limit what is it? In fact, limit the number of operating system, or a process that can open up the file.

 

We have this concept, we look at how the operating system is to limit the number of file handles.

In the operating system, there is a magical command - ulimit, specifically set up these weird limit value, the process of file handles is one of them.

For example, we can see ulimit output root user, -n open file = 1024 is the largest file root user to allow a process to open the handles.

Here's a little of details that need attention.

As the root user is an administrator user in linux, so if the root user ulimit open file set to 1024, then other users, such as: test, mysql users, etc., want to ulimit opon file is set to be greater than 1024, it is not enough of.

This we must know, if you want to modify the ulimit value of ordinary users is very large, we must first change the value of the root user.

 

In addition, restrictions on the number of handles, not just to limit the number of handles in the process, as well as limit the number of handles the entire operating system, as an operating system, we can not indefinitely open handles. In addition it is also introducing a set limit on the maximum number of operating system handles open.

This value is in centos 7, it is stored in / proc / sys / fs / file-max file

If the total number of operating system handles limit has been reached, even if the process has not started several threads, it will be enough to handle the situation.

If you want to set the maximum number of handles temporary modify the operating system, you can perform direct echo  2000000> / proc / sys / fs / file-max can be.

If you want to permanently modify the operating system to set the maximum number of handles, you can edit the  /etc/sysctl.conf file, add  fs.file-max = 2000000, and then execute the root user  sysctl -p can be.

 

  • We then introduce memory resources

Because when you create a thread in linux, it is the need to pre-allocate memory - also called the stack size for the value of the data stored in the thread.

Here again is another popular science knowledge, a program, the memory is divided into two major parts, one called "heap", called "stacks." "Heap" is a program used to hold constant and variable names, "stack" is a program to save specific variables with numbers.

Well, the background introduction, began to business.

Speaking at the start, if insufficient system memory, can not create the thread. The reason is that when a thread is created, the operating system needs to allocate a block of memory to the thread, this memory is how much of it is in the size ulimit -s stack size is. If the contents of the operating system and even the size of the stack size can not be out, and create a thread will fail.

Some readers may be wondering why so little memory is gone?

In fact, if you look closely operating system, you will find so many processes, each process is so many threads are running, each thread in the application memory (note that this memory is physical memory), memory shortage normal very. This is also reminiscent of the JVM OOM, but they are really not the same thing, we do not misunderstand.

To solve this problem is relatively simple - direct and crude? Is to ulimit -s stack size in tune a little small, each thread do not apply so much memory, the operating system will be more ample memory resources. After all, a thread of these are run on the end, can not all be occupied permanently memory.

 

  • Other points of knowledge

Because there are some readers in solving such problems, execute the ulimit -a command, found that parameters are set correctly, or why not?

Here it is necessary and readers said said.

You see ulimit -a is good, but how do you know the process used is the value you set it?

So seeing is believing, readers solving problems, one should really confirm ulimit parameters sdb process is valid.

There are two ways

  1. In the newer version of the sdb, diaglog log when the node starts, will print its own ulimit parameters, readers can go looking through the log
  2. Another even more direct, direct view of linux system records. For example PID process is known 11910 123456, directly open / proc / 123456 / limits file, view the contents inside, so do not want to know, is immune

 

  • About command handles and threads

 

View a process to open a total of how many threads can

  1. cat /proc/$PID/status | grep Threads
  2. pstree -p $ PID, then +1, as well as the main process
  3. top -Hp $ PID, and then view the head "Threads" parameter
  4. ps hH p $PID | wc -l

View the total number of handles currently open linux

lsof -n|awk '{print $2}'|sort|uniq -c|sort -nr|awk '{print $1}' | awk '{sum += $1};END {print sum}'

 

View the total number of handles that a process of open

lsof -n | awk  ' {print $ 2} ' | black | uniq -c | black NR |  grep   $ PID

 

Guess you like

Origin www.cnblogs.com/chenfool/p/11237960.html