ES error too-many-open-files solution

The ElasticSearch service hangs up. The hang is not because the process is gone, because it is protected by Supervisor, but the service is unavailable. There has been a service unavailability failure caused by improper ES_HEAP_SIZE settings before, so my inertial judgment should be the problem of ES_HEAP_SIZE, but after logging in to the server, I found that a lot of "Too many open files" error messages were displayed in the log. So what is the maximum number of files set by ElasticSearch? This can be confirmed with proc:

shell> cat /proc/<PID>/limits

The result is "4096", we can also take a closer look at what ElasticSearch opens:

shell> ls /proc/<PID>/fd

The problem seems very simple, as long as the corresponding configuration items should be increased. This configuration is called MAX_OPEN_FILES in ElasticSearch, but unfortunately it is found to be invalid after configuration. In my experience, usually this kind of problem is mostly due to operating system limitations, but the inspection results are all normal:

shell> cat /etc/security/limits.conf

* soft nofile 65535
* hard nofile 65535

The problem has reached a dead end, so I started to try to find some tricks and tricks to see if I can alleviate it as soon as possible. I searched for an article by @-shenxian-: Dynamically modify the rlimit of a running process, which describes how to dynamically modify The threshold method, although I tested it all showed success, but unfortunately ElasticSearch still does not work properly:

shell> echo -n 'Max open files=65535:65535' > /proc/<PID>/limits

In addition, I also checked the system kernel parameters fs.file-nr and fs.file-max. In short, I checked all the parameters related to the file, and even hardcoded "ulimit -n 65535" in the startup script, but all my efforts seems pointless. Just when the mountains and rivers are in doubt, my colleague @ Xuan Mairen broke the mystery in one sentence: Turn off the process management mechanism of Supervisor, and try to start the ElasticSearch process manually. As a result, everything went back to normal. Why is this so? Because Supervisor's process management mechanism is used, it will fork out the child process as the parent process, that is, the ElasticSearch process. In view of the parent-child relationship, the maximum number of files allowed to open by the child process cannot exceed the threshold limit of the parent process, but the minfds command in Supervisor defaults to The maximum number of files allowed to open is set too small, causing the ElasticSearch process to fail. The reason for this failure was originally very simple, but I fell into the empirical fixed thinking, which is worth reflecting on.


When deploying applications under Linux, sometimes you will encounter the problem of Socket/File: Can't open so many files; this value will also affect the maximum number of concurrent servers. In fact, Linux has file handle restrictions, and Linux defaults to It is not very high, generally 1024, which is actually very easy to reach this number for production servers. The following is how to correct the default value of this system through the positive solution configuration.

View how

我们可以用ulimit -a来查看所有限制值
[root@centos5 ~]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
max nice                        (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 4096
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
max rt priority                 (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited||<

其中 "open files (-n) 1024 "是Linux操作系统对一个进程打开的文件句柄数量的限制

(也包含打开的SOCKET数量,可影响MySQL的并发连接数目)。

 

正确的做法,应该是修改/etc/security/limits.conf
里面有很详细的注释,比如
Hadoop  soft   nofile   32768
hadoop hard nofile 65536

hadoop soft   nproc   32768
hadoop hard nproc 65536
就可以将文件句柄限制统一改成软32768,硬65536。配置文件最前面的是指domain,设置为星号代表全局,另外你也可以针对不同的用户做出不同的限制。

注意:这个当中的硬限制是实际的限制,而软限制,是warnning限制,只会做出warning;其实ulimit命令本身就有分软硬设置,加-H就是硬,加-S就是软
默认显示的是软限制,如果运行ulimit命令修改的时候没有加上的话,就是两个参数一起改变。

RHE6及以后 nproc的修改在/etc/security/limits.d/90-nproc.conf中


补充说明:

soft nproc: 可打开的文件描述符的最大数(软限制)

hard nproc: 可打开的文件描述符的最大数(硬限制)

soft nofile:单个用户可用的最大进程数量(软限制)

hard nofile:单个用户可用的最大进程数量(硬限制)



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325745482&siteId=291194637