hung_task_timeout_secs导致的负载暴增

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/liaynling/article/details/83579486

问题描述:

Druid服务器的负载飙升到400多,导致无法提供正常的服务,对类似ps、kill等命令无响应,查系统日志/var/log/message发现如下信息:

Oct 31 05:08:56 server002 kernel: INFO: task falcon-agent:4973 blocked for more than 120 seconds.

Oct 31 05:08:56 server002 kernel:      Not tainted 2.6.32-754.3.5.el6.x86_64 #1

Oct 31 05:08:56 server002 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

查询资料解释为:

后台对进行的任务由于120s超时而挂起
linux会设置40%的可用内存用来做系统cache,当flush数据时这40%内存中的数据由于和IO同步问题导致超时(120s),所将40%减小到10%,避免超时。

This is a know bug. By default Linux uses up to 40% of the available memory for file system caching.
After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous.
For flushing out this data to disk this there is a time limit of 120 seconds by default.
In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds.
This especially happens on systems with a lof of memory.

The problem is solved in later kernels and there is not “fix” from Oracle.
I fixed this by lowering the mark for flushing the cache from 40% to 10% by setting “vm.dirty_ratio=10″ in /etc/sysctl.conf.
This setting does not influence overall database performance since you hopefully use Direct IO and bypass the file system cache completely.

系统内核当前设置为:

# sysctl -a|grep dirty

vm.dirty_background_ratio = 10

vm.dirty_ratio = 20

解决方法:

调整内核参数:

  sysctl -w vm.dirty_ratio=10

  sysctl -w vm.dirty_background_ratio=5 

  sysctl -p

重启后继续生效,需添加至内核文件:

  vi /etc/sysctl.conf

  vm.dirty_background_ratio = 5

  vm.dirty_ratio = 10

内核参数解释:

vm.dirty_background_ratio:这个参数指定了当文件系统缓存脏页数量达到系统内存百分之多少时(如5%)就会触发pdflush/flush/kdmflush等后台回写进程运行,将一定缓存的脏页异步地刷入外存;

vm.dirty_ratio:而这个参数则指定了当文件系统缓存脏页数量达到系统内存百分之多少时(如10%),系统不得不开始处理缓存脏页(因为此时脏页数量已经比较多,为了避免数据丢失需要将一定脏页刷入外存);在此过程中很多应用进程可能会因为系统转而处理文件IO而阻塞。

一般情况下,dirty_ratio的触发条件不会达到,因为每次会先达到vm.dirty_background_ratio的条件,然后触发flush进程进行异步的回写操作,但是这一过程中应用进程仍然可以进行写操作,如果应用进程写入的量大于flush进程刷出的量,就会达到vm.dirty_ratio这个参数所设定的坎,此时操作系统会转入同步地处理脏页的过程,阻塞应用进程。

猜你喜欢

转载自blog.csdn.net/liaynling/article/details/83579486
今日推荐