Symptom
Multiple virtual machines on the same physical machine simultaneous application service timeout, can not access other phenomena, ping virtual and physical machines severe packet loss.
cause of issue
On a physical machine in a virtual machine to establish a large number of connections, leading to a physical machine connection tracking table is filled with a lot ESTABLISHED connection record, then there packet loss.
Approach
Adjusting kernel parameters physical machine, to increase the maximum number of connected track records, reducing the timeout ESTABLISHED, TIME_WAIT, CLOSE_WAIT, FIN_WAIT like connection record.
Investigation process
Starting a business level investigation, analysis of whether there has been a virtual machine bandwidth, but the real-time traffic monitoring and investigation and found no problems, unsuccessful. When a problem occurs a second time, ping failure on the host virtual machine, the "ping: sendmsg: Operation not permitted" in error:
# ping 172.16.3.5 PING 172.16.3.5 (172.16.3.5) 56(84) bytes of data. ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted ping: sendmsg: Operation not permitted
Viewing the kernel log, being given as follows:
# dmesg | tail [64802472.971773] nf_conntrack: table full, dropping packet [64802472.972242] nf_conntrack: table full, dropping packet [64802472.973668] nf_conntrack: table full, dropping packet [64802472.978622] nf_conntrack: table full, dropping packet [64802472.988458] nf_conntrack: table full, dropping packet [64802472.991945] nf_conntrack: table full, dropping packet [64802472.998772] nf_conntrack: table full, dropping packet [64802472.999542] nf_conntrack: table full, dropping packet [64802473.001464] nf_conntrack: table full, dropping packet [64802473.001768] nf_conntrack: table full, dropping packet
It found that due to the connection tracking table is full and packet loss. In fact, this problem once, was due to the presence of a large number of TIME_WAIT connection record, and now this problem is caused by the presence of a large number ESTABLISHED connection record:
# cat /proc/net/nf_conntrack | awk '/^.*tcp.*$/ {count[$6]++} END {for(state in count) print state, count[state]}' LAST_ACK 36 SYN_RECV 52 CLOSE_WAIT 350 CLOSE 844 ESTABLISHED 246265 FIN_WAIT 4 SYN_SENT 993 TIME_WAIT 9996
Find the problem can be easily handled by adjusting the relevant kernel parameters to solve:
# sysctl -a | grep nf_conntrack net.nf_conntrack_max = 2097152 net.netfilter.nf_conntrack_max = 2097152 net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 30 net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30 net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60 net.netfilter.nf_conntrack_tcp_timeout_established = 3600
The above parameters into /etc/sysctl.conf, execute sysctl -p effect.
Another net.netfilter.nf_conntrack_buckets parameter specifies the size of the hash table. Sysctl modified by 4.8 or more kernels, kernels before 4.8 is read-only, can be modified by modifying / sys / module / nf_conntrack / parameters / hashsize.
Reference material
nf_conntrack: table full, dropping packet finale.
By using sysctl nf_conntrack_bucket
[Stepped pit summarized] nf_conntrack: table full, dropping packet