Troubleshooting Virtual Machine loss treatment

Symptom

Multiple virtual machines on the same physical machine simultaneous application service timeout, can not access other phenomena, ping virtual and physical machines severe packet loss.

 

cause of issue

On a physical machine in a virtual machine to establish a large number of connections, leading to a physical machine connection tracking table is filled with a lot ESTABLISHED connection record, then there packet loss.

 

Approach

Adjusting kernel parameters physical machine, to increase the maximum number of connected track records, reducing the timeout ESTABLISHED, TIME_WAIT, CLOSE_WAIT, FIN_WAIT like connection record.

 

Investigation process

Starting a business level investigation, analysis of whether there has been a virtual machine bandwidth, but the real-time traffic monitoring and investigation and found no problems, unsuccessful. When a problem occurs a second time, ping failure on the host virtual machine, the "ping: sendmsg: Operation not permitted" in error:

# ping 172.16.3.5
PING 172.16.3.5 (172.16.3.5) 56(84) bytes of data.
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted

Viewing the kernel log, being given as follows:

# dmesg | tail 
[64802472.971773] nf_conntrack: table full, dropping packet
[64802472.972242] nf_conntrack: table full, dropping packet
[64802472.973668] nf_conntrack: table full, dropping packet
[64802472.978622] nf_conntrack: table full, dropping packet
[64802472.988458] nf_conntrack: table full, dropping packet
[64802472.991945] nf_conntrack: table full, dropping packet
[64802472.998772] nf_conntrack: table full, dropping packet
[64802472.999542] nf_conntrack: table full, dropping packet
[64802473.001464] nf_conntrack: table full, dropping packet
[64802473.001768] nf_conntrack: table full, dropping packet

It found that due to the connection tracking table is full and packet loss. In fact, this problem once, was due to the presence of a large number of TIME_WAIT connection record, and now this problem is caused by the presence of a large number ESTABLISHED connection record:

# cat /proc/net/nf_conntrack | awk '/^.*tcp.*$/ {count[$6]++} END {for(state in count) print state, count[state]}'
LAST_ACK 36
SYN_RECV 52
CLOSE_WAIT 350
CLOSE 844
ESTABLISHED 246265
FIN_WAIT 4
SYN_SENT 993
TIME_WAIT 9996

Find the problem can be easily handled by adjusting the relevant kernel parameters to solve:

# sysctl -a | grep nf_conntrack
net.nf_conntrack_max = 2097152
net.netfilter.nf_conntrack_max = 2097152
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30 
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 3600

The above parameters into /etc/sysctl.conf, execute sysctl -p effect.

Another net.netfilter.nf_conntrack_buckets parameter specifies the size of the hash table. Sysctl modified by 4.8 or more kernels, kernels before 4.8 is read-only, can be modified by modifying / sys / module / nf_conntrack / parameters / hashsize.

 

Reference material

nf_conntrack: table full, dropping packet finale.

By using sysctl nf_conntrack_bucket

[Stepped pit summarized] nf_conntrack: table full, dropping packet

 

Guess you like

Origin www.cnblogs.com/ltxdzh/p/11288988.html