The ping command keeps testing the causes and solutions for network packet loss.

The so-called network packet loss is the phenomenon that when we use ping to query the destination station, data packets are lost in the channel due to various reasons. Ping uses ICMP echo request and echo reply messages. An ICMP echo request message is a query sent by a host or router to a specific destination host. The machine that receives this message must send an ICMP echo reply message to the source host. This query message is used to test whether the destination station is reachable and to understand its status.

Many times, we may encounter intermittent network connection failures. Faced with such network failures, many network administrators will use the Ping command to test network connectivity. The test results show that the network transmission line at this time The phenomenon of data packet loss is very serious, so what factors cause the serious phenomenon of data packet loss? Is it unstable connection of the connection line? Is it a network virus? Or other potential factors?

Reason 1: Physical line failure

The network administrator found that the WAN line was connected and disconnected intermittently. When this happened, there might be a line failure or it might be a user-related problem. In order to distinguish whether it is a line fault, you can do the following tests.

If the WAN line is implemented through a router, you can log in to the router and send a large number of data packets to the WAN interface of the opposite router for testing by extending ping. If the line is implemented through a Layer 3 switch, you can connect a computer to both ends of the line, set the IP address to the WAN interface address of the local Layer 3 routing switch, and use the "ping peer computer address – t" command. test.

If no packet loss occurs in the above test, it means that the line provided by the line operator is good, and the cause of the fault lies with the user himself, and further investigation is required.

If packet loss occurs in the above test, it means that the fault is caused by the line provided by the line provider, and you need to contact the line provider to solve the problem as soon as possible.

There are many packet loss phenomena caused by physical lines, such as fiber connection problems, jumpers not aligned with device interfaces, problems with twisted pairs and RJ-45 connectors, etc. In addition, data packet errors caused by random noise or burst noise on communication lines, interference from radio frequency signals, and signal attenuation may all cause data packet loss. We can check the quality of the line with the help of a network tester.

Reason 2: Equipment failure

Equipment failure mainly refers to equipment hardware failure and does not include packet loss caused by improper software configuration. For example, the network card is broken, a physical fault occurs on a port of the switch, the electrical port of the optical fiber transceiver does not match the network device interface, or the duplex mode of the device interface at both ends does not match.

I have seen an example of packet loss caused by the failure of the optical fiber module of a switch port. The switch crashed after communicating for a period of time, that is, it could not communicate, and returned to normal after restarting. After a period of observation, it was discovered that there was a problem with a certain fiber optic module. We replaced it with a new module and everything was normal.

The reason is that the switch will perform CRC error detection and length verification on all received data packets, discard the packets found to have errors, and forward the correct packets. However, some erroneous packets during this process have no errors detected in CRC error detection and length verification. Such packets will not be sent out or discarded during the forwarding process. They will accumulate in the dynamic In the cache, it can never be sent out. When the cache is full, the switch will crash. The end result is that the packet cannot reach the destination host.

Reason 3: Network congestion

There are many reasons why the packet loss rate increases due to network congestion. The main reason is that router resources are heavily occupied.

If you find that the network speed is slow and the packet loss rate is increasing, you should show process cpu and show process mem. Generally, you will find that the IP input process takes up too many resources. Next, you can check whether fast switching is disabled on the high-traffic egress port. If so, you need to re-enable it.

Check again to see if Fast switching on the same interface is disabled. For example, when an interface is equipped with multiple network segments and the traffic between these network segments is heavy, the router works in process-switches mode. In this case, you need to execute the command on the interface. "enable ip route-cache same-interface."

Next, use the show interfaces and show interfaces switching commands to identify the ports through which a large number of packets enter and exit. Once the incoming port is confirmed, open IP accounting on the outgoing interface to see its characteristics. If it is an attack, the source address will continue to change but the destination address will not change. You can use the command "access list" to temporarily solve such problems (it is best to be close to the attack configuration on the source's device), the ultimate solution is to stop the attack source.

There are many situations that cause network congestion encountered in applications, such as a large amount of UDP traffic. You can use the steps to solve the spoof attack to solve this problem. A large number of multicast streams and broadcast packets traverse the router. The router is configured with IP NAT and there are many DNS packets traversing the router. After the above situation causes network congestion, the communicating parties adopt flow control and discard packets that cannot be transmitted.

How to determine the presence of network packet loss

Usually we use the ping command to test whether the network is losing packets.

As you can see in the picture above, when the local machine performs a long-term PING to the non-existent address xxxxxx, all the ICMP packets sent are lost, and the loss rate reaches 100%. That is, there is packet loss on the path from the local machine to the actually unreachable address xxxxxxip address.

1. Solution

For network equipment failures: Through the segmented capture method, use the Kelai network analysis system to capture packets at both ends of key equipment in the network to determine whether the device has lost packets, thereby accurately locating the packet-losing device.

For network congestion: Configure mirroring on the core switch and use the Kelai network analysis system to capture packets.

Analyze the traffic occupancy of key links (usually egress links) to see whether the network utilization is too high, whether there are too many data packets per second, whether the data packet size distribution is reasonable, whether the TCP session is normal, etc.

Of course, the most fundamental method is to limit user traffic, which is to control traffic for each user online, such as prohibiting access to video websites and other websites that have nothing to do with work content. At the same time, precise traffic restrictions can be made for each user to prevent It excessively consumes limited network bandwidth.

Quality of Service (QOS) can also be implemented for some traffic. For example, the priority of traffic that has a greater relationship with work, such as web page visits and email traffic, can be increased, which can alleviate network congestion to a certain extent and ensure high efficiency. Priority services can be forwarded first. (Method that treats the symptoms rather than the root cause)

2. In addition, regarding the problem of packet loss when pinging IP:

Usually there are the following reasons:

Because the IIS of the cloud server or vps cloud host is running illegally or does not have an independent process pool, find this site and give it an independent process pool.

If a site with an empty host header is bundled with the server, this problem may easily occur. It is best to delete the site with an empty host header, or make the process pool of this site independent to solve the problem.

Due to the problem of too low bandwidth and traffic restrictions on servers, some IDC service providers in computer rooms generally restrict the servers hosted by users very harshly in order to obtain more hosting users, resulting in very few outflows and many requests. , causing packet loss problems.

Due to a problem with the switching port of the switch: First, use the Ping command to test and find that data packets are lost from time to time. It is initially believed to be the cause of the physical layer. After redoing the RJ45 end of the network cable, the fault persisted, and even replacing the network cable did not work. I suspect it is a problem with the network card interface or switch port. After checking, the network card driver is correct and there is no abnormality in the network card interface. Check the switch port again and find that the working indicator light of the switch port connected to the server is flashing between green and yellow, which indicates that the port is not working properly. Use HyperTerminal to log in to the switch and check the parameters of this port. It is found that this port is working in 100Mbyte/s full-duplex mode. Go back to the server and check the local connection status. The network card is working in 10Mbyte/s full-duplex mode. The transmission rate and duplex mode of the switch port and the network card are inconsistent. Change the working mode of the network card to 100Mbyte/s full-duplex mode and then test. Everything is normal and the fault is resolved.

Due to a large number of packet losses caused by DDOS or savage attacks, there is nothing to say at this time. Hurry up and add a hardware firewall.

3. In short, the general troubleshooting methods are:

Is the bandwidth full?

Try changing the switch port

Try changing the network cable

Are the network card and motherboard drivers not installed (usually this is not a problem)

The setting on the switch is 100M or 10M. It is the same as the setting on the machine.

Guess you like

Origin blog.csdn.net/u011055144/article/details/128748918