Linux: nework: switch: An example of delay in external network packets after the business volume increases; SFP

I encountered a case before that the customer's network deployment was ready, and they just needed to increase the business volume to verify the capacity of the entire network.
However, when the user switches over the service, network delays will occur on each network element during times of high business volume, causing the service to be unable to continue. I tried it back and forth several times but always encountered problems under high business volume.

The customer's technical manager was also a ruthless person, and he said harshly: If the problem is not solved, we will replace the product with another manufacturer's product, and all the network elements will be replaced! This is outright intimidation :). Everyone said that Alexander had brought in their own technical backbones for analysis. In the end, everyone agreed that there was a problem with the switch. It was because the performance or other problems of the switch caused the number of network packets to increase, and it could not handle it, which eventually led to network delays. And they all think that there are no problems with the product blocks they are responsible for. This requires the team responsible for the switch to come out to analyze the problem.

Although I don’t know how the switch team finally determined what the final problem was. But the final solution to the problem is to replace an SFP on the switch, and the problem is solved. It is visually inspected that the port that the SFP is responsible for is an external service port. Otherwise, it will not cause problems on multiple network elements. I can’t help but sigh: The seemingly simple SFP replacement requires a lot of manpower and material resources, and the risks are really huge.

SFP has general problems: poor quality, poor contact, synchronization problems, internal design defects...; and the problems can be judged through some counts and warnings.
The link below says when (not) it is necessary to replace sfp, which is a useful reference.
https://erwinvanlonden.net/2018/04/when-to-replace-an-sfp/
https://linkompc.com/item/why-sfp-transceivers-stop-working-and-their-possible-troubleshooting- guide/16155/

Guess you like

Origin blog.csdn.net/qq_36428903/article/details/132844271