Have no ideas for OSPF fault location? just copy this

Hello everyone, my network worker friends.

It’s been a long time since I talked about OSPF technology. The public account has shared some basic and classic content one after another. Interesting science popularization, interview questions, and experimental operations are all involved.

According to the usual practice, I will give you a whole wave of high-quality previous content:

5 super classic experiments, Lao Yang will take you to advance OSPF efficiently  "

Don't open this article if you don't understand OSPF  "

" Graphic OSPF, it is enough to see these 70 pictures (1) "

" Graphic OSPF, it is enough to see these 70 pictures (2) "

Today I mainly want to share with you some dry goods related to OSPF failures.

There are many reasons for OSPF faults, and different problems will lead to different faults, but can you really troubleshoot?

How to do OSPF fault location, and code these 6 practical cases .

Today's article reading benefits: "OSPF Network Design Solution"

As the foundation of the network, understanding it is your first step in getting started. Private message me , note "plan" , and the first 30 private messages will get this OSPF classic book.

If you want to learn the Internet systematically from 0 to 1, you can also chat with me and tell me your learning intention. I will recommend the most suitable way for you to learn the Internet.

01 OSPF neighbor relationship cannot establish location

01 Confirm whether the configuration and underlying conditions can forward packets

  • Confirm that the configuration is correct.
  • Check whether the interfaces are all up.
  • Can the two devices be pinged? It is required to ping the directly connected interface with the source address.
  • Are the MTUs of devices at both ends the same?

02 Check whether the message is sent and received normally?

Use display ospf cumulative to view the number of received and sent packets:

03 Hidden mode open

After the stealth mode is enabled enableospf-lsa-dbg.

display ospfinterface <interface name> to view the number of packets received and sent by the interface (V3R3 and later versions).

If it is in init for a long time, basically no hello packet is sent or hello packet is not received.

If it is in the state of Exstart and Exchange for a long time, check whether the large ping packet can be pinged successfully?

DD packets generally fill the MTU, for example, 1500 can be filled to 1492.

04 debug layer by layer

If the above simple checks are OK, you need to debug to check layer by layer.

If one end is in the Init state and the other end does not display the state, debughello packets at both ends:

<Quidway>debugging ospf packet hello

If one end is in the Exstart state and the other end is in the Exchange state, debugdd packets at both ends:

<Quidway>debugging ospf packet dd

If one end is in the Loading state and the other end is in other states, debug request and update messages are sent at both ends.

<Quidway>debugging ospf packetrequest

<Quidway>debugging ospf packet update

Messages other than hello may be relatively long, it is recommended to use brief to view the header of the message

debug ip packet acl There are many IP packets, it is recommended to use acl to filter.

02 OSPF neighbor flapping location

01 Neighbor oscillation log, pay attention to the log of neighbor status decline

Find the log file, keywords: NBR_CHG_DOWN, NBR_CHG_E(V3R2), NBR_CHANGE_E(V3R3).

Example:

Aug 28 2010 10:27:32 RTA %%01OSPF/3/NBR_CHG_DOWN(l): Neighbor event:neighbor statechanged  to  Down. (ProcessId=1,NeighborAddress=11.11.11.2, NeighborEvent=KillNbr,  NeighborPreviousState=Full,  NeighborCurrentState=Down)

The neighbor is actively disconnected because the interface is down.

Aug 28 2010 10:31:29 RTA %%01OSPF/3/NBR_CHG_DOWN(l): Neighbor event:neighbor statechanged  to   Down. (ProcessId=1,NeighborAddress=11.11.11.2,  NeighborEvent=InactivityTimer,NeighborPreviousState=Full,NeighborCurrentState=Down)

Neighbor disconnected due to timeout.

Aug 28 2010 10:34:51 RTA %%01OSPF/6/NBR_CHANGE_E(l): Neighbor changes event: neighborstatus  changed. (ProcessId=1,NeighborAddress=11.11.11.2, NeighborEvent=1-Way,NeighborPreviousState=Full,  NeighborCurrentState=Init)

The peer end triggers reconstruction after disconnecting the neighbor, and sends a 1-way hello before receiving the local hello, causing the local end to trigger a 1-way event.

Aug 28 2010 10:38:52 RTA %%01OSPF/6/NBR_CHANGE_E(l): Neighbor changes event: neighborstatus changed. (Process ID=1, Neighbor address=11.11.11.2, Neighbor event=SeqNumberMismatch, Neighbor previous state=Full,Neighbor current state=ExStart)

Reconstruction is triggered after the peer end disconnects the neighbor, and a dd packet is sent after receiving the Hello from the local end, causing the SeqNumberMismatch event to be triggered on the local end.

02 The most common reason: timeout disconnection

In live network use, the most common cause of OSPF neighbor flapping is timeout disconnection.

That is to say, OSPF does not receive a Hello packet within the dead timer interval, which may occur as follows:

1. There is a packet loss phenomenon, which makes the OSPF hello message unable to be sent;

2. The CPU is high, so that routing tasks cannot be scheduled, and packets cannot be sent and received.

Therefore, in case of timeout disconnection, in addition to checking logs and diagnostic logs, you also need to check the underlying packet loss count.

In addition, on the live network, users often ask why there are only logs of neighbors DOWN and no logs of neighbors UP?

First of all, it is clear that the neighbors DOWN and UP are both recorded logs, but in general inspections or user inspections, they are checked by displaylogbuffer.

Aug 28 2010 10:31:29 RTA %%01OSPF/3/NBR_CHG_DOWN(l):Neighbor event:neighbor state changed to Down. (ProcessId=1, NeighborAddress=11.11.11.2, NeighborEvent=InactivityTimer,NeighborPreviousState=Full, NeighborCurrentState=Down)

The log level of OSPF neighbor Down is higher than that of Error, which is recorded in the logbuffer.

Aug 28 2010 10:33:41 RTA %%01OSPF/6/NBR_CHANGE_E(l):Neighbor changes event: neighbor status changed. (ProcessId=1,NeighborAddress=11.11.11.2, NeighborEvent=HelloReceived,NeighborPreviousState=Down, NeighborCurrentState=Init) 

The log level of OSPF neighbor state change is Info level, which is recorded in the log, but not in the logbuffer.

The logs in logbuffer are not all logs. The original intention of logbuffer design is to make it easy for users to view the information that users care about.

By default, if it is not configured, the log information of warning (4) level and above is recorded in the logbuffer.

You can use this command to check the setting of logbuffer.

<Quidway>display channel

…
channel number:4, channel name:logbuffer
MODU_ID NAME ENABLE LOG_LEVEL ENABLE TRAP_LEVEL ENABLE DEBUG_LEVEL 
ffff0000 default  Y      warning      N      debugging     N     debugging   

03 OSPF Router ID Conflict Fault Location

OSPFRouter ID configuration conflicts often occur on the live network.

Since the Router ID is an important basis for identifying OSPF devices, once conflicts occur, OSPF LSAs will be frequently aged and generated, resulting in network instability.

01 Method for judging router id conflicts in an area

The following topology:

RTA, RTB, RTC, and RTD establish OSPF neighbor relationships in area 0. The router ids of RTA and RTC are both 1.1.1.1, and a conflict occurs.

Judgment method:

1. Enter display ospf lsdb every second on any router to check whether the Age field of the Router LSA changes frequently, and check whether the Sequence field increases rapidly.

In the preceding example, the Router LSA Age with router id 1.1.1.1 changes frequently, and the Sequence also increases rapidly.

Enter display ospf routing on the RTB every second, and you can see that there are routes flapping. If routes flap frequently in the area and there is no neighbor flapping, it can be judged that there is a RouterID conflict.

02 Inter-area RouterID Conflict Judgment Method

There are the following topologies:

As shown in the preceding figure, the router IDs of RTA and RTC are conflicting, but RTA and RTC are not in the same area.

Judgment method:

Enter display ospf lsdb every second on any router.

If it is found that a large number of AS External LSAs are refreshed frequently and they all come from a certain router, it can be preliminarily inferred that there is a conflict between router IDs in different areas.

Generally speaking, in the live network, the phenomenon of RouterID configuration conflicts occurs from time to time.

If you master some commonly used judgment methods, you can easily find the cause of the problem, and then check one by one to find out the conflicting RouterID.

Solution:

After changing the conflicting RouterID, reset ospf process can correct this configuration error. (It should be noted that the reset ospf process will cause the neighbor to be re-established, and the service will be interrupted).

Example:

04 OSPF interface IP address conflict fault location

There are the following topologies:

01 DR and non-DR conflict

The interface with the IP address 112.1.1.2 on the RTA is in the DR state, and the interface with the IP address 112.1.1.2 on the RTC is not in the DR state. The IP addresses of the two interfaces conflict.

Judgment method:

Enter display ospf lsdb every second on the RTC, and find that the Age of the Network LSA on the conflicting network segment is always 3600 or occasionally there is no such LSA, and the Sequence field increases rapidly.

Enter displayospf lsdb every second on other routers, and find that the Age of the Network LSA of the conflicting network segment is constantly switching between 3600 and other smaller values, and the Sequence field increases rapidly.

02 Two DR IP addresses conflict

The interface with the IP address 112.1.1.2 on the RTA is in DR state, and the interface with the IP address 112.1.1.2 on the RTC is also in the DR state. The IP addresses of the two interfaces conflict.

Judgment method:

Enter displayospf lsdb every second on either router.

You will find that there are two Network LSAs with LinkState Id 112.1.1.2, and the Age field of these two LSAs is always small, and the Sequence field increases relatively quickly.

Generally speaking, in the live network, IP address configuration conflicts occur from time to time.

If you master some common judgment methods , you can easily find the cause of the problem, and then check one by one.

Find out the conflicting IP address and change the conflicting IP address to correct the configuration error.

03 Judgment method for devices with IP address conflicts in the region

1. When DR and non-DR conflict:

First, the conflicting IP address can be known according to the LinkStateID of this oscillating Network LSA (see above for the specific judgment method).

Then find one of the devices according to AdvRouter and locate which interface it is.

Note that the conflicting device can only be found through network IP address planning, and it is difficult to find the conflicting device through the information carried by OSPF itself.

As in the above example, you can first determine that the conflicting IP address is 112.1.1.2.

The router ID of one of the conflicting devices is 1.1.1.1, and the other conflicting device (3.3.3.3) cannot be found through the information carried by OSPF itself.

2. When DR conflicts with DR:

According to the LinkState Id and AdvRouter of the two Network LSAs with the same LinkState Id (see above for the specific judgment method), it can be judged which interface IP address of which device is in conflict.

As in the above example, it is easy to locate the interfaces with conflicting IP addresses on the two devices with RouterIds 3.3.3.3 and 1.1.1.1.

Then it is easy to find the corresponding interface according to the LinkState Id (112.1.1.2---conflicting IP address).

05 OSPF link interface frequent flapping fault location

01 interface oscillation

In practical applications, high CPU due to interface flapping often occurs. Interface flapping causes frequent generation of Router LSAs.

According to the protocol RFC2328, the change of Router LSA will trigger complete routing calculation, and frequent routing calculation will cause the CPU to always be relatively high.

The investigation of such problems still needs to start from the LSA.

The following topologies exist:

Router-A and Router-B establish an OSPF neighbor relationship.

An interface 2.2.2.2/24 on Router-B is enabled in OSPF and goes up/down frequently. Route calculations are performed frequently on Router-A and Router-B, resulting in a high CPU.

Judgment method:

1. Enter display ospf lsdb every second on any device.

Check whether there is a Router LSA whose Age is always small and whose Sequence number increases rapidly:

2. Enter display ospf brief on any device.

Check whether the number of complete routing calculations increases rapidly:

If the above two conditions are met, you can quickly find the interface that frequently oscillates by combining the logs.

02 LSA Oscillation

LSA flapping leads to frequent calculation of OSPF routes, which makes it difficult to troubleshoot scenarios with high CPU. It is necessary to find the distribution router of the LSA, and then control these flapping LSAs from the source. The troubleshooting is as follows:

1. Enter the display iprouting-table verbose command.

Find the route with frequent flapping:

As shown in the above table, if the route flaps frequently, its Age value is very small, basically at the second level.

2. Enter the displayospf lsdb command.

Check the source of the LSA.

As shown in the above table, it is found that the source of the LSA is the device whose router id is 2.2.2.2. You need to log in to this device to check the specific reason for the frequent generation of the LSA.

In general, because of the high CPU caused by OSPF routing calculation, the troubleshooting method is mainly to check the changes of LSAs, find the cause of the oscillation according to the changes of LSAs, and finally solve the problem.

06 OSPF cannot calculate routing fault location

01 Fault overview

Between PEs and CEs, between OSPF and BGP, routes are learned and advertised from each other, which may cause routing loops.

The OSPF VPN feature provides a solution specifically for this situation.

1. In the VPN scenario, the PE introduces the private network BGP route and advertises it to the CE through OSPF.

As shown in the figure, PE1 and PE2 advertise the remote route 10.1.1.1/32 sent by BGP to CE1 through OSPF.

CE1 generates a route with the destination address 10.1.1.1/32, and the next hop points to PE1 and PE2 respectively.

2. PE1 receives the route advertised by PE2 and generates an OSPF route of 10.1.1.1/32 with the next hop pointing to CE1.

3. Similarly, PE2 receives the route advertised by PE1, and generates an OSPF route of 10.1.1.1/32, with the next hop pointing to CE1.

4. As described in the above process, the route to the destination address 10.1.1.1/32, CE1-->PE1, PE2-->CE1, a loop is generated.

5. Because the priority of OSPF routes is higher than that of BGP routes, the BGP routes on PE1 and PE2 will be replaced by OSPF routes.

Without BGP routes, PE1 and PE2 will not advertise routes to CE1. At the same time, PE1 and PE2 cannot learn the routes advertised by each other, and the OSPF routes generated just now are also revoked.

At this time, there is no OSPF route, and the BGP route is optimized again. Start repeating the cycle just now. This can cause route flapping.

For this, OSPF uses DN-Bit and Router-tag to prevent loops:

02 DN-Bit suppression

As shown in the preceding figure, both PE1 and MCE are bound to a VPN, which is a private network process. BGP routes are imported on PE1 and LSAs are generated, but routes are not calculated on the MCE.

The judgment method is as follows:

1. Enter display ospf lsdb ase<Link id> on the MCE to check whether the LSA has a DN-Bit:

2. Enter displaycurrent-configuration configuration ospf on the MCE to check whether the OSPF process is bound to the VPN:

If a VPN is bound, due to the suppression of the DN-Bit as mentioned above, the route cannot be calculated even if there is an LSA.

Three, the solution

If the device does not assume the PE role, enter vpn-instance-capabilitysimple on the MCE to cancel the loop check.

03 Route Tag Consistent

Also as shown in the figure above, both PE1 and MCE are bound to VPN, which belongs to the private network process.

BGP routes are imported on PE1 and LSAs are generated, but no routes are calculated on the MCE.

The judgment method is as follows:

1. Enter displayospf lsdb ase <Link id> on the MCE to view the Route Tag value of the LSA:

2. Enter the displayospf brief command on the MCE to check whether the OSPF local Tag value is consistent with the received LSATag value:

If the RouteTag of the LSA is consistent with the local Tag, the LSA cannot calculate the route due to loop prevention.

Three, the solution

If the device does not assume the PE role, enter vpn-instance-capabilitysimple on the MCE to cancel the loop check.

In addition, changing the local Route Tag value to be inconsistent with the LSA Tag value can also solve this problem.

Finishing: Lao Yang 丨 10-year senior network engineer, more network workers to improve dry goods, please pay attention to the official account: Network Engineer Club

Guess you like

Origin blog.csdn.net/SPOTO2021/article/details/131433242