MPLS 专网中的疑难故障排查(一)

课题内容:

由于eBGP接口掩码配置错误,导致的路由拒绝接收

知识点:BGP第三方下一跳、BGP路由更新


根据现有网络技术学习及参考材料,BGP对等体之间建立对等关系,传递路由更新,并未有检查对等体掩码的行为。

本文将结合实际案例为大家分享一个MPLS专网中由于一个子网掩码配置错误导致的eBGP对等体拒绝接收路由更新的场景。


网络拓扑:

image

借着研究课题,复习一下MPLS 专网的基本部署练习;


扫描二维码关注公众号,回复: 3077763 查看本文章

部署 VRF:  R1、R4、R5做相同配置

R1(config)#ip vrf CTO
R1(config-vrf)#rd 1:1       
R1(config-vrf)#route-target 6:6


配置基本的IP地址

配置省略,这里仅仅展示 vrf接口的配置;

R1(config-if)#ip vrf forwarding CTO
R1(config-if)#ip address 10.1.1.13 255.255.255.252
R1(config-if)#no shutdown


R4(config)#interface e0/0
R4(config-if)#ip vrf forwarding CTO
R4(config-if)#ip address 10.1.1.17 255.255.255.0    //大家注意,这里我故意把掩码配置错误了 //
R4(config-if)#no shutdown


R5(config)#interface e0/0
R5(config-if)#ip vrf forwarding CTO
R5(config-if)#ip address 10.1.1.21 255.255.255.252
R5(config-if)#no shutdown

//切记,一定不要忘记检查和验证配置哦 //

基础的 ping 命令和 牛逼的 show ip interface brief 是行之有效的方法,当然,PE设备上的ping vrf CTO X.X.X.X 还是要注意的呢;


配置 MPLS Core 的IGP,配置省略

验证必不可少

R3#show ip ospf neighbor

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.1.255.5        0   FULL/  -        00:00:31    10.1.1.9        Ethernet0/3
10.1.255.4        0   FULL/  -        00:00:31    10.1.1.5        Ethernet0/2
10.1.255.1        0   FULL/  -        00:00:30    10.1.1.1        Ethernet0/0


R3#show ip route ospf | begin Gateway
Gateway of last resort is not set

      10.0.0.0/8 is variably subnetted, 10 subnets, 2 masks
O        10.1.255.1/32 [110/11] via 10.1.1.1, 00:01:03, Ethernet0/0
O        10.1.255.4/32 [110/11] via 10.1.1.5, 00:00:53, Ethernet0/2
O        10.1.255.5/32 [110/11] via 10.1.1.9, 00:00:42, Ethernet0/3


配置AS 65078的 iBGP

当然,仅仅为了实验,我们这里R7和R8就采用直连接口做BGP对等体配置;

R7#show run | s r b
router bgp 65078
  network 10.7.1.0 mask 255.255.255.0
  neighbor 10.1.1.26 remote-as 65078
  neighbor 10.1.1.26 next-hop-self


R8#show run | s router bgp
router bgp 65078
  network 10.8.1.0 mask 255.255.255.0
  neighbor 10.1.1.25 remote-as 65078
  neighbor 10.1.1.25 next-hop-self


验证:

R7#show ip bgp       
BGP table version is 3, local router ID is 10.7.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
               r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
               x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
  *>  10.7.1.0/24      0.0.0.0                  0         32768 i
  *>i 10.8.1.0/24      10.1.1.26                0    100      0 i


R8#show ip bgp
BGP table version is 3, local router ID is 10.8.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
               r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
               x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
  *>i 10.7.1.0/24      10.1.1.25                0    100      0 i
  *>  10.8.1.0/24      0.0.0.0                  0         32768 i


继续部署MPLS Core,完成内部BGP配置

R3#show run | s r b
router bgp 65001
  bgp log-neighbor-changes
  bgp listen range 10.1.255.0/24 peer-group iBGP
  no bgp default ipv4-unicast
  neighbor iBGP peer-group
  neighbor iBGP remote-as 65001
  neighbor iBGP update-source Loopback0
  !
  address-family ipv4
  exit-address-family
  !
  address-family ***v4
   neighbor iBGP activate
   neighbor iBGP send-community extended
   neighbor iBGP route-reflector-client
  exit-address-family


R1、R4、R5
router bgp 65001
  bgp log-neighbor-changes
  no bgp default ipv4-unicast
  neighbor 10.1.255.3 remote-as 65001
  neighbor 10.1.255.3 update-source Loopback0
  !
  address-family ipv4
  exit-address-family
  !
  address-family ***v4
   neighbor 10.1.255.3 activate
   neighbor 10.1.255.3 send-community extended
  exit-address-family


验证

R3#show bgp ***v4 unicast all summary
BGP router identifier 10.1.255.3, local AS number 65001
BGP table version is 1, main routing table version 1

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
*10.1.255.1     4        65001       5       5        1    0    0 00:01:19        0
*10.1.255.4     4        65001       2       2        1    0    0 00:00:31        0
*10.1.255.5     4        65001       2       2        1    0    0 00:00:28        0
* Dynamically created based on a listen range command
Dynamically created neighbors: 3, Subnet ranges: 1

BGP peergroup iBGP listen range group members:
   10.1.255.0/24

Total dynamically created neighbors: 3/(100 max), Subnet ranges: 1


配置MPLS标签协议: LDP

R3(config)#interface range e0/0,e0/2-3
R3(config-if-range)#mpls ip


R1(config)#interface e0/0
R1(config-if)#mpls ip


R4(config)#interface e0/2
R4(config-if)#mpls ip


R5(config)#interface e0/3
R5(config-if)#mpls ip


观察LDP邻居建立情况:

R3#
*Sep  6 08:47:50.047: %LDP-5-NBRCHG: LDP Neighbor 10.1.255.1:0 (1) is UP
R3#
*Sep  6 08:48:21.644: %LDP-5-NBRCHG: LDP Neighbor 10.1.255.4:0 (2) is UP
R3#
*Sep  6 08:48:39.094: %LDP-5-NBRCHG: LDP Neighbor 10.1.255.5:0 (3) is UP


配置 PE – CE 之间的eBGP

R6(config)#router bgp 65006
R6(config-router)#network 10.6.1.0 mask 255.255.255.0
R6(config-router)#neighbor 10.1.1.13 remote-as 65001


R7(config)#router bgp 65078
R7(config-router)#neighbor 10.1.1.17 remote-as 65001


R8(config)#router bgp 65078
R8(config-router)#neighbor 10.1.1.21 remote-as 65001


R1(config)#router bgp 65001
R1(config-router)#address-family ipv4 vrf CTO
R1(config-router-af)#neighbor 10.1.1.14 remote-as 65006

R1(config-router-af)#
*Sep  6 08:54:22.981: %BGP-5-ADJCHANGE: neighbor 10.1.1.14 *** vrf CTO Up   //R1和R6对等体建立成功 //


R4(config)#router bgp 65001
R4(config-router)#address-family ipv4 vrf CTO
R4(config-router-af)#neighbor 10.1.1.18 remote-as 65078
R4(config-router-af)#
*Sep  6 08:55:31.655: %BGP-5-ADJCHANGE: neighbor 10.1.1.18 *** vrf CTO Up   //R4和R7对等体建立成功 //


R5(config)#router bgp 65001
R5(config-router)#address-family ipv4 vrf CTO
R5(config-router-af)#neighbor 10.1.1.22 remote-as 65078
R5(config-router-af)#
*Sep  6 08:56:40.336: %BGP-5-ADJCHANGE: neighbor 10.1.1.22 *** vrf CTO Up  //R5和R8对等体建立成功 //


至此,一个基本的MPLS专网部署完毕。


现在进行验证:

R6#show ip route bgp | begin Gateway
Gateway of last resort is not set

      10.0.0.0/8 is variably subnetted, 6 subnets, 3 masks
B        10.7.1.0/24 [20/0] via 10.1.1.13, 00:02:54
B        10.8.1.0/24 [20/0] via 10.1.1.13, 00:02:54


R8#show ip route bgp | begin Gateway
Gateway of last resort is not set

      10.0.0.0/8 is variably subnetted, 8 subnets, 3 masks
B        10.6.1.0/24 [20/0] via 10.1.1.21, 00:03:54
B        10.7.1.0/24 [200/0] via 10.1.1.25, 00:22:09


R7#show ip route bgp | begin Gateway
Gateway of last resort is not set

      10.0.0.0/8 is variably subnetted, 8 subnets, 3 masks
B        10.6.1.0/24 [200/0] via 10.1.1.26, 00:04:26
B        10.8.1.0/24 [200/0] via 10.1.1.26, 00:22:53

// R7上关于65006的路由的下一跳去往了 R8,而不是去往 R4,显然这是出了问题的 //


R7#show ip bgp                     
BGP table version is 4, local router ID is 10.7.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
               r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
               x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
  *>i 10.6.1.0/24      10.1.1.26                0    100      0 65001 65006 i
  *>  10.7.1.0/24      0.0.0.0                  0         32768 i
  *>i 10.8.1.0/24      10.1.1.26                0    100      0 i

// R7并没有从 R4 学习到任何路由哦 //


R4#show bgp ***v4 unicast vrf CTO neighbors 10.1.1.18 advertised-routes
BGP table version is 5, local router ID is 10.1.255.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
               r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
               x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 1:1 (default for vrf CTO)
  *>i 10.6.1.0/24      10.1.255.1               0    100      0 65006 i

Total number of prefixes 1

// R4向R7通告了 10.6.1.0/24的 BGP前缀,看起来问题出在R7上的样子呢(事实证明我的想法是错误的) //


然而事实上我们并没有在R7上部署任何入站路由过滤策略

R7#show ip protocols | section bgp
Routing Protocol is "bgp 65078"
   Outgoing update filter list for all interfaces is not set
   Incoming update filter list for all interfaces is not set

   IGP synchronization is disabled
   Automatic route summarization is disabled
   Neighbor(s):
     Address          FiltIn FiltOut DistIn DistOut Weight RouteMap
     10.1.1.17                                           
     10.1.1.26                                           
   Maximum path: 1
   Routing Information Sources:
     Gateway         Distance      Last Update
     10.1.1.26            200      00:13:58
   Distance: external 20 internal 200 local 200


经过一番思索,最终将故障判断定位在更新报文中

为了让大家更清晰的看到造成故障的根本原因,我特意将报文抓取了出来;

同时在 R7 上开启debug,观察更新情况:

R7#debug ip bgp updates in

R7#clear ip bgp * soft in  // 在 R7上强制 R4 发送路由更新过来 //


image

从报文中可以清晰的看出,从R4更新给 R7的前缀中,下一跳属性被设置为了 10.1.1.6,而不是自身的 e0/0 接口的地址 10.1.1.17;


再来看下 R7 的debug log

image

log指出,来自 10.1.1.17(R4)的更新,下一跳属性为10.1.1.6,并不在本地子网中,也不在本地接口直连范围内,并被拒绝收取;


那么,为什么R4要做出如此荒谬的事情嘞?

这就不得不考虑我们在最开始提到的 第三方下一跳;


R4#show ip cef vrf CTO 10.6.1.0
10.6.1.0/24
   nexthop 10.1.1.6 Ethernet0/2 label 16 21

// 通过转发表观察,R4去往 10.6.1.0/24的下一跳为 10.1.1.6 ,即R3的E0/2接口;

而 R4的  e0/0 接口子网掩码为  24 位,根据第三方下一跳的自动优化机制,R4 认为 10.1.1.6 和 e0/0 的接口地址 10.1.1.17在同一子网啊,因此更新出去的前缀信息上携带的下一跳就是 10.1.1.6 啦。


那么,如何验证我们的想法究竟是否正确呢?

咱们在 R4 上针对 R7的 eBGP邻居做一个下一跳自我,强制修改下一跳为 10.1.1.17 ,并观察现象

R4

router bgp 65001

address-family ipv4 vrf CTO

  neighbor 10.1.1.18 next-hop-self


验证想法

R7#cle ip bgp * soft
R7#show ip bgp
BGP table version is 5, local router ID is 10.7.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
               r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
               x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
  *>  10.6.1.0/24      10.1.1.17                              0 65001 65006 i
  * i                  10.1.1.26                0    100      0 65001 65006 i
  *>  10.7.1.0/24      0.0.0.0                  0         32768 i
  *>i 10.8.1.0/24      10.1.1.26                0    100      0 i

// 呐,路由从R4学来啦 //


为什么说这种故障难以排查呢? 因为如果R4的 e0/0 接口在全局的话,接口IP地址是无法成功配置上的,正因为在vrf中,才会有这种情况的发生。


当然,最正经的解决方法,还是老老实实的把接口掩码修改为正确的。


谢谢大家!
我是乾颐堂CCIE导师,CCIE培训金×××讲师达叔。

本博客由乾颐堂达叔独家冠名写出,素材来自乾颐堂日常工作及达叔和他身边的CCIE们的故事。

猜你喜欢

转载自blog.51cto.com/dashu666/2171307