Mellanox IB switch SM HA

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/Dream_ya/article/details/101196641

Refer to the official website link: https: //community.mellanox.com/s/article/understanding-subnet-manager-sm-high-availability-ha-on-mellanox-infiniband-switches

一、Mellanox SM HA Solution (Mellanox InfiniBand Switches)


  • When enabling SM HA (configuration synchronization) on Mellanox IB switches, the SM database is synchronized with all the switches enabled with SM.
  • The synchronization is done out-of-band using an Ethernet management network. All switches participating in the SM HA should be connected to the same management subnet (same network) without the need to go through a router. This is because the switches send multicast control frames that do not cross routers normally.
  • All the switches that participate in the Mellanox SM HA are joined to the InfiniBand subnet ID. Once joined, the synchronized SMs are launched. One of the nodes is elected as SM Master and the others are Slaves.
  • The SM HA allows the systems’ manager to enter and modify all InfiniBand SM configuration of the different subnet managers from a single location using a Virtual IP (VIP). All subnet managers can be controlled, started, or stopped from this VIP address. The user is expected to use the VIP address for SM configuration. Trying to configure SM parameters on a master or slave IP will be disabled.

Second, the experimental environment

IB Switch IP
SF6036-01 172.16.0.251
SF6036-02 172.16.0.252

Third, the configuration


1, configure the cluster VIP

SF6036-01 [standalone: master] > enable
SF6036-01 [standalone: master] # config terminal
SF6036-01 [standalone: master] (config) # ib ha cluster ip 172.16.0.253 255.255.240.0                                                                                                                              
SF6036-01 [cluster: master] (config) #

2, to add a second switch cluster

SF6036-02 [standalone: master] (config) # ib ha cluster
SF6036-02 [cluster: standby] (config) #

3, open clusters

SF6036-01 [cluster: master] (config) # ib smnode SF6036-01 enable
SF6036-01 [cluster: master] (config) # ib smnode SF6036-02 enable

4, set the priority (0-15)

SF6036-01 [cluster: master] (config) # ib smnode SF6036-01 sm-priority 1
SF6036-01 [cluster: master] (config) # ib smnode SF6036-02 sm-priority 2

Fourth, check the cluster


Can be tested, cut off power one IB switch, Master will shift, and will not affect the business operations

1, see the IB availability status

SF6036-01 [cluster: master] (config) # show ib ha

Global HA state
==================
IB Subnet HA name: cluster
HA IP address:     172.16.0.253/20
Active HA nodes:   2

HA node local information
  Name:         SF6036-01 (active)  <--- (local node)
  SM-HA state:  master
  IP:           172.16.0.251
  Virtual switch membership:    infiniband-default

HA node local information
  Name:         SF6036-02 (active)
  SM-HA state:  standby
  IP:           172.16.0.252
  Virtual switch membership:    infiniband-default

SF6036-01 [cluster: master] (config) # show ib ha brief

Global HA state
==================
IB Subnet HA name: cluster
HA IP address:     172.16.0.253/20
Active HA nodes:   2

 ID                   SM-HA state   IP              Virtual switch membership
--------------------------------------------------------------------------------
*SF6036-01            master        172.16.0.251    infiniband-default
 SF6036-02            standby       172.16.0.252    infiniband-default

2, view the status of IB SM

SF6036-01 [cluster: master] (config) # show ib smnodes

HA state of switch infiniband-default
========================================
IB Subnet HA name: cluster
HA IP address:     172.16.0.253/20
Active HA nodes:   2

HA node local information
  Name:         SF6036-01 (active)  <--- (local node)
  SM-HA state:  master
  SM Licensed:  yes
  SM Running:   running
  SM Enabled:   enabled - master
  SM Priority:  1
  IP:           172.16.0.251

HA node local information
  Name:         SF6036-02 (active)
  SM-HA state:  standby
  SM Licensed:  yes
  SM Running:   running
  SM Enabled:   enabled
  SM Priority:  2
  IP:           172.16.0.252

3, see the connection status

At this point we can connect through 172.16.0.253 (VIP)! ! !

Here Insert Picture Description

Guess you like

Origin blog.csdn.net/Dream_ya/article/details/101196641