Common cluster (Cluster) software and technology analysis

Clustering is to provide resources to customers through a group of servers as a whole through software. These individual servers are the nodes of the cluster. When the node that provides resources to the outside world fails , the remaining nodes in the cluster can take over the resources and continue to provide resources to customers .

The core of cluster technology is resource access control . Since all nodes in the cluster can access the resources shared by the cluster , problems may arise when multiple nodes operate the same resource at the same time. For example, if node A wants to write data to the shared storage, if node B also writes to the storage in the same location, it will cause data inconsistency.

In general, according to the development of cluster resource access control technology, we can divide clusters into no reservation mode , SCSI-2 reservation mode and SCSI-3 reservation mode cluster . Different modes of clusters have different requirements for storage systems.

Initially, the cluster mode is no reservation . In this mode, the most common control method is to control the status of the volume group. In this case, the process of accessing a LUN by a cluster node is as follows.

  • 1. Create volume groups and logical volumes for LUNs on cluster nodes.

  • 2. The cluster node starts the cluster service.

  • 3. The cluster determines which node is the master node and which node is the standby node.

  • 4. The cluster activates the volume group and mounts the logical volume on the master node.

  • 5. The cluster deactivates the volume group on the standby node to ensure that the logical volume cannot be accessed.

In this working mode, there are no additional requirements for the storage system, only that the mapped LUN can create volume groups and logical volumes on the operating system and read and write normally . The cluster controls each node's access to the LUN by controlling the state of the volume group on the node.

This method is the easiest to implement, but there is a very serious defect that the heartbeat cannot have problems . Once a split-brain occurs ( heartbeat communication between cluster nodes is disconnected ) and the cluster nodes lose contact with each other, each node will mistakenly think that the other party is faulty, causing nodes to preempt resources.

In addition, in a shared storage environment, multiple front-end hosts may access the same storage device at the same time. If multiple hosts write to a LUN at the same time, it is conceivable that this LUN will not Know which data is written first and which data is written later . In order to prevent data corruption caused by this situation, there is the concept of SCSI reservation . Data read and write operations are performed through the SCSI Reservation mechanism. Currently, most disks and arrays support the SCSI reservvation command . If a host transmits a SCSI Reservation command to the disk array, the disk array is locked for other hosts.

If another host sends read and write requests to the locked disk, it will receive a Reservation Conflict error message . If the host holding the SCSI reservation crashes, or another host sends the Break Reservation or Reset Target command to the disk array to release the SCSI lock . Then, before the second host sends an I/O request, it needs to resend the SCSI Reservation command to the disk array, so that subsequent IO operations can be performed. There are two types of SCSI reservations, SCSI-2 Reservation and SCSI-3 Reservation . However , only one type of SCSI reservation can exist on a LUN .

为解决脑裂后集群节点抢占资源的问题,引入了SCSI-2预留模式的集群。在这种情况下,集群节点访问LUN的过程如下。

  • 1. 集群节点向需要访问的LUN发起预留操作。

  • 2. 预留操作成功后,则节点获得LUN操作权限;如果预留失败,提示预留冲突,会继续尝试,直到预留成功。

  • 3. 节点对LUN操作完毕后,执行释放操作,其他节点可以预留。

通过使用这种预留方式,可以保证任意时刻均只有一个节点能访问共享资源,但是它要求存储系统支持SCSI-2预留命令集,同时也有下面的一些缺陷。

  • 1)预留基于路径。集群节点存在多路径的情况下,当前路径故障后,预留无法取消,无法对LUN继续访问,多路径相当于没有生效。

  • 2)谁预留谁释放。如果LUN已经被预留,其他节点将无法预留,除非对LUN进行重置操作。但是重置容易引发数据不一致。因为重置操作是不会通知原先预留主机的。

由于SCSI-2 Reservation只允许设备被发出SCSI锁的Initiator访问,也就是主机的HBA。比如主机1上的HBA1对访问的LUN加上SCSI-2锁,此时即使主机1的HBA2也无法访问这个LUN。所以SCSI-2 Reservation也被称为Single Path Reservation

为解决SCSI-2预留的问题,最终引入了SCSI-3持续预留的集群技术。在该模式下,集群节点在访问LUN之前,首先向LUN注册(Registration)一个预留密钥(Persistent Reservation key),注册成功后集群节点可以尝试进行持续预留(Persistent Reserve),持续预留成功后就可以获得LUN操作权限

不同的主机对应的PR Key也不同,所以一般SCSI-3 Reservation通常被应用在多路径的共享环境下面。这里SCSI-3 Reservation也称之为Persistent Reservation。与SCSI-2不同,SCSI-3释放操作根据预留密钥,不同集群节点可以使用相同密钥或是不同密钥进行预留,具体可以结合持续预留类型决定。集群节点可以通过抢占来获取已被持续预留的LUN访问权限。SCSI-3抢占和SCSI-2重置不一样,抢占不会造成数据丢失。

SCSI-3预留能够解决之前集群模式预留的问题,但是它对存储系统要求更高,要求其支持更多更复杂的SCSI-3预留命令集

SCSI预留是多台主机用来操作LUN的基本机制。在Windows存储环境中,当多台Windows主机需要访问一个LUN的情况下,例如Windows Cluster环境,就会用到SCSI预留命令。接下来就Windows Cluster 2003/ 2008中使用到的SCSI预留命令进行介绍

Windows 2003集群中使用SCSI-2 ReserveRelease命令。作为非持久的Reservation,集群中的一台节点会持有SCSI-2 Reservation的锁,然后每过3秒会重新刷新一次。如果故障转移发生,则切换节点主机会在相应的磁盘上放置SCSI-2 Reservation然后维护SCSI锁。如果所有节点主机上的集群服务都会关闭,Reservation也不会保留。

Windows 2008集群中使用SCSI-3 Persistent Reservation机制。如果磁盘从主机上没有正确移除,集群使用的磁盘(Cluster Disk)会保留着这些Reservation。锁对应的SCSI预留会一直存在于相应的磁盘之上,即使集群服务被关闭或者磁盘对于主机取消映射(Unmasked)。所以,有些时候需要强行移除磁盘上的Reservation。

温馨提示:请搜索“ICT_Architect”或“扫一扫”下面二维码关注公众号,获取更多精彩内容。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325636704&siteId=291194637