[High Availability] of System Complexity

Next, let's talk about the second requirement of complexity, high availability.

Referring to Wikipedia, first look at the definition of high availability.

The ability of a system to perform its functions without interruption represents the degree of availability of the system and is one of the criteria for system design.

The key to this definition is " non-interruption ", but it happens that the difficulty is also on "non-interruption", because no matter whether it is a single hardware or a single software, it is impossible to achieve non-interruption, hardware will fail, software will have bugs; Gradually aging, software will become more and more complex and bulky...

In addition to the inherent inability of hardware and software to be "non-interruptible", the unavailability caused by the external environment is more inevitable and uncontrollable. For example, power outages, floods, earthquakes, these accidents or disasters will also cause the system to be unavailable, and the impact is more serious, and it is more difficult to predict and avoid.

High availability refers to the ability of a system to continue to perform its functions without interruption, and is one of the important criteria for system design. However, it is very difficult to achieve non-disruption, because individual hardware or software can fail or be wrong, and hardware will gradually age over time, and software will become more and more complex and bulky. In addition, external environmental factors such as power outages, floods, earthquakes and other disasters may also cause the system to be unavailable, and its impact is more serious and more difficult to predict and avoid.

Therefore, the solution to achieve high availability is to increase the availability of the system by increasing redundancy. Simply put, it is to add more machines or deploy multiple computer rooms to solve the single point of failure problem. The purpose of this is to enhance the redundancy of the system to achieve high availability. It should be noted that both high performance and high availability require additional machines, but their purposes are different. The former is to "expand" processing performance, while the latter is to "redundant" processing units.

While system availability is enhanced through redundancy, it also introduces complexity. Therefore, in practical applications, it is necessary to analyze one by one according to different scenarios and adopt different high-availability solutions.

Computing High Availability

Here, "compute" refers to business logic processing. The complication of high availability associated with computing is that no matter which machine the computation is performed on, as long as the algorithm and input data are the same, the result of the computation should be the same. Therefore, moving computation from one machine to another has no impact on business logic.

Taking the simplest single-machine to dual-machine architecture as an example, we can see the complexity of the following aspects: First, let’s look at a simple architecture diagram of a single-machine to dual-machine architecture.

alt

You may find that the dual-machine architecture diagram is the same as the dual-machine architecture diagram mentioned in the previous issue of "High Performance", so the complexity is also similar, specifically as follows:

  • A task allocator needs to be added, and selecting an appropriate task allocator is also a complex process that requires consideration of various factors, such as performance, cost, maintainability, and usability, among others.

  • Connection and interaction are required between the task dispatcher and the real business server. Therefore, it is necessary to select an appropriate connection mode and manage the connection. For example, establishing connections, detecting connections, handling connection interruptions, and so on.

  • The task allocator needs to increase the allocation algorithm. Common dual-machine algorithms include active-standby, active-active, and active-standby schemes can be subdivided into cold standby, warm standby, hot standby, and so on.

The diagram above is just a simple two-machine architecture. Let's look at a more complex high-availability cluster architecture.

alt

The diagram above is just a simple two-machine architecture, and the complexity will increase with the size and structure of the cluster. For example, the figure above shows a more complex high-availability cluster architecture. In this case, the selection of the allocation algorithm is more complicated, which can be 1 master and 3 backups, 2 masters and 2 backups, 3 masters and 1 backup, 4 masters and 0 backups, and so on. Which method should be used needs to be analyzed and judged according to actual business needs. There is no algorithm that is necessarily better than other algorithms. For example, ZooKeeper uses 1 master and multiple backups, while Memcached uses all masters and 0 backups.

Highly available storage

For systems that need to store data, the key point and difficulty of the high-availability design of the entire system lies in "storage high availability". Storage differs fundamentally from computing in one fundamental way: moving data from one machine to another requires transmission over wires. The speed of line transmission is at the level of milliseconds, which can be achieved within a few milliseconds in the same computer room, but in computer rooms distributed in different places, the transmission takes tens or even hundreds of milliseconds. For example, from the computer room in Guangzhou to the computer room in Beijing, the ping delay is about 50ms under stable conditions, and may reach 1 second or even longer under unstable conditions.

虽然对人类来说,毫秒几乎没有什么感觉,但对于高可用系统来说,这是本质的不同之处。这意味着在某个时间点上,整个系统中的数据肯定是不一致的。按照“数据 + 逻辑 = 业务”这个公式来看,数据不一致即使逻辑一致,最终的业务表现也会不同。以银行储蓄业务为例,假设用户的数据存在北京机房,用户存入1万块钱,然后查询时被路由到了上海机房,而北京机房的数据没有同步到上海机房。用户会发现他的余额并没有增加1万块。想象一下,此时用户肯定会感到不安,会怀疑自己的钱被盗了,赶紧打客服电话投诉,甚至可能打110报警。即使最终发现只是因为传输延迟导致的问题,从用户的角度来看,这个过程的体验肯定很不好。

alt

除了物理传输速度限制,传输线路本身也可能出现可用性问题。传输线路可能会中断、拥塞、异常(如错包、丢包),并且线路故障的恢复时间通常较长,可能持续几分钟甚至几小时。例如,2015年支付宝因为光缆被挖断,业务受到了超过4个小时的影响;2016年中美海底光缆中断3小时等。线路中断意味着存储无法同步,这段时间内整个系统的数据将不一致。

综合考虑,在正常情况下的传输延迟和异常情况下的传输中断都会导致系统在某个时间点或时间段内的数据不一致,从而影响业务。然而,如果完全不做冗余,整个系统的高可用性就无法保证。因此,存储高可用的难点不在于如何备份数据,而在于如何减少或规避数据不一致对业务造成的影响

在分布式系统领域,有一个著名的CAP定理,从理论上证明了存储高可用的复杂度。也就是说,存储高可用不可能同时满足“一致性、可用性、分区容错性”,最多只能满足其中两个。因此,在架构设计时,需要根据实际业务需求进行取舍。

高可用状态决策

无论是计算高可用还是存储高可用,其基础都是“状态决策”,即系统需要能够判断当前的状态是正常还是异常,如果出现异常就需要采取行动来保证高可用。如果状态决策本身存在错误或偏差,那么后续的任何行动和处理都将失去意义和价值。然而,在具体实践中,存在一个本质的矛盾:通过冗余实现的高可用系统,状态决策本质上不可能做到完全正确。以下是对几种常见的决策方式的详细分析:

1.独裁式

独裁式决策指的是存在一个独立的决策主体,我们称之为“决策者”,其负责收集信息并做出决策。所有冗余的个体,我们称之为“上报者”,将状态信息发送给决策者。

alt

独裁式的决策方式不会出现决策混乱的问题,因为只有一个决策者,但问题也正是在于只有一个决策者。当决策者本身故障时,整个系统就无法实现准确的状态决策。如果决策者本身又做一套状态决策,那就陷入一个递归的死循环了。

2.协商式

协商式决策指的是两个独立的个体通过交流信息,然后根据规则进行决策, 最常用的协商式决策就是主备决策

alt

这个架构的基本协商规则可以设计成:

  • 2台服务器启动时都是备机。
  • 2台服务器建立连接。
  • 2台服务器交换状态信息。
  • 某1台服务器做出决策,成为主机;另一台服务器继续保持备机身份。

协商式决策的架构不复杂,规则也不复杂,其难点在于,如果两者的信息交换出现问题(比如主备连接中断),此时状态决策应该怎么做。

  • 如果备机在连接中断的情况下认为主机故障,那么备机需要升级为主机,但实际上此时主机并没有故障,那么系统就出现了两个主机,这与设计初衷(1主1备)是不符合的。
alt
  • 如果备机在连接中断的情况下不认为主机故障,则此时如果主机真的发生故障,那么系统就没有主机了,这同样与设计初衷(1主1备)是不符合的。
alt
  • 如果为了规避连接中断对状态决策带来的影响,可以增加更多的连接。例如,双连接、三连接。这样虽然能够降低连接中断对状态带来的影响(注意:只能降低,不能彻底解决),但同时又引入了这几条连接之间信息取舍的问题,即如果不同连接传递的信息不同,应该以哪个连接为准?实际上这也是一个无解的答案,无论以哪个连接为准,在特定场景下都可能存在问题。
alt

综合分析,协商式状态决策在某些场景总是存在一些问题的。

3.民主式

民主式决策指的是多个独立的个体通过投票的方式来进行状态决策。例如,ZooKeeper集群在选举leader时就是采用这种方式。

alt

民主式决策和协商式决策比较类似,其基础都是独立的个体之间交换信息,每个个体做出自己的决策,然后按照“ 多数取胜”的规则来确定最终的状态。不同点在于民主式决策比协商式决策要复杂得多,ZooKeeper的选举算法ZAB,绝大部分人都看得云里雾里,更不用说用代码来实现这套算法了。

除了算法复杂,民主式决策还有一个固有的缺陷:脑裂。这个词来源于医学,指人体左右大脑半球的连接被切断后,左右脑因为无法交换信息,导致各自做出决策,然后身体受到两个大脑分别控制,会做出各种奇怪的动作。例如:当一个脑裂患者更衣时,他有时会一只手将裤子拉起,另一只手却将裤子往下脱。脑裂的根本原因是,原来统一的集群因为连接中断,造成了两个独立分隔的子集群,每个子集群单独进行选举,于是选出了2个主机,相当于人体有两个大脑了。

alt

从图中可以看到,正常状态的时候,节点5作为主节点,其他节点作为备节点;当连接发生故障时,节点1、节点2、节点3形成了一个子集群,节点4、节点5形成了另外一个子集群,这两个子集群的连接已经中断,无法进行信息交换。按照民主决策的规则和算法,两个子集群分别选出了节点2和节点5作为主节点,此时整个系统就出现了两个主节点。这个状态违背了系统设计的初衷,两个主节点会各自做出自己的决策,整个系统的状态就混乱了。

In order to solve the split-brain problem, democratic decision-making systems generally adopt the rule that "the number of voting nodes must exceed half of the total number of nodes in the system". In the situation shown in the figure, the total number of nodes in the subcluster formed by nodes 4 and 5 is only 2, which is less than half of the total number of 5 nodes, so this subcluster will not be elected. Although this method solves the split-brain problem, it reduces the overall availability of the system at the same time. That is, if the system does not have too few voting nodes due to split-brain Node 3 really failed), at this time the system will not select the master node, and the entire system is equivalent to a downtime, although there are still nodes 4 and 5 that are normal.

Comprehensive analysis shows that no matter what kind of solution is adopted, it is impossible to make state decision-making without problems in any scenario, but if there is no high-availability solution at all, it will cause greater problems. How to choose a high-availability solution suitable for the system is also a problem. Complex analysis, judgment and selection process.

summary

To sum up, I have analyzed the high availability of complexity, analyzed the two scenarios of high availability of computing and high availability of storage, and given several high availability state decision-making methods, I hope it will be helpful to you

This article is published by mdnice multi-platform

Guess you like

Origin blog.csdn.net/qq_35030548/article/details/130051217