Cross-data center high-availability architecture design

foreword

With years of coding and design, the author has done basic coding, cloud computing platform, architect, and seen a lot of application design, system design, middleware, and understand the existing technology system development model, centralized -> distributed formula; cap and base theory, basically usability is the necessary goal of the design most of the time, so how to achieve usability in a distributed situation, the answer is the copy, that is, deploying more resources, theoretically the more deployment More, the higher the usability, but not all situations of state are stateless, so trade-offs are inevitable.

common design

Various commonly used technologies have the concept of replicas. For stateless replicas, such as multi-core CPUs, each core runs independently and can execute time-slice threads, but the problem of cache consistency still exists (MESI protocol). Similarly, in a distributed environment, if there is a stateless service, the more resources deployed, the higher the availability, because there is no need to consider consistency. According to the CAP theory, AP can do it. But the actual situation often requires consistency, such as database transactions, sessions, and so on.

In reality, cloud computing has the concept of regions and availability zones, which are actually multiple data centers, so the high-availability architecture is stateful. Common designs include active and standby, replica sets, fragmented clusters, and connected clusters.

Active/Standby+Arbitration

The active/standby design is used in many scenarios, typically MySQL MHA and Redis Sentinel. Taking MySQL as an example, if automatic failover is required, the support of arbitration nodes such as MHA is required

 In fact, the same is true for Redis’s sentinel mode. The master and backup are placed in different data centers. Of course, the smaller the delay, the better. This involves delay design across data centers. It can be optimized by making the data center close enough or transferring less data.

Here, the MHA server is the arbitration node, which is responsible for failover and automatic master-standby switchover. According to the heartbeat detection, in fact, the mongodb copy can also use the 1+2 mode, and the principle is the same.

In the same way, due to the master-slave replication, the standby node cannot provide writes, and needs to be copied from the master node (full synchronization, semi-synchronization, asynchronous), so there should not be too many standby nodes, and there are cases where the master node writes too much and cannot be expanded. question.

Replica Set (Extended Cluster)

Replica sets generally do not use arbitration, which is actually the same idea, but an odd number of nodes is required to arbitrate with each other. In fact, an even number is also possible, but there will be a phenomenon of split brain or difficult arbitration. This idea is based on the paxos algorithm, but the Paxos algorithm is too complicated, so the zab and raft algorithms appear to be simplified. Zookeeper is based on ZAB autonomous election algorithm, MongoDB replica set is based on Raft algorithm, Tencent imitates MongoDB and uses MySQL to implement tdsql, etc.

 Nodes are connected in pairs and initiate voting with each other. The voting method is designed according to the node design (odd number, id, etc.) that will definitely appear as the master. Because of the design of voting, the more nodes there are, the more difficult the election is, so the more the better, the better. There is a balance point, such as 5 or 7.

Is this picture familiar? In fact, Redis clusters and rocketmq clusters use this method for failover, but they are different from traditional replica sets and use the fragmentation + raft algorithm.

Then, when designing cross-data center HA, 2 locations and 3 centers are required. Deploying a node in a data center can ensure that a data center is down and the service is still available.

 

Sharded cluster

There are two ways to design a sharded cluster. The sharded shards are replica sets, and the sharded shards are active and standby.

Sharded cluster

If the fragmented piece is a replica set, then the replica set provides the failover capability. The common ones are mongodb fragmented clusters, mongos routing + configserver (replica set) + fragmentation, and each fragment is a raft replica set (replicaSet) . In fact, K8S creates a deployment plan and also creates RS (replicaSet).

For example, in a shard cluster with 2 shards, since each shard is a replica set, the replica set nodes of each shard can span data centers to achieve 2 locations and 3 centers. Theoretically, sharding can be expanded infinitely, but as the expansion increases, the pressure of routing query writing will increase. This architecture design is why mongodb or ES (also sharding) can store a large amount of data. For example, 6 billion "watches".

 

Fragmentation + master/standby

The common ones are Redis cluster, rocketmq cluster, ES cluster, etc. Take Redis as an example:

The sharding of Redis is also the sharding of data, but the Redis primary and backup failover is decided by the vote of the Redis master node. In the multi-data center design, if there are 2 data centers, then this cannot be handled, so Redis is generally 3 Master 3 slaves, 5 masters 5 slaves, etc. An odd number of nodes will appear in two data centers, and the one with more nodes will fail, and the entire cluster will be unavailable, requiring the support of two data centers and three data centers.

connect cluster

As the name suggests, connect clusters: copy the data of two or more clusters to each other to achieve data consistency.

multiple connections

Take rocketmq as an example, use an intermediate replication platform to replicate the rocketmq clusters of two data centers to each other

Only half of it is drawn here, and the right side will also be copied to the left

 Through sdk routing distribution, active-active clusters are realized, and the data is fully synchronized. If one of the data centers is down, the business will not be affected at all. The disadvantages are also obvious, and the bandwidth usage is extremely high. If the data center delay is high or the bandwidth is not enough, then this method It won't work.

Designing applications also needs to be considered, such as idempotent deduplication of MQ consumption, etc., to avoid multiple consumptions; sending can be sent on one side according to the mark in the data area, of course consumption can also be processed according to the actual situation, which is a true dual live design. If there is a data center failure, switch the sending and consumption marks, then realize seamless migration of services, and even realize automatic switching based on monitoring data.

Because it is connected to the cluster, the id (uniqueness) conflict of the written data needs to be considered when storing, and it is recommended to mark the data for writing.

  

Datastore file copy

The above design can be further optimized, such as using middleware features, such as MySQL binlog, Kafka's commit log index log, etc. The replication of files or data streams is realized through the properties of the middleware itself.

The essence is the same, just to realize the difference.

 

Fragmentation of large clusters - data loss is allowed

If the data requirements are not high and loss is allowed, such as caching data, you can use the fragmented large cluster mode

The data is completely fragmented, the data source can be synchronized, and written to redis regularly. If the redis cluster hangs because one data center is down, the data will be lost and wait for writing, but it will not affect the business. As for how to design the sdk routing, this will According to the specific circumstances, routing rules Bale. If there is a local cache like the registration center, and the heartbeat is checked regularly, then it will be automatically registered without copying

 The data is not copied, but it is a fragmented cluster. If one of the data centers goes down, the sdk will automatically register with the other data center, and each sdk has a local cache, which realizes the cost of no switching, and the entire data center is down. The application also hangs without any problems.

Of course, it is also possible not to implement automatic registration across data centers, and lose part of the APP's ability to provide, and the application can automatically restore the ability after recovery.

 

Summarize

The author only briefly introduced the practice and scheme of cross-data centers; in fact, the design is more complicated, and various practical problems are very difficult, especially considering various status, replication stability, bandwidth and other comprehensive considerations. . A simple sdk will have multiple designs, routing deduplication and so on. It still needs to be designed according to the actual scene.

Migration of development logic to the platform

With the rise of containers such as docker in recent years, the development logic is tilted towards the platform. In the past, the development was in the FATJar mode, which became an agent, a sidecar, and a serverless. It is a fixed frame, working within the frame to ensure quality and efficiency. For developers, it is difficult to touch the system implementation logic.

Guess you like

Origin blog.csdn.net/fenglllle/article/details/131022669