How is the high-availability "multi-active in different places" architecture designed?

0. Foreword

Background services can be divided into two categories, stateful and stateless. High availability is relatively simple for stateless applications. For stateless applications, only F5 or any agent can be used to solve them. The following description is mainly for the analysis of stateful services.

The state maintenance of the server is mainly saved through disk or memory, such as MySQL database, redis and other memory databases. In addition to these two types of maintenance methods, there is also the state maintenance of jvm memory, but the state life cycle of jvm is usually very short.

1. Some solutions for high availability

High availability, from the development point of view, has roughly gone through the following processes:

  • cold standby

  • Hot Standby

  • Double live in the same city

  • remote active

  • Live more in different places

When talking about living in different places, let's look at some other solutions first, which will help us understand the reasons for many designs.

1.1 Cold Standby

Cold standby, by stopping the ability of the database to serve externally, the operation method of quickly backing up and archiving data by copying files. In short, cold standby is copy and paste, which can be completed quickly through the cp command on linux. It can be done manually or by scheduled scripts. There are the following benefits:

  • Simple

  • Fast backup (compared to other backup methods)

  • Quick recovery. You only need to copy the backup file back to the working directory to complete the recovery process (or modify the configuration of the database, directly modify the backup directory to the database working directory). What's more, the recovery can be completed instantly by two mv commands.

  • It can be restored according to the point in time. For example, a lot of money was swiped out of the Pinduoduo coupon loophole that happened a few days ago, and it can be restored according to the previous point in time to "recover the loss".

The above benefits are a good way for the previous software. But for many scenarios today, it is no longer easy to use, because:

  • The service needs to be down. N nines definitely can't be done. Then, in the past, our downtime cold backup was performed when no one was using it in the early morning, but now many Internet applications are already global, so someone is using it at any time.

  • data lost. If no measures are taken, the data from the backup time point to the restore time point will be lost after the data recovery is completed. The traditional method is to manually restore data through database logs after cold standby restoration. For example, through redo logs, what's more, I have used business logs to manually replay requests to restore data. Recovery is a huge physical effort with high error rate and long recovery time.

  • Cold backup is a full backup. Full backup will cause waste of disk space and insufficient capacity, which can only be solved by copying the backup to other mobile devices. Therefore, the entire backup process takes longer.

  • Imagine how much mobile hard disk and time it takes to copy several terabytes of data to the mobile hard disk every day. Moreover, full backup cannot be customized. For example, only certain tables cannot be backed up.

How to weigh the pros and cons of cold backup is something that every business needs to consider.

1.2 Hot Standby

Compared with cold backup, the main difference between hot backup and cold backup is that it does not need to shut down and provides services while backing up. But it still needs to be shut down when restoring. Since what we are discussing is related to storage, the method of sharing disks is not regarded as dual-machine hot backup.

1.2.1 Active/Standby mode

Equivalent to 1 master and 1 slave, the master node provides external services, and the slave node acts as a backup. By some means, the data is synchronized from the master node to the slave node, and when a failure occurs, the slave node is set as a working node. The method of data synchronization can be at the software level or at the hardware level. At the software level, such as the master/slave method of mysql, through the method of synchronizing binlog; the subscription replication method of sqlserver. At the hardware level, data is copied to another disk through mirroring technologies such as sector and disk interception. The hardware-oriented approach is also called data-level disaster recovery; the software-oriented approach is called application-level disaster recovery. The following article will talk more about application-level disaster recovery.

1.2.2 Two-machine mutual backup

In essence, it is still Active/Standby, but it is only mutually dominant. Dual-machine mutual backup does not work for the same business, but from the perspective of the server, it better squeezes the available resources. For example, two businesses have libraries A and B respectively, and are deployed through two machines P and Q. Then for business A, P is master and Q is slave, and for business B, Q is master and P is slave. On the whole, it seems that the two machines are active and standby for each other. Under this architecture, the separation of reading and writing is very good. Single writing and multiple reading reduce conflicts and improve efficiency.

Other high-availability solutions can also refer to various deployment modes of various databases, such as mysql master-slave, dual-master multi-slave, MHA; redis master-slave, sentinel, cluster, etc.

1.3 Intra-city live-active

The several schemes mentioned above are basically carried out in a local area network. After the business develops, there is a plan to live more in the same city. Compared with the previous one, the granularity of distrust has changed from the machine to the computer room. This solution can solve the situation that an IDC computer room hangs up as a whole (power outage, network disconnection, etc.).

In fact, there is no essential difference between the dual-active in the same city and the dual-machine hot backup mentioned above, but the "distance" is farther, and it is basically the same (the network speed of the private line in the same city is still very fast). Dual-machine hot backup provides disaster recovery capabilities, and dual-machine mutual backup avoids excessive waste of resources.

With the help of program codes, some businesses can also achieve true active-active, that is, the same business, dual masters, providing read and write at the same time, as long as the conflicts are handled well. It should be noted that not all businesses can do this.

The industry more often adopts the practice of three centers in two places. The remote backup computer room can provide greater disaster recovery capabilities and better resist earthquakes, terrorist attacks and other situations. Active-active machines must be deployed in the same city, and cities farther away serve as disaster recovery computer rooms. The disaster recovery computer room does not provide external services, it is only used as a backup, and the traffic is switched to the disaster recovery computer room when a failure occurs; or it is only used as a data backup. The main reason is that the distance is too far and the network delay is too large.

Figure 1 Three Centers in Two Places

As shown in the figure above, the user traffic is load balanced, and the traffic of service A is sent to IDC1 and server set A; the traffic of service B is sent to IDC2 and server B; at the same time, server sets a and b perform intra-city leased lines from A and B respectively The data is synchronized, and it is synchronized to IDC3 through a long-distance off-site dedicated line. When any IDC crashes, switch all traffic to another IDC computer room in the same city to complete failover.

When a large-scale failure occurs in city 1, for example, an earthquake causes IDC1 and 2 to stop working at the same time, the data will be preserved in IDC3. At the same time, if load balancing is still valid, all traffic can be forwarded to IDC3. However, at this time, the IDC3 computer room is very far away, and the network delay becomes very serious, usually the user experience will be seriously affected.

Figure 2 Master-slave mode of two places and three centers

The picture above is a schematic diagram of two places and three centers based on the Master-Slave model. The two computer rooms in city 1 act as a master and one slave, and the remote computer room acts as a slave. You can also use the method of dual masters in the same city + keepalived + vip, or the method of MHA to perform failover. But city 2 cannot (preferably not) be selected as Master.

1.4 Remote Active-Active

Active-active in the same city can handle most disaster recovery situations, but services will still be interrupted in the event of a large-scale power outage or natural disaster. Renovate the above three centers in two places, deploy front-end entry nodes and applications in different places, and switch traffic to city 2 after city 1 stops serving, which can be downgraded while reducing user experience. However, the user experience has declined significantly.

Therefore, most Internet companies have adopted the remote active-active solution.

Figure 3 Simple schematic diagram of remote active-active

The above figure is a schematic diagram of a simple remote active-active. The traffic is distributed to the server clusters in the two cities after passing through the LB. The server clusters are only connected to the local database clusters. Only when all the local database clusters cannot be accessed will they failover to the remote database clusters.

In this way, two-way synchronization takes more time due to remote network issues. Longer sync times will result in more severe throughput degradation, or data collisions. Throughput and contention are two opposing issues, and you need to make a trade-off between them. For example, in order to resolve conflicts, introduce distributed locks/distributed transactions; in order to achieve higher throughput, use intermediate states, error retries, etc. to achieve final consistency; It is possible to complete the entire transaction in one node.

For some businesses that cannot accept eventual consistency, Ele.me adopts the method shown in the figure below:

For individual applications with high consistency requirements, we provide a strong consistent solution (Global Zone). Global Zone is a cross-room read-write separation mechanism. All write operations are directed to a Master computer room for To ensure consistency, read operations can be performed in the Slave library of each computer room, or can be bound to the Master computer room. All of this is done based on our database access layer (DAL), and the business is basically unaware.

—— "Ele.me's remote multi-live technology implementation (1) general introduction"

In other words, active-active cannot be performed in this area. Adopting the master instead of double writing naturally solves the problem of conflicts.

In fact, remote active-active and remote multi-active are already very similar, and the structure of active-active is simpler, so there is no need to think too much about the program architecture, and only need to do traditional current limiting, failover and other operations. But in fact, active-active is only a temporary step, and the ultimate goal is to switch to multi-active. Because HyperMetro has problems with data conflicts, it cannot scale horizontally.

1.5 Live more in different places

Figure 4 Schematic diagram of multiple activities in different places

According to the idea of ​​live-active in different places, we can draw a schematic diagram of multi-active in different places. The out-degree and in-degree of each node are 4. In this case, any node going offline will not affect the business. However, considering the distance, a write operation will bring greater time overhead. In addition to affecting the user experience, the time overhead also brings more data conflicts. In severe data conflicts, the cost of using distributed locks is also greater. This will increase the complexity of the system and decrease the throughput. Therefore, the solution shown above cannot be used.

Recall how we optimized when solving the mesh network topology? Introduce intermediate nodes and change the mesh to star:

Figure 5 Star-shaped off-site multi-active

After transforming to the picture above, the offline of each city will not affect the data. For the traffic of the original requested city, it will be re-LoadBalanced to the new node (preferably LB to the nearest city). In order to solve the problem of data security, we only need to deal with the central node. But in this way, the requirements for the central city will be higher than other cities. For example, recovery speed, backup integrity, etc., will not be expanded here for the time being. Let's first assume that the hub is completely secure.

If we have deployed the remote multi-active business as the structure shown in the figure above, the problem of data synchronization everywhere has been largely solved, but there will still be a large number of conflicts, which can be simply considered to be similar to dual-active. So is there a better way?

Here we can relate Ele.me's GlobalZone solution. The general idea is to "de-distribute", that is to say, put the written business on a node (same city) machine. Ali thinks this way:

Ali's ideal remote multi-active architecture

In fact, I guess that many businesses are also realized according to the above picture, such as the Didi taxi business, all businesses are divided by city. The latitude and longitude of users, car owners, and destinations are usually in the same city. A single data center does not need to interact with other data centers. It is only needed when generating reports, but the reports do not pay much attention to real-time performance. Well, in this case, the national business can actually be sharded very well.

However, for complex scenarios and businesses such as e-commerce, sharding in the manner mentioned above can no longer meet the needs. Because the business line is very complex and the data dependencies are also very complex, it is inevitable that each data center will synchronize data with each other. Taobao's solution is somewhat similar to how we split microservices:

Taobao's remote multi-active architecture divided by unit

Pay attention to the data synchronization arrows in the figure. Taking the transaction unit as an example, the business data belonging to the transaction unit will be synchronized with the central unit in two directions; the business data not belonging to the transaction unit will be synchronized from the central unit in one direction. The central unit undertakes the most complex business scenarios, and the business unit undertakes relatively single scenarios. For business units, elastic scaling and disaster recovery can be performed; for central units, the scalability is poor and the stability requirements are higher. It can be seen that most of the failures will occur in the central unit.

Segmenting units according to business already requires a complete transformation of the code and architecture (maybe this is why Ali switched from dual-active to multi-active, which took 3 years). For example, business splitting, dependency splitting, mesh to star, distributed transactions, cache invalidation, etc. In addition to the high requirements for coding, there are also great challenges for testing and operation and maintenance.

In such a complicated situation, how to automate coverage, how to conduct drills, and how to transform the pipeline. This level of disaster recovery is not something ordinary companies dare to do, and the input and output are not directly proportional. However, we can still regard this scenario as our "imaginary enemy" to think about our own business, how it will develop in the future, and what level of disaster preparedness we need. Relatively speaking, Ele.me's multi-active solution may be more suitable for most enterprises.

This article only gives a brief description by drawing pictures. In fact, multi-living in different places requires a lot of very strong basic capabilities. For example, data transmission, data verification, data operation layer (simplifies the process of client control writing and synchronization), etc.

Guess you like

Origin blog.csdn.net/u011487470/article/details/127619794