How Table Storage Achieves High Reliability and High Availability

series of articles

How Table Storage Achieves High Reliability and High Availability
How Table Storage Achieves Cross-Region Disaster Recovery

foreword

This article will introduce how a distributed NoSQL can achieve high data reliability and high service availability. This is a NoSQL service on the cloud called Table Storage. For distributed NoSQL, you may think of many names, such as HBase, Cassandra, AWS's DynamoDB, etc. This type of NoSQL has been designed as a distributed system to support large-scale data volume and concurrency. In addition, you may also think of MongoDB and Redis, which also provide clustering functions, but generally require manual configuration of sharding and replication sets/master-slave and so on.

Table Store is a NoSQL service independently developed by Alibaba. The architecture design refers to Google's bigtable paper, which has some similarities with the above-mentioned HBase, Cassandra, and DynamoDB. Some of the functional advantages of Table Store can be seen in the following figure.

image.png | center | 704x378

This article will focus on the technology of table storage in terms of high reliability and high availability. Table Store is designed according to the data reliability of 11 9s and the service availability of 4 9s. As a cloud service, it provides 10 9s of reliability and 3 9s of availability SLA.

Fault tolerance, high reliability, high availability

Fault tolerance, high reliability, and high availability are several interrelated concepts. Fault tolerance refers to the ability of a system to quickly detect software or hardware failures and quickly recover from failures. The general implementation of fault tolerance is redundancy, which can ensure that a broken system is not affected. High reliability is a measure of data security, and high availability is a measure of the available time of the system. Generally, the indicators of high reliability and high availability are represented by several 9s.

Redundancy can achieve fault tolerance. At the data level, it generally means that there are multiple copies of the data. Any damage will not affect the data integrity. At the service level, it can be expressed as multiple service nodes, and any node failure can migrate the service. to another node. The high availability of services will depend on the high reliability of data, because if the integrity of the data is not guaranteed, then the availability of services cannot be discussed.

A fault-tolerant system does not necessarily have to be highly available. High availability measures the available time of the system. If the system does not support hot upgrades, or does not support dynamic expansion or shrinkage, it needs to be shut down for operation and maintenance operations, which is also unsatisfactory for high availability. .

In order to obtain higher availability, two or more systems can be combined into active and standby systems to build disaster recovery, such as the common scenario of two computer rooms in the same city. Two computer rooms in the same city cooperate with automatic switching or fast manual switching, which can make the system achieve higher availability. In addition, the business can carry out some elastic architecture design for the application, and can automatically degrade when some services are abnormal to improve the availability of the business.

How Table Storage Achieves High Reliability and High Availability

In the traditional MySQL active-standby synchronization scheme, when the main database fails, the HA module will migrate the service to the standby database to achieve high availability. The problem with this method is that if strong data consistency is to be ensured, the main database and the standby database must be written successfully before they can be returned, that is, the maximum protection mode. If the standby database becomes unavailable, the main database write will also fail, so in fact, availability and consistency cannot be achieved.

In a distributed system, when the failure rate of a single machine is the same, the probability of a single machine failure in the cluster increases with the increase of the cluster size. High reliability and high availability have become the most basic design goals of distributed systems. In a distributed system, to achieve high data reliability, multiple copies and distributed consistency algorithms such as Paxos are often used to achieve high availability. Generally, a fast failover mechanism is implemented to achieve hot upgrades and dynamic expansion and contraction.

So how does table storage achieve high reliability and high availability? Let's first look at the following architecture diagram:

image.png | center | 704x497

The Table Worker in the figure represents the back-end service node of Table Store, and there is a system called Pangu below it. Pangu is an advanced distributed storage system developed by Alibaba. Distributed storage is a kind of shared storage. Any node of Table Worker can access any file in Pangu. At the same time, this is also an architecture that separates storage and computing.

 

I have a few Alibaba Cloud lucky coupons to share with you. There will be special surprises for purchasing or upgrading Alibaba Cloud products with the coupons! Take all the lucky coupons for the products you want to buy! Hurry up, it's about to be sold out.

High reliability of data

Pangu will focus on solving problems in the field of distributed storage. The most basic thing is to ensure high reliability of data and provide strong consistency guarantees. When we use Pangu, we have configured three data copies, and the loss of any copy will not affect the read-write and data integrity. When writing, Pangu guarantees that at least two copies are successfully written and returns, and in the background, it repairs the possible situation that one copy fails to be written. When reading, only one copy is required in most cases.

Service high availability

1. Data Model of Table Store

First, let's take a look at the data model of Table Store, as shown in the following figure:

image.png | center | 704x363
A table in Table Store consists of a series of rows. Each row is divided into primary key and attribute columns. The primary key can contain multiple primary key columns. For example, a primary key expressed in the following way contains three primary key columns. The types of the three columns are all For string:
PrimaryKey (pk1: string, pk2: string, pk3: string)
The first column of the primary key column is used as the partition key (Partition Key). In the above example, the partition key is pk1. All data in the table will be sorted by the primary key (including three primary key columns) and sharded according to the range of sharding keys. When the data or access volume of a table increases, the shards will be split to perform dynamic elastic expansion. A table with a large amount of data will have hundreds to thousands of shards.

Why is it designed this way? To put it simply, on the one hand, all data is sorted according to the primary key, and the entire table can be regarded as a huge SortedMap, which can be queried in any primary key interval. The data with the same column is in a shard, which is convenient to perform functions within the shard, such as auto-incrementing columns, transactions in the shard, LocalIndex, etc.

Why can a shard key be a sorted key instead of a hash key? This is because we use the ability of shared storage. When continuously dividing a shard, there is no need to move data, and the data itself has been broken up by Pangu. If there is no shared storage, you need to consider how to deal with data distribution, such as using a consistent hash algorithm, etc. The shard key is generally a hash key, which loses the ability to query the entire table in any range of primary keys.

2. High Availability Implementation

So back to the topic, how to achieve high service availability? If a table has 100 shards, these shards will be loaded by different workers. When a worker dies, the above shards will be automatically migrated to other available workers. On the one hand, a single worker failure will only affect some shards, not all shards. On the other hand, since we use the shared storage Pangu, any worker can load any shard, and the computing resources are separated from the storage resources, which makes the failover process very fast and the architecture simpler.

The above is about failover. In fact, this architecture can also do very flexible load balancing. When the read and write pressure on a shard is too high, a shard can be quickly divided into two or more shards and loaded onto different workers. When sharding is divided, it is not necessary to migrate the data of the parent shard to the child shard, but to establish a link to the file of the parent shard in the child shard, and then use the background compaction to actually segment the data. Remove the link. After a shard is divided, it can continue to be divided down, and all shards will be shared among different workers in the cluster, making full use of the CPU and network resources of the cluster.

3. Operation and maintenance, upgrade, hardware update

As a cloud service, the daily operation and maintenance, upgrade, and hardware replacement of the server are completely transparent to users, and these operation and maintenance operations must also be designed according to high-availability standards. The operation and maintenance of distributed systems is a very complex task. If cloud services are not used, operation and maintenance will be a heavy and extremely risky task. There may be frequent downtimes for upgrades, and various failures need to be dealt with. In addition, machine hardware is constantly being updated. In order to provide users with better performance, cloud services will continue to update new hardware and build better networks. We believe that as a database on the cloud, it can provide users with better performance, higher availability and better experience.

 

Click to read the original text

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326338379&siteId=291194637