Analysis of PolarDB, Alibaba Cloud's Next-Generation Relational Database

Abstract:  By describing the background of relational database development and the characteristics of the era of cloud computing, this paper shares the evolutionary concept of the spiral rise of database computing power. And combined with the development path of Alibaba Cloud RDS products, it expounds the overall product design idea of ​​the self-developed new generation cloud-hosted relational database PolarDB, and also interprets some key technical points.

By describing the background of the development of relational databases and the characteristics of the cloud computing era, this paper shares the evolutionary concept of the spiral rise of database computing power. And combined with the development path of Alibaba Cloud RDS products, it expounds the overall product design idea of ​​the self-developed new generation cloud-hosted relational database PolarDB, and also interprets some key technical points.

1. Background

Relational Database

When it comes to relational databases, in this TMT era where knowledge is changing with each passing day, it sounds a bit "antique". This IT technology, which originated half a century ago, has in fact been at the core of modern social technology, supporting most of the world's Commercial science and technology civilization. The three core areas of CPU, operating system, and database are basically the epitome of the IT era, and are also the cornerstone of all information processing, computing power and intelligence. From the landmark paper "A Relational Model of Data for Large Shared Data Banks" published by EFCodd in 1970, to the commercial relational database DB2 supporting SQL in the early 1980s, the launch of Oracle, and the birth of SQL-Server in the early 1990s, All are representative of the success of relational databases.

Today, with the development of the global Internet and the wide application of big data technology, more and more new databases have emerged, but relational databases still dominate. One of the main reasons is that the relational database adopts the SQL standard. This high-level non-procedural programming interface language perfectly integrates computer science and data management methods that are easy for human understanding and cognition. It is still difficult to surpass. .

SQL language

SQL (Structured Query Language) language is a structured query language between relational algebra and relational calculus proposed by Boyce and Chamberlin in 1974. Its essence is to use a keyword and grammar similar to natural language to define And manipulate data for programmable data storage, query and management. This abstract programming interface decouples specific data issues from the details of data storage and query implementation, enabling business logic and information management computing models to be copied and applied in large numbers, liberating productivity and greatly improving productivity. Promote the development of commercial relational database itself. From the perspective of the continuous development and enrichment of SQL, SQL has become the standard and king of relational database languages. As of today, there is no more perfect replacement for this programming language.

OLTP

In 1976, Jim Gray published a paper entitled "Granularity of Locks and Degrees of Consistency in a Shared DataBase", which formally defined the concept of database transactions and the mechanism of data consistency. OLTP is a typical application of relational databases involving transaction processing, mainly basic, daily transaction processing, such as bank transactions. Transaction processing needs to follow four elements of ACID to ensure the correctness of data, including Atomicity, Consistency, Isolation and Durability. The performance indicators to measure the OLTP processing capability mainly include response time and throughput rate.

Open source database ecology

After we briefly reviewed the history, status and development stage of relational databases, it is not difficult to see that relational databases such as Oracle, SQL-Server, DB2 still occupy the dominant position of global commercial databases, although the once-familiar Informix, Sybase has faded out of the public eye. However, since the 1990s, another software spirit that advocates knowledge sharing, freedom and openness has become a trend and trend, especially open source software represented by Linux, MySQL, PostgreSQL, etc. These freely shared technological dividends have nurtured and promoted the rapid development of global Internet high-tech companies. This is the progress of the entire human society, and we would like to thank those fighters of open source software, Richard Stallman, Linus Torvalds, Michael Widenius, etc. Of course, in recent years, more and more Chinese companies that actively participate in the mainstream open source community have emerged in China, and they are also constantly sharing technology and giving back to the open source world.

According to the latest statistics from the DB-engines website, it is not difficult to find that when the open source databases MySQL and PostgreSQL are added together, the open source database has surpassed the commercial database Oracle and become the most popular relational database in the world.

2. The current stage of cloud computing

If the relational database is a product of the IT era. So in the cloud computing of the Internet era, what stage is the relational database in? To a certain extent, the IT era is more about creating computing power. Then cloud computing, which has entered the Internet era, focuses on the connection between users and computing power, and provides ubiquitous computing power. This is actually a cloud computing business. The success of the model can be called version 1.0. The cloud computing version 2.0 needs to re-evolve and upgrade computing power in the cloud environment. This evolution reflects the integration of social computing power and the progress of computing resource energy efficiency. In order to comply with the development trend of green computing and the sharing economy, it is not only necessary to integrate and upgrade various software and hardware systems such as cloud servers, cloud databases, network interconnection, hardware chips, etc., but also to adhere to the demand-oriented technology and user-oriented services To further promote the improvement of computing efficiency and computing intelligence, it is based on the concept of science and technology that benefits the general public.

We are in a booming cloud computing 2.0 stage. At this stage, the relational database gradually exposed some problems in the cloud hosting environment. As a pioneer in the cloud computing era, Amazon released the Aurora cloud hosted relational database at the AWS re:Invent 2014 conference on November 12, 2014. solve these problems. The release of this new generation of database also indicates that in the cloud computing era, the traditional core products of IT technology will open the prelude to self-evolution. At the 2017 SIGMOD Data Conference, Amazon released the paper "Amazon Aurora: Design Considerations for High Throughput Cloud Native Relational Databases", which more openly explained how the relational database designed by Cloud-Native based on the cloud environment was born. .

3. Why does Alibaba Cloud develop a new generation of relational database PolarDB?

After we reviewed the relational database and the background of cloud computing, it is not difficult to find that although cloud computing 1.0 solves the problem of connection between users and computing, it still needs to further solve the traditional relational database in a shared computing environment. Integration with the public cloud service environment.

Cloud computing 1.0 has gained the power to convert traditional IT computing to the cloud with its low cost, flexible and rapid deployment, elasticity and expansion capabilities. After the low-cost enjoyment of inclusive technology has become the norm, with the growth of user business, new pain points for users begin to appear, for example, how to fundamentally solve the problem of using sustained low costs to enjoy the same or even better computing power than traditional IT. Cloud services have become an urgent need. This looks like a false proposition at first, but after careful analysis, it vividly embodies the philosophical thought of spiraling upward. Just as in the era of the emergence of PC servers, PC servers first provided computing power close to that of small servers at a low price, and then on the basis of maintaining cost and cost-effectiveness advantages, they achieved performance advantages that surpassed small servers, until the end of small servers. The server era, began the PC server era.

Therefore, the era of cloud computing is far from reaching its heyday, unless it evolves through its own evolution, while maintaining its cost-effectiveness advantage, on the basis of its inherent attributes of fast, flexible and elastic, it has the ability to surpass traditional IT computing power. It is only a matter of time before computing will truly enter the era it dominates.

That is to say, today, it is not only Alibaba Cloud that wants to make such a relational database, but all cloud computing vendors will inevitably go through such a stage. That is the reconstruction and evolution of traditional IT computing power in the cloud computing era! It's just that Amazon is at the forefront, and Alibaba Cloud is close behind, and they all need to go through the process of evolution and transformation. In this process, a new generation of relational databases is one of the key milestones. In the same way, there should be more advanced cloud services, such as the emergence of intelligent cloud operating systems, to integrate hardware chips and network interconnections designed for the cloud era.

In the IT era, traditional computing power (such as using relational databases to process structured data, etc.) serves multi-user usage scenarios in a system hardware isolation environment. The cloud computing era is a multi-customer Self-Service rental environment, and various computing load scenarios are more complex. In this environment of computing load changes, how to solve the contradiction between the technical products in the IT era and the application environment in the cloud computing era is a problem. It is the internal driving force for the self-evolution of cloud computing.

For example, in the public cloud environment, with the increase of users, as well as the growth of user business and data, related problems such as backup, performance, migration, upgrade, read-only instance, disk capacity, and Binlog delay gradually emerge. Most of the reasons behind this are due to I/O bottlenecks (storage and network), which urgently need to be solved through technological innovations and new product architectures. On the other hand, in terms of product form, the current product forms of Alibaba Cloud RDS have their own advantages, which will be introduced in detail in the next section. However, from the perspective of the development of product architecture, in addition to the type of database storage engine, for relational databases, considering engineering efficiency and operation and maintenance costs, it is best to have a general product technology architecture that can take into account the needs of different user scenarios, and A corresponding technical architecture is not implemented for every scenario.

In the following content, by describing the characteristics of different product forms of Alibaba Cloud RDS, we will understand more clearly that the product form of PolarDB was born by absorbing the advantages of the previous product forms.

4. The Design Thinking of PolarDB

User needs and the choice of public cloud development

As cloud-hosted relational data, in addition to the core characteristics of relational databases. PoalrDB focuses more on how to provide cloud services that meet the business needs of users, and through technological innovation and continuous evolution, while providing better database computing power, it also meets the following business needs of users:

  • cloud cost

  • OLTP performance

  • business continuity

  • Online business expansion

  • Data Security

On the other hand, in addition to the cost advantage of cloud computing, elasticity and scalability are also the natural attributes of cloud computing. For user business expansion, better scale up and fault recovery, the architecture that separates computing and storage has become a better choice for cloud resource environments. This will be further explained in the evolution of the RDS product architecture in the next section.

Evolution of Alibaba Cloud RDS Product Architecture

As mentioned above, the evolution directions of Alibaba Cloud PolarDB and Amazon Aurora databases are the same, but the evolution paths are different. In itself, this is determined by the different implementation methods of their respective database cloud services. Alibaba Cloud RDS MySQL has the following versions. These product forms meet different user business scenarios, have different characteristics, and can complement each other's advantages.

1. MySQL Basic Edition

The MySQL basic version adopts the separation method of database computing nodes and storage nodes, utilizes the reliability of cloud disk data itself and the characteristics of multiple copies, and also utilizes ECS cloud server virtualization to improve the management efficiency of standardized deployment, version and operation and maintenance. It can meet the business scenarios where low-end users do not pay much attention to high-availability services. At the same time, this architecture has natural advantages for database migration, data capacity expansion, computing node Scale Up, and computing node failure recovery. The fundamental reason is the separation of computing and storage. As will be mentioned later, PolarDB also adopts the design concept of separation of storage and computing.

2. MySQL High Availability Edition

The MySQL high-availability version is a high-availability database version for enterprise-level users, providing a 99.95% SLA guarantee. Using Active-Standby's high-availability architecture, data replication is performed between the primary node and the standby node through MySQL Binlog. When the primary node fails, the standby node takes over the service. At the same time, it also supports multiple read-only nodes, and supports load-balanced data read-write separation access. Using the Shared-Nothing architecture, computing and data are located on the same node, which maximizes performance and brings reliability through multiple copies of data.

3. MySQL Financial Edition

MySQL Financial Edition can be said to be a high-availability and high-reliability cloud service product designed for high-end users in the financial industry. It adopts the distributed Raft protocol to ensure strong data consistency, has better failure recovery time, and is more suitable for data disaster recovery and backup. and other business scenarios.

Evolution of PolarDB

PolarDB adopts a technical architecture that separates storage and computing, and can support more read-only nodes. The Active-Active Failover method is used between the master node and the read-only node, and the computing node resources are fully utilized. Due to the use of shared storage, the same data is shared, which further reduces the use cost of users. In the next section we will describe the key features of PolarDB in further detail.

There are several major innovations in the design thinking of PolarDB. One is to access specific WAL I/O data such as Redo log by redesigning a specific file system, and the other is to put database files and Redo log files on a shared storage device through high-speed network and efficient protocol, avoiding multiple times. The repeated operation of long-path I/O is more ingenious than Binlog. In addition, in the design of DB Server, it adopts the idea of ​​MySQL compatibility and fully embraces the open source ecology. From SQL compilation, performance optimizer and execution plan, it retains the characteristics of traditional relational databases. And for the I/O path of Redolog, a multi-copy shared storage block device is specially designed.

We know that the distributed database has always been a hot spot in the database field, and it is very difficult to implement. Regardless of whether it follows the CAP theory or the BASE idea, it is basically difficult for a general distributed relational database to achieve a perfect balance between technology and commercial use. Compatible with SQL standards and mainstream databases, 100% support for OLTP ACID transactions, 99.99% high availability, high-performance and low-latency concurrent processing capabilities, elastic Scale Up, Scale out scalability, backup disaster recovery and low-cost migration, etc. A commercial relational database that perfectly combines all these characteristics has not yet emerged.

A common design philosophy of Alibaba Cloud PolarDB and Amazon Aurora is to abandon the general-purpose distributed database OLTP multi-channel concurrent write support, and adopt the architecture design of one write and multiple reads, which simplifies the theoretical model that is difficult to take into account in distributed systems, and can satisfy Most OLTP application scenarios and performance requirements. In a word, the compatibility of 100% MySQL, coupled with the dedicated file system and shared storage block device design, as well as the application of a number of advanced technologies mentioned below, make the new generation of relational database PoalrDB sure to shine in the cloud era .

5. Analysis of key technical points of PolarDB products

After describing the different product forms of Alibaba Cloud RDS, let's take a look at the product architecture of PolarDB as a whole. The following figure outlines the main modules of the PolarDB product, including database server, file system, shared block storage, etc.

PoalrDB Product Architecture

Alibaba Cloud relational database PoalrDB cluster

As shown in the figure, the PolarDB product is a design of a distributed cluster architecture. It integrates many advanced technologies to realize a qualitative leap in database OLTP processing performance. PoalrDB adopts the design concept of separation of storage and computing to meet the rigid requirements of user business elastic expansion in the public cloud computing environment. High-speed network interconnection is used between database computing nodes and storage nodes, and data transmission is performed through the RDMA protocol, so that I/O performance is no longer a bottleneck.

Database nodes are designed to be fully compatible with MySQL. Active-Active Failover is used between the master node and the read-only node to provide high-availability services for DB. DB data files, redolog, etc. are transmitted to the remote Chunk Server through the User-Space user mode file system, through the block device data management route, and relying on the high-speed network and RDMA protocol. At the same time, only the metadata information related to the Redo log needs to be synchronized between the DB Servers. The data of Chunk Server adopts multiple copies to ensure the reliability of the data, and ensures the consistency of the data through the Parallel-Raft protocol.

After describing the product architecture of PolarDB, we will introduce the key technical points used by PolarDB one by one from the aspects of distributed architecture, high availability of databases, network protocols, storage block devices, file systems and virtualization.

Shared Disk Architecture

The essence of distributed systems lies in splitting and combining. Sometimes data is split for concurrent performance, and sometimes it has to be combined for the consistency of data state, or it has to wait synchronously due to distributed locks. PolarDB adopts the Shared Disk architecture, and the fundamental reason is the above-mentioned need to separate computing and storage. Logically, the DB data is placed on the data chunk storage server that all DB servers can share and access. In the storage service, the data is actually cut into chunks to achieve the purpose of concurrently accessing I/O through multiple servers.

Physical Replication

We know that MySQL Binlog records data changes at the Tuple row level. In the InnoDB engine layer, transaction ACID needs to be supported, and a Redo log is also maintained, which stores the modifications based on the physical pages of the file. In this way, a MySQL transaction needs to call fsync() at least twice by default to perform log persistence operations, which has a direct impact on the system response time and throughput performance of transaction processing. Although MySQL adopts the Group Commit mechanism to improve the throughput under high concurrency, it cannot completely eliminate the I/O bottleneck.

In addition, due to the limited computing and network bandwidth of a single database instance, a typical approach is to build multiple read-only instances to share the read load to achieve scale out. PolarDB solves the problem of data replication between read-only nodes and master nodes by storing database files and Redolog on shared storage devices. Due to data sharing, the addition of read-only nodes does not require full data replication, and shares a full data and Redo log. It only needs to synchronize metadata information, support basic MVCC, and ensure data read consistency. This reduces the failure recovery time of switching to a read-only node to less than 30 seconds when the system fails over the primary node. The high availability of the system is further enhanced. Also, data latency between read-only nodes and master nodes can be reduced to the millisecond level.

From the perspective of concurrency, using Binlog replication can now only replicate in parallel at the table level, while physical replication only follows the data page dimension, with finer granularity and higher parallel efficiency.

Finally, the advantage of introducing Redolog to realize Replication is that Binlog can be closed to reduce the impact on performance, unless Binlog is required for logical disaster recovery backup or data migration.

In short, in the I/O path, the more you go to the bottom layer, the easier it is to decouple from the business logic and state of the upper layer and reduce the system complexity. Moreover, this WAL Redo log I/O method of reading and writing large files is also very suitable for the concurrency mechanism of the distributed file system, which improves the concurrent read performance for PolarDB.

RDMA protocol under high-speed network

RDMA has been used for many years in the field of HPC technology, and now it is used in the field of cloud computing, which also confirms one of my judgments. In the era of cloud computing 2.0, people's understanding of cloud computing will be rebuilt, and the cloud can also create computing power that surpasses traditional IT technology, which will be an increasingly rigorous industrial realization.

RDMA usually requires network devices (such as switches, NICs, etc.) that support high-speed network connections, communicates with the NIC Driver through a specific programming interface, and then uses Zero-Copy technology to achieve data on the NIC and remote applications. High-efficiency and low-latency transfer between memories, instead of copying data from kernel state to application state by interrupting the CPU, greatly reduces performance jitter and improves the overall system processing capability.

Snapshot physical backup

Snapshot is a popular block device-based backup scheme. Its essence is to use the Copy-On-Write mechanism. By recording the metadata changes of the block device, copy-on-write is performed on the block device where the write operation occurs, and the content of the write operation is changed to the newly copied block device to achieve recovery. The purpose of the data up to the snapshot point in time. Snapshot is a typical post-processing mechanism based on time and write load model. That is to say, when the Snapshot is created, the data is not backed up, but the load of the backup data is evenly divided into the time window of the actual data writing after the Snapshot is created, so as to realize the fast response of backup and recovery. PolarDB provides a mechanism based on Snapshot and Redo log, which is more efficient than the traditional recovery method of full data combined with Binlog incremental data in the function of restoring user data at a point in time.

Parallel-Raft algorithm

When it comes to transaction consistency of distributed databases, we can easily think of 2PC (2 Phases Commit) and 3PC (3 Phases Commit) protocols. As for data state consistency, we have to mention the Paxos protocol invented by Leslie Lamport. After Paxos was widely used in multiple distributed systems by Internet manufacturers such as Google, Paxos became the most concerned data state consistency. one of the algorithms. However, because the theory and implementation of Paxos algorithm papers are too complicated, it is difficult to be quickly applied to engineering technology. One of the problems solved by Paxos is, in a collection of multiple machines, whether any machine with the same initial state in the collection can reach the same state point through the same sequence of commands, forming a consistent convergent state machine. Another problem is that as a member of the cluster, through micro-time serial communication, it is necessary to find a protocol that is always valid. When a certain data state of a machine needs to be changed, it needs to communicate with the entire cluster (including other machines) Relying on communication and agreement to achieve the same cognition, together agree to a certain state change on this machine.

Based on these two points, the problem of achieving a consistent state machine between machines with different roles in the distributed cluster architecture is basically solved. It is also possible to further design the framework of most distributed systems. Paxos can be called the peer-to-peer design of P2P (Peer to Peer), which is more abstract and general, and more difficult to understand. Raft, on the other hand, elects the leader, and then initiates the implementation of state consistency updates for other roles through the leader, which is easier to understand. The implementation process of the protocol itself is similar to that of Paxos.

Parallel-Raft is an improved consensus algorithm based on the Raft protocol for the I/O model of the PolarDB chunk server. The Raft protocol is based on the continuity of Logs. If log#n is not submitted, subsequent Logs are not allowed to be submitted. Parallel-Raft implemented by PolarDB allows parallel submission, breaking the assumption that Raft's log is continuous, improving concurrency, and ensuring consistency through additional restrictions.

Docker

The earliest appearance of container virtualization technology is that in order to solve the process of process migration between operating systems or during the running process of the Linux kernel, the context and state of the process can be saved through the decoupling between the process and the operating system. , for replication and recovery purposes. However, the implementation of LXC contributed to the birth of the once-popular Docker.

In principle, the implementation of container virtualization is more lightweight than KVM and other virtualization technologies. If the user does not need to perceive the functions of the entire operating system, then the container virtualization technology should theoretically be able to obtain a better computing energy efficiency ratio. In fact, the process virtualization and resource isolation method of LXC plus Cgroup has been used for many years, and it is often applied to the checkpoint and restart recovery of MPI super tasks in the HPC field. PolarDB uses the Docker environment to run DB computing nodes, and uses a lighter virtualization method to solve the isolation of resources and performance, and save system resources.

User-Space file system

When it comes to file systems, we have to mention the POSIX semantics invented by IEEE (POSIX.1 has been accepted by ISO), just like when we talk about databases, we need to talk about SQL standards. The biggest challenge of implementing a general distributed file system is to provide strong concurrent file read and write performance on the basis of being fully compatible with the POSIX standard. However, the compatibility of POSIX is bound to sacrifice part of the performance to obtain full support for the standard, and the complexity of the system implementation is also greatly increased. In the final analysis, it is the trade-off and difference between universal design and proprietary design, and also the balance between ease of use and performance, which is a classic problem. The distributed file system is the most enduring technology in the IT industry. From the HPC era, the cloud computing era, the Internet era, and the big data era, there have been innovations. In fact, more strictly speaking, many customizations have emerged for different application I/O scenarios. The realization of the change, to say the white point, is not to support the POSIX standard.

At this point, it can only be said that you give up, but when it only serves specialized I/O scenarios, it is not a problem not to apply POSIX. This is exactly the same as the development from SQL to NoSQL. A file system that supports POSIX needs to implement a system call interface compatible with standard file read and write operations, so that users do not need to modify the file operation application implemented by the POSIX interface. This requires riveting the specific file system kernel implementation through the Linux VFS layer. This is also one of the reasons that make the implementation of file system engineering more difficult.

For a distributed file system, the kernel module must also exchange data with the user-mode Daemon to achieve data fragmentation and transmission to other machines through the Daemon process. The User-Space file system provides a dedicated API for users, which does not need to be fully compatible with POSIX standards, nor does it need to perform 1:1 mapping of system calls in the operating system kernel. It directly implements file system metadata management and data read and write access in user mode. Support is enough, the difficulty of implementation is greatly reduced, and it is more conducive to inter-process communication of distributed systems.

Summary: Through the above introduction, it is not difficult to find that PolarDB uses a full range of technical means from computing virtualization, high-speed network interconnection, storage block devices, distributed file systems, database physical replication, etc. It can be said that it is the most popular technology. Integrate. The integration and innovation of these key technologies has made the performance of PolarDB a qualitative leap.

write at the end

Alibaba Cloud PolarDB is one of the key milestones in product evolution in the cloud computing 2.0 era, and it is also an active driving force for the open source database ecosystem. The public beta version of PolarDB, which will be launched at the end of September 2017, will be fully compatible with MySQL. Next, we will also start the development of a compatible PostgreSQL database engine.

Source: infoQ

 

Original link: https://yq.aliyun.com/articles/172724

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326454672&siteId=291194637