Looking forward to the evolution of database computing power in the new era of cloud computing

From 1970 when the relational database theory was put forward to the birth of the concept of cloud computing in 2006 , the technological innovation in the IT industry has been leading the trend of the times for half a century. As one of the core areas of IT technology, how does the database continue to evolve and innovate in the context of the era of cloud computing .

On October 12th , at the Yunqi Conference - Alibaba Cloud Self-developed Database POLARDB , Alibaba Cloud researcher Yu Feng shared the theme of "Prospecting the Evolution of Database Computing Power in the New Era of Cloud Computing", and discussed the self-developed database POLARDB of the Alibaba Cloud database team. The design concept, and together look forward to the future of a new generation of cloud databases.

Evolution of cloud databases

Databases are computing, they are storage, and they also involve networking and memory, but they are all very intensive basic software. In fact, in the past, things like middleware, converters, etc. that everyone came into contact with were all doing traffic, and the database was a real knowledge-intensive transaction. The quality of a database depends on whether it is a comprehensive software that can maximize computing power, storage capacity, network capacity and memory capacity. Therefore, all the evolution and development of POLARDB revolves around these three aspects.

9287105a82e35f66a6a29308ecfc7b4de7a48e10 

There are hundreds of popular databases around the world, but they are all a combination of three basic things: computing, storage, and network. Whenever computing hardware, storage, or network changes, it is an opportunity for database transformation. Every change also contains a variety of opportunities.

Now, you can easily get a fixed machine with T -level memory in Alibaba Cloud or other cloud manufacturers, which also reflects the extremely terrifying growth rate of memory. The same is true of the network. The RDMA network has undergone great changes. Although individuals generally do not purchase and use them due to cost considerations, IT companies will embrace these things. This is a change in hardware. Then from a computing point of view, multi-core CPUs and 128 cores have become very popular. There are still many GPU -based databases, and Alibaba Cloud PolarDB has also researched and explored in this area. The same goes for storage, such as SSDs and Ethernet developed by Intel . Memory is no exception. Today, a machine will have several terabytes of memory, which is too common. Although the InfiniBand on the network is still very high-end, I believe that everyone will be able to easily obtain and use it in the future.

Changes in hardware must be accompanied by changes in software. Different from the database service, the database is more like the engine on the car, and the database service needs to provide a complete vehicle, as well as 4S and the entire ecological service. The entire DB engine is actually a very small part in the cloud database environment, and needs to be isolated like Docker . Therefore, if the isolation technology is not applied enough, the subsequent use experience will be very poor. For example, on "Double 11 ", in the past, it might take 180 microseconds for a packet to reach the DB backend from the client machine , but now it only takes about 30 microseconds, which can make the entire cloud service more competitive and more convenient to use. The link pursues the ultimate. Around computing, storage and networking, POLARDB is also committed to achieving the ultimate experience.

The new generation cloud database POLARDB

Compared with MySQL , POLARDB has a 6 times performance improvement - this is its core point, and it has also been open for testing. So how is the 6x performance improvement achieved? There are a few things in common: first, the latest hardware; second, storage, engines built from the ground up. POLARDB is 100% compatible with MySQL , which means that how to do it in the past is the same now. The difference is that it is bigger, better and cheaper. The key is that it uses high CPU , network card, RDMA and specifications, which can solve the most important pain points . In the past, the MySQL single machine was 2 to 3TB , but the actual POLARDB has a disk of up to 100TB .

139c4b8734dfc1383621d72e15f0face2f5df26d 

I have a few Alibaba Cloud lucky coupons to share with you. There will be special surprises for purchasing or upgrading Alibaba Cloud products with the coupons! Take all the lucky coupons for the products you want to buy! Hurry up, it's about to be sold out.

 

When disaster strikes, data usually cannot be moved quickly, so a question arises: why do you need to do POLARDB ? In fact , the research of POLARDB has gone through two or three years. Many problems and difficulties encountered in the use of customers' products cannot be solved well in the existing solutions and structures, but POLARDB can perfectly meet the biggest demands of customers.         

虽然看起来,似乎只要有好的硬件均可以满足这一要求,然而其实并没有那么简单。最早的MySQL标准都是一主一从结构,所以SQL写入时的数据流基本是单向流进,会有八个数据流流动。数据流动一多,也就意味着IO消耗会过大,延迟变大。除去这个原生模式外,RDSAWS采取的做法都是将官方版本放于云上,提供存储以及更好的弹性。POLARDB则是为云而生,其所有的弹性、隔离、IO存储均是针对云的特点所设计的。

0fbe2c895f4b154bb0796dbcd819622174968d14 

在系统的演进中,团队对整体业务有了很深的了解,今天的数据流已经是极大简化后的版本。如上图所示,左边是单向的,右边则是双向数据流,比如准备发生同步时数据会主动推给中间节点,所以这对于技术要求会很高。此外,过去是集中式存储,数据同步中的Copy成本很高,并且会造成很多不可见的问题。在MySQL的使用上,过去因为一些数据不可用所以很少有人会把它用于关键场合,而POLARDB的设计具有金融级别的可靠性,能在硬件基础上把很多事情简化做好,让它更可靠、更简单。

如果有一个好的数据库却不知道如何使用,或者说配套不完整,这些都是空谈。POLARDB所提供的数据库服务则具有完整的生命周期,从上云端到扩容、缩容全覆盖。淘宝“双11”让团队明确了互联网高可用的使用,因此对于容灾、扩容、弹性、存版都了解得非常清楚,所以在设计POLARDB时即针对这些场景,后续又逐渐叠加了金融、政企等场景。

e178a2d9dbf7d1b204007ecf93d5a6c4f5a89424 

在设计之初,POLARDB的计算和存储就是分离的,如上所示的图中可以看到非常明显的三层:首先是用户应用层,然后是计算层,最后是存储层。过去的计算存储不分离,就会导致在节点恢复时需要数据秒搬,但一般很少有系统可以做到这一点,比如几个T的数据需要搬一至两天,不可能突破物理极限。在计算、存储完全分离后系统还设置了一个中间件,读写分离的情况它的写能力是非常强的,中间件也会帮助做读写能力和自动识别读写能力,同时成本也很低。

系统设计需要非常可靠数据,这是数据库最核心的东西也是底线。通过计算以后,存储分离的情况下数据都在后端,就形成了一个标准的分布式数据库,可以在生命周期里扩容、缩容、读写分离以及容灾。但是下面还需要考虑服务迁移的问题,可以用商业数据库或MySQL解决:诸如DTSBS的商业数据库能够用逻辑复制拉取过来;MySQL则可以用物理设置。POLARDB能够实现节点同步,即逻辑复制就是一条复制一次,物理复制就是一批拿过去,效果会更高。

If the data is transferred through physical replication and OSS , the speed will be very fast. According to bank requirements, future operational data needs to be retained for a long time, so data can be imported to OSS and then exported when needed. RDS is a complete system that is 100% compatible with all ecosystems. We also hope that POLARDB can evolve with the ecosystem.

Internationalization, intelligence, continuous innovation

The most important point of PPOLARDB is that its database can be in line with international standards. There are some domestic and foreign compliance requirements that are not the same. For example, the previous projects done by Sing Post have very high compliance requirements, but these things belong to the core functions of output, which are in RDS , so the system can easily inherit Come over and do better. In the future, I believe that POLARDB will continue to evolve in the direction of informatization, Internetization, internationalization, intelligence and innovation.

 

Click to read the original text

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326399007&siteId=291194637