What kind of database do we need in the cloud era?

The author of this article, Pan Anqun, head of Tencent Cloud TDSQL. Pan Anqun is mainly responsible for the research and development of Tencent Cloud distributed database. He has more than 13 years of experience in distributed database research and development. His research and development results have been selected for many international top conferences such as VLDB and SIGMOD. The secure and controllable distributed database TDSQL created by his team is the first in the industry to be used in the core transaction system of Internet banking, the first to enter the traditional core system of banks, and the first to help traditional large banks realize the first "mainframe" move down in the banking industry. Domestic enterprise-level distributed database on distributed platform.

Zero, introduction

Database technology has been developed for half a century. The database Turing Award winner Michael Stonebraker once divided database model technology into 9 different eras and types in Readings in Database Systems. After the beginning of the cloud era, we can start from the new A perspective to examine the past and future of basic technologies such as databases.

Based on cloud computing, IT basic technologies including databases have undergone substantial changes from technical forms to online and offline integration of the entire market. Database technology has shown a trend from traditional centralized to distributed migration and replacement in the cloud era, which also gives domestic databases Opportunities and Challenges.

In November 2020, Gartner released its 2020 database vendor evaluation report. National database vendors occupies three seats, marking a new stage of development for domestic databases.

At the same time, Gartner predicts that by 2022, three-quarters of the world’s databases will run on the cloud, and we believe that the development of cloud databases is currently undergoing the first stage of "databases to the cloud, that is, from database to cloud database." , To the second phase of "from cloud database to cloud native database" transformation.

In the final analysis, what has the cloud database done to get the industry's recognition? What is the future trend of database development? How can we grasp the pulse of technological innovation in the era of cloud integration under new opportunities? As the current domestic database has become a hot topic, let's talk about our understanding and thinking, and encourage everyone.

1. The evolution of IT basic technology in the cloud era


With the development of cloud computing, the earth-shaking changes in the entire IT basic technology are reflected in several aspects:

The deployment of IT facilities has moved from being fragmented in the past to being centralized and large-scale today. In the past, each enterprise built its own data center and other IT infrastructure, including servers, networks, operating systems, databases, etc., forming a fragmented IT facility model in the enterprise market. Today, based on cloud computing services, enterprise IT facilities are showing centralized and large-scale effects, and the requirements for efficiency, performance, and cost have increased.

IT service delivery, from the past software delivery model to the service delivery model. In the past, purchasing commercial software or using open source software products was basically distributed through commercialization or open source, but now it is completely delivered in the form of individual services. The change that this brings is that users do not need to plan to purchase several servers, but can use them directly on the cloud when they have database usage requirements.

The development method will present a mode from the very low-level development of the business in the past and the operation of calling the low-level API to the SaaS-based, Severless service mode. On the cloud, developers can use a variety of SaaS services. This is a huge change in terms of efficiency and basic technical capabilities.

In terms of data forms and application scenarios, in fact, the past data forms or application scenarios are relatively single. Taking traditional databases as an example, the scenarios are mainly concentrated in traditional industries such as finance, operators, and government affairs. With the development of the Internet, mobile Internet, and industrial Internet, various industries are gradually accelerating their electronic and informatization development trends, and application service forms are diversified, making the current industry data forms and application scenarios more and more diversified , And put forward more requirements and challenges to the underlying database capabilities. In the past, industry scenes were more structured data, and relational databases could support a large part of the scene requirements. Now we can see that many types of databases such as NoSQL and Graph database have emerged. NoSQL subordinates At the same time, it can be subdivided into various categories such as KV type and document type, and the overall database type is still increasing. This is a very reasonable phenomenon. In other words, for the future database, its own development will also show a diversified, integrated and innovative trend. We know that according to traditional experience, if a technical product is a single form, then the pursuit is to achieve generalization as much as possible. However, under the current trend of diversified needs, various trade-offs and trade-offs are required at the technical application level.

Therefore, it can be said that this is the development and change of the cloud era, which brings new challenges and requirements to the database. While the current cloud database has become the general trend, we believe that for the development of domestic cloud databases, it is necessary to continue to explore and make breakthroughs in basic capabilities, cost efficiency, productization, and future technology integration.

2. Challenges of Cloud Database Technology Evolution


Combining with the characteristics of cloud computing, the development of domestic cloud databases is faced with the need to continue to explore basic capability breakthroughs such as availability and consistency, high concurrency performance, and elastic scalability, while facing the diversified trend of the cloud era to create a new generation of distributed database products. Claim.

First, availability and consistency.

As a database, high availability and data consistency are the most basic challenges. High availability requires more than 99.999%; strong data consistency means that there is no error in the data and the database is highly reliable. In the era of cloud computing, the upgrading of technological facilities has brought about changes in the way technology is realized. In the past, such as in the financial industry, the system was based on traditional centralized mainframes or minicomputers with high stability to ensure the availability and consistency of the system. However, the traditional centralized structure has obvious technical boundaries, including the boundaries of performance and throughput. Today, they are already facing larger throughput and performance bottlenecks, and cannot meet the industrial needs of the cloud era. Naturally, the current industry trend is to transform and upgrade to a distributed architecture, and to a distributed and open platform based on x86. The traditional architecture system relies on a large number of redundant designs at the hardware level of the mainframe or minicomputer, and guarantees availability and consistency at the hardware level. Relatively speaking, a new generation of distributed architecture system based on x86 machine deployment poses new challenges and requirements on the basis of how to achieve performance and unlimited horizontal expansion to ensure data consistency and high system availability.

Second, performance cost.

In the era of cloud computing, it is unacceptable if the cost reduction cannot be achieved after achieving large-scale. For cloud computing to help improve the resource utilization rate of the entire society, performance costs need to be kept to a minimum.

For Tencent Cloud’s services, what we need to consider is how to ensure that customers can buy the most advanced services at the cheapest price-such as spending the least money to buy the largest disk space, and the best TPS and other product performance . In this process, the most important thing is resource utilization. For example, if a cloud computing service provider increases resource utilization by 20%, it will greatly reduce a part of the cost for customers and service providers themselves.

Third, cloud native means that it must be flexible.

Elastic scalability means that resources can be allocated and used according to the actual needs of users, instead of pre-purchasing or pre-allocation in the past. In the past, most customers first estimated and then purchased, so resource utilization has been criticized; now, users do not need to estimate how many resources they may use in the future, but can achieve elastic scaling according to real-time usage requirements . Because of this, cloud databases can achieve cost advantages by improving resource utilization. However, extreme elastic scaling puts forward higher requirements for the database in terms of a higher degree of SQL support and distributed transaction capabilities.

Fourth, the degree of cloud database productization and service.

The development of domestic databases has also gone through multiple stages, but it is precisely the emergence of cloud computing and the Internet that many domestic cloud vendors such as Tencent have seized the opportunity to develop a new generation of basic software technologies such as databases based on the characteristics and needs of their own business scenarios. In the past many years, Tencent has placed great emphasis on how to polish and improve the productization of the entire database and improve user experience, including technical productization and service improvement. Internet manufacturers develop their own technical systems based on internal business scenarios, which is an advantage. In the process of opening to B, they also face challenges such as product standardization, versatility, and user experience. The requirements for providing technical products to industry customers are much higher than supporting internal use. For traditional enterprise customers, Tencent Cloud hopes to provide customers with a complete product, not a semi-finished product. Therefore, the degree of productization is a capability that Tencent has always emphasized.

Fifth, massive scene verification.

The final key point is that for cloud databases, the core condition for the development of basic capabilities including stability and feature requirements is that there must be enough application scenarios for polishing. The development and improvement of a database system is a very complicated process. How can the database be practiced and applied? Today, we believe that continuous and massive scene polishing is a key condition for product development. Thanks to Tencent's own applications and applications from all walks of life on the cloud, and the use of more than one million developers, Tencent Cloud database can have enough space to polish products. This is our challenge and the soil that promotes our development.

These challenges are the only way in the development of cloud databases, and they are also opportunities for us to create a new generation of distributed database products in the cloud computing era.

3. Key future trends of cloud database

Based on these challenges and the opportunities given by the cloud computing era, we believe that the future development of cloud databases will include several major trend requirements:

Elastic Scaling: Solving the core cost problem-resource utilization

As mentioned earlier, cost and performance are the core elements. Here is an extension of the difference in the era of cloud computing, that is, we need to achieve flexible scheduling of infrastructure resources such as CPU, memory, and disk.

In the cloud database era, we will comprehensively solve performance, efficiency, and cost issues by exploring the ultimate elastic scaling architecture. Focusing on different scenarios, cloud-native distributed databases can be divided into two architectures: one is Shared Nothing and the other is Shared Storage, both of which can achieve better elastic scalability by implementing a separate architecture for computing and storage. , To overcome the shortcomings of limited storage capacity, difficulty in expansion, and high master-slave latency under the traditional architecture, while also helping us to control costs lower and fully release the cost-effectiveness of leading technologies.

The full serverless architecture of computing and storage database services is also a direction that can be focused in the future. On the basis of automatic and non-inductive expansion and contraction, it can be charged according to actual use at the same time, and it will improve the utility of cloud databases.

Hyper-integration of database bottom layer and service under the trend of multi-mode and multi-engine

The rapid development of new infrastructure and industrial Internet, the acceleration of the digitalization of various industries, and the increasingly diversified and massive data forms. How to solve the problems of database performance, cost, service and other aspects most efficiently, hyper-convergence is an inevitable trend .

At present, we are under the trend of electronic, information construction and digital transformation in all walks of life, and a large number of emerging scenes are constantly emerging in the industry. As the basic software technology supporting various IT system architectures, the database has a variety of new application implementations in its entire technical form, including a large number of NoSQL practices, and the storage field has traditional B+ Tree, current LSM Tree, and line Architecture products such as storage and column storage; according to the type of workload, HTAP databases including OLTP, OLAP, or a mixture of the two have emerged.

In most cases, a variety of engine products will not exist independently to serve an enterprise or system. One size fits none. From a technical point of view, there is a natural contradiction between extreme performance cost and versatility. Therefore, in diversified scenarios, multiple engines must coexist, and the characteristics and advantages of various engines can be fully utilized to achieve both extreme and versatility. .

But not as a cloud database service provider, we expose these various engine products to customers and developers to choose by themselves? From the perspective of product service experience, it must not be. The current situation of multi-modal technology engines will inevitably bring difficulties to developers in selecting models and developing applications-that is, how to ensure that they can adapt to different scenarios while achieving high enough performance, which is also the current database development facing Of a dilemma. In order to solve this problem, we hope that in the future, we do not need users to make these complicated choices, but the system is based on AI intelligent scheduling, serverless and other solutions to completely realize unified and standardized services with multiple engines. From a low-level perspective, in the future, developers do not need to perceive specific product selections. For example, when doing data analysis, the system can automatically help schedule solutions with the best performance and guaranteed transaction consistency.

On this basis, the future trend of cloud database services is the integration of delivery methods, including software and hardware integration, private cloud and public cloud platform integration and other product and service delivery solutions, which can allow customers to balance between sensitive business and operating costs Realize more refined management.

Intelligent: AI+DB

The ecological integration and transformation of underlying technologies such as intelligent technology, and the realization of database autonomy and intelligent management are also one of the future database trends. In the past, for an enterprise, perhaps a few DBAs were enough to manage dozens of sets of instances, but for Tencent, for example, with hundreds of thousands of database instances, it was difficult to deploy manpower to maintain operations, so we must use tools Or platform to solve the problem of operational efficiency. In addition, under the current trend of distributed microservice transformation, future enterprise IT operations will also have increasingly strong autonomy requirements. The integration of intelligent technology and the bottom layer of the database can realize the full life cycle intelligent management of the database.

Accelerate the release of new hardware dividends

In the past, a new hardware promotion cycle was very long, and many traditional enterprises were relatively conservative in purchasing new hardware. For cloud vendors, it is relatively possible to gradually take the lead in exploring new hardware applications, such as first in non-critical applications, and at the same time, it also has a large number of scene verifications to achieve steady and large-scale promotion. From this perspective, based on cloud computing services, cloud native databases are relatively easier to explore and release the dividends brought by new hardware.

At present, we are also in an era of endless new hardware innovations, including SSD, NVM, RDMA+SPDK, thousand-core servers, heterogeneous processors, etc. Based on cloud database services, customers and ordinary developers can also enjoy new hardware more quickly. The blessing brought by the hardware.

Therefore, integration, autonomy, and utility are the basic characteristics of future enterprise-level distributed databases. Tencent Cloud Database will implement the above trends from a practical level to meet the diverse needs of customers from all walks of life in the future.

Guess you like

Origin blog.csdn.net/Tencent_TEG/article/details/113749941