Alibaba's next-generation database technology: putting databases into containers is no longer a myth

Abstract: Zhang Rui, head of database technology team of Alibaba Group, researcher of Alibaba, Oracle ACE. The technical director of the Double Eleven database, and he has served as the technical support director of the Double Eleven for two times. Since joining Alibaba in 2005, he has been leading the continuous innovation of the entire Alibaba database technology.

Review video: http://yq.aliyun.com/webinar/play/220
Zhang Rui, head of database technology team of Alibaba Group, researcher of Alibaba, Oracle ACE. The technical director of the Double Eleven database, and he has served as the technical support director of the Double Eleven for two times. Since joining Alibaba in 2005, he has been leading the continuous innovation of the entire Alibaba database technology.

Recently, at the 2017 China Database Technology Conference held in Beijing, Zhang Rui, a researcher from Alibaba Group, delivered a keynote speech entitled "Thinking of the Database Architecture for the Future". It mainly introduced the ideas and experience of Alibaba's database technology team in building the next-generation database technology system of Alibaba, hoping to introduce Alibaba's achievements, pitfalls and future-oriented thinking to the participants, so as to contribute to the development of China's database technology. .
1

Full text of the speech:

Let me first introduce myself. I joined Alibaba in 2005 and have been working on databases. Today, this topic is my recent thoughts on Alibaba's next-generation database system. I will share with you here. I hope Able to throw bricks to attract jade. If you can get some experience and have some ideas based on the actual scene you are facing after I share it today, the purpose of my sharing today has been achieved.

Today I will talk about the following aspects: First, I will talk about our innovation in the kernel, how the database can achieve flexible scheduling, thinking about intelligence, and finally the pits we have stepped on and the future direction.

Problems faced by the database in the Ali scenario

2

First of all, let me say that the earliest database technology used by Alibaba was Oracle. Later, everyone also knows that one thing is to go to IOE. In the process of going to IOE, we have entered the era of using open source databases. This era has passed today, and this process will probably continue. For five or six years, the whole Alibaba has an open source MYSQL branch that everyone knows - AliSQL, we have made a lot of improvements on it, so I listed some improvements on AliSQL here, but today I actually don't want to Speaking of this, I would like to talk about which direction the next-generation database technology and database architecture will go in the future.

I think so, because today's Alibaba is a technology company after all, so many times we look at Google or some big Internet companies. Where do their technological innovations come from? from the question. That is to say, everyone here today is the same as me. What is the problem in the scenario you are facing, and how you look at the depth of the problem determines how big the innovation you create today.

So today, let’s take a new look at the problems that Ali faces. I believe everyone here must have the same idea. The problems that Ali faces are not necessarily your problems, but I want to talk about the problems faced by Ali today, and what we have seen What you do after these problems, I hope to bring you a reference, and I hope you can see what the problems you are facing and how you will think.

3

It can be seen that Alibaba's applications are actually quite different from Facebook and Google's applications. We also communicated with them and found that their business scenarios are really different. First of all, our main applications are transactional. What are the requirements of these applications, you will see these points (see the picture), the following mainly talk about our thinking.

High availability and strong consistency of data are very important today. The problems caused by data inconsistency are very, very huge. Everyone also uses Taobao and is also a user of some services of Alibaba. Both my parents are concerned about these things.

Second, the cost of storage is very high today. All data centers are already using SSDs, but the cost of data storage is still a very big problem faced by a large enterprise. This is a real money issue.

In addition, as I mentioned just now, data has a life cycle, so data, especially transaction data, has a very obvious cold and hot state. You must seldom look at your purchase records on Taobao a year ago, but the current purchases The record will be read, so the system needs to read it and update it frequently.

Another feature is that today Ali's business is relatively simple. For example, we need to achieve the ultimate in OLTP performance. Another point unique to Alibaba is Double Eleven. What is Double Eleven in essence, it is essentially creating a very large hot spot effect in technology. What does this require of us? Demand is an extremely elastic ability. In fact, the database is very lacking in this direction. How to achieve elastic scaling of the database is very difficult.

Finally, I would like to talk about DBAs. Many people here today may be DBAs. I would like to talk about what kind of thinking Ali has got in the direction of intelligence. We have a huge amount of data, and we also have many experienced DBAs. , but how do these DBAs complete the next transformation, and how do they not become the bottleneck of the business? How to achieve self-diagnosis and self-optimization of the database. This is the problem we see, and finally I will share my thinking in this regard.

Ali's thinking on the database kernel direction

Let me talk about our thinking on the database kernel first. First of all, I respect the manufacturers of domestic databases. Anyone who has improved the kernel knows that it is very difficult to write line-by-line codes for each function. respect. What I want to talk about today is my first time speaking at a domestic conference. First, I will talk about AliSQL X-Cluster. X-Cluster is a three-node cluster built on AliSQL. We have introduced the Paxos consistency protocol to ensure that MySQL becomes a cluster, and this cluster has strong data consistency, is oriented to remote deployment, and can tolerate high network latency. and a series of features.

4

Today, many databases are associated with Paxos, such as Google's Spanser database, which we all know. But before, we didn't think about the relationship between databases and Paxos. The place needs to use the Paxos protocol. First, we need to use Paxos to elect, especially in high availability scenarios, we need to uniquely elect a node as the master node, which requires the use of Paxos; the second is to use the Paxos protocol. To ensure the strong consistency of data in the database without shared storage, that is, how to ensure strong consistency and high availability of data between multiple nodes.

Therefore, the application of Paxos in the current database architecture design is very extensive. Today, many exhibitors outside, including Goolge Spanser, are also using the Paxos protocol and the database together to do it. Therefore, the three-node cluster of AliSQL is the same, that is, it uses the Paxos protocol to become a cluster with strong data consistency. Below I will briefly explain what the role of the Paxos protocol is in the database.

5

In essence, Paxos is also a common technology now. Everyone is engaged in databases. In short, the Paxos protocol is used in our database, that is, after a transaction group is submitted and landed on one node, it must be landed on multiple nodes at the same time. , that is to say, the original writing only needs to be written to one node, but now it needs to write to another node across the network. This node may be in another place or another city in the world, and it needs to go through a very long network in the middle. Delay, at this time need to have some core technologies.

What is our goal? First of all, there is no way to resist the physical delay. In the past, operations on the database only needed to be submitted locally, but now the database is deployed globally, in different places, and even across networks. This delay characteristic cannot be overcome, but in this case we can to do what? As long as the delay increases, the throughput will not drop as much as possible. The original QPS and TPS can be guaranteed. As long as the project is done well, it can be guaranteed, but the delay will definitely increase.

This is also the description of "my latency is very high" that you often see in Goolgle Spanser papers. In this case of high latency, how to write a good application to ensure availability and high throughput, this is Another topic. Everyone has been accustomed to a concept for a long time, that is, the database must have low latency, and high latency will lead to application problems. To adapt to such a high-latency database system. Of course, Batching and Pipelining technologies are used, which are essentially general engineering optimizations to make synchronization of multiple replicas across the network more efficient, but the delay will definitely increase.

In fact, everyone knows that the database needs to have three copies or three nodes, which is essentially to achieve strong data consistency, and everyone is working hard in this direction, such as Group replication launched by Oracle some time ago, which is also a three-node technology, X-Cluster The difference with it is that our initial goal was to cross the city. At the beginning of the design, we thought that this node must be deployed across a very long distance. Proposing this goal at the beginning of the design caused our design, engineering practice, Including the final performance has a relatively large difference.

Here we have also done some comparisons between X-Cluster and Oracle's Group replication. We are better than them in the same city environment; the difference is even greater in remote scenarios, because we originally designed it for remote scenarios. . As you may know, Ali has been talking about the concept of multiple activities in different places, that is, how to do multiple activities in different places between IDCs, so at the beginning we designed it for different places.

This is a typical architecture diagram of how X-Cluster works in a remote multi-active scenario. This is a typical architecture of 3 cities, 4 copies of data and 5 copies of logs. If you want to simplify and consider the cost of data storage, you can actually do To 3 copies of data and 5 copies of logs, in this way, we can ensure that any failures at the city level, computer room machines, including single machines can be avoided, and there is zero data loss. Today we can do this, and we can ensure zero data loss, Strongly consistent. The data at any point will at least be written to the database of the data center in another city. This is the goal of our X-Cluster design at the beginning, which is also a typical multi-active architecture in different places.

6

Let’s talk about a small but very practical innovation that everyone may be interested in. This is X-KV. It’s also worth mentioning here that all of our next-generation technology components start with an X. This X-KV is an improvement based on the original MYSQL Memcached plugin to achieve very high performance. Everyone may know MySQL's Memcached plugin. You can directly access the data in the InnoDB buffer through the interface of the Memcached plugin, and the reading performance can be improved. To a very high level, what is the significance of this for everyone, or for the so-called architects, or in the design process?

That is, there is no need for caching in many scenarios, because the database + cache structure is basically a common scenario for all businesses, but the problem with caching is that the data in the cache and the database are always inconsistent, and a synchronization or invalidation mechanism is needed to do this. . The problem of reading after using X-KV can basically be solved. This is because as long as a piece of data is accessed through this interface, it can basically achieve the same ability to access the cache as the original, or in most cases, no cache is actually needed.

7

The second is that it reduces the response time of the application. Originally, the response time of SQL access will be relatively high. We have made some improvements on this. Originally, the Memcached plugin plugin has some restrictions on the type of supported data, including some index types. The support is not very good, so we have made improvements. This can be used by everyone. If this method is used, many cache systems are basically unnecessary.

The third thing I want to talk about is how to solve the separation of hot and cold data. We naturally use the MySQL framework. Here we directly take the big picture of MySQL to show it. You can see that MySQL is essentially a Client, there is a Server in the middle, and a storage layer at the bottom. There can be various engines in the storage layer, so different features can be achieved through different engines. The most commonly used engine today is the InnoDB engine. The characteristics of each storage engine are essentially caused by its structure. For example, InnoDB adopts the B+ Tree structure, which brings about the characteristics of relatively balanced reading and writing, because it is relatively mature after so many years of development.

8

For example, we choose RocksDB now, because we have some cooperation with Facebook on RocksDB, which is to introduce it to MySQL. Its essential structure is LSM Tree. The benefits of this structure include writing friendliness and compression. The characteristics are good and so on. Introducing it into our reform is not just introducing a structure, but today we use these two engines to solve our data separation problem today. We also had some conversations with Facebook, RocksDB is not as stable and good today, but as a complement to the InnoDB storage engine, it is very effective.

Especially in the context of a stable database, how can users today not have much sense of the heat and coldness of their own data, because you may also know that you used to have some separation of data, but for the application side, it is necessary to store the data. From a certain storage to a certain storage, and then delete it; or the DBA often goes to the business developer to say that your storage space is not enough and takes up a lot of space. Can you delete some data or import this data to a lower cost in the storage engine. We do it all the time, and to be blunt here, I'm sure everyone has done it.

However, with this dual-engine structure, the high compression rate of RocksDB, especially in the scenario of OLTP row storage, can bring us relatively large benefits. So we can combine these two engines under the MySQL feature, and can take advantage of a relatively cheap architecture, especially the LSM Tree architecture, which is friendly to cheap storage media, because his writing is is written sequentially. These are some of our thoughts on the database kernel today.

Why do we need to implement flexible scheduling? In

the second part, I want to talk about database flexible scheduling. Everyone knows that Alibaba's Double Eleven and Double Eleven are the biggest challenges for us. Applications may already be easy to do flexible scheduling, including: Going to the cloud, elastic expansion and contraction, but the database is really difficult, we have also explored this for a while, and today we will share our thoughts with you.

9

I heard a lot of people say that database containerization is a false proposition. Why do you need to containerize, why put the database in the container? Second, there are some new technologies. For example, as mentioned by the sharing guests just now, it is possible to store the storage at a remote location and access it through the network. But let's think from a positive perspective. Don't think that the elastic scheduling of the database may not be possible. If the database wants to achieve elastic scheduling, what is the premise?

Let's first think that the database needs to be as simple as an application with elastic scheduling, so what does the database need to do? I think there are two major prerequisites that must be done: 1. It must be placed in a container; 2. Computing and storage must be separated. Because if computing and storage are not inherently separated, there is basically no way for databases to schedule elastically. We all know that computing resources are easily moved, but storage resources are basically difficult to move in a short period of time, so it is very, very difficult to do elasticity. So these are two basic conditions.

In our scenario, if you also encounter this kind of problem, it is not a false proposition. I think this thing is unreasonable. More often, it is not whether the technology is correct, but whether it is needed in your scenario. So today we have done two things. The first is to put it in the container. We are currently supporting physical machines, VMs and Docker. One layer will shield the complexity of the container, and the database must be placed in the container. . Applications are often placed in containers for deployment, but we put databases in containers for scheduling purposes, because the database itself does not have many releases and does not need to be released as frequently as applications. After containerization, the database can be mixed with other containers on a physical machine.

We as DBAs all have some traditional viewpoints. For example, applications cannot be run on database servers, and containers cannot be used for databases. I don't know if all of you here, whenever someone or your boss asks you this question, do you always immediately reject him and say "databases can't do that", but today you might be able to tell your boss to give it a try try.

Storage and computing are separated. When the database was first built, storage and computing were actually separated. An Oracle database was used, a SAN network was used, and a storage was connected underneath. The storage and computing itself were separated, and the SAN network was used in the middle. Then it evolved to using Local disks, SSD disks, and PCs as servers. , then in the future, we will return to the structure of separation of storage and computing. The development of today's network technology, not to mention proprietary networks, but to general 25G networks, as well as the use of new technologies such as RDMA and SPDK, allow us to have The ability to separate storage and computing has made the conditions for database storage and computing separated.

Today, a large number of optimized features have been seen in the database, which can reduce IO, can turn discrete IO into sequential IO, and can be very friendly to the underlying storage. In terms of storage cost, shared storage will greatly reduce the cost, because the storage fragmentation will be greatly compressed, because the original space on each machine is 30%, 50% free, and other machines are difficult to use. , when you turn these shards into a Pool today is a big gain.

In addition, if the database adopts the separation of storage and computing in the future, it will break the current mainstream database architecture of one master and one backup. In this architecture, at least half of the computing resources are completely wasted, regardless of whether your backup database is used for reporting. Or other applications, but it is basically wasteful. If shared storage could be done, that would be a huge benefit. This is our thinking on scheduling. Tomorrow, there will be a student from Ali who will give you a detailed introduction to containers and storage resources on this topic. I will only give a general overview today.

What is the job content of the DBA in the future?

Finally, let’s talk about the DBA thing. I was talking about it just now. I’m talking about moving from automation to intelligence. We talk about going from self-service to intelligence internally. I don’t know if everyone is troubled. The speed of business development is far greater than the number of DBAs. Growth, if you don't have the latter, I can not talk about it, but if you have, you can listen to our thinking on this aspect, we also encounter the same problem, how to develop DBA, the next step of automation should be What to do, many people say whether DBAs will be eliminated, at least after we have figured out these issues, Alibaba's DBAs are not entangled in this matter, so I will share this thinking with you today.

First of all, we did something today. We gave up the original idea. What was the original idea? In the earliest days, we needed DBA to look at each SQL that we launched; in the second stage, we built a system, and before each SQL went online, the system had to estimate whether its performance was good or not, and only go online if it was good. What is the biggest change and reflection of all we feel today? All optimization based on a single statement is not particularly meaningful, because only based on large data and calculations can it become an intelligent thing, otherwise it is based on rules.

It is difficult for a rule-based system to have a particularly long-term viability, because there are rules that can never be written. We have also made such an attempt. When some SQL comes in, the system needs to make some judgments on it, and finally finds rules that can never be written. So later we found another direction. I believe that everyone here today has a monitoring system no matter how big or small your company is. We will start from this monitoring system and how to turn a monitoring system into an intelligent one. Optimizing the engine, we don't say it's the brain here, but the engine. What will this engine do?

10

First of all, we have given up the optimization based on a single SQL, because it is meaningless, the DBA has not reviewed a single SQL, and the system does not make much sense to see a single SQL. Our first scenario today is to say a lot of data, what is a lot of data? Starting from our monitoring system, we put forward the first goal of collecting each running SQL, not sampling, but each one. Storage is a huge pressure on larger systems because of the large by-products.

Just like the time series database generated by Facebook when it was making monitoring products, the by-products we generate today also bring pressure on the time series database, which I won't expand on today. We collect the operation of each SQL, because we have made improvements in the kernel, which can collect the source, path, and all information of each SQL in the database. To suppress the monitoring indicators to the second level, the indicators of all monitoring items must reach the second level at least, which is what our existing technology can do.

Additionally, we combine application-side logging with the database. It turned out that when the database was made, the application side roared and said, "Is there any problem with the database?" The DBA said there was no problem. But from the application side, I actually see that there are many problems with the database, including application error reporting, including response time, and the application-side error reporting should also be combined with the database, especially the database error reported in the application, and the entire chain road.

Response time, only the response time of the application side is an indicator that can measure whether a database is a good indicator, not how the database itself is, what load is low, and how much CPU utilization is. When all these data are collected, we call these large amounts of time series data by-products, which put a huge pressure on our entire link. Our classmates who built the entire monitoring system platform felt that life would be impossible, because the original storage system could not support it, the analysis system could not support it, and the original platform could not calculate it. Therefore, considering this goal first, huge improvements have been made based on the link, including how to realize cheap storage and how to analyze in real time, which are the requirements of storage and computing.

Our goal today is clearly mentioned within Alibaba. We hope to replace most of the DBA jobs within two to three years. I don’t know if it can be achieved in two to three years, but I hope it can be achieved. In fact, today's DBA is like this. The work of DBA is essentially divided into two categories. The first category is operation and maintenance, but operation and maintenance is essentially easier to solve. Basically, there are some automated operation and maintenance systems.

But the most difficult thing to solve is the diagnosis and optimization I just said. I have also learned about many companies, such as Google, facebook, I said why do you not have a DBA? They say that we don't have a DBA, there is no such traditional DBA that is optimized for diagnosis and performance like in China, and there are very few such responsibilities. So hopefully this can be done.

Finally, we have data and calculations. We think that the future direction may be the popular machine learning. There will be an Ali alumni meeting tomorrow to share this topic. I will not talk about machine learning here, because I think we also At the beginning, there is nothing worth talking about, but we think this design is quite interesting. As long as you accumulate enough data and calculations, this thing is quite interesting.

Our other thoughts on the future of the database. On the

last page of PPT, I will use the vernacular to talk about some of my understanding of the entire database system.

11

Today, there is no single storage or database in a company that can solve all problems. Today, more and more trends see that the diversity of data storage is bound to exist, because row storage has the advantages of row storage, and column storage has columns. The advantages of storage, the advantages of computing, the advantages of analysis, the advantages of OLTP, the advantages of OLTP, do not expect, or it is difficult to expect a system to do all things, I said this may not be very good , but it's really harder, but what's the point we're seeing? That is, each technology or product can do one thing best in production, and you can solve your problem with the best thing it does.

This goes back to the previous question. We have also gone through some detours. There are more and more types of data storage. Today we use this one tomorrow, what should we do? Our operation and maintenance can not be done, this support is very painful.

So today we propose to build two platforms: 1. Build a supporting platform that shields the complexity of the lower layer storage as much as possible and provides unified interfaces and services to the upper layer; 2. Build a service platform that clearly R&D-oriented platform, R&D personnel can use database services directly through this platform. I see that many companies mix the operation and maintenance platform with the platform developed by DBA, but Ali's idea is that the support platform and the service platform are two layered platforms, the support platform is below, and the upper service platform serves all developers. , developers can see what database I use, how well the performance is, and what can be done on this platform, which can save a lot of DBA manpower.

We have an internal joke called "platforms that don't save manpower and technologies that don't save costs are hooligans." How do you say this? That is to say, our automation systems, especially large companies, are building more and more, and the final result is that people are incompetent. I don't know if you have this problem. This is the last point I talked about, the paradox of automation systems. Does everyone in every company have one thing happen today when you are working on an automated system? Anyway, it happened in Ali, that is, the ability of people is weakened.

The paradox of this automated system is what we have inadvertently seen. When talking about the autopilot of the aircraft, because the autopilot is doing well enough, when there is an emergency problem, the pilot of the aircraft does not have enough ability to deal with the emergency. This is the paradox of automated systems.

For comparison, we have built a lot of automated systems today. As a result, people can only click on the system. Once the system is stuck, it will be finished. Many secondary failures occur when the system is stuck. This is a question to think about today. In this process, all people who lead the team or are in the system today have to think about it. We are also facing this problem head-on, so that the ability of people and the ability of the system can be combined. , this is another topic, I can't give an answer today, but pay special attention to these questions.

Don't believe those myths that have expired, database storage and computing can be separated, and databases can also be placed in containers, but you really have to look at those myths or what are the problems behind them, in fact

There may already be a solution now, so everyone here, when your boss, CTO or someone comes and asks you "Can you do this?" I hope you can tell him "I can!" Where did our DBA read an article saying what is the concept of DBA? I was particularly impressed. The reply from a developer at the bottom was that "DBA is a group of people who always say no." It can't be like this. I think we can always say "yes" in the future. "Yes" people, thank you!

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326681095&siteId=291194637