1. Background

This article is one I have always wanted to write, because "separation of computing and storage" has appeared more and more in everyone's field of vision in recent years, but in fact, many of them are still unclear about what it means. After consulting a lot of information and combining with my own understanding, let's talk about what "separation of computing and storage" is.

2. What is calculation? What is storage?

To understand what the separation of computing and storage is, then we need to understand what is computing and what is storage. The word calculation has the meaning of operation, which is inseparable from the relationship with mathematics. Think back to how you got the results from the math questions in the previous math exams. This process is actually called calculation. Then what we are talking about here is actually computer calculation, so we can get the result of the problem through the computer. This is called computer calculation, which is the "calculation" we are talking about here.

For storage, this concept is more difficult to define, and many people simply think that this is a hard disk, U disk, etc. But in fact, in our computer computing process, it is inseparable from storage. We know that the CPU is composed of controllers, operators and registers. When we run a program, our instructions are stored in our memory. Every step performed is inseparable from storage. For example, in the multiple-choice questions in our previous exams, everyone only cares about whether your choice is correct, not your computing process. Similar to scratch paper, although it does not need to be shown to the graders, the same is actually written on the paper.

We said above that computing and storage in a computer are actually inseparable. We think that if computing and storage are separated and interacted through a high-speed network, then every instruction of our CPU needs to be transmitted through the network, and Our network transmission does not match our current CPU speed at all, so our separation of computing and storage is actually a pseudo-requirement. Of course, if our network transmission time is negligible one day in the future, the separation of computing and storage will also be can really be realized.

Since the separation of computing and storage is a pseudo-requirement, why are so many people still mentioning it? Then we need to redefine their meaning. We summarize the storage in the calculation process into calculation, and only focus on the problem and the result. This is our new definition of "storage", just like when we take an exam, the scratch paper does not need to be stored. , you can tear it apart at will.

Then here we will make a final definition. The "storage" we will talk about later needs to be persistent, which can be U disk, hard disk, network disk, etc. The "calculation" we are talking about is actually our calculation process. The required CPU and memory, etc.

3. Why do you need to separate computing and storage

The separation of computing and storage is not a new term that only appears now. 20 years ago, there was a NAS-network attached storage, which is essentially an Ethernet file server using the TCP/IP protocol. At that time, if you wanted large-scale storage, you would have the server save data to the NAS, but the NAS was extremely expensive and difficult to expand, so NAS was not suitable for fast-developing Internet applications.

At this time, Google abandoned the previous concept of "mobile storage to computing" and adopted the concept of "mobile computing to storage" to couple computing and storage, because the network speed at that time was hundreds of times slower than it is now, and the network speed Can't keep up with our needs. In a typical MapReduce deployment both computation and storage are performed in the same cluster, such as subsequent hadoop. This is actually replacing the network transmission speed with the local IO speed.

With the advancement of technology, our network speed is getting faster and faster, our bottleneck is no longer network speed, but our disk I/O speed has not increased significantly, and the architectural shortcomings of computing and storage integration are gradually becoming more and more Expose:

Waste of machines: whether the business reaches the bottleneck first in computing or storage. These two situations are often different, often at different times. There is a certain amount of waste in the architecture. If the calculation is not enough, add a machine; if the storage is not enough, add a machine. So there will be a lot of waste here.
The machine ratio needs to be updated frequently: Generally speaking, the configuration of the machines in a company is relatively fixed, such as how many cores, how much memory, how much storage space, and so on. However, due to the continuous development of the business, our machine configuration also needs to be constantly updated.
Expansion is not easy: if we don't have enough storage, we usually need to expand, and if we expand in the coupled computing and storage mode, we need to migrate a large amount of data.

Due to the increasing number of shortcomings in the coupling of computing and storage, and the speed of the network, the architecture is now starting to develop again in the direction of separation of computing and storage.

4. Who is using compute and storage separation

We have talked about a lot of theoretical knowledge above. I believe that everyone already has a certain understanding of "computation and storage separation", so where is it used? There are two major influences, one is the database, and the other is the message queue. Next, I will talk about how these two parts use the "separation of computing and storage".

4.1 Database

When it comes to databases, we have to think of MySql, which should also be the most familiar database. The following is a master-slave architecture diagram of Mysql:

It can be seen that our master receives data changes, our database reads binlog information, and replays binlog to achieve data replication. There are many problems in Mysql's master-slave architecture:

When the writing pressure of the main library is relatively high, the delay of master-slave replication will become relatively high. Since we are replicating binlog, he will complete all transactions.
The speed of adding slave nodes is slow, because we need to copy the full amount of data to the slave nodes. If the master node has a lot of data at this time, the speed of expanding a slave node will be very slow and high.
For databases with a large amount of data, the backup speed is very slow.
The cost becomes higher. If the capacity of our database is relatively large, then the capacity of all our corresponding slave nodes needs to be as large as the pig database, and our cost will increase linearly with the number of slave databases we need.

All of these issues seem to be leading us towards the separation of computing and storage, allowing all nodes to share a single storage. In 2014, at the AWS conference, AWS announced the launch of Aurora. This is a MySQL-compatible database engine for Amazon's Relational Database Service (RDS), and Aurora perfectly fits the needs of enterprise-class database systems for high availability, performance and scalability, and cloud service hosting. The current Aurora is capable of 6-way replication across 3 availability zones, failover within 30 seconds, and fast crash recovery. In terms of performance, Aurora is now 5 times faster than RDS MySQL 5.6 and 5.7 versions.

Aurora turns the MySQL storage layer into an independent storage node. In Aurora, logs are considered as data, and the logs are completely extracted from the Mysql computing nodes, which are all stored by the storage nodes, and the undolog is also canceled to reduce computing. Interaction between storage and transfer data bandwidth.

Similarly, Ali's team also borrowed the ideas of Aurora and made a lot of optimizations on it. Since Aurora's storage engine of Mysql-Innodb has been modified greatly, the subsequent update of Mysql will inevitably cost a lot. Therefore, Ali The team launched PolarDB on the basis of maintaining the original MySQL IO path. Its design architecture diagram is as follows: Here we need to pay attention to the following things:

libfis: This is a file system library that provides an API interface for computing nodes to access the underlying storage, and perform operations such as file reading and writing and metadata updating. With this, computing nodes do not need to care about where the stored data is.
ChunkServer can be considered as an independent storage sub-node. Each ChunkServer manages an SSD hard disk. Multiple ChunkServers form a Polardb storage node. For computing nodes, it only needs to be considered as a large storage node.
PolarSwitch: It is a Daemon deployed on the computing node. It is responsible for receiving file IO requests sent by libpfs. PolarSwitch divides it into one or more corresponding Chunks, and sends the requests to the ChunkServer to which the Chunk belongs to complete the access.

Of course, PolarDB has many other details. If you are interested, you can read the official documents of Alibaba Cloud. Through this shared storage method, we can make different configuration applications according to our own business. For example, our requirements for concurrency are not high. , the data volume requirements are large, then we can apply for a large amount of storage space, and the computing resources can be relatively small. If we have high requirements for concurrency, especially read requests, then we can apply for multiple reading machines. until our requirements are met.

In fact, it is not only these, many databases are now gradually moving closer to the "separation of computing and storage", including the current OceanBase, TiDB and so on. Therefore, "separation of computing and storage" should be the main development direction of future databases.

4.2 Message Queuing

I have written a lot of articles about message queues before, including Kafka and RocketMQ. Whether it is Kafka or RocketMQ, the design idea is to use the disk of the local machine to save the message queue, which actually has certain drawbacks:

The data is limited. Students who use the two message queues should feel deeply. Generally, the server saves the messages of the last few days. The purpose of this is to save storage space, but it will cause us to trace some historical data. Inquire.
The cost of expansion is high, and the drawbacks in the database will also be shown here.

In response to these problems, Apache Pulsar appeared. pulsar was originally developed by Yahoo. In 2018, Kafka won the InfoWorld Best Open Source Data Platform Award for two consecutive years in one fell swoop.

In Pulsar's architecture, data computing and data storage are two separate structures:

Data computing is also called Broker. Its function is similar to that of Kafka's Broker. It is used for load balancing, processing consumers and producers, etc. If there are too many consumers and producers in the business, we can expand this layer separately.
The data storage is Bookie, pulsar uses the Apache Bookkeeper storage system, and does not care too much about the storage details. In fact, we can also learn from this. When designing such a system, we need to go to the details of computing services by ourselves. Think about the design, and the storage system can use a more mature open source solution.

In theory, Pulsar has unlimited storage, and our messages can be stored forever. Some people will say that hard drives do not need money? Of course, it's not that we still need money. In Pulsar, tiered storage is available. We move old news to cheaper storage solutions, such as AWS's s3 storage, and our current latest news is still on our more expensive SSD. In this mode, not only the storage is unlimited, but also the expansion of our computing resources is unlimited, because our computing resources are basically stateless, and there is no cost to expand, so Pulsar also came up with a multi-tenant function, Instead of building a cluster for each team, it was true in Meituan before. The more important BGs basically have their own Mafka clusters to prevent mutual influence.

Some of Kafka's latest proposals are also moving closer to these aspects. For example, they are also discussing whether to support tiered storage. Of course, whether to adopt a "computation and storage separation" architecture is not necessarily the case, but I think the direction of "computation and storage separation" is It is also the main direction of the future development of message queues.

Summarize

With the development of cloud native, "separation of computing and storage" appears more and more times in various systems. I hope you can have a simple understanding of it after reading this article. At the same time, if you design the system in the future, this scheme can also be considered as one of the options.

If you think this article is helpful to you, your attention and forwarding are the greatest support for me, O(∩_∩)O:

Talk about the separation of computing and storage