Apache Doris's giant leap: a new architecture for storage and calculation separation

Author: Ma Ruyue, founder of Apache Doris

Historically, the continuous improvement of data analysis requirements (larger data scale, faster processing speed, lower usage cost) and the continuous evolution of computing infrastructure (from dedicated high-end hardware to low-cost commercial hardware to Cloud computing services), these two factors have driven the data warehouse architecture to go through three eras: the all-in-one era of software and hardware integration, the distributed era of storage and computing integration, and the cloud-native era of separation of storage and computing.

Apache Doris was born in the distributed era of storage and computing. It is a typical Shared Nothing architecture: storage and computing on BE nodes are tightly coupled, and multiple BE nodes adopt MPP distributed computing architecture. This architecture brings high availability, A series of core features such as simple deployment, horizontal scalability, and powerful real-time analysis performance. With the advent of the cloud era, whether it is public cloud, private cloud or K8S container platform, more and more enterprises hope that Apache Doris can provide more in-depth adaptation to the new infrastructure of cloud computing, so as to provide more flexible and powerful resilience.

In the past year, the SelectDB technical team designed and implemented a new cloud-native storage-computing separation architecture (SelectDB Cloud) in the process of developing a fully managed enterprise-level cloud data warehouse product based on the Apache Doris kernel. Based on the architecture of cloud-native separation of existing and computing, SelectDB Cloud provides functions such as multi-computing cluster load isolation and computing elastic scaling.

Adhering to the primary goal of "promoting open source technology innovation and prospering the open source community ecology", when Apache Doris 2.0 is about to be released, the SelectDB technical team officially announced that it will contribute to the Apache Doris community to implement the storage-computing separation architecture . This work is expected to be completed around October 2023. At that time, all codes for separation of storage and computing will be submitted to the main branch of the Apache Doris community.

When the storage-computing separation code is integrated into the Apache Doris community, Apache Doris can run in one of the following two modes: the deployment mode of storage-computing integration and the deployment mode of storage-computing separation. Apache Doris running in the two modes will store master data differently. In terms of user experience, most of the functions are consistent, but there are also some functional differences due to differences in implementation architectures and deployment modes. Below we will introduce the core features and applicable scenarios of the two deployment modes.

Distributed Architecture with Integrated Storage and Computing

The storage-computing integrated architecture is also the most mature MPP distributed architecture in terms of performance, ease of use and stability. The overall architecture diagram is as follows:

Saving 1.png

Apache Doris Storage and Computing Integrated Architecture

Easy to deploy

In the integrated mode of storage and computing, Apache Doris does not need to rely on similar external shared file systems or object storage. It only relies on physical servers to deploy two processes of FE and BE to complete the construction of the cluster, which can be expanded from one node to hundreds of nodes. This deployment mode that does not rely on third-party components greatly reduces the threshold for using Apache Doris, and even an office laptop can complete the deployment of Apache Doris.

While the deployment is simple, it also has minimal operation and maintenance costs:

  • Both FE and BE support horizontal linear expansion, there is no need to stop the server during the expansion and contraction process, and stable and reliable online services can be provided normally
  • Data multi-copy storage, its own distributed management framework automatically manages the distribution, repair and balance of data copies, and data copies will automatically load balance among nodes during expansion and contraction without any manual operation

Because the storage-computing integrated architecture has less dependencies and does not need to rely on any other systems, it also enhances the stability of the system. The storage-computing separation mode needs to rely on a shared storage system. For most enterprises, providing a shared storage system is not so easy. The more dependent components, the instability of any component will affect the stability of the entire system. The storage-computing separation architecture relies on a shared storage system, so the stability and availability of the storage system, the network delay and stability connecting the storage system and computing nodes will all have a crucial impact on the stability of the entire storage-computing separation architecture.

excellent performance

In the integrated storage and computing mode, when Apache Doris performs calculations, the computing nodes can directly access the local storage data, make full use of the IO of the machine, reduce unnecessary network overhead, and obtain more extreme query performance. However, in the storage-computing separation mode, the network transmission bandwidth and time consumption often restrict the performance of the system. Therefore, even distributed frameworks such as Hadoop and Spark that adopt the storage-computing separation mode from the beginning will push the calculation logic to the data as much as possible. The node where it is located is used to improve the execution performance of computing tasks.

At the same time, the storage-computing integrated mode is more friendly to predicate pushdown, which brings the condition judgment logic closer to the data source, reduces the amount of data scanned, transmitted and calculated during query, and can better exert the query performance of the system. Compared with the storage-computing separation mode, general storage systems do not have the ability to perform predicate calculations, so predicate pushdown cannot be implemented, which requires the network to transmit a large amount of data to the computing side.

hot and cold stratification

In version 2.0 of Apache Doris, the layering of hot and cold data in the storage-computing integration mode is also realized. The hot and cold data tiering function enables Apache Doris to sink cold data to object storage with lower storage costs. At the same time, the storage method of cold data on object storage is changed from multiple copies to single copy, and the storage cost is further reduced to the original one-third of. Through the layering of hot and cold data, the Apache Doris cluster configuration no longer needs to continuously expand the machine with the accumulation of historical data volume. In essence, the hot and cold data layering of Apache Doris 2.0 is also a form of separation of storage and computing, which only realizes the storage separation of cold data.

For a detailed introduction to the hot and cold data tiering function of Apache Doris 2.0, please refer to How does the Apache Doris hot and cold tiering technology reduce storage costs by 70%?

Applicable Scenarios of Integrated Storage and Computing Architecture

Based on the above reasons, if any of the following conditions are met, then the Apache Doris integrated storage and computing model is more suitable for you:

  • Simple to use Doris, want to try it out quickly, or use it for development and testing
  • No reliable shared storage is available, such as HDFS, Ceph, object storage, etc.
  • The business line maintains Apache Doris independently, and there is no full-time DBA to maintain the Doris cluster
  • No need for extreme elastic expansion and contraction, no need for K8S containerization, no need to run on public cloud or private cloud

New Architecture for Storage and Computing Separation

As mentioned above, if the integrated storage and computing model has so many advantages, why do we need to provide a new architecture that separates storage and computing? The core driving force comes from the maturity of emerging cloud computing infrastructure. Whether it is public cloud, private cloud, or K8s-based container platform, the innovation of cloud computing infrastructure has created new demands.

The cloud itself is separated from storage and computing, and its extreme elasticity brings great cost and economic advantages:

  • Elasticity of computing resources: According to the demand of computing load, computing nodes can be purchased or expanded on demand, and the cost can be minimized under the condition of meeting the computing needs;
  • Low cost and elasticity of storage resources: Object storage provides extremely reliable low-cost storage, and is charged according to the used capacity, so that more data can be stored for a longer period of time.

Even companies that do not use the cloud platform can use the low-cost shared storage system to reduce storage costs and improve computing elasticity while obtaining additional high-quality features such as multi-computing clusters.

The future storage-computing separation architecture is shown in the following figure:

Saving 2.png

New Architecture for Storage and Computing Separation

Master data storage based on shared storage system

Under the integrated storage and computing architecture, data is mainly stored on computing nodes. Even if hot and cold data layers are used, hot data is still only stored on computing nodes. Computing nodes need to rely on their own multi-copy mechanism to ensure data reliability. Under the storage-computing separation architecture, computing nodes no longer store primary data, but use the shared storage layer as a unified primary data storage space, which will bring the following benefits:

  • The computing nodes on the upper layer can be stateless and can be completely shut down
  • More convenient data sharing, data sharing between different clusters and different warehouses can be conveniently carried out
  • Easier data backup and recovery, and time travel of data

Of course, the mature and stable HDFS/object storage also brings extremely low storage costs and high data reliability to the system, and greatly simplifies the implementation complexity of upper-level computing nodes.

Performance optimization based on local cache

The storage-computing separation relies on reading the data of the storage system from the network for calculation, which will cause a decrease in computing performance to a certain extent, which is also the main disadvantage compared with the storage-computing integrated architecture. To solve this problem, cache memory can be provided locally using SSD.

Just as the storage-computing integration greatly alleviates the problem of simultaneous expansion of storage and computing through the hot and cold data layering technology, the introduction of the local cache of computing nodes in the storage-computing separation architecture actually integrates the storage-computing integration capability. This kind of local cache plus shared storage system, we can also call it a hybrid mode, whether it is Snowflake or Redshift, in fact, this method is adopted to deal with the poor performance of the underlying object storage system and network transmission. Performance drops.

After the local cache is introduced, the system will automatically cache the latest written and accessed data according to LRU. Of course, the caching strategy of the table can also be set manually. Because it is only a cache, only a single copy is stored locally, which greatly improves the cache utilization rate, and can reduce the high-speed storage usage by 2/3 compared with the storage-computing integrated mode.

In addition, in the storage-computing integration mode, each tablet has 3 nodes to store its 3 data copies, and data consolidation (Compaction) calculations need to be performed independently on the three copies. Under the separation of storage and calculation, only one node performs data consolidation calculation, which can reduce the amount of data consolidation calculation by 2/3.

Therefore, by introducing a local cache, not only can the performance of the original integration of storage and calculation be basically achieved, but in some cases it will even exceed the performance of the original integration of storage and calculation.

Multiple Computing Clusters for Workload Isolation

Users often want to isolate analysis workloads on the same data. For example, the imported workload is isolated from the query load, and the Adhoc large query load and the online point query load are isolated from each other to avoid resource preemption between different loads.

In version 2.0 of Apache Doris, the resource isolation scheme of workload group (Workload Group) is provided. This solution is a kind of soft limit isolation, which can specify the query priority for specific queries or specific users, but the isolation based on Workload Group cannot achieve the true physical isolation of multi-computing clusters in the storage-computing separation mode.

In the storage-computing separation mode, it provides an isolation method for multiple physical computing clusters in the same warehouse. Because the main data is stored on the shared object storage, users can create multiple computing clusters as needed but share the same data. Computing clusters are physically isolated and can be expanded and contracted independently. The local caches of the computing nodes are all isolated, which ensures the best possible isolation.

Extreme elastic expansion and contraction

The biggest advantage of the separation of storage and computing is that storage and computing can be scaled independently. Data is stored on HDFS or object storage, which can be expanded and contracted on demand. The computing nodes of each computing cluster can achieve more efficient elastic expansion and contraction, including manual expansion and contraction, time-sharing expansion and contraction, and automatic shutdown.

Demonstration of storage-computing separation feature

Here we take SelectDB Cloud's existing products as an example to demonstrate the features and functions of the new storage and computing separation mode.

Saving 3.gif

Create a new warehouse on SelectDB Cloud

Saving 4.gif

Multi-cluster demo on SelectDB Cloud

Saving 5.gif

Manual scaling on SelectDB Cloud

Saving 6.gif

Time-sharing scaling on SelectDB Cloud

Saving 7.gif

The cluster on SelectDB Cloud automatically starts and stops

Applicable scenarios of storage-computing separation architecture

Based on the above introduction, it will undoubtedly help us further clarify the applicable scenarios of the storage-computing separation architecture. If any of the following conditions are met, the storage-computing separation architecture is more suitable for you:

  • If you have already used public cloud services, then the storage-computing separation architecture is definitely worth trying
  • Have a reliable shared storage system, such as HDFS, Ceph, object storage, etc.
  • Requires extreme elastic expansion and contraction, requires K8S containerization, and needs to run on a private cloud
  • A dedicated team maintains the entire company's data warehouse platform

Data Lake Analysis

It should be noted that for different technical groups, the concepts of storage, computing, and separation of storage and computing have different meanings.

Whether it is the storage-computing separation of Apache Doris or the storage-computing separation of Snowflake, it refers to the separation between the internal storage and computing modules of a single system. For users of data lakes and lakehouses, they hope to achieve a more thorough separation, that is, computing systems and storage systems are two different products. The storage system is open to the computing system through a unified open table format, and the computing system can also openly connect to different underlying storage systems.

For Apache Doris, whether it is a storage-computing integrated architecture or a storage-computing-separated architecture, it supports the new Lakehouse system form of lake warehouse integration, that is, it can directly query lake storage and various currently popular open table formats, including Hive. , Iceberg and Hudi et al. It should be noted that Apache Doris currently has relatively complete reading of the data lake, including support for Snapshot reading and Time Travel, and will further support the write-back of data on the lake in the future to form a more closed-loop data analysis and circulation.

In addition to the integration and analysis of data lakes, Apache Doris currently supports direct query and analysis of current common relational databases, object storage, and data in formats such as CSV and Parquet on HDFS.

Future plan

Focusing on the separation of storage and computing, the SelectDB technical team will work with the Apache Doris community to promote research and development in the following related directions:

Convergence of Workload Group and Multiple Computing Clusters

The Workload Group under the current storage-computing integrated architecture and the multi-computing cluster of the storage-computing separation architecture are actually used to solve load isolation. One is a soft limit and the other is a hard limit. There are certain differences in the current specific implementation methods, and the two will be considered later. integration, providing a unified user experience for users.

More convenient data import and export with external data lakes

The data of the external data lake can be incrementally and continuously written to the internal table, and the data of the internal table can also be incrementally and continuously written to the format of the external data lake.

By providing a more convenient function of importing external tables into internal tables, Doris can continuously load the latest data lake data to provide higher data computing performance.

By providing a more convenient function of exporting the outer table from the inner table, the data in the inner table can be incrementally written into an open outer table format. The conversion of data into an open format is to facilitate the connection with the relevant big data ecosystem, and the other is to dispel the concerns of enterprises about being locked in closed data formats.

Implements a shared cache, further decoupled from compute nodes

In the current storage-computing separation mode, the cache uses the local disk of the computing node, so the computing node cannot be made truly stateless. When performing rapid node expansion, it is necessary to consider cache warm-up balance; when performing rapid node scaling, it is necessary to consider cache invalidation and cache transfer to other nodes. In the future, we will implement a shared cache that is separated from the computing nodes to completely separate computing, caching, and object storage, so as to provide second-level scaling capabilities.

Fusion of the two modes of integrated storage and calculation and separation of storage and calculation

The architecture of storage-computing integration and storage-computing separation needs to be determined at the beginning of deployment, and for most users, there may be conversion between different architectures, so the implementation method will be continuously improved in the future, so that the two modes can be more conveniently Perform mutual conversion, and even gradually merge into a set of architecture.

Communication and Trial

The code of the storage-computing separation architecture will be submitted to the Apache Doris community around October 2023, and it is currently in the intensive code finishing stage. Users who are interested in the storage-computing separation architecture can scan the following QR code to join the storage-computing separation technology communication group to learn about the latest developments in code open source. At present, we have also opened early trial for some enterprises, if necessary, please contact [email protected].

Clarification about MyBatis-Flex plagiarizing MyBatis-Plus Arc browser officially released 1.0, claiming to be a substitute for Chrome OpenAI officially launched Android version ChatGPT VS Code optimized name obfuscation compression, reduced built-in JS by 20%! LK-99: The first room temperature and pressure superconductor? Musk "purchased for zero yuan" and robbed the @x Twitter account. The Python Steering Committee plans to accept the PEP 703 proposal, making the global interpreter lock optional . The number of visits to the system's open source and free packet capture software Stack Overflow has dropped significantly, and Musk said it has been replaced by LLM
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5735652/blog/10091810