Doris (1): Introduction to Doris

1 Introduction to Doris

Apache Doris is a modern analytical database product based on MPP (Massively Parallel Processing) technology. To put it simply, MPP distributes tasks to multiple servers and nodes in parallel. After the calculation is completed on each node, the results of the respective parts are aggregated to obtain the final result (similar to Hadoop). Query results can be obtained with only sub-second response time, effectively supporting real-time data analysis.

Apache Doris can meet a variety of data analysis needs, such as fixed historical reports, real-time data analysis, interactive data analysis and exploratory data analysis, etc. Make your data analysis work easier and more efficient!

MPP (Massively Parallel Processing), that is, large-scale parallel processing. In a non-shared database cluster, each node has an independent disk storage system and memory system. Business data is divided into each node according to the database model and application characteristics. Each Data nodes are connected to each other through a dedicated network or a commercial general-purpose network, and cooperate with each other in computing to provide database services as a whole. Non-shared database clusters have the advantages of complete scalability, high availability, high performance, excellent cost performance, and resource sharing. To put it simply, MPP distributes tasks to multiple servers and nodes in parallel. After the calculation is completed on each node, the results of each part are aggregated to obtain the final result (similar to Hadoop).

2 core features

  • Analytical database based on MPP (massively parallel processing) architecture
  • Excellent performance, PB-level data response in milliseconds/seconds
  • Support standard SQL language, compatible with MySQL protocol
  • vectorized executor
  • Efficient Aggregate Table Technology
  • New pre-polymerization technology Rollup
  • High performance, high availability, high reliability
  • Simplified operation and maintenance, elastic scaling

3 Doris features

excellent performance

TPC-H and TPC-DS have leading performance, high cost performance, high concurrent query, 100 clusters can reach 10w QPS, streaming import single node 50MB/s, small batch import millisecond delay

easy to use

Highly compatible with MySql protocol; supports high integration of online table structure changes and does not depend on external storage systems

Strong scalability

Elegant architecture, a single cluster can be expanded horizontally to more than 200

high availability

Multiple copies, high availability of metadata

4 Doris development history

comparison of open source OLAP engines

5.1 OLTP and OLAP

OLTP

  • Operational processing is called OLTP (On-Line Transaction Processing). The main goal is to do data processing. It is a daily operation on the database for specific businesses, usually querying and modifying a small number of records.
  • Users are more concerned about the response time of operations, data security, integrity, and the number of concurrently supported users.
  • The traditional relational database system (RDBMS), as the main means of data management, is mainly used for operational processing.

OLTP queries generally only access a small number of records, and most of the time they use indexes. For example, the most common primary key-based CRUD operations

OLAP

  • Analytical processing, called OLAP (On-Line Analytical Processing), the main goal is to do data analysis.
  • Generally, complex multi-dimensional analysis is performed on historical data of certain subjects to support management decision-making.
  • A data warehouse is a typical example of an OLAP system, which is mainly used for data analysis.

OLAP queries generally need to scan a large amount of data, and most of the time only some columns are accessed, and the aggregation requirements (Sum, Count, Max, Min, etc.) will be more than the detailed requirements (query the original detailed data)​​​​​​​​

PAH

HTAP is the abbreviation of Hybrid Transactional (hybrid transaction)/Analytical Processing (analytical processing).

Based on the innovative computing and storage framework, the HTAP database can simultaneously support business system operation and OLAP scenarios on one piece of data, avoiding a large amount of data interaction between online and offline databases in the traditional architecture. In addition, HTAP is based on a distributed architecture, supports elastic expansion, can expand throughput or storage on demand, and easily handles high concurrency and massive data scenarios.

At present, there are not many databases that implement HTAP, mainly PingCAP's TiDB, Aliyun's HybridDB for MySQL, Baidu's BaikalDB, etc. Among them, TiDB is the first open source HTAP distributed database in China

5.2 OLAP engine classification

OLAP is divided into MOLAP (Multi-dimensional OLAP), ROLAP (Relational OLAP) and HOLAP (Hybrid OLAP) according to the data storage format of the memory.

MOLAP: A storage model based on multidimensional arrays, which is also the original form of OLAP. It is characterized by precomputing data, trading space for efficiency, and storing detailed and aggregated data in cubes. But generating a cube requires a lot of time and space. MOLAP can choose open source products such as Kylin and Druid.

Through pre-computation, stable sliced ​​data is provided, and multiple queries can be performed once, which reduces the calculation pressure during query and ensures the stability of query. It is the best path to "exchange space for time". Realized the deduplication algorithm based on Bitmap, supports real-time statistics of deduplication indicators in different dimensions, and has high efficiency.

ROLAP: Stores data based entirely on the relational model, does not require precomputation, and can be queried instantly on demand. Both detail and summary data are stored in relational database fact tables.

Based on real-time large-scale parallel computing, the requirements for clusters are relatively high. The core of the MPP engine is to improve parallel computing capabilities by dispersing data to realize the distribution of CPU, IO, and memory resources. In the current situation where data storage is dominated by disks, the large disk IO required by data Scan and the high CPU caused by parallelism are still the short board of resources. Therefore, the concurrency capability of high-frequency large-scale summary statistics will face a greater challenge, which depends on the parallel computing capability of the cluster hardware. Traditional deduplication algorithms require a lot of computing resources, and real-time large-scale deduplication indicators pose a huge challenge to CPU and memory. At present, the latest version of Doris already supports the Bitmap algorithm, which can solve deduplication application scenarios well with pre-computation.

HOLAP: Hybrid model, the detailed data is stored in ROLAP, and the aggregated data is stored in MOLAP. This method is relatively flexible and more efficient.

Among them, doris is a ROLAP engine that can meet the following requirements

  • Flexible Multidimensional Analysis
  • Detail + Aggregation
  • primary key update

Compared to other OLAP systems

  • Disadvantages of MOLAP mode ( taking Kylin as an example )
    1. The application layer model is complex, and more model preprocessing needs to be done according to business needs and Kylin production needs. In this way, in different business scenarios, the utilization rate of the model is relatively low.
    2. Since MOLAP does not support the query of detailed data, in the application scenario of "summary + detail", the detailed data needs to be synchronized to the DBMS engine to respond to the interaction, which increases the operation and maintenance cost of production.
    3. More pretreatment is accompanied by higher production costs.
  • Advantages of ROLAP mode
    1. The design of the application layer model is simplified, and the data can be fixed at a stable data granularity. For example, the star model of merchant granularity has a relatively high reuse rate.
    2. The business expression of the App layer can be encapsulated through views, which reduces data redundancy, improves application flexibility, and reduces operation and maintenance costs.
    3. At the same time, " summary + detail " is supported.
    4. The model is lightweight and standardized, which greatly reduces production costs.

To sum up, in the application scenarios of changing dimensions, non-preset dimensions, and fine-grained statistics, using the ROLAP mode driven by the MPP engine can simplify model design, reduce the cost of precomputation, and through powerful real-time computing capabilities, can Support a good real-time interactive experience.

Summarize

  • The data compression rate of Clickhouse is good
  • ClickHouse single-table query has huge performance advantages
  • Join query has its own advantages and disadvantages. Clickhouse is better when the amount of data is small, and Doris is better when the amount of data is large.
  • Doris has better support for SQL

 6 usage scenarios

The picture above shows the specific usage scenario of the entire Doris, mainly its receiving data source, its overall module, and finally its visual presentation. There will be a more detailed diagram later to introduce its entire source and the final data flow that can be output.

In general, the user's original data, such as logs or data in a transactional database, is imported into Doris after being processed by a streaming system or offline for query by upper-level reporting tools or data analysts.

7 Architecture System

7.1 Name Explanation

7.2 Overall Architecture

Doris mainly integrates the technologies of Google Mesa (data model), Apache Impala (MPP Query Engine) and Apache ORCFile (storage format, encoding and compression).

Why integrate these three technologies?

  1. Mesa can meet the needs of many of our storage needs, but Mesa itself does not provide a SQL query engine.
  2. Impala is a very good MPP SQL query engine, but lacks a perfect distributed storage engine.
  3. Self-developed columnar storage: The storage layer manages the storage data through the storage_root_path path, and there can be multiple paths. The lower layer of the storage directory is organized according to buckets, and specific tablets are stored in the bucket directory, and the subdirectories are named according to the tablet_id.

A combination of these three techniques was therefore chosen.

The system architecture of Doris is as follows. Doris is mainly divided into two components: FE and BE:

The structure of Doris is very simple. Using the MySQL protocol, users can use any MySQL ODBC/JDBC and MySQL client to directly access Doris. Only two roles and two processes, FE (Frontend) and BE (Backend), are set, and do not depend on external components. , to facilitate deployment and operation and maintenance.

  • FE: Frontend, the front-end node of Doris. Mainly responsible for receiving and returning client requests, metadata, cluster management, query plan generation, etc.
  • BE: Backend, the backend node of Doris. Mainly responsible for data storage and management, query plan execution and other work.
  • Both FE and BE can be linearly expanded

FE has two main roles, one is follower and the other is observer. Multiple followers form an election group, and a master will be elected. The master is a special case of the follower. The master and the follower are mainly used to achieve high availability of metadata and ensure that the metadata can be online in real time when a single node is down. recovery without affecting the entire service.

Observer nodes only synchronize metadata from leader nodes and do not participate in elections. Can be scaled out to provide scalability for metadata read services.

The reliability of the data is guaranteed by BE, and BE will store multiple or three copies of the entire data. The number of copies can be dynamically adjusted according to demand. 

7.3 Metadata Architecture

Doris adopts ==Paxos protocol and the mechanism of Memory+ Checkpoint + Journal== to ensure high performance and high reliability of metadata. Every update of metadata will follow the following steps:

  • First write to the log file on disk
  • and then write to memory
  • Finally, periodically checkpoint to the local disk

It is equivalent to a structure of pure memory, which means that all metadata will be cached in memory, so as to ensure that FE can quickly restore metadata after a downtime without losing metadata.

Leader, follower, and observer constitute a reliable service. If a node goes down, a leader and two followers are usually deployed. Currently, this is basically the case. That is to say, three nodes are used to achieve a highly available service. When a stand-alone node fails, basically three are enough, because the FE node only stores one piece of metadata after all, and its pressure is not great, so if there are too many FE nodes, it will consume machine resources, so most In some cases three is enough to achieve a highly available metadata service.

 7.4 Data distribution

  • Looking at the data structure from the perspective of the table, a user's Table will be split into multiple Tablets, and the Tablet will be stored in multiple copies and stored in different BEs, thereby ensuring high availability and high reliability of data.
  • Data is mainly stored in BE. The reliability of physical data on BE nodes is realized through multiple copies. The default is 3 copies. The number of copies is configurable and can be adjusted dynamically at any time to meet business needs of different availability levels. FE schedules the distribution and completion of replicas on BE.
  • If users do not have high requirements for availability but are more sensitive to resource consumption, we can choose to create two copies or one copy when creating a table. For example, when we create tables for users on Baidu Cloud, some users are more sensitive to its overall resource consumption, because they have to pay, so they may create two copies. However, we generally do not recommend that users create a copy, because in the case of a copy, once the machine fails, the data will be lost directly, and it is difficult to recover. Generally, three copies are created by default, which can basically ensure that the normal operation of the entire service will not be affected when a single machine node goes down.

7.5 MPP architecture

SELECT k1,SUM(v1) FROM A,B WHERE A.k2=B.k2 GROUP BY k1 ORDER BY SUM(v1)

This statement includes multiple operations such as merging, aggregation calculation, and sorting; when executing the plan, MPP splits it into multiple parts, distributes them to each machine for execution, and finally summarizes the results. If there are 10 machines, this query execution method can improve the query performance by 10 times under a large amount of data.

Guess you like

Origin blog.csdn.net/u013938578/article/details/130069522