ClickHouse Technology Research and Grammar Introduction | JD Cloud Technical Team

This article conducts some research on Clickhouse architecture principles, syntax, and performance characteristics. At the same time, it compares it with mysql, elasticsearch, and tidb horizontally, and focuses on analyzing the grammatical differences with mysql.

1 Basic concepts

Clickhouse is a columnar database management system (DBMS) for online analytics (OLAP).

1.1 Cluster Architecture

ClickHouse adopts a typical grouped distributed architecture, and the specific cluster architecture is shown in the following figure:

  • Shard: The cluster is divided into multiple shards or groups (Shard 0 ... Shard N). Through the linear expansion capability of Shard, it supports distributed storage and computing of massive data.
  • Node: Each shard contains a certain number of nodes (Node, that is, processes), and the nodes in the same shard are copies of each other to ensure data reliability. The number of replicas in ClickHouse can be built on demand, and logically the number of replicas in different Shards can be different.
  • ZooKeeper Service: All nodes in the cluster are equal, and distributed coordination is performed between nodes through the ZooKeeper service.

1.2 Data partition

Clickhouse is a distributed system. The creation of its data table is different from mysql. It can be compared to the way of sub-database and sub-table on mysql.

Clichhouse first creates a local table (that is, a copy of the shard) on each node of each shard, and the local table is only visible in the corresponding node; then creates a distributed table [Distributed], which maps to the previously created local table.

When a user accesses a distributed table, ClickHouse will automatically forward the request to the corresponding local table based on the cluster architecture information.

1.3 Column storage

Compared with relational database (RDBMS), it is stored by row. Taking the primary key index of innodb in mysql as an example, in the B+ tree for constructing the primary key index, each leaf node stores a row of records.

The columnar database is to store a table according to the maintenance of the column, "a single disk I/O gets the data of one column".

The advantages of columnar storage
When querying, only the involved columns are read, which will greatly reduce the number of IO/overhead. And clickhouse will arrange the data in the specified order when storing, so you only need to scan the specified column according to the where condition, and merge the scanning results of multiple columns to find the data that meets the condition.
However, since the insert data is written by row, the stored procedure will be troublesome.

The difference when querying:

  • Column storage: Only the necessary column data (involved in select + where) is read from the storage system, and useless columns are not read, and the speed is very fast.
  • Row storage: Read all the row data that meets the conditions from the storage system, and then filter out the required fields in memory, which is relatively slow.

1.4 Data sorting

Inside each data partition, the data of all columns is sorted according to the sort key (ORDER BY column).
It can be understood as: For the original record rows that generate this partition, they are first sorted by sort key, and then split and stored by column.

1.5 Data Blocking

In the data file of each column, it is actually stored in blocks, which is convenient for data compression and query cutting. The number of records in each block does not exceed index_granularity, and the default is 8192. When the value of index_granularity is reached, the data will be divided into files.

1.6 Vectorized execution

On the basis of supporting column storage, ClickHouse implements a computing engine oriented to vectorization processing, and a large number of processing operations are executed in vectorization.

Computing engine for vectorized processing:
Based on the data storage model, superimposed batch processing mode, using the SIMD instruction set, reducing the number of function calls, reducing hardware overhead (such as hardware cache at all levels), and improving multi-core CPU utilization.
Coupled with distributed architecture, multi-machine, multi-node, multi-thread, and batch operation data instructions, it can maximize the use of hardware resources and improve efficiency.

Note: SIMD instruction, single instruction multiple data stream, that is to say, multiple data can be processed at the same time in the same instruction cycle. (For example: the comparison of multiple data units can be completed in one instruction cycle).

1.7 Encoding compression

Because ClickHouse uses column storage, the data of the same column is stored continuously, and the underlying data is sorted when stored, so the local regularity of the data is very strong, which is conducive to obtaining a higher data compression ratio.
At the same time, the ultra-high compression ratio can reduce storage read overhead and improve system caching capabilities, thereby improving query performance.

1.8 Index

The aforementioned columnar storage is used to cut unnecessary field reads;
while the index is used to cut unnecessary record reads (reduce the IO of missing data).

Simple explanation:
Take the primary key index as an example. When Clickhouse stores data, it will sort by the column specified by the sort key (ORDER BY), and divide it into blocks according to the Index_granularity parameter. Then it will extract the first row of each data block and organize it into one A sparse sorted index.
Similar to the search process of the B+ tree, if the where condition contains the primary key column, it can be quickly filtered through the sparse index. Sparse indexes are more efficient for range lookups.

The secondary index is realized by bloom filter: minmax, set, ngrambf/tokenbf.

1.9 Applicable scenarios

There are two typical directions in the field of OLAP analysis:

  • ROLAP improves query performance through various technical means such as column storage and indexing.
    In wide table and large table scenarios, where conditions are too many and dynamic, mysql cannot build indexes for each column.
  • MOLAP generates aggregated result data in advance through pre-computation to reduce the amount of data read by query, which belongs to the calculation performance method.
    Complicated report query, aggregation and filtering are very complex scenarios.

Since it is OLAP analysis, there are some basic requirements for the use of data:

  • The vast majority are used for read access
  • No update, large batch update (greater than 1000 rows). (ck does not have a high-speed, low-latency update and delete method)
  • Query as few columns as possible, but many rows.
  • Transactions are not required and can be avoided (clickhouse does not support transactions)
  • Data consistency requirements are low
  • When joining multiple tables, only one is a large table, and the large table is associated with a small table
  • The query and aggregation efficiency of a single table is the highest, and it is recommended that the data be processed in a wide table

2 Horizontal comparison

The storage system is faced with querying and aggregated analysis from more than one billion data, collected from the optional middleware that supports massive data reading and writing in the world, and there are relatively lightweight products that can support similar scenarios. There are Clickhouse, ElasticSearch, TiDB.

2.1 Comparison between clickhouse and ElasticSearch

The elastic ecosystem is very rich, and es, as one of the storage products, has a development history of 10 years since the first version, mainly solving the search problem. The underlying storage of es uses Lucene, which mainly includes row storage, column storage and inverted index. Using fragmentation and copy mechanism, it solves the problem of search performance and high availability under the cluster.

Advantages of es:

  • Supports real-time updates, and more complete support for update and delete operations.
  • Data fragmentation is more uniform and cluster expansion is more convenient

Limitations of es:

  • When the amount of data exceeds 10 million or 100 million, if too many columns are aggregated, the performance will reach the bottleneck;
  • Does not support deep secondary aggregation, resulting in some complex aggregation requirements, which require manual code writing and external implementation, which increases a lot of development workload.

Like Elasticsearch (sorting and aggregation queries), ClickHouse adopts a columnar storage structure and supports replica sharding. The difference is that there are some unique implementations at the bottom of ClickHouse, as follows:

  • The merge tree table engine series (MergeTree) provides data partitions, primary indexes, and secondary indexes.
  • Vector Engine (Vector Engine), data is not only stored in columns, but also processed in vectors (parts of columns), which can use CPU more efficiently

Online information: performance comparison of aggregation query

es may cause OOM problems when processing large queries. Although the cluster can have an automatic recovery mechanism for abnormal nodes, the magnitude of the query data does not meet the requirements of the relocation system.

2.2 Comparison between clickhouse and TiDB

TiDB is a distributed NewSQL database. It supports horizontal elastic expansion, ACID transactions, standard SQL, MySQL syntax, and MySQL protocol, and has high-availability features with strong data consistency. It is a hybrid database that is not only suitable for OLTP scenarios but also for OLAP scenarios.

Advantages of TiDB:

  • Compatible with Mysql protocol and most Mysql syntax, in most cases, users can seamlessly migrate from Mysql to TiDB without modifying a single line of code
  • High availability, enforced consistency (Raft)
  • Supports ACID transactions (depending on the transaction list), supports secondary indexes, and
    is suitable for fast point insertion, point update and point deletion

Limitations of TiDB:

  • Better at OLTP
  • Performance depends on hardware and cluster size, and the read and write performance of a single machine is not good enough

TiDB is more suitable as an alternative to MySql. Its compatibility with MySQL can make our application switching costs lower, and the automatic data sharding provided by TiDB does not require manual maintenance.

3 Why clickhouse

Our project scenario is to synchronize more than one billion single-table data every day, with millions of basic business queries, including complex aggregate analysis. And Clickhouse is very good at processing the query and analysis of massive data in a single table, so clickhouse is chosen.

3.1 Clickhouse read and write performance verification

The official public benchmark test shows that it can achieve a write throughput of 50MB-200MB/s, which is estimated at 100Byte per line, which is roughly equivalent to a write speed of 50W-200W/s.

The following is a simple test of the read and write performance of Clickhouse. The larger the data volume, the more obvious the gap.
1) JDBC single table, single write performance test (better performance):

2) Mybatis single table, single write performance test:

Example of aggregation query performance: The figure below shows the performance of an aggregation query of the warehouse moving system under different data levels in clickhouse. This query is executed in mysql. When the amount of data is about one million, it takes minutes.

1) Aggregation in count+distinct mode:

2) Aggregation in group by mode:

3.2 Weaknesses

As a distributed system, it usually includes three important components: 1. Storage engine. 2. Calculation engine. 3. Distributed management and control layer.
In the distributed management and control layer, CK appears to be relatively weak, resulting in high operation and use costs.

  • The maintenance of distributed tables, local tables, and replicas all need to be defined by users themselves, and a lot of related content needs to be learned in advance when using them.
  • Elastic scaling: Although ck can increase nodes horizontally, it does not support automatic data balancing. That is to say, when the cluster expands, it is necessary to manually rewrite the data into fragments, or rely on data expiration to maintain the balance of storage pressure.
  • Fault recovery: In the case of a node failure, ck cannot use other machines to fill in the missing copy data. Only after the user ian fills in the nodes can the data be automatically synchronized in the copy.

In this regard, since we directly use JD Cloud instances, we can save a lot of work.

As a computing engine, CK needs manual optimization to achieve significant performance improvement when dealing with multi-table associated queries and complex nested subqueries; real-time writing, CK
usage scenarios are not suitable for relatively scattered inserts, because it has not been implemented Memory table (Memory Table) structure, each batch of writing is directly written to the disk, real-time writing of a single record will cause a large number of small files at the bottom layer, affecting query performance.

It is recommended to reduce the probability of small files to be generated in a single batch writing method and in report library scenarios.

Writing to local tables in cluster mode requires custom sharding rules, otherwise random writing will cause uneven data.
Relying on the writing of distributed tables takes up a lot of network and resources.

From the perspective of data volume growth, usage scenarios:

  • If you estimate that your business data volume is not large (increasing less than a million rows per day), you can write both distributed tables and local tables, but please note that if you choose to write local tables, please ensure that new data is created every time you write data. connections, and the amount of data written by each connection is basically the same, manually keep the data uniform
  • If you estimate that your business data volume is large (a daily increase of more than one million, and concurrent inserts are greater than 10), then please write a local table
  • It is recommended to insert about 500,000 rows of data at a time, and no more than 1,000,000 rows at most. In short, CH does not require small transactions like MySQL. For example, for 1,000,000 rows of data, MySQL recommends inserting about 1,000 rows of data at a time, using small transactions, and executing 1,000 times. 50W times. This is determined by the principle of the MergeTree engine. Frequent small insertions will lead to too many data parts, which cannot be merged.
  • MergeTree series: Designed to insert extremely large amounts of data into a table. Data can be quickly written one after another in the form of data fragments, and the data fragments are merged in the background according to certain rules. This strategy is much more efficient than continuously modifying (rewriting) the stored data while inserting.
  • Log series: The function is relatively simple, and it is mainly used for fast writing to small tables (tables with about 1 million rows) and then reading them all out.
  • Integration series: mainly used to import external data into ClickHouse, or directly operate external data sources in ClickHouse.
  • Special series: Most of them are customized for specific scenes. The Distributed mentioned above belongs to this series.

4.1 MergeTree table engine

It is mainly used for massive data analysis and supports data partitioning, orderly storage, primary key index, sparse index, data TTL, etc. MergeTree supports all ClickHouse SQL syntax, but some functions are not consistent with MySQL, for example, the primary key in MergeTree is not used for deduplication.

First look at a simple syntax for creating a table:

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    ...
) ENGINE = ReplacingMergeTree([ver]) 
[PARTITION BY expr]  -- 数据分区规则
[ORDER BY expr] -- 排序键
[SAMPLE BY expr] -- 采样键
[SETTINGS index_granularity = 8192, ...] -- 额外参数

Ignore the definition of the table structure first, and first look at the differences compared to mysql table creation. (Specify clusters, partition rules, sort keys, sample 0-1 numbers)

Data partition: Inside each shard copy, the data is partitioned according to the PARTITION BY column, and the partition is managed in the form of a directory. The table in the sample in this article is partitioned according to time.

Based on the MergeTree table engine, CK expands many table engines to solve special scenarios. The following introduces several commonly used ones.

4.1.1 ReplacingMergeTree engine

The engine differs from MergeTree in that it removes duplicates with the same sort key value (ORDER BY).
Official table creation statement:

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = ReplacingMergeTree([ver]) 
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...]

Note: When setting the table engine, there is one more parameter than MergeTree: ver-version column, ENGINE = ReplacingMergeTree([ver]) .
When merging data, ReplacingMergeTree selects one row from all rows with the same sort key:

  • If the ver column is not specified, the last entry is kept.
  • If the ver column is specified, keep the version with the highest ver value.

The ReplacingMergeTree engine does not necessarily perform deduplication immediately after data is written, or may not complete deduplication (the official description will merge within 10 to 15 minutes).
Since the deduplication depends on the sort key, the ReplacingMergeTree engine will partition according to the partition key, so the data with the same sort key may be divided into different partitions, and it may not be possible to deduplicate between different shards.

In the figure, the file blocks of partition 1 will be merged and deduplicated, but the data between partition 1 and partition 2 will not be deduplicated. Therefore, if you want to ensure that the data can be deduplicated in the end, you must ensure that the data with the same sort key will be written to the same partition.

Data verification
The following figure shows the ReplacingMergeTree engine, which uses the date as the partition key, and performs a deduplication test for duplicate primary key data:

4.1.2 CollapsingMergeTree engine

The engine requires a sign column Sign to be specified in the table creation statement, and the rows are divided into two types according to the value of Sign: rows with Sign=1 are called status rows, rows with Sign=-1 are called cancel rows. Every time a state needs to be added, a state line is written; when a state needs to be deleted, a cancel line is written.
scenes to be used:

  1. According to the clickhouse architecture, the merging and folding operations are performed independently and on-site in the background, so the time cannot be controlled, and it is impossible to predict when the folding will be completed.
  2. If the inserted status line and cancel line are out of order, it will cause failure to fold normally

4.1.3 VersionedCollapsingMergeTree table engine

In order to solve the problem that CollapsingMergeTree cannot be folded normally when it is written out of order, the VersionedCollapsingMergeTree table engine adds a column of Version in the table creation statement, which is used to record the corresponding relationship between the status row and the cancel row in the case of disorder.
Rows with the same primary key, the same Version, and the opposite Sign will be deleted during Compaction.

4.2 Data copy

The data copy is placed in the table engine here to talk about it separately, because only the tables in the MergeTree series can support copy:

  • ReplicatedMergeTree
  • ReplicatedSummingMergeTree
  • ReplicatedReplacingMergeTree
  • ReplicatedAggregatingMergeTree
  • ReplicatedCollapsingMergeTree
  • ReplicatedVersionedCollapsingMergetree
  • The ReplicatedGraphiteMergeTree
    copy is at the table level, not the entire server level. Therefore, there can be both replicated and non-replicated tables in the server.
    Replicas do not depend on sharding. Each shard has its own independent copy.
    To use replicas, the address of the ZooKeeper cluster must be set in the configuration file. (The clickhouse provided by JD Cloud has been configured and we can use it directly)
<zookeeper>
    <node index="1">
        <host>example1</host>
        <port>2181</port>
    </node>
    <node index="2">
        <host>example2</host>
        <port>2181</port>
    </node>
    <node index="3">
        <host>example3</host>
        <port>2181</port>
    </node>
</zookeeper>

Creating a data copy is controlled by setting the parameters of the table engine location, syntax example:

CREATE TABLE table_name
(
    EventDate DateTime,
    CounterID UInt32,
    UserID UInt32
)ENGINE=ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/table_name', '{replica}')  -- 这里
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)

To define a data copy, you only need to put Replicated in front of the above table engine name.
In the above example, the table engine used is MergeTree, the data copy is enabled, the keyword is Replicated, and there are 2 parameters that must be filled:

  • zoo_path — The path to this table in ZooKeeper.
  • replica_name — the name of the replica of this table in ZooKeeper

The values ​​in the example use the variables {layer}, {shard}, and {replica}. Their values ​​are obtained from the configuration file and affect the generated replica granularity.

<macros>
    <layer>05</layer>
    <shard>02</shard>
    <replica>example05-02-1.yandex.ru</replica>
</macros>

4.3 Special Series

Table engines of the Special series are mostly customized for specific scenarios.

  • Memory: Store the data in the memory, and the data will be lost after restarting. The query performance is excellent, and it is suitable for small tables below 100 million that do not require data persistence. In ClickHouse, it is usually used as a temporary table;
  • Buffer: Set a memory buffer for the target table, and when the buffer reaches a certain condition, it will be flushed to the disk;
  • File: directly store local files as data;
  • Null: The written data is discarded, and the read data is empty.
  • Distributed: Distributed engine, which can perform distributed queries on multiple servers

4.3.1 Distributed engine

The distributed table engine itself does not store data and does not occupy storage space. It needs to specify fields when defining, but it must have the same structure as the table to be mapped. Can be used to uniformly query each fragment of *MergeTree, analogous to the logical table in sharding.
For example, the warehouse moving system uses the combination of ReplicatedReplacingMergeTree and Distributed to realize the reading and writing of local tables through distributed tables (write operations to local tables, and read operations to distributed tables).

CREATE TABLE IF NOT EXISTS {distributed_table} as {local_table}
ENGINE = Distributed({cluster}, '{local_database}', '{local_table}', rand())

illustrate:

  • distributed_table: the table name of the distributed table
  • local_table: local table name
  • as local_table: Keep the table structure of the distributed table and the local table consistent. Here you can also use (column dataType) to define the table structure instead
  • cluster: cluster name

Precautions:

  • The distributed table itself does not store data, but only provides a framework for distributed access to data. When querying a distributed table, clickhouse will automatically query the data in each corresponding local table, and return after aggregation
  • Pay attention to AS {local_table}, which indicates the local table corresponding to the distributed table (the local table stores data)
  • You can configure the last parameter rand() in the Distributed table engine to set the distribution method of data entries
  • You can directly write data to the distributed table, and clickhouse will automatically allocate data and self-balance according to the method mentioned in the previous point, and the data will actually be written to the local table
  • You can also write the sharding algorithm yourself, and then write the data to the local table [the scenario of online data is hundreds of billions of writes every day, performance considerations should be directly written to the local table]

4.4 Log series

The function of the Log series table engine is relatively simple, and it is mainly used for scenarios where small tables (tables with about 1 million rows) are quickly written and then read out.
The commonality of several log table engines is:

  • Data is appended sequentially to disk;
  • delete and update are not supported;
  • index is not supported;
  • Atomic writes are not supported;
  • insert will block the select operation.

The difference between them is:

  • TinyLog: does not support concurrent reading of data files, and the query performance is poor; the format is simple, suitable for temporary storage of intermediate data;
  • StripLog: supports concurrent reading of data files, and the query performance is better than TinyLog; stores all columns in the same large file, reducing the number of files;
  • Log: Supports concurrent reading of data files, with better query performance than TinyLog; each column will be stored in an independent file.

4.5 Integration series

The system table engine is mainly used to import external data into ClickHouse, or directly operate external data sources in ClickHouse.

  • Kafka: Import the data in Kafka Topic directly to ClickHouse;
  • MySQL: use Mysql as the storage engine, and directly perform operations such as select on the MySQL table in ClickHouse; guess: if there is a join requirement, you don’t want to import the mysql data into ck
  • JDBC/ODBC: read data source by specifying jdbc, odbc connection string;
  • HDFS: Directly read data files in a specific format on HDFS.

5 data types

The data types supported by clickhouse are as shown in the figure below, which are divided into basic types, composite types, and special types.

5.1 Comparison between CK and Mysql data types

6 SQL syntax - common introduction

6.1 DDL

6.1.1 Create a database:

CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster];

If the IF NOT EXISTS keyword is present in the CREATE statement, the statement does not create the database and no error is returned if the database already exists.
The ON CLUSTER keyword is used to specify the cluster name, which must be specified in a cluster environment, otherwise it will only be created on the linked node.

6.1.2 Create a local table:

CREATE TABLE [IF NOT EXISTS] [db.]table_name ON CLUSTER cluster
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = engine_name()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...];

Option description:

  • db: Specify the database name, if the current statement does not contain 'db', the currently selected database will be 'db' by default.
  • cluster: Specifies the name of the cluster, which is currently fixed as default. ON CLUSTER will create a local table on each node.
  • type: The column data type, such as UInt32.
  • DEFAULT: The default value of this column. If the specified column is not included in the INSERT, its default value will be calculated by the expression and populated (consistent with mysql).
  • MATERIALIZED: materialized column expression, which means that the column cannot be INSERT, but is calculated; in the INSERT statement, there is no need to write the column; in the SELECT * query statement result set does not contain the column; you need to specify the list to query ( virtual column)
  • ALIAS : Alias ​​column. Such columns are not stored in the table. Its value cannot be written by INSERT, and when SELECT queries use asterisks, these columns will not be used to replace asterisks. But they can be used in SELECT, in which case the alias will be replaced in query analysis.
  • The difference between materialized columns and alias columns: Materialized columns save data and do not need to be calculated when querying, while alias columns do not save data, need to be calculated when querying, and return the calculation result of the expression when querying

The following options are related to the table engine and are only supported by the MergeTree family of table engines:

  • PARTITION BY: Specifies the partition key. Usually partitioned by date, but other fields or field expressions can also be used. (The definition of the partition key must be considered clearly, it affects data distribution and query performance)
  • ORDER BY: Specifies the sort key. Can be a tuple of columns or an arbitrary expression.
  • PRIMARY KEY: Specify the primary key. By default, the primary key is the same as the sort key. Therefore, there is no need to specify a PRIMARY KEY clause in most cases.
  • SAMPLE BY : Sampling expression, if you want to use sampling expression, this expression must be included in the primary key.
  • SETTINGS: Additional parameters that affect performance.
  • GRANULARITY : Index granularity parameter.

Example, create a local table:

CREATE TABLE ontime_local ON CLUSTER default -- 表名为 ontime_local
(
    Year UInt16,
    Quarter UInt8,
    Month UInt8,
    DayofMonth UInt8,
    DayOfWeek UInt8,
    FlightDate Date,
    FlightNum String,
    Div5WheelsOff String,
    Div5TailNum String
)ENGINE = ReplicatedMergeTree(--表引擎用ReplicatedMergeTree,开启数据副本的合并树表引擎)
    '/clickhouse/tables/ontime_local/{shard}', -- 指定存储路径
    '{replica}')           
 PARTITION BY toYYYYMM(FlightDate)  -- 指定分区键,按FlightDate日期转年+月维度,每月做一个分区
 PRIMARY KEY (intHash32(FlightDate)) -- 指定主键,FlightDate日期转hash值
 ORDER BY (intHash32(FlightDate),FlightNum) -- 指定排序键,包含两列:FlightDate日期转hash值、FlightNunm字符串。
 SAMPLE BY intHash32(FlightDate)  -- 抽样表达式,采用FlightDate日期转hash值
SETTINGS index_granularity= 8192 ;  -- 指定index_granularity指数,每个分区再次划分的数量

6.1.3 Create a distributed table

Create a distributed table based on a local table. Basic syntax:

CREATE TABLE  [db.]table_name  ON CLUSTER default
 AS db.local_table_name
ENGINE = Distributed(<cluster>, <database>, <shard table> [, sharding_key])

Parameter Description:

  • db: database name.
  • local_table_name: The name of the corresponding created local table.
  • shard table: Same as above, corresponding to the name of the created local table.
  • sharding_key: sharding expression. It can be a field, such as user_id (integer type), which is fragmented by taking the remainder value; it can also be an expression, such as rand(), which returns the value/shards total weight fragmentation through the rand() function; for fragmentation More uniform, you can add a hash function, such as intHash64 (user_id).

Example, create a distributed table:

CREATE TABLE ontime_distributed ON CLUSTER default   -- 指定分布式表的表名,所在集群
 AS db_name.ontime_local                             -- 指定对应的 本地表的表名
ENGINE = Distributed(default, db_name, ontime_local, rand());  -- 指定表引擎为Distributed(固定)

6.1.4 Other table creation

Clickhouse also supports creating other types of tables:

6.1.5 Modify table

The syntax is basically the same as mysql:
ALTER TABLE [db].name [ON CLUSTER cluster] ADD|DROP|CLEAR|COMMENT|MODIFY COLUMN …

The following actions are supported:

  • ADD COLUMN — add a column
  • DROP COLUMN — drop a column
  • CLEAR COLUMN — reset the value of a column
  • COMMENT COLUMN — add a comment to a column
  • MODIFY COLUMN — change a column's value type, default expression, and TTL

Example: ALTER TABLE bd01.table_1 ADD COLUMN browser String AFTER name; – Add a column after the name column

6.2 DML

Notice:

  1. Index columns do not support update and delete
  2. Distributed tables do not support update and delete

7 Complex query JOIN

All standard SQL JOIN support types (INNER and OUTER can be omitted):

  • INNER JOIN, returns only matching rows.
  • LEFT OUTER JOIN, which returns non-matching rows from the left table in addition to matching rows.
  • RIGHT OUTER JOIN, returns non-matching rows from the right table in addition to matching rows.
  • FULL OUTER JOIN returns non-matching rows from both tables in addition to matching rows.
  • CROSS JOIN generates the Cartesian product of the entire table, and "join keys" are not specified.

Query optimization:

  1. A join B query has much higher performance than from A, B, C multi-table
  2. Global join will send the secretary to all nodes to participate in the calculation, and the performance is better for smaller dimension tables
  3. JOIN will operate on the back node, and is suitable for the association of two tables with the same fragment field (the fragment fields of table A and table B both contain field M)
  4. The performance of IN is better than JOIN, and JOIN is preferred
  5. It is more efficient to filter first and then join (reduce the magnitude of data associated with each shard)
  6. When doing multi-table join, if the query filter condition of A table can include the field filter condition in ON expr of B table, the performance will be better
  7. The order of join, the large table is on the left, and the small table is on the right; ck query will be executed from right to left

Comparing the query complexity of JOIN and IN:
The table engine commonly used by CK will be distributed storage, so the query process must be a query for each shard, which leads to the higher the complexity of SQL, and the shards scanned by query locks The larger the quantity, the longer it will take.

Assuming that two tables AB are stored in 10 shards respectively, the join is to query the A table 10 times and join the B table 10 times at the same time, a total of 10*10 times. If Global join is used, it will first query 10 times and generate a temporary table, and then use the temporary table to fetch and join table B, which takes 10+10 times in total.

This is the query characteristic of the distributed architecture. If the data sharding rules can be intervened, and if the query condition contains a shard column, it can directly locate the shard containing the data, thereby reducing the number of queries.

Although CK supports join syntax, its performance is not high. When the left side of join is the subquery result, ck cannot perform distributed join.

8 MySQL migration to CK

  • Data synchronization cost: clickhouse can achieve the same table structure as mysql, so the cost of data synchronization is lower, and there is no need to adjust the data structure or do additional wide table processing (of course, it is more efficient to convert to a wide table).
  • SQL migration cost: support jdbc, mybatis access; support standard SQL syntax; support join, in, function, SQL migration cost is low.

Of course, if you spend time optimizing the table structure, SQL, indexes, etc., you can get better query efficiency.

Official Support
In the second half of 2020, Yandex released the MaterializeMySQL engine in the ClickHouse community, which supports full and incremental real-time data synchronization from MySQL. The MaterializeMySQL engine currently supports MySQL 5.6/5.7/8.0 versions, compatible with Delete/Update statements, and most commonly used DDL operations.
That is to say, CK supports the existence of a slave node of MySQL, which is realized by subscribing to binlog.
https://bbs.huaweicloud.com/blogs/238417

9 Summary

ClickHouse is more suitable for OLAP scenarios and has great performance advantages in the report library. If you want to use it as an application database, you can flexibly use its table engine features to avoid data modification as much as possible. In fact, there is no best, only the most suitable.

Author: JD Logistics Geng Hongyu

Source: JD Cloud Developer Community

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/10084125