Comparison summary between ClickHouse and Elasticsearch

Table of contents

background

Distributed architecture

Storage architecture

Write link design

Elasticsearch

Let’s talk about Schemaless again

Query schema

Compute engine

Data scanning

Let’s talk about high concurrency again

Performance Testing

Log analysis scenario

access_log (data volume 197921836)

trace_log (data volume 569816761)

Official Ontime test set

User portrait scenario (data volume 262933269)

Secondary index enumeration scenario (data volume 1000000000)

Data import performance comparison

Conclusion

advantage

shortcoming

Feasible solution for replacing ES with ClickHouse

Reference link


background

Clickhouse is an analytical database developed by Russian search giant Yandex with complete column storage computing. ClickHouse has been very popular in the OLAP field in the past two years, and is used on a large scale by major domestic Internet companies.

Elasticsearch is a near-real-time distributed search analysis engine whose underlying storage is entirely built on Lucene. Simply put, it expands Lucene's stand-alone search capabilities to enable distributed search and analysis capabilities. Elasticsearch usually provides end-to-end log/search analysis functions together with two other open source components, Logstash (log collection) and Kibana (dashboard), and is often referred to as ELK.

Distributed architecture

Elasticsearch and ClickHouse are both data products that support distributed multi-machine . The first thing the author needs to compare is the difference in distributed architecture between the two. Distributed structure design has a very important impact on the ease of use and scalability of the product. In the distributed architecture, several core issues to be solved include: node discovery, Meta synchronization, and replica data synchronization. As an old open source product, Elasticsearch is relatively mature in this area. The native node discovery and Meta synchronization protocol give users a very good usability experience. The problems that Elasticsearch's Meta synchronization protocol needs to solve are actually very similar to the open source Raft protocol. However, Raft did not exist when Elasticsearch was born, so we had to make one ourselves. After so many years of polishing, Elasticsearch’s Meta synchronization protocol is also quite mature. Relying on this, Elasticsearch has very easy-to-use multi-role division, auto schema inference and other functions. It is worth mentioning that Elasticsearch's multi-copy data synchronization does not reuse the Meta synchronization protocol. Instead, it uses the traditional master-slave synchronization mechanism. The master node is responsible for synchronizing to the backup node. This method is simpler and more efficient.

ClickHouse's distributed architecture capabilities are relatively simple. This is also because ClickHouse is still a relatively young open source product and is still in the stage of continuous iterative improvement in distributed ease of use. ClickHouse introduces an external ZooKeeper cluster to issue distributed DDL tasks (node ​​Meta changes), active-standby synchronization tasks and other operations. The delivery of data synchronization (data shipping) tasks between multiple copies also relies on the ZooKeeper cluster, but in the end the data transmission between multiple copies is point-to-point data copying through the HTTP protocol. At the same time, multiple copies can be written, and data synchronization is Completely multidirectional. As for node discovery, ClickHouse currently does not have this capability, and it needs to be solved by manually configuring the cluster node address. ClickHouse's current scaffolded distributed architecture results in it having extremely strong flexible deployment capabilities and operation and maintenance intervention capabilities. It is slightly less usable for users and has a relatively high user threshold. However, in terms of the upper limit of capabilities, ClickHouse's distribution There is no shortcoming in the scalability of traditional deployment, and the upper limit of the cluster size is no different from that of Elasticsearch. ClickHouse has a flat architecture, with no distinction between front-end nodes and back-end nodes, and can deploy clusters of any size. At the same time, ClickHouse has more fine-grained control capabilities in the multi-copy function, and can configure the number of replicas at the table level. The same physical cluster can be divided into multiple logical clusters, and the number of shards and replicas can be configured arbitrarily for each logical machine.

Storage architecture

Write link design

Elasticsearch

Write throughput is a core indicator in big data scenarios. Users' requirements for big data products are not only to store data quickly, but also to write quickly. Here we first introduce the real-time writing link design of Elasticsearch: In each Shard of Elasticsearch, the writing process is divided into two parts, first writing to Lucene, and then writing to TransLog. After the write request reaches the Shard, the Lucene memory index is first written. At this time, the data is still in the memory, and then the TransLog is written. After writing the TransLog, the TransLog data is refreshed to the disk. After the disk is written successfully, the request is returned to the user. There are several key points here. One is to put writing to Lucene at the front, mainly to prevent users' write requests from containing "illegal" data. Second, after the Lucene index is written, it is not searchable. It is necessary to convert the memory object into a complete Segment through refresh, and then reopen it again before it can be searched. This refresh time interval is user-settable. It can be seen that the Lucene index does not have the ability to write real-time visible data, so Elasticsearch is a near real-time (Near Real Time) system. Finally, every relatively long period of time, such as after 30 minutes, Lucene will refresh the new Segment generated in the memory to the disk. After the refresh, the index file has been persisted, and the historical TransLog will be useless and will be cleared. Old TransLog.

Elasticsearch single Shard write link

ClickHouse single Shard write link

Compared with Elasticsearch's writing link, ClickHouse's writing method is more "simple, direct" and extreme. As mentioned above, Elasticsearch is a near-real-time system, and newly written data in the memory storage engine needs to be flushed regularly before it is visible. ClickHouse simply gave up the memory storage engine function completely. All data was written directly to the disk , and the traditional redo log writing stage was also omitted. In scenarios with extremely high write throughput requirements, both Elasticsearch and ClickHouse need to give up some real-time write visibility in order to improve throughput. It's just that the main approach of ClickHouse is to hand over the delayed batch writing of data to the client. In addition, in terms of multi-copy synchronization, Elasticsearch requires real-time synchronization, that is, write requests must be written through multiple copies before they are returned, while ClickHouse relies on ZooKeeper for asynchronous disk file synchronization (data shipping). In actual combat, ClickHouse's write throughput can far exceed that of Elasticsearch with the same specifications.

Segment vs DataPart

The storage designs of Elasticsearch and ClickHouse look very similar on the outside, but their capabilities are completely different. The disk file of Elasticsearch is composed of Segments . Segment is actually the smallest unit of Lucene index. The internal storage format of Segment will not be discussed here. Segments will be merged asynchronously in the background. Merging here mainly solves two problems: 1) making the secondary index more orderly; 2) completing the primary key data change. The secondary index is a "global" ordered index. Building all the data into one index will speed up the query more obviously than building it into multiple indexes. Elasticsearch supports primary key deletion and update, which is achieved by relying on the deletion function of the Lucene index. The update operation will be converted into a delete operation plus a write operation. When there are multiple deleted records in the Lucene index segment, the system needs to eliminate these records through segment merging. When multiple segments are merged, the stored data in the Lucene index shows an append-only merge. In this way, the merge of the secondary index does not require "reordering".

Compared with Segment in Elasticsearch, the smallest unit in ClickHouse storage is DataPart . Data written in batches at a time will be placed into a DataPart. The data storage inside DataPart is completely ordered (sorted according to the order by table definition). This ordered storage is a default clustered index that can be used to speed up data scanning. ClickHouse will also perform asynchronous merging of DataParts, and its merging is also used to solve two problems: 1) Make data storage more orderly; 2) Complete primary key data changes. DataPart behaves in a merge-sorted manner when merging and storing data, and the DataPart generated after merging is still in a completely ordered state. Relying on the completely ordered setting of DataPart storage, ClickHouse implements primary key data updates in a completely different way from Elasticsearch. When Elasticsearch changes the primary key, it adopts the method of "first check the original record - generate a new record - delete the original record - write a new record". This method completely limits the efficiency of primary key update. Primary key update writing and append- The efficiency difference of only writing is very large. ClickHouse's primary key update is completely asynchronous. Multiple records with the same primary key will produce the latest record results when asynchronously merged. This asynchronous batch primary key update method is more efficient than Elasticsearch.

Finally, let’s summarize the differences in the internal file storage capabilities of Segment and DataPart. Segment is completely the storage format of Lucene index. The storage of Lucene index on inverted files is undoubtedly the best. Lucene index also provides row storage and column storage. Raw data storage in different formats. Elasticsearch will store the original data in two copies by default, one in the row store and one in the column store. Elasticsearch will select the appropriate storage file to scan based on the query pattern. There is no secondary index file in the DataPart of native ClickHouse, and the data is stored entirely in columns. ClickHouse achieves the ultimate in column compression rate and scan throughput. Relatively speaking, storage in Elasticsearch is mediocre and costs at least twice as much.

Let’s talk about Schemaless again

When talking about the characteristics of Elasticsearch, everyone will mention the word Schemaless. Elasticsearch can automatically infer the json-schema of written data and adjust the Meta structure of the storage table according to the json-schema of written data. This can help users save a lot of time in creating tables and Trouble with Gallie. However, in the author's opinion, this ability of Elasticsearch is actually more appropriately called auto schema inference, which benefits from Elasticsearch's distributed Meta synchronization capabilities. Elasticsearch's storage actually requires a schema, or even a strongly bound schema, because it is a storage with secondary indexes as the core. How can an index be built for fields without types? The real Schemaless should be the ability to change field types flexibly and efficiently while ensuring that query performance does not drop significantly. Today, if a user wants to change a certain field type in the Elasticsearch index, there is only one way: to reindex the entire data. In contrast, ClickHouse's storage is not strongly bound to the schema, because ClickHouse's analysis capabilities are based on storage scanning. It can perform dynamic type conversion during data scanning, and can also slowly and asynchronously adjust fields when DataParts are merged. Type, the cost caused by changing the field type during query is the cost of increasing the cast operator during runtime, and users will not experience a sharp performance drop. The author believes that Schemeless is definitely not Elasticsearch's moat capability, but rather its weakness. As for auto schema inference, this is a very friendly capability for small-scale users, but it can never help users create the best-performing Schema. In scenarios with large data volumes, you still need to create Schemas based on specific query requirements. All convenience comes at a cost in the end.

Query schema

Compute engine

Putting ClickHouse and Elasticsearch together here to talk about computing engines is actually a bit ridiculous, because Elasticsearch only implements a general search engine. The query complexity that a search engine can handle is certain and has an upper limit. All search queries can obtain results after a certain number of stages, but this is not the case with computing engines. Although Elasticsearch also has SQL-supported plug-ins, the implementation logic of this plug-in is to translate simple SQL queries into certain search patterns. For data analysis behaviors that search engines do not originally support, Elasticsearch-SQL is of no help. In addition, the current translation capabilities of Elasticsearch-SQL do not seem to be very complete and intelligent. In order to obtain the highest search performance, users still need to try Elasticsearch's native query API. For users who are accustomed to using SQL, Elasticsearch's query API is a completely unfamiliar system, and complex queries are very difficult to write.

Elasticsearch's search engine supports three different search modes: query_and_fetch, query_then_fetch, and dfs_query_then_fetch. The first mode is very simple. Each distributed node searches independently and returns the results to the client. The second mode is that each distributed storage node first searches for its TopN record ID and corresponding score, and then aggregates them into After querying the request node, perform rearrangement to obtain the final TopN results, and finally request the storage node to pull detailed data. The purpose of designing two rounds of requests here is to minimize the number of pull details, that is, the number of disk scans. The last method is to balance the scoring standards of each storage node, first count the global TF (Term Frequency) and DF (Document Frequency), and then perform query_then_fetch. Elasticsearch's search engine does not have the streaming processing capabilities of a database computing engine. It is a completely turn-based request-response data processing. When the user needs to return a large amount of data, it is easy for the query to fail or trigger GC. Generally speaking, the upper limit of Elasticsearch's search engine capabilities is a two-stage query. Queries such as multi-table association are completely beyond its upper limit.

The computing engine of ClickHouse is characterized by extreme vectorization. The vectorized functions and aggregator operators completely handwritten in C++ templates make its processing performance in aggregate queries reach the extreme. Coupled with the ultimate parallel scanning capability of the storage, machine resources can be fully utilized with ease. ClickHouse's computing engine capabilities can completely cover the Elasticsearch search engine in terms of analysis query support. The computing engine with complete SQL capabilities can allow users to be more flexible and free when processing data analysis.

Data scanning

ClickHouse is a completely columnar storage computing engine, and it is based on ordered storage. In the process of querying and scanning data, it will first infer the need to scan based on the storage orderliness, column storage block statistics, partition keys and other information. column storage block, and then perform parallel data scanning, such as expression calculation and aggregation operators, which are all processed in a regular calculation engine. From the computing engine to data scanning, data flow is based on column storage blocks and is highly vectorized . As mentioned in the previous section, Elasticsearch's data scanning mainly occurs in the query and fetch stages. The query phase mainly scans the Lucene index file to obtain the DocId hit by the query, and also includes scanning the column storage file for aggregation calculations. The fetch stage mainly checks the row storage file reading detailed results in the Lucene index. Expression calculation and aggregation calculation may occur in both phases, and their calculation logic is performed in row units. In general, Elasticsearch's data scanning and calculations do not have the ability to vectorize, and are based on the results of the secondary index. When the number of hit rows returned by the secondary index is particularly large (analytic queries involving large amounts of data), its search The engine will expose its shortcomings in insufficient data processing capabilities.

Let’s talk about high concurrency again

Many users will have a wrong image when talking about ClickHouse. ClickHouse queries run fast, but concurrency is not good. But the reason behind this is actually that ClickHouse's parallelism is so awesome . This is a major strength of ClickHouse. One query can fully maximize the disk throughput. Query parallelism does not depend on shards at all and can be adjusted at will. It is undeniable that the throughput capability of processing concurrent requests is the ultimate indicator of the efficiency of a data system. There are no natural concurrency flaws in ClickHouse's architecture. It is just that it is an upright boy. The amount of data and computational complexity that needs to be scanned for queries are there. , ClickHouse just calculates it honestly every time, and the hardware capabilities of the machine determine its concurrency upper limit.. The concurrency capability of ClickHouse is actually good. It is a misunderstanding to think that it is not concurrency capable. But by default, ClickHouse's goal is to ensure that the latency of a single query is low enough; in some scenarios, users can improve concurrency capabilities by setting appropriate system parameters, such as max_threads, etc. Conversely, here is an introduction to why Elasticsearch’s concurrency capabilities are very good in some scenarios. First, from the perspective of Cache design, Elasticsearch’s Cache includes Query Cache, Request Cache, Data Cache, and Index Cache. The cache acceleration from query results to index scan results is because Elasticsearch believes that there are hot data in its scenarios, which may was queried repeatedly. In contrast, ClickHouse only has an IO-oriented UnCompressedBlockCache and a system PageCache. Why? Because ClickHouse is based on analysis and query scenarios, the data and queries in the analysis scenario are changeable, and query results and other caches are not easy to hit. Therefore, ClickHouse's approach is to always focus on disk data and have good IO Cache capabilities. Secondly, going back to the data scanning granularity, Elasticsearch has full-column secondary index capabilities. These indexes are generally preheated and loaded into memory in advance. Even under changing query conditions, the cost of index query results is very low. After getting the index results, you can read the data row by row and perform calculations. The native ClickHouse does not have the capability of secondary indexing, and can only scan data in large batches to filter out results under changing query conditions (Alibaba Cloud ClickHouse already has the capability of secondary indexing, which solves this problem, and its performance level is the same as that of Elasticsearch) Quite, details will be introduced in the subsequent performance evaluation section). But if Elasticsearch has secondary indexes, will its concurrency capabilities be better? Not necessarily. When the result set obtained by the secondary index search is large, the query will still be accompanied by a large number of IO scans, and high concurrency will be out of the question unless the Data Cache of Elasticsearch is large enough to load all the original data into the memory. .

In summary , Elasticsearch can only show its concurrency advantages in a complete search scenario (where the number of records after filtering is small) and in an operating environment with sufficient memory. In analysis scenarios (where the number of records after filtering is large), ClickHouse will achieve better concurrency performance with its ultimate column storage and vectorized calculations. The two have different focuses. At the same time, ClickHouse's concurrent processing capabilities are based on disk throughput, while Elasticsearch's concurrent processing capabilities are based on memory Cache. ClickHouse is more suitable for low-cost, large-volume analysis scenarios, and it can make full use of disk bandwidth capabilities.

Performance Testing

Clickhouse Elasticsearch Number of nodes
CPU: 8coreMemory: 32GB Storage: ESSD PL1 1500GB CPU: 8coreMemory: 32GB Storage: ESSD PL1 1500GB 4

Log analysis scenario

In the log analysis scenario, two representative query scenarios were selected for comparative testing. The results are as follows. From the result analysis, it can be seen that the performance gap between ClickHouse and Elasicsearch in the two scenarios expands as the number of records filtered by the where condition increases. In the trace_log scenario with a larger amount of data, the analysis query performance gap between the two is clear at a glance. Download the full version of Elasticsearch and ClickHouse table creation statements and queries: Log analysis scenario

access_log (data volume 197921836)

The table creation statement in ClickHouse is as follows:

CREATE TABLE access_log_local on cluster default
(
  `sql` String, 
  `schema` String, 
  `type` String, 
  `access_ip` String, 
  `conn_id` UInt32, 
  `process_id` String, 
  `logic_ins_id` UInt32, 
  `accept_time` UInt64, 
  `_date` DateTime, 
  `total_time` UInt32, 
  `succeed` String, 
  `inst_name` String
) 
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(_date)
ORDER BY (logic_ins_id, accept_time);

CREATE TABLE access_log on cluster default as access_log_local
engine = Distributed(default, default, access_log_local, rand());

The query statement in ClickHouse is as follows:

--Q1
select _date, accept_time, access_ip, type, total_time, concat(toString(total_time),'ms') as total_time_ms, sql,schema,succeed,process_id,inst_name from access_log where _date >= '2020-12-27 00:38:31' and _date <= '2020-12-28 00:38:31' and logic_ins_id = 502680264 and accept_time <= 1609087111000 and accept_time >= 1609000711000 and positionCaseInsensitive(sql, 'select') > 0 order by accept_time desc limit 50,50;
--Q2
select 
case 
when total_time <=100 then 1 
when total_time > 100 and total_time <= 500 then 2 
when total_time > 500 and total_time <= 1000 then 3 
when total_time > 1000 and total_time <= 3000 then 4 
when total_time > 3000 and total_time <= 10000 then 5 
when total_time > 10000 and total_time <= 30000 then 6 
else 7 
end as reorder, 
case 
when total_time <=100 then '0~100ms' 
when total_time > 100 and total_time <= 500 then '100ms~500ms' 
when total_time > 500 and total_time <= 1000 then '500ms~1s' 
when total_time > 1000 and total_time <= 3000 then '1s~3s' 
when total_time > 3000 and total_time <= 10000 then '3s~10s' 
when total_time > 10000 and total_time <= 30000 then '10s~30s' 
else '30s以上' 
end as label, 
case 
when total_time <= 100 then '0~100' 
when total_time > 100 and total_time <= 500 then '100~500' 
when total_time > 500 and total_time <= 1000 then '500~1000' 
when total_time > 1000 and total_time <= 3000 then '1000~3000' 
when total_time > 3000 and total_time <= 10000 then '3000~10000' 
when total_time > 10000 and total_time <= 30000 then '10000~30000' 
else '30000~10000000000' 
end as vlabel, 
count() as value
from access_log
where logic_ins_id = 502867976 and _date >= '2020-12-27 00:38:31' and _date <= '2020-12-28 00:38:31' and accept_time <= 1609087111000 and accept_time >= 1609000711000 
group by label,vlabel,reorder 
order by reorder;
--Q3
select toStartOfMinute(_date) as time, count() as value 
from access_log 
where logic_ins_id = 500152868 and accept_time <= 1609087111000 and accept_time >= 1609000711000  
group by time 
order by time;
--Q4
select count(*) as c from (
  select _date, accept_time, access_ip, type, total_time, concat(toString(total_time),'ms') as total_time_ms, sql, schema, succeed, process_id, inst_name 
  from access_log 
  where logic_ins_id = 501422856 and _date >= '2020-12-27 00:38:31' and _date <= '2020-12-28 00:38:31' and accept_time <= 1609087111000 and accept_time >= 1609000711000
);

The performance comparison is as follows:

trace_log (data volume 569816761)

The table creation statement in ClickHouse is as follows:

CREATE TABLE trace_local on cluster default
(
  `serviceName` LowCardinality(String), 
  `host` LowCardinality(String), 
  `ip` String, 
  `spanName` String, 
  `spanId` String, 
  `pid` LowCardinality(String), 
  `parentSpanId` String, 
  `ppid` String, 
  `duration` Int64, 
  `rpcType` Int32, 
  `startTime` Int64, 
  `traceId` String, 
  `tags.k` Array(String), 
  `tags.v` Array(String), 
  `events` String,
  KEY trace_idx traceId TYPE range
) ENGINE = MergeTree() 
PARTITION BY intDiv(startTime, toInt64(7200000000)) 
PRIMARY KEY (serviceName, host, ip, pid, spanName) 
ORDER BY (serviceName, host, ip, pid, spanName, tags.k);

CREATE TABLE trace on cluster default as trace_local
engine = Distributed(default, default, trace_local, rand());

The query statement in ClickHouse is as follows:

--Q1
select *
from trace
prewhere
traceId ='ccc6084420b76183'
where startTime > 1597968000300000  and startTime <  1598054399099000 settings max_threads = 1;
--Q2
select count(*) count, spanName as name from trace
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000  and startTime <  1598054399099000
group by spanName
order by count desc limit 1000;
--Q3
select host as name, count(*) count
from trace
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000  and startTime <  1598054399099000
group by host;
--Q4
select count(*) count, tags.k as name  from trace
array join tags.k
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000  and startTime <  1598054399099000
group by tags.k;
--Q5
select count(*) spancount, 
sum(duration) as sumDuration, intDiv(startTime, 1440000000) as timeSel
from trace
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000  and startTime <  1598054399099000
group by timeSel;
--Q6
select count(*) spanCount, 
countIf(duration  <=1000000), countIf(duration > 1000000),  countIf(duration > 3000000)
from trace
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000  and startTime <  1598054399099000;
--Q7
select  host, startTime,traceId,spanName,duration,tags.k,tags.v
from trace
where serviceName ='conan-dean-user-period'
and startTime > 1597968000300000  and startTime <  1598054399099000 limit 1000000;

The performance comparison is as follows:

Official Ontime test set

The Ontime test set is an analytical query benchmark recommended on the ClickHouse official website in order to compare the performance differences between ClickHouse and Elasticsearch in analytical queries in a more fair and open manner. The author also introduced this data set for test and comparison. The results are as follows. ClickHouse has a huge performance advantage in purely analytical query scenarios. Download the full version of Elasticsearch and ClickHouse table creation statements and queries: Aggregation analysis scenario

User portrait scenario (data volume 262933269)

The user portrait scenario is also a typical scenario where it is difficult for users to choose between using Elasticsearch or ClickHouse. The specific characteristics of this scenario are ultra-large wide tables, large batch update writes, large amounts of data returned by queries, and complex and changeable filtering conditions. There are two main difficulties that users encounter when using Elasticsearch: data cannot be written, and import is slow; data cannot be pulled out, and returning large-scale detailed data is very slow. In response to this scenario, the author mocked a large wide table with nearly 150 columns based on real user business scenarios to conduct related query tests. The specific queries are as follows. The result set returned by each query ranges from 100,000 to 1 million rows. level. Download the full version of Elasticsearch and ClickHouse table creation statements and queries: User portrait scenario

The query statement in ClickHouse is as follows:

--Q1
select user_id
from person_tag
where mock3d_like > 8 and mock3d_consume_content_cnt > 8 and mock_10_day_product_avg_amt < 1 settings append_squashing_after_filter = 1;
--Q2
select user_id
from person_tag
where mock_7_day_receive_cnt > 8 and like_fitness = 1 and mock14d_share_cnt > 8 settings append_squashing_after_filter = 1;
--Q3
select user_id
from person_tag
where home_perfer_mock_score > 8 and mock7d_access_homepage_cnt > 8 settings append_squashing_after_filter = 1;
--Q4
select user_id
from person_tag
where is_send_register_coupon > 8 and mock1d_like > 8 settings append_squashing_after_filter = 1;
--Q5
select user_id
from person_tag
where like_sports = 1 and like_3c = 1 and sex = 1 and like_dance = 1 and mock1d_share_cnt > 6 settings append_squashing_after_filter = 1;
--Q6
select user_id
from person_tag
where mock14d_access_homepage_cnt > 8 and like_anime = 1 settings append_squashing_after_filter = 1;
--Q7
select user_id,offline_ver,is_visitor,mock1d_comment_like,reg_days,mock14d_share_cnt,mock_30_order_avg_delivery_time_cnt,mock7d_comment_cnt,performance_rate,mock3d_valid_user_follow_cnt,mock30d_consume_content_cnt,like_cnt,like_photo,ls90_day_access_days,mock3d_release_trend_cnt,mock14d_access_homepage_range,qutdoor_perfer_mock_score,mock3d_access_homepage_cnt,mock_15_order_avg_delivery_time_cnt,mock7d_release_trend_cnt,like_food,mock30d_follow_topic_cnt,mock7d_is_access_topic,like_music,mock3d_interactive_cnt,mock14d_valid_user_follow_cnt,reg_platform,mock_7_day_lottery_participate_cnt,pre_churn_users,etl_time,like_anime,mock14d_access_homepage_cnt,mock14d_consume_content_cnt,like_travel,like_watches,mock14d_comment_like,ls30_day_access_days,mock14d_release_trend_cnt,ftooeawr_perfer_mock_score,mock7d_valid_user_follow_cnt,beauty_perfer_mock_score
from person_tag
where mock3d_like > 8 and mock3d_consume_content_cnt > 8 and mock_10_day_product_avg_amt < 1 settings append_squashing_after_filter = 1;

The comparison of query performance results is as follows. It can be seen that Elasticsearch performs very well in scenarios where a large amount of result data is scanned and exported. The larger the result set returned, the slower it is. Q5 is a comparison case in which the query hit result set is very small.

Secondary index enumeration scenario (data volume 1000000000)

In the analysis and query business scenario, users will inevitably have several detailed query cases, such as querying detailed information based on log traceId. Because the open source ClickHouse does not have secondary indexing capabilities, when encountering this situation, the query performance completely lags behind Elasticsearch. Alibaba Cloud ClickHouse has developed its own secondary indexing capabilities to make up for its shortcomings. The author has specially added a secondary index check scenario here for performance comparison testing. Download the full version of Elasticsearch and ClickHouse table creation statements and queries: secondary index check scenario

The table creation statement in ClickHouse is as follows:

CREATE TABLE point_search_test_local on cluster default (
 `PRI_KEY` String, 
 `SED_KEY` String,  
 `INT_0` UInt32, 
 `INT_1` UInt32, 
 `INT_2` UInt32, 
 `INT_3` UInt32, 
 `INT_4` UInt32, 
 `LONG_0` UInt64, 
 `LONG_1` UInt64, 
 `LONG_2` UInt64, 
 `LONG_3` UInt64, 
 `LONG_4` UInt64, 
 `STR_0` String, 
 `STR_1` String, 
 `STR_2` String, 
 `STR_3` String, 
 `STR_4` String, 
 `FIXSTR_0` FixedString(16), 
 `FIXSTR_1` FixedString(16), 
 `FIXSTR_2` FixedString(16), 
 `FIXSTR_3` FixedString(16), 
 `FIXSTR_4` FixedString(16), 
 KEY SED_KEY_IDX SED_KEY Type range
) ENGINE = MergeTree ORDER BY PRI_KEY 
SETTINGS index_granularity_bytes = 4096, secondary_key_segment_min_rows = 1000000000, min_rows_for_wide_part = 2000000000;

CREATE TABLE point_search_test on cluster default as point_search_test_local
engine = Distributed(default, default, point_search_test_local, rand());

The query template statement in ClickHouse is as follows:

select * from point_search_test where SED_KEY = 'XXX' settings max_threads = 1;

The final query performance comparison is as follows. After Alibaba Cloud ClickHouse has secondary indexing capabilities, its query capabilities are not weaker than Elasticsearch. The storage natively supports secondary indexing capabilities and has ultimate performance. (Alibaba Cloud ClickHouse secondary index document)

Data import performance comparison

For all the data sets listed above, the author used the ESSD local file import method to test and compare the import performance of Elasticsearch and ClickHouse. ClickHouse can directly use ClickHouse-Client to read local files in various formats for import, while Elasticsearch configures Logstash tasks. The specific time-consuming results are as follows:

Conclusion

What Elasticsearch is best at is mainly the complete search scenario (where the number of records after filtering is small), and it can show excellent concurrent query capabilities in a memory-rich operating environment. However, in large-scale data analysis scenarios (where the number of records after filtering is large), ClickHouse will have better concurrency performance due to its ultimate column storage and vectorized calculations, and the query support is also more complete. ClickHouse's concurrent processing capabilities are based on disk throughput, while Elasticsearch's concurrent processing capabilities are based on memory cache. This makes the cost range of the two very different. ClickHouse is more suitable for low-cost, large-volume analysis scenarios, and it can make full use of The bandwidth capabilities of the disk. In terms of data import and storage costs, ClickHouse has an absolute advantage.

advantage

  1. ClickHouse has high write throughput, with a single server log write volume ranging from 50MB to 200MB/s, and more than 600,000 records written per second, which is more than 5 times that of ES.
  2. The query speed is fast. Officially, the query speed of a single server is about 2-30GB/s when the data is in the pagecache; when the data is not in the pagecache, the query speed depends on the disk read rate and the data compression rate. .
  3. ClickHouse costs less than ES Server. On the one hand, ClickHouse has a higher data compression ratio than ES, and the disk space occupied by the same data is only 1/3 to 1/30 of ES. This saves disk space and can also effectively reduce disk IO; on the other hand, ClickHouse takes up more space than ES. Less memory consumes less CPU resources. .
  4. Compared with ES, ClickHouse has higher stability and lower operation and maintenance costs. The load of different Groups in ES is unbalanced. Some Groups have high load, which will lead to problems such as rejected writes, requiring manual migration of indexes. In ClickHouse, through the cluster and Shard strategy, the polling write method can make the data more evenly distributed. to all nodes. A large query in ES may cause OOM problems; ClickHouse will fail the query through the preset query restrictions without affecting the overall stability. ES needs to separate hot and cold data. ClickHouse partitions according to talent. Generally, there is no need to consider hot and cold separation. In special scenarios, users do need to separate hot and cold data, and the amount of data will be much smaller. ClickHouse’s own hot and cold separation mechanism can easily Good solution.
  5. ClickHouse uses SQL syntax, which is simpler than ES's DSL and has lower learning costs.

shortcoming

  1. Because it is a columnar database, it cannot provide full-text search functions like ES.
  2. Fields cannot be added dynamically, and the table schema needs to be defined in advance.
  3. Logs cannot be saved for a long time, and historical data needs to be cleaned and offline regularly. If there is a need to save historical data, it needs to be migrated, using ClickHouse_copier or copying the data.
  4. ClickHouse has fast query speed and can make full use of cluster resources, but it cannot support high concurrent queries. The default configuration qps is 100.
  5. Clickhouse is not suitable for high-frequency insertion of many small data, and there will be a certain delay in batch writing logs.

The same type of Ctrip logs occupy disk space in ES and ClickHouse 

Ctrip same type log query time in ES and ClickHouse

Feasible solution for replacing ES with ClickHouse

1. Disaster recovery deployment and cluster planning

Using multiple Shards and 2 Replicas, the servers are backed up with each other through Zookeeper, allowing one shard and one server to be down without losing data. In order to access logs of different sizes, multiple clusters can be established based on log type and log volume.

2. Consumption data to ClickHouse using gohangout tool

a) Use polling to write to all servers in the ClickHouse cluster to ensure that data is basically evenly distributed.

b) Large batches of low-frequency writes reduce the number of parts, reduce server merge, and avoid Too many parts exceptions. The amount and frequency of data writing is controlled through two thresholds. Records exceeding 10w are written once or every 30 seconds.

3. Design of table structure

Create different local tables according to log types. Non-standard fields can be set to map type. If there is a value, fill it with value. If there is no value, fill it directly with N.

When creating a table, consider the partition settings and divide the partitions according to their talents.

4. Data display

Tabix, the web interface that comes with Clickhouse.

Third-party visual interfaces can be connected to Grafana and kibana

Reference link

https://zhuanlan.zhihu.com/p/368193306

https://www.cnblogs.com/xionggeclub/p/15100707.html

Guess you like

Origin blog.csdn.net/u012206617/article/details/133070423