Clickhouse research


  ClickHouse is an OLAP open source database produced by Yandex, a fighting nation, referred to as CH/CK, which is currently the fastest OLAP database on the market.

1. Applicable scenarios (OLAP)

  • Mostly read requests
  • Data batch write
  • Do not modify the added data
  • The query is multiple rows and few columns
  • Storage wide table
  • Fewer queries (around 100/s per unit)
  • For simple queries, a delay of approximately 50 milliseconds is allowed
  • No large fields (for example, 60 bytes per URL)
  • Large number of rows in a single query
  • No transaction requirements, low data consistency requirements
  • Data can be stored in the memory of a single server after being filtered or aggregated

Summary: Mass data, but do not want the storage space consumption of a single node to be too high. For wide tables, for business convenience, many related data columns may be integrated into one table. SQL-based query mode improves the applicability and portability of the program.

2. Features

  • Vector computing, and supports multi-core CPU parallel computing, and strive to squeeze CPU performance when executing each SQL.
  • Columnar storage, high data compression ratio
  • Based on the Shared nothing architecture, it supports distributed solutions.
  • Compatible with most SQL syntax, and its syntax is especially similar to MySQL.
  • Support primary key
  • index
  • Online calculation
  • Support approximate calculation
  • Support master-slave replication architecture
  • Real-time data update

Three, restrictions

1. Does not support affairs

2. High-frequency, low-latency updates and deletions are not applicable, only batch deletion and modification are supported

3. Sparse index, not suitable for point query

Fourth, performance

  • Single big query

Data is in page cache

​ Complex query 2-10GB/s (uncompressed), simple query 30GB/s

Data is not in page cache

​ The processing speed is equal to the disk IO* compression ratio

Performance is almost linearly expanded in distributed scenarios

  • Latency for short queries

Data is in page cache

The primary key query of hundreds of thousands of rows is less than 50ms

Data is not in page cache

HDD: 10ms * field number * data block data volume

  • Short query throughput

About 100 times per second

  • Write performance

It is recommended to write at most 1 time per second or write more than 1000 lines each time, and the writing speed is 50-200MB/s

Guess you like

Origin blog.csdn.net/qq_42979842/article/details/108921129