Linux systems: Centos7 under build ClickHouse columnar storage databases

This article Source: GitHub · Click here || GitEE · Click here

A, ClickHouse Profile

1. Introduction basis

Yandex open source data analysis of the database, called ClickHouse, time-series data for streaming or lots of storage. ClickHouse should not be used as a general-purpose database, but real-time distributed processing platform as an ultra-high performance mass data fast query data in aggregate query terms (such as GROUP BY), ClickHouse queries very quickly.

下载仓库:https://repo.yandex.ru/clickhouse
中文文档:https://clickhouse.yandex/docs/zh/

2, features database

(1) Column database

Column-based database storage infrastructure-related columns for data storage database, mainly for bulk data processing and real-time queries.

(2) Data Compression

In some columnar database management system data are not compressed. However, data compression is indeed play a critical role in achieving excellent storage system.

Disk Storage (3) Data

Many column-based database can only work in memory, this approach will result in more than the actual equipment budget. ClickHouse system is designed to work on a conventional disk, providing a lower storage costs per GB.

(4) multi-core parallel processing

Large queries can be parallelized processing ClickHouse to a very natural way, in order to use all the resources available on the current server.

(5) multi-server distributed processing

In ClickHouse, the data may be stored on different shard, each shard by a set of replica composition for fault tolerance, parallel query processing is performed on all the shard.

(6) supports SQL and indexes

ClickHouse support SQL-based query language, in most cases the language is compatible with the SQL standard. Supported query includes GROUPBY, ORDERBY, IN, JOIN, and non-correlated subquery. It does not support window functions and related sub-queries. Sort the data according to the master key, which will help to tens of milliseconds ClickHouse low delay data to find a particular value or range search.

(7) vector engine

For efficient use of the CPU, data is not stored in columns, but also by the vectors (a column) for processing.

(8) real-time data updates

ClickHouse supports define the primary key in the table. In order to make queries to quickly find the range in the primary key, the data is always in an orderly increments stored in MergeTree in. Thus, continuous data can be efficiently written to the table, and the process does not present any written behavior locked.

Two, Linux under installation process

1, download warehouse

curl -s 
https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh 
| sudo os=centos dist=7 bash

2, view the installation package

sudo yum list 'clickhouse*'

3, installation services

sudo yum install -y clickhouse-server clickhouse-client

4, see the installation list

sudo yum list installed 'clickhouse*'

Console output

Installed Packages
clickhouse-client.noarch
clickhouse-common-static.x86_64
clickhouse-server.noarch

5, view the configuration

  • cd /etc/clickhouse-server/
  • vim config.xml
数据目录:/var/lib/clickhouse/
临时目录:/var/lib/clickhouse/tmp/
日志目录:/var/log/clickhouse-server
HTTP端口:8123
TCP 端口:9000

6, configure access

config.xml file configured to remove comments below.

<listen_host>::</listen_host> 

7, start the service

/etc/rc.d/init.d/clickhouse-server start

8, view the service

ps -aux |grep clickhouse

Third, the basic operation

1, construction of the table statement

CREATE TABLE cs_user_info (
  `id` UInt64,
  `user_name` String,
  `pass_word` String,
  `phone` String,
  `email` String,
  `create_day` Date DEFAULT CAST(now(),'Date')
) ENGINE = MergeTree(create_day, intHash32(id), 8192)

Note: The official recommendation engine, MergeTree

Clickhouse the most powerful engine in Table undoubtedly MergeTree (merge tree) engine and the series (* MergeTree) in other engines. The basic idea MergeTree engine series are as follows. When you have a huge amount of data to be inserted into the table, you have to write efficient batch data segment, and hope that these pieces of data consolidation in accordance with certain rules in the background. When compared to insert constantly modify (rewrite) data into memory, this strategy will be a lot efficient.

2, batch write

INSERT INTO cs_user_info 
  (id,user_name,pass_word,phone,email) 
VALUES 
  (1,'cicada','123','13923456789','cicada@com'),
  (2,'smile','234','13922226789','smile@com'),
  (3,'spring','345','13966666789','spring@com');

3, the query

SELECT * FROM cs_user_info ;
SELECT * FROM cs_user_info WHERE user_name='smile' AND pass_word='234';
SELECT * FROM cs_user_info WHERE id IN (1,2);
SELECT * FROM cs_user_info WHERE id=1 OR id=2 OR id=3;

MySQL database query and operation is very similar.

Fourth, the source address

GitHub·地址
https://github.com/cicadasmile/linux-system-base
GitEE·地址
https://gitee.com/cicadasmile/linux-system-base

Linux systems: Centos7 under build ClickHouse columnar storage databases

Guess you like

Origin blog.51cto.com/14439672/2446105