Experience ClickHouse from the user's perspective Quick Start

As a distributed column database management system born out of Yandex, open source, oriented to OLAP (online analytical processing of queries), ClickHouse has been active in the stage of star database.

In a blog on April 26, 2022,  ClickHouse Docs have a new look and feel!,  Rich Raposa stated that, given the importance of documentation in helping users (developers, DBAs, architects, etc.) understand and use the product, ClickHouse Docs have a new look and feel! The documentation puts a lot of effort into improving existing content and adding new content. In addition to the traditional function-oriented database operation manuals (for DBAs) and development guides (for database developers), ClickHouse focuses on adding a series of tutorial documents to facilitate developers and DBAs to get started quickly. These documents include:

  • Quick start and tutorial documentation to help users get started quickly
  • A series of tutorials to help users connect to well-known UI/BI tools, covering Grafana, Metabase, Superset, and Tableau
  • A series of tutorials to help users connect to external tools such as Kafka, AWS S3, PostgreSQL, MySQL, Airbyte

Overall, ClickHouse's new documentation focuses on improving the onboarding experience for new users, while improving the operability of the documentation as a whole, adding a lot of practical tutorials. In addition, the new documents have supplemented a large number of content that combines the database with the external ecology, which makes the technical content no longer limited to the database itself. This is undoubtedly very friendly to individual developers.

actual experience

Of course, as document engineers, we can't just look at the propaganda, but also the actual effect. Now let's follow ClickHouse's tutorial to actually operate it.

According to the official documentation, ClickHouse is compatible with multiple operating systems and CPU architectures.

operating system:

  • Linux
  • FreeBSD
  • macOS

CPU Architecture:

  • x86_64
  • AArch64
  • PowerPC64LE

我们选择 Ubuntu 20.04 系统,体验 ClickHouse 的最新教程。

安装并启动 ClickHouse

ClickHouse 官方提供了一个 bash 脚本,帮助你自动化下载和安装。

# 下载 ClickHouse
curl https://clickhouse.com/ | sh

bash 脚本会自动进行下载。

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1292    0  1292    0     0    548      0 --:--:--  0:00:02 --:--:--   548

Will download https://builds.clickhouse.com/master/amd64/clickhouse

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0 2140M    0 2848k    0     0   525k      0  1:09:28  0:00:05  1:09:23  569k

完成之后,运行以下命令进行安装:

sudo ./clickhouse install
Copying ClickHouse binary to /usr/bin/clickhouse.new
Renaming /usr/bin/clickhouse.new to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-server to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-client to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-local to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-benchmark to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-copier to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-obfuscator to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-git-import to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-compressor to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-format to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-extract-from-config to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper-converter to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-disks to /usr/bin/clickhouse.
Creating clickhouse group if it does not exist.
 groupadd -r clickhouse
Creating clickhouse user if it does not exist.
 useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse clickhouse
Will set ulimits for clickhouse user in /etc/security/limits.d/clickhouse.conf.
Creating config directory /etc/clickhouse-server.
Creating config directory /etc/clickhouse-server/config.d that is used for tweaks of main server configuration.
Creating config directory /etc/clickhouse-server/users.d that is used for tweaks of users configuration.
Data path configuration override is saved to file /etc/clickhouse-server/config.d/data-paths.xml.
Log path configuration override is saved to file /etc/clickhouse-server/config.d/logger.xml.
User directory path configuration override is saved to file /etc/clickhouse-server/config.d/user-directories.xml.
OpenSSL path configuration override is saved to file /etc/clickhouse-server/config.d/openssl.xml.
Creating log directory /var/log/clickhouse-server.
Creating data directory /var/lib/clickhouse.
Creating pid directory /var/run/clickhouse-server.
 chown -R clickhouse:clickhouse '/var/log/clickhouse-server'
 chown -R clickhouse:clickhouse '/var/run/clickhouse-server'
 chown  clickhouse:clickhouse '/var/lib/clickhouse'
Enter password for default user:

在这里输入密码或直接回车。

ClickHouse has been successfully installed.

Start clickhouse-server with:
 sudo clickhouse start

Start clickhouse-client with:
 clickhouse-client

两条命令搞定安装。再敲一条命令,就来到了传说中的 A-ha moment。

sudo clickhouse start
$ sudo clickhouse start
 chown -R clickhouse: '/var/run/clickhouse-server/'
Will run clickhouse su 'clickhouse' /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
Waiting for server to start
Waiting for server to start
Server started

数据库服务端启动了。

在 ClickHouse 里 CRUD

教程里提供的 Connect to ClickHouse 部分使用了自带的 HTML 操作界面,可用于 SQL 查询。但由于我使用的 Ubuntu 系统是不带 UI 的,所以只能跳到 The ClickHouse Client 部分使用 CLI 客户端。

感觉这里可以加个说明,告诉没有 UI 的用户直接去 CLI 客户端。体验更顺滑。

运行客户端

运行 CLI 客户端:

clickhouse-client

运行成功

$ clickhouse-client
ClickHouse client version 22.7.1.2100 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.7.1 revision 54456.

Warnings:
 * Linux is not using a fast TSC clock source. Performance can be degraded. Check /sys/devices/system/clocksource/clocksource0/current_clocksource
 * Linux threads max count is too low. Check /proc/sys/kernel/threads-max
 * Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.

localhost.localdomain :) 

创建数据库

我使用的测试机性能不足,因此报了上述 warning。这里就从教程前面的建库开始:

CREATE DATABASE IF NOT EXISTS helloworld
localhost.localdomain :) CREATE DATABASE IF NOT EXISTS helloworld

CREATE DATABASE IF NOT EXISTS helloworld

Query id: 2807edaa-c084-4501-bb0f-a32701f359ff

Ok.

0 rows in set. Elapsed: 0.006 sec. 

创建数据表

接下来建表。可以看到 ClickHouse 里的任何表都需要一个 ENGINE。教程里推荐使用 MergeTree。

CREATE TABLE helloworld.my_first_table
(
    user_id UInt32,
    message String,
    timestamp DateTime,
    metric Float32
)
ENGINE = MergeTree()
PRIMARY KEY (user_id, timestamp)

插入数据

在操作之前,文档专门通过 A Brief Intro to Primary Keys 讲解了 ClickHouse 的一个独特之处,即主键(Primary Key)不是独一无二的!这一点确实和大部分数据库不同。这个讲解看似突兀,实际必要。否则,用户看到下面插入的数据就会产生疑惑。

在产生疑惑之前打消疑惑是非常有效的做法。

插入数据:

INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) VALUES
    (101, 'Hello, ClickHouse!',                                 now(),       -1.0    ),
    (102, 'Insert a lot of rows per batch',                     yesterday(), 1.41421 ),
    (102, 'Sort your data based on your commonly-used queries', today(),     2.718   ),
    (101, 'Granules are the smallest chunks of data read',      now() + 5,   3.14159 )

插入成功:

localhost.localdomain :) INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) VALUES
                             (101, 'Hello, ClickHouse!',                                 now(),       -1.0    ),
                             (102, 'Insert a lot of rows per batch',                     yesterday(), 1.41421 ),
                             (102, 'Sort your data based on your commonly-used queries', today(),     2.718   ),
                             (101, 'Granules are the smallest chunks of data read',      now() + 5,   3.14159 )

INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) FORMAT Values

Query id: 4d376221-2786-4d44-9ef9-129a8f03ddcf

Ok.

4 rows in set. Elapsed: 0.009 sec. 

检查插入的数据:

SELECT * FROM helloworld.my_first_table

结果如下:

localhost.localdomain :) SELECT * FROM helloworld.my_first_table

SELECT *
FROM helloworld.my_first_table

Query id: 867f2212-6d8b-43ec-b1ff-efce3c879f73

┌─user_id─┬─message────────────────────────────────────────────┬───────────timestamp─┬──metric─┐
│     101 │ Hello, ClickHouse!                                 │ 2022-07-16 00:19:30 │      -1 │
│     101 │ Granules are the smallest chunks of data read      │ 2022-07-16 00:19:35 │ 3.14159 │
│     102 │ Insert a lot of rows per batch                     │ 2022-07-15 00:00:00 │ 1.41421 │
│     102 │ Sort your data based on your commonly-used queries │ 2022-07-16 00:00:00 │   2.718 │
└─────────┴────────────────────────────────────────────────────┴─────────────────────┴─────────┘

4 rows in set. Elapsed: 0.002 sec. 

从 CSV 文件插入数据

新建一个 CSV 文件 data.csv 并输入以下内容:

102,This is data in a file,2022-02-22 10:43:28,123.45
101,It is comma-separated,2022-02-23 00:00:00,456.78
103,Use FORMAT to specify the format,2022-02-21 10:43:30,678.90

将 CSV 的数据插入 helloworld 数据库的 my_first_table 表:

clickhouse-client --query='INSERT INTO helloworld.my_first_table FORMAT CSV' < data.csv

这里官方文档是通过 Web UI 来看数据的,我们通过客户端 CLI 操作一下:

SELECT *
                         FROM helloworld.my_first_table
                         ORDER BY timestamp ASC

结果如下,证明数据插入成功:

SELECT *
FROM helloworld.my_first_table
ORDER BY timestamp ASC

Query id: d7216864-2b85-4ad2-9073-6c0bef7ed0c6

┌─user_id─┬─message────────────────────────────────────────────┬───────────timestamp─┬──metric─┐
│     103 │ Use FORMAT to specify the format                   │ 2022-02-21 10:43:30 │   678.9 │
│     102 │ This is data in a file                             │ 2022-02-22 10:43:28 │  123.45 │
│     101 │ It is comma-separated                              │ 2022-02-23 00:00:00 │  456.78 │
│     102 │ Insert a lot of rows per batch                     │ 2022-07-15 00:00:00 │ 1.41421 │
│     102 │ Sort your data based on your commonly-used queries │ 2022-07-16 00:00:00 │   2.718 │
│     101 │ Hello, ClickHouse!                                 │ 2022-07-16 00:19:30 │      -1 │
│     101 │ Granules are the smallest chunks of data read      │ 2022-07-16 00:19:35 │ 3.14159 │
└─────────┴────────────────────────────────────────────────────┴─────────────────────┴─────────┘

操作体验

Overall very smooth. ClickHouse itself has made a great effort to improve the ease of use of the database and the operability of the document. For concepts such as ENGINE and Primary Key, we try our best to avoid overly complicated explanations and focus on guiding users to successfully complete tasks according to the documentation.

For the client part, it is recommended to use a unified client, such as CLI to go through the whole process. Rather than having to split into two parts, the Web UI and the CLI.

Guess you like

Origin juejin.im/post/7120649677904543781