Clickhouse cluster installation and deployment

1. Installation environment

本次安装使用clikchouse用户来安装,安装环境为CetOS7。其余linux也可适用(略有变动)
Clickhouse's environmental requirements official website also introduces:
that is, CH only supports Linux and must support 4.2 SSE instructions. If you want to build CH in other environments, you can use docker or use online cloud services.

System requirements for pre-built packages: Linux, x86_64 with SSE 4.2.

Check whether the system supports SSE4.2

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

Turn off the firewall

# 关闭防火墙
systemctl stop firewalld.service

# 禁止开机启动防火墙
systemctl disable firewalld.service

2. Version selection and download

Create a directory for downloading the installation package

mkdir -p /bigdata/software/clickhouse

Get the latest stable version and download the installation package

export LATEST_VERSION=$(curl -s https://repo.clickhouse.tech/tgz/stable/ | \
    grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -V -r | head -n 1)
curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-common-static-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-common-static-dbg-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-server-$LATEST_VERSION.tgz
curl -O https://repo.clickhouse.tech/tgz/stable/clickhouse-client-$LATEST_VERSION.tgz

Three, installation

Unzip the installation package and install

tar -xzvf clickhouse-common-static-$LATEST_VERSION.tgz
sudo clickhouse-common-static-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-common-static-dbg-$LATEST_VERSION.tgz
sudo clickhouse-common-static-dbg-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-server-$LATEST_VERSION.tgz
sudo clickhouse-server-$LATEST_VERSION/install/doinst.sh
tar -xzvf clickhouse-client-$LATEST_VERSION.tgz
sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh

Introduction to the core directory structure

  1. /etc/clickhouse-server: The directory of the server configuration file
  2. /etc/clickhouse-client: Directory of client configuration files
  3. /var/lib/clickhouse: The default data storage directory
  4. /var/log/clickhouse-server: The default log storage location

Profile introduction

  1. /etc/security/limits.d/clickhouse.conf: Clickhouse file handle number configuration
  2. /etc/cron.d/clickhouse-server: cron timing task configuration. Used to restore the Clickhouse service that was interrupted due to an exception. It is executed every 10 minutes by default. It will check whether the clickhouse service is running, and if it is not running, it will execute the start operation.
*/10 * * * * root (which service > /dev/null 2>&1 && (service clickhouse-server condstart ||:)) || /etc/init.d/clickhouse-server condstart > /dev/null 2>&1
  1. Executable files in the /usr/bin path
  • clickhouse: executable file of clikchouse main program
  • clickhouse-client: soft connection to Clickhouse executable file, used for client connection
  • clickhouse-server: a soft link to the Clickhouse executable file, used to start the server
  • clickhouse-compressor: built-in compression tool that can be used for data compression and decompression

Three, start the service

1) Disk storage configuration

Modify the default data storage path. Since the data volume will be relatively large when actually used, this path should be switched to a large-capacity disk.

<path>/ch/data/</path>
<tmp_path>/ch/data/tmp/</tmp_path>
<user_files_path>/ch/data/user_files/</user_files_path>

Modify file directory permissions

chown -R clickhouse:clickhouse /ch/daata/

If you have multiple disks, you need to set the path of multiple disks and modify the file config.xml

<storage_configuration>
		...
        <disks>
            <sdc> <!-- disk name -->
                <path>/datac/clickhouse/</path>
            </sdc>
            <sdd> <!-- disk name -->
                <path>/datad/clickhouse/</path>
            </sdd>
            <sde> <!-- disk name --> 
                <path>/datae/clickhouse/</path>
            </sde>
			...
		</disks>
		...
</storage_configuration>

By default, clickhouse will only write data to the default address you set, so you need to set a new storage strategy. Modify the config.xml file.

<storage_configuration>
	...
	<policies>
		...
	    <!-- 该策略为数据平均写到所有磁盘上 -->
	    <hdd_in_order> <!-- 策略名称 -->
	        <volumes>
	            <single>
	                <disk>default</disk> <!-- 磁盘名称 -->
	                <disk>sdc</disk> <!-- 磁盘名称 -->
	                ...
	            </single>
	        <volumes>
	    </hadd_in_order>
	    ...
	</policies>
	...
</storage_configuration>

If you restart the clickhouse query table system.storage_policies at this time, you can query your newly configured storage rules.
Insert picture description here
2)
To use cluster mode in zk configuration and cluster configuration clickhouse, you need to integrate zookeeper. The installation of zookeeper is omitted here, clickhouse needs to modify the metrika.xml file and add it

<zookeeper-servers>
    <node index="1">
        <host>xxxx1</host>
        <port>2181</port>
    </node>
    <node index="2">
        <host>xxxx2</host>
        <port>2181</port>
    </node>
    <node index="3">
        <host>xxxx3</host>
        <port>2181</port>
    </node>
</zookeeper-servers>

Each node in the clickhouse cluster has no intersection with each other, so you need to add the cluster information of all nodes in metrika.xml

<clickhouse_remote_servers>
    <report_shards_replicas>
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>host1</host>
                <port>9005</port>
                <user>default</user>
                <password>xxxx</password>
            </replica>
            <!-- 若有多个分片则继续往下配replica -->
            <replica>
                <host>host4</host>
                <port>9005</port>
                <user>default</user>
                <password>xxxx</password>
            </replica>
        </shard>
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>host2</host>
                <port>9005</port>
                <user>default</user>
                <password>xxxx</password>
            </replica>
        </shard>
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>host3</host>
                <port>9005</port>
                <user>default</user>
                <password>xxxx</password>
            </replica>
        </shard>
	</report_shards_replicas>
</clickhouse_remote_servers>

<!-- 配置macros是为了方便后续创建分布式表的时候可以用动态参数指定表在zk上的路径 -->
<macros>
    <!-- layer可不配置,若只有一套集群 -->
    <layer>01</layer>
    <shard>01</shard>
    <replica>cluster01-01-1</replica>
</macros>
  1. Configure user permissions
<?xml version="1.0"?>
<yandex>
    <profiles>
        <!-- 读写用户配置 -->
        <default>
            <!-- 单查询最大内存使用 -->
            <max_memory_usage>10000000000</max_memory_usage>
            <!-- 是否使用未压缩格式存储缓存(一般不建议) -->
            <use_uncompressed_cache>0</use_uncompressed_cache>
            <!-- 分配模式下选择副本的方式 -->
            <load_balancing>random</load_balancing>
        </default><!-- 只读用户配置 -->
        <readonly>
            <max_memory_usage>10000000000</max_memory_usage>
            <use_uncompressed_cache>0</use_uncompressed_cache>
            <load_balancing>random</load_balancing>
            <readonly>1</readonly>
        </readonly>
    </profiles><!-- 用户和访问权限控制 -->
    <users>
        <!-- default为用户么,可以自己指定 -->
        <default>
            <!-- 密码可以用SHA256加密 -->
            <password_sha256_hex>967f3bf355dddfabfca1c9f5cab39352b2ec1cd0b05f9e1e6b8f629705fe7d6e</password_sha256_hex>
            <!-- 访问权限设置。
                 
                 任何地方都能读取:
                    <ip>::/0</ip>
​
                 只能从本地读取:
                    <ip>::1</ip>
                    <ip>127.0.0.1</ip>
​
                 可以用正则表达式去表示。
             -->
            <networks incl="networks" replace="replace">
                <ip>::/0</ip>
            </networks>
            <!-- profile 指定标签 -->
            <profile>default</profile>
            <!-- Quota 指定标签 -->
            <quota>default</quota>
        </default>
        <!-- 只读用户(个人创建) -->
        <ck>
            <password_sha256_hex>967f3bf355dddfabfca1c9f5cab39352b2ec1cd0b05f9e1e6b8f629705fe7d6e</password_sha256_hex>
            <networks incl="networks" replace="replace">
                <ip>::/0</ip>
            </networks>
            <profile>readonly</profile>
            <quota>default</quota>
        </ck>
    </users><!-- 资源限额 -->
    <quotas>
        <!-- 资源限额的名字. -->
        <default>
            <!-- 用于限制一定时间间隔内的资源使用量 -->
            <interval>
                <!-- 时间间隔 -->
                <duration>3600</duration>
                <!-- 下面配置为无限制 -->
                <queries>0</queries>
                <errors>0</errors>
                <result_rows>0</result_rows>
                <read_rows>0</read_rows>
                <execution_time>0</execution_time>
            </interval>
        </default>
    </quotas>
</yandex>

There will be other configurations of config.xml and metrika.xml in the follow-up, see another article of mine for details.
https://blog.csdn.net/sileiH/article/details/113404907

4) Start clickhouse-service
to execute on each node:

sudo /etc/init.d/clickhouse-server start

Log in to CH​Client:

Log in locally:

clickhouse-client -u username --password pwd

Remote login:

clickhouse-client -h host --port port -u username --password pwd

Insert picture description here

Guess you like

Origin blog.csdn.net/sileiH/article/details/113736233