Chproxy+Clickhouse high-availability cluster deployment monitoring

1. Environmental description

1.1 Service deployment planning

server nickname IP address Service configuration
mdw 172.16.104.11 grafana 、 prometheus 、 ck
sdw1 172.16.104.12 chproxy 、 ck 、 zk
sdw2 172.16.104.13 ck 、 zk
sdw3 172.16.104.14 ck 、 zk

Local environment resources are limited. In the production environment, it is recommended that different services be deployed on different servers.

1.2 CK cluster settings

┌─cluster──────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ shard2_repl1 │         1 │            1 │           1 │ mdw       │ 172.16.104.11 │ 9000 │        1 │ default │                  │            0 │                       0 │
│ shard2_repl1 │         1 │            1 │           2 │ sdw1      │ 172.16.104.12 │ 9000 │        0 │ default │                  │            0 │                       0 │
│ shard2_repl1 │         2 │            1 │           1 │ sdw2      │ 172.16.104.13 │ 9000 │        0 │ default │                  │            0 │                       0 │
│ shard2_repl1 │         2 │            1 │           2 │ sdw3      │ 172.16.104.14 │ 9000 │        0 │ default │                  │            0 │                       0 │
└──────────────┴───────────┴──────────────┴─────────────┴───────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

Second, the use of chproxy

2.1 Advantages of chproxy

1. Disadvantages of native clickhouse cluster

  • ck completes distributed query and writing through local tables and distributed tables, which leads to a large amount of network bandwidth consumption when the statement executes the initial database node (coordinated processing operation with zk) for data distribution;
  • For writing to a distributed table, the data will be placed on the machine where the distributed table is located, and then asynchronously sent to the machine where the local table is located for storage. There is no consistency check in the middle, and there is a risk of data consistency;
  • For the writing of distributed tables, if the machine is down when the distributed table is located, there will be a risk of data loss;
  • In some scenarios, using native distributed tables will result in uneven distribution of data writes on shard nodes;
  • Use on cluster SQL with caution. At present, the grammar is not very complete. In some cases, it will lead to SQL hang;
  • For the use of distributed tables, it is recommended to create only individual nodes, specifically for distributed query operations;

2. Optimization of chproxy

  • Perform polling routing directly through chproxy, and write data directly to the local table to avoid the risk of data consistency and data loss that may be caused by the native distributed table;
  • Perform polling routing directly through chproxy to avoid uneven data distribution of multi-shard table nodes;
  • Perform polling routing directly through chproxy to avoid the doubled network bandwidth consumption when the statement is executed and the initial database node performs data delivery;
  • Chproxy can divide the ck cluster into multiple logical clusters, and different logical clusters can be configured on demand (different logics specify designated nodes in the CK cluster, and data write and query are logically isolated to avoid resource interference, etc.)
  • Load balance on the upper layer of Chproxy, which can achieve high availability at the chproxy level, and can also expand business capabilities horizontally
  • Restrict the query, access, security, etc. of logical users;

Insert picture description here

2.2 Clickhouse cluster installation and deployment

Reference documents: CK cluster construction and deployment

2.3 chproxy installation and deployment

1. Software installation

# wget -c https://github.com/Vertamedia/chproxy/releases/download/v1.14.0/chproxy-linux-amd64-v1.14.0.tar.gz
# tar xf chproxy-linux-amd64-v1.14.0.tar.gz

2. Write chproxy configuration file

# mkdir -pv /data/chproxy
# touch /data/chproxy/config.yml


# vim /data/chproxy/config.yml
log_debug: false                                                    # debug日志
hack_me_please: true

# cache设置,可设置长期缓存或者短期缓存,按组区分
caches:                                                             # 缓存设置
  - name: "longterm"
    dir: "/data/chproxy/longterm/cachedir"
    max_size: 100Gb
    expire: 1h
    grace_time: 20s

  - name: "shortterm"
    dir: "/data/chproxy/shortterm/cachedir"
    max_size: 100Mb
    expire: 10s

# 网络白名单组,按组区分
network_groups:                                                     # 白名单组,可设置多个白名单组         
  - name: "cluster_internal"
    networks: ["172.16.104.0/24"]

  - name: "office_addrs"
    networks: ["192.168.102.0/24"]

# 参数设置,按组区分
param_groups:                                                       # 参数组,可设置多个参数
  - name: "cron-job"
    params:
      - key: "max_memory_usage"
        value: "40000000000"

      - key: "max_bytes_before_external_group_by"
        value: "20000000000"

  - name: "web"                                                 
    params:
      - key: "max_memory_usage"
        value: "5000000000"

      - key: "max_columns_to_read"
        value: "30"

      - key: "max_execution_time"
        value: "30"

# chproxy server相关设置,一般分为http、https、metrics
server:
  http:
    listen_addr: ":9090"                                            # chproxy 服务监听端口
    allowed_networks: ["office_addrs", "cluster_internal"]          # 允许访问chproxy服务白名单
    read_timeout: 5m
    write_timeout: 10m
    idle_timeout: 20m

  metrics:
    allowed_networks: ["office_addrs", "cluster_internal"]          # 暴露给prometheus使用的白名单

# 用户设置,按组区分
users:
  - name: "web"                                                     # chproxy 用户名
    password: "123456"                                              # chproxy 密码
    to_cluster: "chproxy_ck_cluster_1"                              # 用户可访问的cluster名称
    to_user: "web"                                                  # chproxy用户对应的ck用户
    deny_http: false                                                # 是否允许http请求
    allow_cors: true                                                
    requests_per_minute: 20                                         # 限制该用户每分钟请求次数
    # cache: "longterm"                                             # 使用缓存,若使用缓存,查询优先走缓存,而不是按照规则轮询
    params: "web"                                                   # 应用“web”指定的参数集
    max_queue_size: 100                                             # 最大队列数    
    max_queue_time: 35s                                             # 队列最大等待时间

  - name: "default"                                                 # chproxy 用户
    to_cluster: "chproxy_ck_cluster_2"                              # 不同的chproxy用户,可对应不同的cluster集群
    to_user: "default"
    allowed_networks: ["office_addrs", "cluster_internal", "172.16.104.12"]
    max_concurrent_queries: 4
    max_execution_time: 1m
    deny_https: false

# 逻辑集群设置,按组区分
clusters:
  - name: "chproxy_ck_cluster_1"                                    # chproxy 集合名称
    scheme: "http"                                                  # 请求类型,http/https
    nodes: ["mdw:8123","sdw2:8123"]                                 # 集群可访问节点,http使用端口默认为8123,https使用端口默认为8443,查看ck服务的config.xml配置文件查询
    heartbeat:                                                      # 集群内部心跳检测定义
      interval: 1m
      timeout: 10s
      request: "/?query=SELECT%201%2B1"
      response: "2\n"

    kill_query_user:                                                # 达到上限自动执行kill用户
      name: "default"
      password: ""

    users:
      - name: "web"                                                 # 集群对应chproxy用户信息
        password: "123456"
        max_concurrent_queries: 4
        max_execution_time: 1m

  - name: "chproxy_ck_cluster_2"                                    # chproxy 集群2名称,可从逻辑上定义多个集群
    scheme: "http"
    replicas:                                                       # 集群可访问节点
      - name: "replica1"
        nodes: ["mdw:8123", "sdw1:8123"]
      - name: "replica2"
        nodes: ["sdw2:8123", "sdw3:8123"]

    users:
      - name: "default"
        max_concurrent_queries: 4
        max_execution_time: 1m

Some configuration information notes:

1) User configuration

  • Two types of users are defined in chproxy: in-users, out-users
  • in-users refers to the logically defined users in chproxy, out-users refers to the users actually created in ck
  • One out-users can correspond to one or more in-users, some query restrictions such as maximum concurrency, worst query time, etc. can be restricted at the out-users layer

2) Cluster settings

  • The chproxy cluster can be set up with one or more logical clusters, and each logical cluster must contain a name and a list of nodes, or a list of replica nodes
  • The round-robin + least-loaded balance method is adopted for requests between cluster nodes and replica nodes
  • If the request to a node is unsuccessful in the near future, the priority of the node will be automatically lowered within a short time interval. This means that chproxy will automatically select the healthy node with the smallest copy load in each request
  • Heartbeat detection can be used to check the availability of each node within the cluster, and the routing of business requests will automatically exclude unavailable nodes
  • The cluster can automatically kill queries that reach the max_execution_time limit through the setting of kill_query_user
  • If the cluster does not specify a user, the default user is used by default

3) Cache settings

  • chproxy supports configuring cache settings, which helps to speed up the query rate
  • Cache can be roughly divided into long-term cache and short-term cache
  • After the cache is added, the request will first return the result set through the cache query. To a certain extent, it may seem that the routing of each request is uneven, but it does not affect the overall cluster performance

4) Security settings

  • hack_me_please starts the security check by default
  • Configurable access object whitelist
  • Support https

3. Start the chproxy service

# ./chproxy -config=/data/chproxy/config.yml >> /data/chproxy/error.log 2>&1 &
[1] 32538

2.4 chproxy function test

1. Traverse query test

[root@sdw3 ~]# echo 'select * from system.clusters' | curl 'http://172.16.104.12:9090/?user=default&password=' --data-binary @-
shard2_repl1	1	1	1	mdw	172.16.104.11	9000	0	default		0	0
shard2_repl1	1	1	2	sdw1	172.16.104.12	9000	0	default		0	0
shard2_repl1	2	1	1	sdw2	172.16.104.13	9000	1	default		0	0
shard2_repl1	2	1	2	sdw3	172.16.104.14	9000	0	default		0	0
[root@sdw3 ~]# echo 'select * from system.clusters' | curl 'http://172.16.104.12:9090/?user=default&password=' --data-binary @-
shard2_repl1	1	1	1	mdw	172.16.104.11	9000	1	default		0	0
shard2_repl1	1	1	2	sdw1	172.16.104.12	9000	0	default		0	0
shard2_repl1	2	1	1	sdw2	172.16.104.13	9000	0	default		0	0
shard2_repl1	2	1	2	sdw3	172.16.104.14	9000	0	default		0	0
[root@sdw3 ~]# echo 'select * from system.clusters' | curl 'http://172.16.104.12:9090/?user=default&password=' --data-binary @-
shard2_repl1	1	1	1	mdw	172.16.104.11	9000	0	default		0	0
shard2_repl1	1	1	2	sdw1	172.16.104.12	9000	0	default		0	0
shard2_repl1	2	1	1	sdw2	172.16.104.13	9000	0	default		0	0
shard2_repl1	2	1	2	sdw3	172.16.104.14	9000	1	default		0	0
[root@sdw3 ~]# echo 'select * from system.clusters' | curl 'http://172.16.104.12:9090/?user=default&password=' --data-binary @-
shard2_repl1	1	1	1	mdw	172.16.104.11	9000	0	default		0	0
shard2_repl1	1	1	2	sdw1	172.16.104.12	9000	1	default		0	0
shard2_repl1	2	1	1	sdw2	172.16.104.13	9000	0	default		0	0
shard2_repl1	2	1	2	sdw3	172.16.104.14	9000	0	default		0	0

[root@sdw3 ~]# echo 'select * from system.clusters' | curl 'http://172.16.104.12:9090/?user=web&password=123456' --data-binary @-
shard2_repl1	1	1	1	mdw	172.16.104.11	9000	1	default		0	0
shard2_repl1	1	1	2	sdw1	172.16.104.12	9000	0	default		0	0
shard2_repl1	2	1	1	sdw2	172.16.104.13	9000	0	default		0	0
shard2_repl1	2	1	2	sdw3	172.16.104.14	9000	0	default		0	0
[root@sdw3 ~]# echo 'select * from system.clusters' | curl 'http://172.16.104.12:9090/?user=web&password=123456' --data-binary @-
shard2_repl1	1	1	1	mdw	172.16.104.11	9000	0	default		0	0
shard2_repl1	1	1	2	sdw1	172.16.104.12	9000	0	default		0	0
shard2_repl1	2	1	1	sdw2	172.16.104.13	9000	1	default		0	0
shard2_repl1	2	1	2	sdw3	172.16.104.14	9000	0	default		0	0

Three, chproxy monitoring

3.1 A dress grafana

1. Software package installation

# tar xf grafana-7.1.5.linux-amd64.tar.gz -C /usr/local

2. Service start

# ./grafana-server &

3. Service verification

The default listening port number of grafana is 3000. Visit http://172.16.104.11:3000/ for verification. The initial account password is: admin/admin

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-7FCLHVFH-1615029258561)(http://note.youdao.com/yws/res/80721/B978CBF7CCC84475A9D6E68732E6414B)]

3.2 Install prometheus

1. Software package installation

# tar xf prometheus-2.21.0.linux-amd64.tar.gz -C /usr/local

2. Modify the configuration file

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'clickhouse-chproxy'

    scrape_interval: 10s
    static_configs:
    - targets: ['172.16.104.12:9090']               # 监听 chproxy 服务的IP:Port

3. Start the service

# ./prometheus &

4. Service verification

The default listening port number of grafana is 9090. Visit http://172.16.104.11:9090/targets to verify and check whether the job status is normal.

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-Lj23YjGT-1615029258565)(http://note.youdao.com/yws/res/80742/FE3A896A8EBF4FE28E537CF12584F0C8)]

3.3 Configure chproxy monitoring

1. Import the chproxy monitoring template

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-QXG4CRMb-1615029258567)(http://note.youdao.com/yws/res/80744/EAAC793EA51B4E6BB7CD6FC599B5A7B4)]

[External link image transfer failed. The origin site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-prhZSVJC-1615029258570)(http://note.youdao.com/yws/res/80748/82784E80F21D479885AC4ED5C74531B8)]

2. The content of the json template is as follows

https://github.com/Vertamedia/chproxy/blob/master/chproxy_overview.json

3. Effect test

1) Manually perform some data operations through chproxy

[root@sdw3 ~]# echo 'insert into db1.t1 select * from numbers(1000000);' | curl 'http://172.16.104.12:9090/?user=default&password=' --data-binary @-
[root@sdw3 ~]# echo 'insert into db1.t1 select * from numbers(1000000);' | curl 'http://172.16.104.12:9090/?user=default&password=' --data-binary @-
[root@sdw3 ~]# echo 'insert into db1.t1 select * from numbers(1000000);' | curl 'http://172.16.104.12:9090/?user=default&password=' --data-binary @-
[root@sdw3 ~]# echo 'insert into db1.t1 select * from numbers(1000000);' | curl 'http://172.16.104.12:9090/?user=default&password=' --data-binary @-
[root@sdw3 ~]# echo 'insert into db1.t1 select * from numbers(1000000);' | curl 'http://172.16.104.12:9090/?user=default&password=' --data-binary @-
[root@sdw3 ~]# echo 'insert into db1.t1 select * from numbers(1000000);' | curl 'http://172.16.104.12:9090/?user=default&password=' --data-binary @-
[root@sdw3 ~]# echo 'insert into db1.t1 select * from numbers(1000000);' | curl 'http://172.16.104.12:9090/?user=default&password=' --data-binary @-

2) View monitoring

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-coKVPhBR-1615029258572)(http://note.youdao.com/yws/res/80751/7E961E0DBC20427FB36D3CD11FCD29FC)]

Reference document:
https://github.com/Vertamedia/chproxy
https://www.jianshu.com/p/9498fedcfee7

Guess you like

Origin blog.csdn.net/weixin_37692493/article/details/114452689