Summary of dababend deployment based on minio

Databend is an open source, flexible, low-cost, new data warehouse that can also perform real-time analysis based on object storage . Looking forward to your attention, let's explore cloud-native data warehouse solutions together to create a new generation of open source Data Cloud.

Minio build

minion 192.168.10.159

cd /data 
mkdir minio
cd minio 
wget https://dl.min.io/server/minio/release/linux-amd64/minio
export MINIO_ROOT_USER=minioadmin
export MINIO_ROOT_PASSWORD=minioadmin
#./minio server ./data
./minio server --address :29000 ./data

Use the WEB interface to open the MinIO management test and create a: databend bucket

 databenddownload

Official website download: only the latest version can be downloaded, and the use of the latest version may require upgrading the corresponding dependent libraries. If there is a risk, use it in docker.

Databend - Activate your Object Storage for real-time analytics | Databend

Historical version download:

Tags · datafuselabs/databend · GitHub

Experience: You can download by modifying the version number of the link below

https://repo.databend.rs/databend/v0.9.50-nightly/databend-v0.9.50-nightly-x86_64-unknown-linux-musl.tar.gz

mkdir databend_cluster

tar -zxvf databend-v0.9.50-nightly-x86_64-unknown-linux-musl.tar.gz -C databend_cluster

single point deployment

Configuration of databend-query

# Storage config.
[storage]
# fs | s3 | azblob | obs
type = "s3"

# Set a local folder to store your data.
# Comment out this block if you're NOT using local file system as storage.
[storage.fs]
data_path = "./.databend/stateless_test_data"

# To use S3-compatible object storage, uncomment this block and set your values.
[storage.s3]
bucket = "databend"
endpoint_url = "https://192.168.10.159:29000"
access_key_id = "minioadmin"
secret_access_key = "minioadmin"

Start Databend

./script/start.sh
ps axu |grep databend

Close Databend

./script/stop.sh 

Connect to Databend

Databend has three external service ports by default:
MySQL: 3307 supports MySQL cli and application connection.
Clickhouse: 8124 Clickhouse http handler protocol

Here we take MySQL client connection as an example:

mysql -h 127.0.0.1 -P3307 -uroot

Note that root can log in without a password through localhost . Databend authority management refers to the design of MySQL 8.0, and users of Databend can be managed according to the user management of MySQL 8.0.

mysql8 create user 

create user 'xx'@'%' identified by 'xxx';
grant all privileges on *.* to 'xx'@'%' with grant option;
flush privileges;

test

After the mysql client logs in

create database test;
use test;

# build table

CREATE TABLE `p_msg` (
  `id` int ,
  `tt` varchar(3000) ,
  `author` varchar(255) ,
  `tags` varchar(255) ,
  `insert_time` timestamp ,
  `pubtime` datetime  
) ;

# insert test data

INSERT INTO `test`.`p_msg`(`id`, `tt`, `author`, `tags`, `insert_time`, `pubtime`) VALUES (1, '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'Albert Einstein', 'change', '2022-10-07 10:32:00', '2022-10-07 10:32:00');

After the data is inserted, data will enter the bucket of minio's databend next time.

 The corresponding test table can be queried as required:

 source:

Deploying single-instance Databend based on MinIO | Beginners (1)

databend cluster deployment

Concepts in a Databend cluster

There are two roles in the Databend cluster: databend-meta cluster and databend-query cluster. Among them, the databend-meta cluster needs to be started first, and it is recommended to be in the awayson state in the production environment, and at least 3 nodes; the databend-query cluster can be multiple.

In production, we recommend deploying a databend-meta cluster globally. databend-meta does not consume much resources, and can also share resources with other programs.

By default, databend-query is the concurrency of the maximum capacity. Under a single databend-query node, a SQL will try its best to make all CPU COREs of a single node concurrent; in cluster mode, databend-query will perform calculations in the entire cluster Concurrent scheduling.

Cluster resource isolation
In databend cluster, there are several important concepts of resource isolation: tenant_id, cluster_id, max_threads. In order for everyone to better understand the databend cluster, we need to understand these three concepts first.

1. tenant_id : tenant id, used to identify which tenant the databend-query belongs to. When the tenant_id is the same, this databend-query can get the following tenant: user list and permissions, corresponding data definition and so on.

2. cluster_id : cluster id, this parameter is attached to tenant_id, first you need to check which tenant_id it is under, and then you can get the corresponding meta data, and then check whether there are members with the same cluster_id. If a member with the same cluster_id is encountered, a cluster will be automatically formed, which can be found in the system.clusters table. After the SQL request reaches the node, the computing power will be coordinated among the nodes with the same tenant_id and cluster_id. The tenant_id is the same, but the cluster_id is different, which can isolate computing power, but everyone shares a data and user list.

 The corresponding meta information under the same teamt_id is shared, and the corresponding cluster_id is calculated and shared. Therefore, Databend supports the native multi-tenant model, and supports multiple Clusters in the same tenant .

3. max_threads : Control how many cpu cores a sql can use on databend-query. The default is the cpu core supported by the node. For example, some complex SQL can limit the number of concurrency through max_threads in the case of insufficient memory to reduce memory usage .

cluster planning

192.160.10.153 databend-meta(single) databend-query

192.160.10.159 databend-meta  databend-query

192.160.10.160 databend-meta  databend-query

databend-meta cluster 

/etc/hosts

192.168.10.153 meta01
192.168.10.159 meta02
192.168.10.160 meta03

192.160.10.153 configuration

vi configs/databend-meta.toml

log_dir                 = "/var/log/databend"
admin_api_address       = "0.0.0.0:28101"
grpc_api_address        = "0.0.0.0:9191"
# databend-query fetch this address to update its databend-meta endpoints list,
# in case databend-meta cluster changes.
grpc_api_advertise_host = "192.168.10.153"

[raft_config]
id            = 1
raft_dir      = "/var/lib/databend/raft"
raft_api_port = 28103

# Assign raft_{listen|advertise}_host in test config.
# This allows you to catch a bug in unit tests when something goes wrong in raft meta nodes communication.
raft_listen_host = "192.168.10.153"
raft_advertise_host = "192.168.10.153"

# Start up mode: single node cluster
single        = true

192.160.10.159 configuration

vi configs/databend-meta.toml

log_dir                 = "/var/log/databend"
admin_api_address       = "0.0.0.0:28101"
grpc_api_address        = "0.0.0.0:9191"
# databend-query fetch this address to update its databend-meta endpoints list,
# in case databend-meta cluster changes.
grpc_api_advertise_host = "meta02"

[raft_config]
id            = 2
raft_dir      = "/var/lib/databend/raft"
raft_api_port = 28103

# Assign raft_{listen|advertise}_host in test config.
# This allows you to catch a bug in unit tests when something goes wrong in raft meta nodes communication.
raft_listen_host = "meta02"
raft_advertise_host = "meta02"

# Start up mode: single node cluster
# single        = true
join            =["meta01:28103","meta03:28103"]

192.160.10.160 configuration

vi configs/databend-meta.toml

log_dir                 = "/var/log/databend"
admin_api_address       = "0.0.0.0:28101"
grpc_api_address        = "0.0.0.0:9191"
# databend-query fetch this address to update its databend-meta endpoints list,
# in case databend-meta cluster changes.
grpc_api_advertise_host = "meta03"

[raft_config]
id            = 3
raft_dir      = "/var/lib/databend/raft"
raft_api_port = 28103

# Assign raft_{listen|advertise}_host in test config.
# This allows you to catch a bug in unit tests when something goes wrong in raft meta nodes communication.
raft_listen_host = "meta03"
raft_advertise_host = "meta03"

# Start up mode: single node cluster
# single        = true
join            =["meta01:28103","meta02:28103"]

Note that the local ip cannot appear in the join node

Start the meta node

The first time you need to pay attention, first you need to start the first node with single and other true, and then start other nodes. In addition to the first startup, you also need to pay attention to at least two meta nodes before starting the query node, otherwise the query may not be added to the cluster.

For the second start, you need to start a non-single node first, start the single node after 2-3 seconds, and finally start other non-single nodes.

Start the databend-meta script

meta.sh

#!/bin/bash
ulimit  -n 65535
nohup bin/databend-meta --config-file=configs/databend-meta.toml  2>&1 >meta.log &

Meta cluster member view

You need to check the admin_api_address port of the single node.

curl 192.168.10.153:28101/v1/cluster/nodes

If the cluster member information appears, it means that the cluster is built successfully, otherwise you need to check the log.

databend-query cluster

configs/databend-query.toml

The configuration of the Query node can be the same, and the core configuration is as follows (note that not all configurations can be copied directly):

flight_api_address = "0.0.0.0:9091"


# Storage config.
[storage]
# fs | s3 | azblob | obs
type = "s3"

# Set a local folder to store your data.
# Comment out this block if you're NOT using local file system as storage.
[storage.fs]
data_path = "./.databend/stateless_test_data"

# To use S3-compatible object storage, uncomment this block and set your values.
[storage.s3]
bucket = "databend"
endpoint_url = "https://192.168.10.159:29000"
access_key_id = "minioadmin"
secret_access_key = "minioadmin"

[meta]
# To enable embedded meta-store, set address to "".
embedded_dir = "./.databend/meta_embedded_1"
address = ["192.168.10.153:9191","192.168.10.159:9191","192.168.10.160:9191"]
username = "root"
password = "root"
client_timeout_in_second = 60
auto_sync_interval = 60

Start the query node

nohup bin/databend-query --config-file=configs/databend-query.toml 2>&1 >query.log &

View query cluster member information

select * from system.clusters;

 verify

On the meta01 node, log in to databend, perform database building, table building, and warehousing operations, and log in to databend on meta02 and meta03 nodes to view the operation results, indicating that the cluster is successfully built.

shutdown process script

#Close the meta process

kill -9 `pgrep -f "bin/databend-meta"`

#Close the query process

kill -9 `pgrep -f "bin/databend-query"`

source:

Databend Cluster Deployment | Beginners (2)_Databend's Blog-CSDN Blog

Databend Cluster Deployment | Beginners (2)

Welcome | Databend

Guess you like

Origin blog.csdn.net/csdncjh/article/details/132000922