Open a new generation of cloud-native data warehouse Databend in KubeSphere

Author: Shang Zhuoran ( https://github.com/PsiACE) , Databend R&D Engineer, Apache OpenDAL (Incubating) PPMC.

foreword

Databend is a new generation of cloud-native data warehouse completely oriented to cloud object storage, designed for elasticity and high efficiency, and escorting your large-scale analysis needs. Databend is also an open source software that complies with the Apache-2.0 protocol. In addition to accessing cloud services ( https://app.databend.com/), users can also deploy Databend production clusters to meet workload needs.

Typical usage scenarios for Databend include:

  • Real-time analysis platform, fast query and visualization of logs.
  • Cloud data warehouse, multi-dimensional analysis and report generation of historical order data.
  • Hybrid cloud architecture, unified management and processing of data from different sources and formats.
  • For cost- and performance-sensitive OLAP scenarios, dynamically adjust storage and computing resources.

KubeSphere is an application-centric multi-tenant container platform built on top of Kubernetes. It provides full-stack IT automation operation and maintenance capabilities, can manage containerized applications on multiple nodes, and provides high availability, elastic expansion and contraction, and service Discovery, load balancing and other functions.

Using KubeSphere to deploy and manage Databend has the following advantages:

  • Use Helm Charts to deploy Databend clusters to simplify application management, deployment process and parameter settings.
  • Use the characteristics of Kubernetes to realize the automatic recovery, horizontal expansion, load balancing, etc. of the Databend cluster.
  • Easily integrate and interact with other services or applications on Kubernetes, such as MinIO, Prometheus, Grafana, etc.

This article will introduce how to use KubeSphere to create and deploy a Databend high-availability cluster, and use QingStor as the underlying storage service.

Configure object storage

Object storage is a storage model that manages and accesses data as objects, rather than files or blocks. The advantages of object storage include: scalability, low cost, high availability, etc.

Databend is designed from the ground up for object storage, increasing flexibility and efficiency while reducing complexity and cost. Databend supports multiple object storage services, such as AWS S3, Azure Blob, Google Cloud Storage, HDFS, Alibaba Cloud OSS, Tencent Cloud COS, etc. You can choose an appropriate service to store your data according to your business needs and preferences.

Here we take QingStor as an example to introduce the pre-preparation for S3-compatible object storage configuration.

Create buckets

Object storage service (QingStor) provides an unlimited-capacity online file storage and access platform. Each user can create multiple storage spaces (Bucket); you can upload any type of file to a storage space (Bucket) through the console or QingStor API; the storage space (Bucket) supports access control, and you can Space (Bucket) is open to specified users, or all users.

Log in to the Qingyun console, select the object storage service, and create a new bucket for verification.

What you need to pay attention to is the name of the bucket <bucket>and the availability zone where it is located <region>.

Since the s3-compatible service is used here, the endpoint_url of the last connection is s3.<bucket>.<region>.qingstor.com.

Create an API key

The API key (Access Key) allows you to access Qingyun's services by sending API commands. The API key ID must be sent as a parameter in each request; the private key of the API key is responsible for generating the signature of the API request string, and the private key must be kept properly and must not be shared. By default, all IP addresses can use this key to call the API. After setting the IP whitelist, only IP addresses within the whitelist range can use this key.

Click the upper right menu, select API key, and create a new key for API access.

qy_access_key_idCorrespondence in the download file access_key_id, qy_secret_access_keycorresponding secret_access_key.

Prepare the KubeSphere environment

KubeSphere ( https://kubesphere.io) is an open source container platform built on top of Kubernetes, which provides full-stack IT automation operation and maintenance capabilities, and simplifies the DevOps workflow of enterprises. KubeSphere has been adopted by tens of thousands of enterprises at home and abroad. In addition, KubeSphere also has an extremely open ecosystem. Based on OpenPitrix, KubeSphere provides users with a Helm-based application store for application lifecycle management. The KubeSphere App Store enables ISVs, developers and users to upload, test, install and publish applications with just a few clicks in a one-stop service. Currently Databend has entered the KubeSphere application store.

KubeSphere environment construction

All-in-One mode deployment test environment

Refer to the official documentation .

Spot a machine on Azure:

Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1089-azure x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Tue Sep  6 02:09:16 UTC 2022

  System load:  0.15              Processes:           376
  Usage of /:   4.8% of 28.89GB   Users logged in:     0
  Memory usage: 0%                IP address for eth0: 10.0.0.4
  Swap usage:   0%

Deploy in All-In-One mode:

Note that it needs to be run under root.

apt install socat conntrack containerd
systemctl daemon-reload
systemctl enable --now containerd
curl -sfL https://get-kk.kubesphere.io | VERSION=v3.0.2 sh -
chmod +x kk
./kk create cluster --with-kubernetes v1.22.12 --with-kubesphere v3.3.1
+------+------+------+---------+----------+-------+-------+---------+-----------+--------+--------+------------------+------------+-------------+------------------+--------------+
| name | sudo | curl | openssl | ebtables | socat | ipset | ipvsadm | conntrack | chrony | docker | containerd       | nfs client | ceph client | glusterfs client | time         |
+------+------+------+---------+----------+-------+-------+---------+-----------+--------+--------+------------------+------------+-------------+------------------+--------------+
| ks   | y    | y    | y       | y        | y     |       |         | y         | y      |        | 1.5.9-0ubuntu3.1 |            |             |                  | UTC 02:53:56 |
+------+------+------+---------+----------+-------+-------+---------+-----------+--------+--------+------------------+------------+-------------+------------------+--------------+

If you are prompted that the dependencies are missing, you can install them as needed, sudo apt install <name>, only the first two are installed here.

Kubernetes Version ≥ 1.18
shocked Required
conntrack Required
ebtables Optional but recommended
ipset Optional but recommended
ipvsadm Optional but recommended

Access the KubeSphere control panel.

Execute the following command to view information about login:

Collecting installation results ...
#####################################################
###              Welcome to KubeSphere!           ###
#####################################################

Console: http://10.0.0.4:30880
Account: admin
Password: P@88w0rd

NOTES:
  1. After you log into the console, please check the
     monitoring status of service components in
     "Cluster Management". If any service is not
     ready, please wait patiently until all components
     are up and running.
  2. Please change the default password after login.

#####################################################
https://kubesphere.io             2022-09-06 15:41:44
#####################################################

Access 30880the port and log in with the username and password to access KubeSphere. To ensure access to KubeSphere and other services, please add inbound and outbound rules for the corresponding ports in the cloud platform control panel according to the actual situation.

KubeSphere Cloud creates a demo environment

Create a lightweight cluster service:

After registering and logging in to https://kubesphere.cloud , you can easily create lightweight cluster services.

Use the default configuration to create a free version of the cluster to experience early adopters, individual users have 10 hours of free quota per month.

Access the KubeSphere control panel.

Click 进入 KubeSphereto log in with the temporary account password.

plugin enabled

The interface after login is as shown in the figure below:

To use the application store, you can refer to the KubeSphere documentation - Enable the application store after installation Enable .

After opening, you can search for Databend in the app store, and the result is similar to the picture below.

Enterprise Space and Project Management

Click 平台管理to enter 访问控制the page, select 企业空间, click , and fill in the name you want to use 创建in a column, for example .名称databend

项目Select and click on the sidebar to create projects for and prepared respectively创建 . The effect after creation is as shown in the figure:databend-metadatabend-query

Deploy Databend

Application Template Loading

Although Databend is already available in the app store, the version is older (v0.8.122-nightly), and the new PR (v1.0.3-nightly) will not be available until merged, so it is recommended to add the helm-charts officially maintained by Databend as an application template.

Databend officially provides Helm Charts, and KubeSphere also supports the use of Helm Charts application templates.

App templates are a way for users to upload, deliver, and manage apps. In general, an application can consist of one or more Kubernetes workloads (such as deployments , stateful replica sets , and daemon sets ) and services , depending on its functionality and how it communicates with the external environment. Applications uploaded as application templates are built based on Helm packages. Helm Chart can be delivered to KubeSphere's public repository, or imported into a private application repository to provide application templates. https://kubesphere.io/zh/docs/v3.3/workspace-administration/upload-helm-based-application/

Select Application , click Application Warehouse, and add the Helm Charts officially maintained by Databend .

After the status becomes successful, you can install and deploy a new Databend application based on the template.

Databend deployment model

Reference documentation .

A typical Databend cluster architecture is shown in the figure below, and multiple Meta and Query nodes need to be deployed separately:

When deploying Databend in cluster mode, you first need to start a Meta node, and then set and start other Meta nodes to join the first Meta node to form a cluster. After successfully starting all Meta nodes, start Query nodes one by one. Each Query node is automatically registered to the Meta node to form a cluster after startup.

Meta high availability cluster deployment

Check databend-metathe item . Click on the sidebar 应用负载to select it 应用. Click 创建and select 从应用模板. Select the previously added Databend in the drop-down bar, the effect is as shown in the figure:

Select databend-meta, click 安装, and set the application name and version. We recommend always using the latest version for a better experience.

Using the example setup, create a cluster of 3 replica databend-metanodes . In the production environment, it is recommended to use at least 3 copies of high-availability clusters, which can be configured by referring to the official Databend documentation.

bootstrap: true
replicaCount: 3
persistence:
  size: 5Gi # 考虑到宿主机资源有限,仅供示范
serviceMonitor:
  enabled: true

Query cluster deployment

After all replicas of the Meta nodes are ready, the Query cluster can be deployed.

The pre-steps of Query node deployment are similar to those of Meta nodes. Enter databend-querythe project , follow the previous steps to select the databend-query application template to create.

The parts of the configuration that need attention are:

  • databend-meta connection: The address here depends on the relevant information of the previously deployed Meta cluster.
  • Storage method: In this example, QingStor is connected and uses the S3 compatible protocol, so special attention is required endpoint_url.
  • Built-in user creation: Create databenda databendbuilt-in user named password as to facilitate access in non-localhost situations.

What is started here is a single-copy Query cluster, which can be flexibly adjusted according to the workload scale in practice.

replicaCount: 1
config:
  query:
    clsuterId: default
    # add builtin user
    users:
      - name: databend
        # available type: sha256_password, double_sha1_password, no_password, jwt
        authType: double_sha1_password
        # echo -n "databend" | sha1sum | cut -d' ' -f1 | xxd -r -p | sha1sum
        authString: 3081f32caef285c232d066033c89a78d88a6d8a5
  meta:
    # Set endpoints to use remote meta service
    # depends on previous deployed meta service、namespace and nodes
    endpoints:
      - "databend-meta-0.databend-meta.databend-meta.svc:9191"
      - "databend-meta-1.databend-meta.databend-meta.svc:9191"
      - "databend-meta-2.databend-meta.databend-meta.svc:9191"
  storage:
    # s3, oss
    type: s3
    s3:
      bucket: "<bucket>"
      endpoint_url: "https://s3.<region>.qingstor.com" # for qingstor
      access_key_id: "<key>"
      secret_access_key: "<secret>"
# [recommended] enable monitoring service
serviceMonitor:
  enabled: true
# [recommended] enable access from outside cluster
service:
  type: LoadBalancer

KubeSphere monitoring

KubeSphere observation workload

Just wait for the status to change 运行中. At this time, it is very convenient to use KubeSphere to observe the workload.

resource status

  • databend meta

  • databend-query

monitor

  • databend meta

  • databend-query

accessibility testing

Node Status Detection

If deployed in All-in-One mode , we can easily use the container group IP address to test the node status.

psiace@ks:~$ curl 10.233.107.113:8080/v1/health
{"status":"pass"}

When deploying with KubeSphere Cloud网络 , you can choose to create access rules in the KubeSphere Cloud control panel .

Here we take ports 8080 (Admin API) and 8000 (Query HTTP Handler) as examples:

The result after creation is shown in the figure below:

Similarly we can use curl to check node status.

psiace@ks:~$ curl https://admin-gfkyzxaz.c.kubesphere.cloud:30443/v1/health
{"status":"pass"}

execute query

bendsql is a very convenient command line interface tool that can help you use Databend smoothly and efficiently. bendsql also supports connecting to Databend Cloud, managing computing clusters and running SQL queries.

install bendsql

$ go install github.com/databendcloud/bendsql/cmd/bendsql@latest

Connect to the databend cluster (take KubeSphere Cloud as an example)

$ bendsql connect -H query-gfkyzxaz.c.kubesphere.cloud -P 30443 -u databend -p databend --ssl
Connected to Databend on Host: query-gfkyzxaz.c.kubesphere.cloud
Version: DatabendQuery v0.9.57-nightly-df858a1(rust-1.68.0-nightly-2023-03-01T01:23:11.56066902Z)

try to execute query

$ bendsql query
Connected with driver databend (DatabendQuery v0.9.57-nightly-df858a1(rust-1.68.0-nightly-2023-03-01T01:23:11.56066902Z))
Type "help" for help.

dd:databend@query-gfkyzxaz/default=> SELECT avg(number) FROM numbers(1000);
+-------------+
| avg(number) |
+-------------+
| 499.5       |
+-------------+
(1 row)

Summarize

This article introduces how to use KubeSphere to create and deploy a Databend high-availability cluster. The back-end storage service uses QingStor. Finally, use bendsql to demonstrate the connection to the cluster and execute queries.

This article is published by OpenWrite, a multi-post platform for blogging !

Guess you like

Origin blog.csdn.net/zpf17671624050/article/details/129427502
Recommended