Deploy a three-node rook-ceph cluster based on kubesphere's k8s environment


Preface

This experiment recorded a rook-ceph cluster using three computing\storage nodes built using a virtual machine to simulate real usage scenarios. Supplementary: The previous post only has a single point deployment method.

Link:Deploy a single-point version of rook-ceph in the k8s environment based on kubesphere

1. What is rook-ceph?

Rook is an open source cloud-native storage orchestrator, providing the platform, framework, and support for Ceph storage to natively integrate with cloud-native environments.
Rook is an open source A cloud-native storage orchestrator that provides the platform, framework, and support for Ceph storage to natively integrate with cloud-native environments.

2. Start deployment

2.1 Environment preparation

Number of virtual machines: four
Virtual machine image type: CentOS-7-x86_64-Minimal-2009.iso
k8s environment: one set
k8s environment:1.23.6

The machine list is as follows

hostname IP system disk data disk
cubeadmin 192.168.150.61 now (50G) none
kubeworker01 192.168.150.62 sda(20G) vda(20G),vdb(20G)
kubeworker02 192.168.150.63 sda(20G) vda(20G),vdb(20G)
kubeworker03 192.168.150.64 sda(20G) vda(20G),vdb(20G)

2.2 Software package preparation, computing\storage node execution

Install the software package and load the rbd module

#软件包装备
yum install -y git lvm2 gdisk
#内核加载rbd模块
modprobe rbd
lsmod | grep rbd

Note: Delete residual data. If the deployment fails, be sure to clean up the data. Failure to do so will affect the next deployment.

删除配置文件目录
rm -rf /var/lib/rook/
格式化磁盘
gdisk --zap-all /dev/vda
gdisk --zap-all /dev/vdb
dd if=/dev/zero of=/dev/vda  bs=1M count=100 oflag=direct,dsync
dd if=/dev/zero of=/dev/vdb  bs=1M count=100 oflag=direct,dsync

2.3 Download rook-ceph file

Download the file and extract the core files to your own deployment folder

cd /data/
yum install -y git
git clone --single-branch --branch v1.11.6 https://github.com/rook/rook.git
# 提取部署文件
mkdir -p /data/rook-ceph/
cp /data/rook/deploy/examples/crds.yaml /data/rook-ceph/crds.yaml
cp /data/rook/deploy/examples/common.yaml /data/rook-ceph/common.yaml
cp /data/rook/deploy/examples/operator.yaml /data/rook-ceph/operator.yaml
cp /data/rook/deploy/examples/cluster.yaml /data/rook-ceph/cluster.yaml
cp /data/rook/deploy/examples/filesystem.yaml /data/rook-ceph/filesystem.yaml
cp /data/rook/deploy/examples/toolbox.yaml /data/rook-ceph/toolbox.yaml
cp /data/rook/deploy/examples/csi/rbd/storageclass.yaml /data/rook-ceph/storageclass-rbd.yaml
cp /data/rook/deploy/examples/csi/cephfs/storageclass.yaml /data/rook-ceph/storageclass-cephfs.yaml
cp /data/rook/deploy/examples/csi/nfs/storageclass.yaml /data/rook-ceph/storageclass-nfs.yaml

cd /data/rook-ceph

2.4 Deploy operator

Modify the image warehouse information and change the image warehouse in operator.yaml to Alibaba Cloud's image warehouse configuration.

ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.8.0"
ROOK_CSI_REGISTRAR_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.7.0"
ROOK_CSI_RESIZER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-resizer:v1.7.0"
ROOK_CSI_PROVISIONER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-provisioner:v3.4.0"
ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-snapshotter:v6.2.1"
ROOK_CSI_ATTACHER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-attacher:v4.1.0"

Execute deployment

# 开始部署
cd /data/rook-ceph
kubectl create -f crds.yaml
kubectl create -f common.yaml
kubectl create -f operator.yaml
# 检查operator的创建运行状态
kubectl -n rook-ceph get pod
# 输出
NAME                                 READY   STATUS    RESTARTS   AGE
rook-ceph-operator-585f6875d-qjhdn   1/1     Running   0          4m36s

2.5 Create ceph cluster

Modify cluster.yaml and configure the data disk corresponding to osd. The modified part is intercepted here, mainly the configuration of nodes. Here, I directly configured the data disks of the three nodes according to the existing hard disks of the cluster. No rule discovery or all discovery is used.

  priorityClassNames:
    mon: system-node-critical
    osd: system-node-critical
    mgr: system-cluster-critical
    useAllNodes: false
    useAllDevices: false
    deviceFilter: 
    config:
    nodes:
      - name: "kubeworker01"
        devices: 
          - name: "vda" 
          - name: "vdb" 
      - name: "kubeworker02"
        devices:
          - name: "vda" 
          - name: "vdb" 
      - name: "kubeworker03"
        devices:
          - name: "vda"           
          - name: "vdb"           

Execute deployment cluster-test.yaml

kubectl create -f cluster.yaml
# 会部署一段时间
kubectl -n rook-ceph get pod
# 查看部署结果,当全部为Running之后部署工具容器进行集群确认
NAME                                                     READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-7qk26                                   2/2     Running     0          34m
csi-cephfsplugin-dp8zx                                   2/2     Running     0          34m
csi-cephfsplugin-fb6rh                                   2/2     Running     0          34m
csi-cephfsplugin-provisioner-5549b4bcff-56ntx            5/5     Running     0          34m
csi-cephfsplugin-provisioner-5549b4bcff-m5j76            5/5     Running     0          34m
csi-rbdplugin-d829n                                      2/2     Running     0          34m
csi-rbdplugin-provisioner-bcff85bf9-7thl7                5/5     Running     0          34m
csi-rbdplugin-provisioner-bcff85bf9-cctkc                5/5     Running     0          34m
csi-rbdplugin-rj9wp                                      2/2     Running     0          34m
csi-rbdplugin-zs6s2                                      2/2     Running     0          34m
rook-ceph-crashcollector-kubeworker01-794647548b-bdrcx   1/1     Running     0          91s
rook-ceph-crashcollector-kubeworker02-d97cfb685-ss2sl    1/1     Running     0          86s
rook-ceph-crashcollector-kubeworker03-9d65c8dd8-zrv5x    1/1     Running     0          22m
rook-ceph-mgr-a-6fccb8744f-5zdvf                         3/3     Running     0          23m
rook-ceph-mgr-b-7c4bbbfcf4-fhxm9                         3/3     Running     0          23m
rook-ceph-mon-a-56dc4dfb8d-4j2bz                         2/2     Running     0          34m
rook-ceph-mon-b-7d6d96649b-spz4p                         2/2     Running     0          33m
rook-ceph-mon-c-759c774dc7-8hftq                         2/2     Running     0          28m
rook-ceph-operator-f45db9b9f-knbx4                       1/1     Running     0          2m9s
rook-ceph-osd-0-86cd7776c8-bm764                         2/2     Running     0          91s
rook-ceph-osd-1-7686cf9757-ss9z2                         2/2     Running     0          86s
rook-ceph-osd-2-5bc55847d-g2z6l                          2/2     Running     0          91s
rook-ceph-osd-3-998bccb64-rq9cf                          2/2     Running     0          83s
rook-ceph-osd-4-5c7c7f555b-djdvl                         2/2     Running     0          86s
rook-ceph-osd-5-69976f85fc-9xz94                         2/2     Running     0          83s
rook-ceph-osd-prepare-kubeworker01-qlvcp                 0/1     Completed   0          104s
rook-ceph-osd-prepare-kubeworker02-mnhcj                 0/1     Completed   0          100s
rook-ceph-osd-prepare-kubeworker03-sbk76                 0/1     Completed   0          97s
rook-ceph-tools-598b59df89-77sm7                         1/1     Running     0          7m43s

2.6 Create tool container and check cluster status

# 创建工具容器
kubectl apply -f toolbox.yaml
# 执行命令查询
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s
# 输出:
  cluster:
    id:     3a04d434-a2ac-4f2a-a231-a08ca46c6df3
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 20m)
    mgr: a(active, since 4m), standbys: b
    osd: 6 osds: 6 up (since 3m), 6 in (since 4m)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   521 MiB used, 119 GiB / 120 GiB avail
    pgs:     1 active+clean

# 或是进入工具容器内执行命令
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
# 查看集群状态
bash-4.4$ ceph -s 
  cluster:
    id:     3a04d434-a2ac-4f2a-a231-a08ca46c6df3
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 21m)
    mgr: a(active, since 5m), standbys: b
    osd: 6 osds: 6 up (since 5m), 6 in (since 5m)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   521 MiB used, 119 GiB / 120 GiB avail
    pgs:     1 active+clean

2.7 Prepare nodeport port mapping service for dashboard

cat > /data/rook-ceph/dashboard-external-https.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  name: rook-ceph-mgr-dashboard-external-https
  namespace: rook-ceph
  labels:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph
spec:
  ports:
  - name: dashboard
    port: 8443
    protocol: TCP
    targetPort: 8443
    nodePort: 30808
  selector:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph
  sessionAffinity: None
  type: NodePort

EOF

# 这里的nodeport端口建议更换为适合自己环境规划的端口
kubectl apply -f dashboard-external-https.yaml
# 输出
service/rook-ceph-mgr-dashboard-external-https created
# 获取admin用户密码
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

Use the browser to access port 192.168.150.61:30808 and log in as the admin user. After logging in, you can change the password or create a new user.
Insert image description here

Successfully logged in
Insert image description here

2.8 Prepare prometheus’ metric port mapping service

cat > /data/rook-ceph/metric-external-https.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  name: rook-ceph-mgr-metric-external-https
  namespace: rook-ceph
  labels:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph
spec:
  ports:
  - name: metric
    port: 9283
    protocol: TCP
    targetPort: 9283
    nodePort: 30809
  selector:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph
  sessionAffinity: None
  type: NodePort
EOF

# 这里的nodeport端口建议更换为适合自己环境规划的端口
kubectl apply -f metric-external-https.yaml
# 输出
service/rook-ceph-mgr-metric-external-https created

Use your browser to access port 192.168.150.61:30809
Insert image description here

3. Create storage class

3.1 Create cephrbd storage class

kubectl apply -f  storageclass-rbd.yaml

Create pvc for testing

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-rbd-pv-claim
spec:
  storageClassName: rook-ceph-block
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10G

3.2 Create cephfs storage class

kubectl apply -f filesystem.yaml
kubectl apply  -f storageclass-cephfs.yaml
# 创建filesystem.yaml之后会生成rook-ceph-mds-myfs的工作负载
kubectl -n rook-ceph get pod |grep mds
# 输出
rook-ceph-mds-myfs-a-5d5754b77-nlcb9                     2/2     Running     0               97s
rook-ceph-mds-myfs-b-9f9dd7f6-sc6qm                      2/2     Running     0               96s
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-cephfs-pv-claim
spec:
  storageClassName: rook-cephfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10G

3.3 View creation results

Query storage

kubectl get storageclass
# 输出
NAME              PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local (default)   openebs.io/local                Delete          WaitForFirstConsumer   false                  21d
rook-ceph-block   rook-ceph.rbd.csi.ceph.com      Delete          Immediate              true                   4s
rook-cephfs       rook-ceph.cephfs.csi.ceph.com   Delete          Immediate              true                   4m44s

Query pvc

kubectl  get pvc -o wide
# 输出
NAME                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE   VOLUMEMODE
test-cephfs-pv-claim   Bound    pvc-3cdd9e88-2ae2-4e23-9f23-13e095707964   10Gi       RWX            rook-cephfs       7s    Filesystem
test-rbd-pv-claim      Bound    pvc-55a57b74-b595-4726-8b82-5257fd2d279a   10Gi       RWO            rook-ceph-block   6s    Filesystem


Replenish

Debugging mode

When the cluster is not successfully deployed as expected, you can execute the following command to redeploy or check the cause of the exception.

# 重启ceph operator调度,重新部署
kubectl rollout restart deploy rook-ceph-operator -n rook-ceph
#注:如果新osd pod无法执行起来可以通过查询osd prepare日志找问题
kubectl -n rook-ceph logs rook-ceph-osd-prepare-nodeX-XXXXX provision
#查看状态
kubectl -n rook-ceph get CephCluster -o yaml

Experiments on using raw partitions

I used a virtual machine to test the raw partition as the storage medium for OSD, and it can also run.

  priorityClassNames:
    mon: system-node-critical
    osd: system-node-critical
    mgr: system-cluster-critical
    useAllNodes: false
    useAllDevices: false
    deviceFilter: 
    config:
    nodes:
    - name: "kubeworker01"
      devices: 
        - name: "vda6" 
    - name: "kubeworker02"
      devices:
        - name: "vda6"  
    - name: "kubeworker03"
      devices:
        - name: "vda6"                     

Summarize

The official cluster version is still very convenient to use. We will test some disk adding and subtracting actions later to simulate the problems that may be encountered in daily life. Ceph is still very useful even for beginners.

Guess you like

Origin blog.csdn.net/baidu_35848778/article/details/131438109