Article directory
- Preface
- 1. What is rook-ceph?
- 2. Start deployment
-
- 2.1 Environment preparation
- 2.2 Software package preparation, computing\storage node execution
- 2.3 Download rook-ceph file
- 2.4 Deploy operator
- 2.5 Create ceph cluster
- 2.6 Create tool container and check cluster status
- 2.7 Prepare nodeport port mapping service for dashboard
- 2.8 Prepare prometheus’ metric port mapping service
- 3. Create storage class
- Replenish
- Summarize
Preface
This experiment recorded a rook-ceph cluster using three computing\storage nodes built using a virtual machine to simulate real usage scenarios. Supplementary: The previous post only has a single point deployment method.
Link:Deploy a single-point version of rook-ceph in the k8s environment based on kubesphere
1. What is rook-ceph?
Rook is an open source cloud-native storage orchestrator, providing the platform, framework, and support for Ceph storage to natively integrate with cloud-native environments.
Rook is an open source A cloud-native storage orchestrator that provides the platform, framework, and support for Ceph storage to natively integrate with cloud-native environments.
2. Start deployment
2.1 Environment preparation
Number of virtual machines: four
Virtual machine image type: CentOS-7-x86_64-Minimal-2009.iso
k8s environment: one set
k8s environment:1.23.6
The machine list is as follows
hostname | IP | system disk | data disk |
---|---|---|---|
cubeadmin | 192.168.150.61 | now (50G) | none |
kubeworker01 | 192.168.150.62 | sda(20G) | vda(20G),vdb(20G) |
kubeworker02 | 192.168.150.63 | sda(20G) | vda(20G),vdb(20G) |
kubeworker03 | 192.168.150.64 | sda(20G) | vda(20G),vdb(20G) |
2.2 Software package preparation, computing\storage node execution
Install the software package and load the rbd module
#软件包装备
yum install -y git lvm2 gdisk
#内核加载rbd模块
modprobe rbd
lsmod | grep rbd
Note: Delete residual data. If the deployment fails, be sure to clean up the data. Failure to do so will affect the next deployment.
删除配置文件目录
rm -rf /var/lib/rook/
格式化磁盘
gdisk --zap-all /dev/vda
gdisk --zap-all /dev/vdb
dd if=/dev/zero of=/dev/vda bs=1M count=100 oflag=direct,dsync
dd if=/dev/zero of=/dev/vdb bs=1M count=100 oflag=direct,dsync
2.3 Download rook-ceph file
Download the file and extract the core files to your own deployment folder
cd /data/
yum install -y git
git clone --single-branch --branch v1.11.6 https://github.com/rook/rook.git
# 提取部署文件
mkdir -p /data/rook-ceph/
cp /data/rook/deploy/examples/crds.yaml /data/rook-ceph/crds.yaml
cp /data/rook/deploy/examples/common.yaml /data/rook-ceph/common.yaml
cp /data/rook/deploy/examples/operator.yaml /data/rook-ceph/operator.yaml
cp /data/rook/deploy/examples/cluster.yaml /data/rook-ceph/cluster.yaml
cp /data/rook/deploy/examples/filesystem.yaml /data/rook-ceph/filesystem.yaml
cp /data/rook/deploy/examples/toolbox.yaml /data/rook-ceph/toolbox.yaml
cp /data/rook/deploy/examples/csi/rbd/storageclass.yaml /data/rook-ceph/storageclass-rbd.yaml
cp /data/rook/deploy/examples/csi/cephfs/storageclass.yaml /data/rook-ceph/storageclass-cephfs.yaml
cp /data/rook/deploy/examples/csi/nfs/storageclass.yaml /data/rook-ceph/storageclass-nfs.yaml
cd /data/rook-ceph
2.4 Deploy operator
Modify the image warehouse information and change the image warehouse in operator.yaml to Alibaba Cloud's image warehouse configuration.
ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.8.0"
ROOK_CSI_REGISTRAR_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.7.0"
ROOK_CSI_RESIZER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-resizer:v1.7.0"
ROOK_CSI_PROVISIONER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-provisioner:v3.4.0"
ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-snapshotter:v6.2.1"
ROOK_CSI_ATTACHER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-attacher:v4.1.0"
Execute deployment
# 开始部署
cd /data/rook-ceph
kubectl create -f crds.yaml
kubectl create -f common.yaml
kubectl create -f operator.yaml
# 检查operator的创建运行状态
kubectl -n rook-ceph get pod
# 输出
NAME READY STATUS RESTARTS AGE
rook-ceph-operator-585f6875d-qjhdn 1/1 Running 0 4m36s
2.5 Create ceph cluster
Modify cluster.yaml and configure the data disk corresponding to osd. The modified part is intercepted here, mainly the configuration of nodes. Here, I directly configured the data disks of the three nodes according to the existing hard disks of the cluster. No rule discovery or all discovery is used.
priorityClassNames:
mon: system-node-critical
osd: system-node-critical
mgr: system-cluster-critical
useAllNodes: false
useAllDevices: false
deviceFilter:
config:
nodes:
- name: "kubeworker01"
devices:
- name: "vda"
- name: "vdb"
- name: "kubeworker02"
devices:
- name: "vda"
- name: "vdb"
- name: "kubeworker03"
devices:
- name: "vda"
- name: "vdb"
Execute deployment cluster-test.yaml
kubectl create -f cluster.yaml
# 会部署一段时间
kubectl -n rook-ceph get pod
# 查看部署结果,当全部为Running之后部署工具容器进行集群确认
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-7qk26 2/2 Running 0 34m
csi-cephfsplugin-dp8zx 2/2 Running 0 34m
csi-cephfsplugin-fb6rh 2/2 Running 0 34m
csi-cephfsplugin-provisioner-5549b4bcff-56ntx 5/5 Running 0 34m
csi-cephfsplugin-provisioner-5549b4bcff-m5j76 5/5 Running 0 34m
csi-rbdplugin-d829n 2/2 Running 0 34m
csi-rbdplugin-provisioner-bcff85bf9-7thl7 5/5 Running 0 34m
csi-rbdplugin-provisioner-bcff85bf9-cctkc 5/5 Running 0 34m
csi-rbdplugin-rj9wp 2/2 Running 0 34m
csi-rbdplugin-zs6s2 2/2 Running 0 34m
rook-ceph-crashcollector-kubeworker01-794647548b-bdrcx 1/1 Running 0 91s
rook-ceph-crashcollector-kubeworker02-d97cfb685-ss2sl 1/1 Running 0 86s
rook-ceph-crashcollector-kubeworker03-9d65c8dd8-zrv5x 1/1 Running 0 22m
rook-ceph-mgr-a-6fccb8744f-5zdvf 3/3 Running 0 23m
rook-ceph-mgr-b-7c4bbbfcf4-fhxm9 3/3 Running 0 23m
rook-ceph-mon-a-56dc4dfb8d-4j2bz 2/2 Running 0 34m
rook-ceph-mon-b-7d6d96649b-spz4p 2/2 Running 0 33m
rook-ceph-mon-c-759c774dc7-8hftq 2/2 Running 0 28m
rook-ceph-operator-f45db9b9f-knbx4 1/1 Running 0 2m9s
rook-ceph-osd-0-86cd7776c8-bm764 2/2 Running 0 91s
rook-ceph-osd-1-7686cf9757-ss9z2 2/2 Running 0 86s
rook-ceph-osd-2-5bc55847d-g2z6l 2/2 Running 0 91s
rook-ceph-osd-3-998bccb64-rq9cf 2/2 Running 0 83s
rook-ceph-osd-4-5c7c7f555b-djdvl 2/2 Running 0 86s
rook-ceph-osd-5-69976f85fc-9xz94 2/2 Running 0 83s
rook-ceph-osd-prepare-kubeworker01-qlvcp 0/1 Completed 0 104s
rook-ceph-osd-prepare-kubeworker02-mnhcj 0/1 Completed 0 100s
rook-ceph-osd-prepare-kubeworker03-sbk76 0/1 Completed 0 97s
rook-ceph-tools-598b59df89-77sm7 1/1 Running 0 7m43s
2.6 Create tool container and check cluster status
# 创建工具容器
kubectl apply -f toolbox.yaml
# 执行命令查询
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s
# 输出:
cluster:
id: 3a04d434-a2ac-4f2a-a231-a08ca46c6df3
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 20m)
mgr: a(active, since 4m), standbys: b
osd: 6 osds: 6 up (since 3m), 6 in (since 4m)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 449 KiB
usage: 521 MiB used, 119 GiB / 120 GiB avail
pgs: 1 active+clean
# 或是进入工具容器内执行命令
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
# 查看集群状态
bash-4.4$ ceph -s
cluster:
id: 3a04d434-a2ac-4f2a-a231-a08ca46c6df3
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 21m)
mgr: a(active, since 5m), standbys: b
osd: 6 osds: 6 up (since 5m), 6 in (since 5m)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 449 KiB
usage: 521 MiB used, 119 GiB / 120 GiB avail
pgs: 1 active+clean
2.7 Prepare nodeport port mapping service for dashboard
cat > /data/rook-ceph/dashboard-external-https.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
name: rook-ceph-mgr-dashboard-external-https
namespace: rook-ceph
labels:
app: rook-ceph-mgr
rook_cluster: rook-ceph
spec:
ports:
- name: dashboard
port: 8443
protocol: TCP
targetPort: 8443
nodePort: 30808
selector:
app: rook-ceph-mgr
rook_cluster: rook-ceph
sessionAffinity: None
type: NodePort
EOF
# 这里的nodeport端口建议更换为适合自己环境规划的端口
kubectl apply -f dashboard-external-https.yaml
# 输出
service/rook-ceph-mgr-dashboard-external-https created
# 获取admin用户密码
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
Use the browser to access port 192.168.150.61:30808 and log in as the admin user. After logging in, you can change the password or create a new user.
Successfully logged in
2.8 Prepare prometheus’ metric port mapping service
cat > /data/rook-ceph/metric-external-https.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
name: rook-ceph-mgr-metric-external-https
namespace: rook-ceph
labels:
app: rook-ceph-mgr
rook_cluster: rook-ceph
spec:
ports:
- name: metric
port: 9283
protocol: TCP
targetPort: 9283
nodePort: 30809
selector:
app: rook-ceph-mgr
rook_cluster: rook-ceph
sessionAffinity: None
type: NodePort
EOF
# 这里的nodeport端口建议更换为适合自己环境规划的端口
kubectl apply -f metric-external-https.yaml
# 输出
service/rook-ceph-mgr-metric-external-https created
Use your browser to access port 192.168.150.61:30809
3. Create storage class
3.1 Create cephrbd storage class
kubectl apply -f storageclass-rbd.yaml
Create pvc for testing
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-rbd-pv-claim
spec:
storageClassName: rook-ceph-block
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10G
3.2 Create cephfs storage class
kubectl apply -f filesystem.yaml
kubectl apply -f storageclass-cephfs.yaml
# 创建filesystem.yaml之后会生成rook-ceph-mds-myfs的工作负载
kubectl -n rook-ceph get pod |grep mds
# 输出
rook-ceph-mds-myfs-a-5d5754b77-nlcb9 2/2 Running 0 97s
rook-ceph-mds-myfs-b-9f9dd7f6-sc6qm 2/2 Running 0 96s
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-cephfs-pv-claim
spec:
storageClassName: rook-cephfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10G
3.3 View creation results
Query storage
kubectl get storageclass
# 输出
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local (default) openebs.io/local Delete WaitForFirstConsumer false 21d
rook-ceph-block rook-ceph.rbd.csi.ceph.com Delete Immediate true 4s
rook-cephfs rook-ceph.cephfs.csi.ceph.com Delete Immediate true 4m44s
Query pvc
kubectl get pvc -o wide
# 输出
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE
test-cephfs-pv-claim Bound pvc-3cdd9e88-2ae2-4e23-9f23-13e095707964 10Gi RWX rook-cephfs 7s Filesystem
test-rbd-pv-claim Bound pvc-55a57b74-b595-4726-8b82-5257fd2d279a 10Gi RWO rook-ceph-block 6s Filesystem
Replenish
Debugging mode
When the cluster is not successfully deployed as expected, you can execute the following command to redeploy or check the cause of the exception.
# 重启ceph operator调度,重新部署
kubectl rollout restart deploy rook-ceph-operator -n rook-ceph
#注:如果新osd pod无法执行起来可以通过查询osd prepare日志找问题
kubectl -n rook-ceph logs rook-ceph-osd-prepare-nodeX-XXXXX provision
#查看状态
kubectl -n rook-ceph get CephCluster -o yaml
Experiments on using raw partitions
I used a virtual machine to test the raw partition as the storage medium for OSD, and it can also run.
priorityClassNames:
mon: system-node-critical
osd: system-node-critical
mgr: system-cluster-critical
useAllNodes: false
useAllDevices: false
deviceFilter:
config:
nodes:
- name: "kubeworker01"
devices:
- name: "vda6"
- name: "kubeworker02"
devices:
- name: "vda6"
- name: "kubeworker03"
devices:
- name: "vda6"
Summarize
The official cluster version is still very convenient to use. We will test some disk adding and subtracting actions later to simulate the problems that may be encountered in daily life. Ceph is still very useful even for beginners.