Cloud native in-depth analysis of the environment deployment and actual operation of the distributed storage system Ceph

1. Introduction to Ceph

① What is Ceph?

  • Ceph is currently a very popular open source distributed storage system. It has the advantages of high scalability, high performance, and high reliability. It also provides block storage services (rbd), object storage services (rgw) and file system storage services (cephfs). When storing, Ceph makes full use of the computing power of storage nodes. When storing each data, it calculates the location of the data and tries to distribute it as balanced as possible.
  • Currently, Ceph is also the mainstream backend storage of OpenStack.

insert image description here

② Features of Ceph

  • high performance:
    • Abandoning the traditional centralized storage metadata addressing scheme, using the CRUSH algorithm, the data distribution is balanced, and the degree of parallelism is high;
    • Considering the isolation of disaster recovery domains, it is possible to implement copy placement rules for various loads, such as cross-computer room, rack awareness, etc.;
    • It can support the scale of thousands of storage nodes and support TB to PB level data.
  • High availability:
    • The number of copies can be flexibly controlled;
    • Support fault domain separation and strong data consistency;
    • Automatic repair and self-healing in various fault scenarios;
    • No single point of failure, managed automatically.
  • High scalability:
    • decentralization;
    • Flexible expansion;
    • It grows linearly with the number of nodes.
  • Rich in features:
    • Supports three storage interfaces: block storage, file storage, and object storage;
    • Supports custom interfaces and multiple language drivers.

③ Ceph architecture

  • Ceph supports three interfaces:
    • Object: has a native API, and is also compatible with Swift and S3 APIs;
    • Block: supports thin provisioning, snapshots, and clones;
    • File: Posix interface, supports snapshots.

insert image description here

  • illustrate:
    • RADOS: The full name is Reliable Autonomic Distributed Object Store, that is, a reliable, automated, and distributed object storage system. RADOS is the essence of Ceph clusters, and users can implement cluster operations such as data allocation and failover;
    • Librados: Rados provides libraries, because RADOS is a protocol that is difficult to access directly, so the upper layer RBD, RGW, and CephFS are all accessed through librados, which currently provides support for PHP, Ruby, Java, Python, C, and C++.
    • MDS: Stores metadata for the Ceph file system.

④ Ceph core components

insert image description here

  • OSD is the process responsible for physical storage. It is generally configured in one-to-one correspondence with disks. A disk starts an OSD process. Its main functions are to store data, copy data, balance data, restore data, and perform heartbeat checks with other OSDs. Responsible for response The client requests to return the process of specific data, etc. The OSD is the only component in the Ceph cluster that stores actual user data, and usually an OSD daemon is bound to a physical disk in the cluster. Therefore, generally speaking, the total number of physical disks in a Ceph cluster is the same as the total number of OSD daemons storing user data on each physical disk.
  • Ceph introduces the concept of PG (placement group). PG is a virtual concept and does not correspond to any entity. Ceph first maps objects to PGs, and then maps them to OSDs from PGs.

insert image description here

  • Pool is a logical partition of storage objects. It specifies the type of data redundancy and the corresponding copy distribution strategy. It supports two types: replicated and erasure code.
  • The relationship between Pool, PG and OSD is as follows:
    • There are many PGs in a Pool;
    • A PG contains a bunch of objects, and an object can only belong to one PG;
  • PG has a master-slave distinction, and a PG is distributed on different OSDs (for three-copy types);
  • Monitor monitoring: A Ceph cluster needs a small cluster composed of multiple Monitors. They synchronize data through Paxos to save OSD metadata and are responsible for monitoring the Map views (such as OSD Map, Monitor Map, PG Map and CRUSH) running on the entire Ceph cluster. Map), maintain the health status of the cluster, maintain various charts showing the status of the cluster, and manage cluster client authentication and authorization.
  • The full name of MDS is Ceph Metadata Server. It is a metadata service that CephFS services depend on. It is responsible for saving the metadata of the file system and managing the directory structure. Object storage and block device storage do not require metadata services; if you are not using CephFS, you can not install it.
  • Mgr: ceph officially developed ceph-mgr, the main goal is to realize the management of ceph cluster and provide a unified entrance for the outside world, such as cephmetrics, zabbix, calamari, prometheus. The Ceph manager daemon (ceph-mgr) was introduced in the Kraken release and runs alongside the monitor daemon to provide additional monitoring and interfaces to external monitoring and management systems.
  • The full name of RGW is RADOS gateway, which is an object storage service provided by Ceph. The interface is compatible with S3 and Swift.
  • CephFS: The ceph file system provides a posix-compliant file system that uses the Ceph storage cluster to store user data on the file system. Like RBD (block storage) and RGW (object storage), the CephFS service is also native to librados interface implementation.

⑤ Learn more

Two, Ceph storage type

① Block storage service (RBD)

  • A block is a sequence of bytes (usually 512), and a block-based storage interface is a mature and common way of storing data on media including hard disks, solid-state drives, CDs, floppy disks, and even tapes. The ubiquity of the block device interface is ideal for interacting with massive data stores including Ceph, which is thin-provisioned, resizable, and stores data in stripes across multiple OSDs.

insert image description here

  • advantage:
    • Data protection is provided through means such as Raid and LVM;
    • Multiple cheap hard drives are combined to increase capacity;
    • A logical disk combined with multiple disks improves read and write efficiency;
  • shortcoming:
    • When using SAN architecture networking, the cost of fiber optic switches is high;
    • Data cannot be shared between hosts;
  • scenes to be used:
    • docker container, virtual machine disk storage allocation;
    • log storage;
    • file storage;
  • A Linux kernel-level block device that allows users to access Ceph like any other Linux block device.

② File system storage service (CephFS)

  • Ceph File System (CephFS), is built on top of Ceph distributed object storage, CephFS provides the most advanced, multi-purpose, highly available and high-performance file storage in various scenarios, including shared home directory, FTP and NFS shared storage etc.

insert image description here

  • Ceph has block storage, why do we need a file system interface? Mainly because of different application scenarios, Ceph's block devices have excellent read and write performance, but they cannot be mounted in multiple places at the same time. Currently, they are mainly used as virtual disks on OpenStack, while Ceph's file system interface has relatively high read and write performance. The device interface is poor, but it has excellent sharing.
  • advantage:
    • Low cost, just any machine;
    • Easy file sharing.
  • shortcoming:
    • Low read and write speed;
    • slow transfer rate;
  • scenes to be used:
    • log storage;
    • File storage with directory structure.

③ Object storage service (RGW)

  • The Ceph Object Gateway is built on librados, which provides a RESTful gateway between applications and Ceph storage clusters. Ceph Object Storage supports two interfaces:
    • S3 Compatible: Provides object storage functionality through the interface and is compatible with most subsets of the Amazon S3 RESTful API;
    • Swift Compatibility: Provides object storage functionality through an interface compatible with a large subset of the OpenStack Swift API.

insert image description here

  • advantage:
    • High read and write speed with block storage;
    • It has features such as file storage sharing.
  • scenes to be used:
    • image storage;
    • video storage.

3. Ceph cluster deployment

① Ceph deployment tools

  • ceph-deploy: The official deployment tool, ceph-deploy is no longer actively maintained, it does not support RHEL8, CentOS 8 or newer operating systems;
  • ceph-ansible: Red Hat's deployment tool;
  • ceph-chef: A tool for automatically deploying Ceph using chef;
  • puppet-ceph: the ceph module of puppet;
  • cephadm: Only Octopus and newer are supported (recommended).

② Cluster deployment planning

IP hostname Role disk operating system
192.168.182.130 local-168-182-130 monitor,mgr,rgw,mds,osd 2*20G centos7
192.168.182.131 local-168-182-131 monitor,mgr,rgw,mds,osd 2*20G centos7
192.168.182.132 local-168-182-132 monitor,mgr,rgw,mds,osd 2*20G centos7
  • monitor: Ceph monitors the management nodes and undertakes important management tasks of the Ceph cluster. Generally, 3 or 5 nodes are required.
  • mgr: Ceph cluster management node (manager), which provides a unified entrance for the outside world.
  • rgw: Ceph Object Gateway, a service that enables clients to access a Ceph cluster using standard object storage APIs.
  • mds: Ceph metadata server, MetaData Server, mainly saves the metadata of the file system service, and this component is only required when using file storage.
  • osd: Ceph storage node Object Storage Daemon, the node actually responsible for data storage.

③ Preliminary preparation

  • Close the filewalld service:
systemctl stop firewalld.service
systemctl disable firewalld.service
  • Turn off and disable SELinux:
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config
setenforce 0
  • Configure hosts:
192.168.182.130 local-168-182-130
192.168.182.131 local-168-182-131
192.168.182.132 local-168-182-132
  • ssh password-free configuration:
ssh-keygen
# ...一路Enter
ssh-copy-id root@local-168-182-130  //会有一次密码输入
ssh-copy-id root@local-168-182-131
ssh-copy-id root@local-168-182-132
  • Configure time synchronization:
yum install -y chrony
systemctl enable --now chronyd

④ Add disk

  • If you cannot see the disk after adding it, you can execute the following command:
# 重新扫描SCSI总线添加设备
echo "- - -" > /sys/class/scsi_host/host0/scan
echo "- - -" > /sys/class/scsi_host/host1/scan
echo "- - -" > /sys/class/scsi_host/host2/scan

insert image description here

⑤ Install docker (all node operations, including adding)

# centos7
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo

# 安装yum-config-manager配置工具
yum -y install yum-utils
# 设置yum源
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# 安装docker-ce版本
yum install -y docker-ce

#启动docker服务并开机自启
systemctl enable --now docker

# 查看版本号
docker --version
# 查看版本具体信息
docker version

# Docker镜像源设置
# 修改文件 /etc/docker/daemon.json,没有这个文件就创建
# 添加以下内容后,重启docker服务:
cat >/etc/docker/daemon.json<<EOF
{
    
    
   "registry-mirrors": ["http://hub-mirror.c.163.com"]
}
EOF

systemctl restart docker

⑥ install cephadm

  • Download the cephadm script (only works on the master node):
mkdir -p /opt/ceph/my-cluster ; cd /opt/ceph/my-cluster
curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm  -o cephadm
chmod +x cephadm
# 或者:
#curl https://raw.githubusercontent.com/ceph/ceph/v15.2.1/src/cephadm/cephadm -o cephadm
#chmod +x cephadm
#//**//下载出错:配置/etc/hosts 文件--—> 199.232.28.133 raw.githubusercontent.com

# 安装python3:(所有节点执行)
yum install python3 -y

# 配置ceph存储库  :(或指定版本)
./cephadm add-repo --release octopus
#或
#./cephadm add-repo --version 15.2.1

# 开始安装ceph-common,ceph工具
./cephadm install ceph-common ceph
# 安装cephadm工具
./cephadm install
which cephadm
which ceph

⑦ Initialize the ceph cluster

  • The current node installs mon and mgr roles, and deploys prometheus, grafana, alertmanager, node-exporter and other services:
#先安装一个节点,其它节点通过后面的命令添加到集群中即可
#您需要知道用于集群的第一个监视器守护程序的IP地址。
#如果有多个网络和接口,要确保选择任何可供访问Ceph群集的主机访问的网络和接口。

cephadm bootstrap --mon-ip 192.168.182.130

##### 命令特点:

#在本地主机上为新集群创建监视和管理器守护程序。
#为Ceph集群生成一个新的SSH密钥,并将其添加到root用户的/root/.ssh/authorized_keys文件中。
#将与新集群进行通信所需的最小配置文件编写为/etc/ceph/ceph.conf。
#将client.admin管理密钥的副本写入/etc/ceph/ceph.client.admin.keyring。
#将公用密钥的副本写入 /etc/ceph/ceph.pub。

# 查看部署的服务
docker ps

#=======输出信息=======
Ceph Dashboard is now available at:

             URL: https://local-168-182-130:8443/
            User: admin
        Password: 0ard2l57ji

You can access the Ceph CLI with:

        sudo /usr/sbin/cephadm shell --fsid d1e9b986-89b8-11ed-bec2-000c29ca76a9 -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring

Please consider enabling telemetry to help improve Ceph:

        ceph telemetry on

For more information see:

        https://docs.ceph.com/docs/master/mgr/telemetry/

insert image description here

  • According to the prompt, there is a web address: https://ip:8443/, the screenshot here is the screenshot after deployment:

insert image description here

  • View the cluster status through the ceph command:
ceph -s

insert image description here

⑧ Add new node

  • Install the cluster's public SSH key in the new host's root user authorized_keys file:
ssh-copy-id -f -i /etc/ceph/ceph.pub root@local-168-182-131
ssh-copy-id -f -i /etc/ceph/ceph.pub root@local-168-182-132

insert image description here

  • Configure the new node:
ceph orch host add local-168-182-131
ceph orch host add local-168-182-132

#第一次部署新节点时直接用上边的命令即可:
#但是之后的节点新增有可能上述命令出错:
ceph orch host add local-168-182-131 192.168.182.133  #后边跟上对应的IP

# 查看节点
ceph orch host ls

insert image description here

⑨ Deployment monitor (monitor)

# ceph orch apply mon *<number-of-monitors>*
# 确保在此列表中包括第一台(引导)主机。
ceph orch apply mon local-168-182-130,local-168-182-131,local-168-182-132

⑩ deploy osd

  • A list of storage devices can be displayed as:
ceph orch device ls

insert image description here

  • Conditions for available storage devices:
    • The device must have no partitions;
    • The device must not have any LVM state;
    • not install equipment;
    • The device must not contain a file system;
    • The device must not contain a Ceph BlueStore OSD;
    • Device must be larger than 5 GB.
  • The way to create osd:
# 【第一种方式】告诉Ceph使用任何可用和未使用的存储设备:
ceph orch apply osd --all-available-devices

# 【第二种方式】或者使用下面命令指定使用的磁盘(推荐)
#1. ceph orch daemon add osd *<host>*:*<device-path>*
#例如:
#从特定主机上的特定设备创建OSD:
ceph orch daemon add osd local-168-182-130:/dev/sdb
ceph orch daemon add osd local-168-182-130:/dev/sdc

ceph orch daemon add osd local-168-182-131:/dev/sdb
ceph orch daemon add osd local-168-182-131:/dev/sdc

ceph orch daemon add osd local-168-182-132:/dev/sdb
ceph orch daemon add osd local-168-182-132:/dev/sdc
  • Delete an OSD node:
#1.停止osd进程
ceph osd stop x  //(x 可以通过ceph osd ls 查看)
#停止osd的进程,这个是通知集群这个osd进程不在了,不提供服务了,因为本身没权重,就不会影响到整体的分布,也就没有迁移
#2.将节点状态标记为out
ceph osd out osd.x
#停止到osd的进程,这个是通知集群这个osd不再映射数据了,不提供服务了,因为本身没权重,就不会影响到整体的分布,也就没有迁移
#3. 从crush中移除节点
ceph osd crush remove osd.x
这个是从crush中删除,
#4. 删除节点
ceph osd rm osd.x
这个是从集群里面删除这个节点的记录ls
#5. 删除节点认证(不删除编号会占住)
ceph auth del osd.x
这个是从认证当中去删除这个节点的信息

⑪ Deploy mds (cephFS metadata daemon)

# ceph orch apply mds *<fs-name>* --placement="*<num-daemons>* [*<host1>* ...]"

ceph orch apply mds myfs --placement="3 local-168-182-130 local-168-182-131 local-168-182-132"

⑫ Deploy RGW

# 为特定领域和区域部署一组radosgw守护程序:
# ceph orch apply rgw *<realm-name>* *<zone-name>* --placement="*<num-daemons>* [*<host1>* ...]"

ceph orch apply rgw myorg us-east-1 --placement="3 local-168-182-130 local-168-182-131 local-168-182-132"

###说明:
#myorg : 领域名  (realm-name)
#us-east-1: 区域名 (zone-name)myrgw

#Cephadm将等待运行正常的群集,并在部署rgw守护程序之前自动创建所提供的领域和区域(realm-name和zone-name不存在的情况)
  • View cluster status:
ceph -s

insert image description here

⑬ Add a dedicated ceph-mgr node

# ceph-mgr节点默认会挑选一主一备
# 添加新节点
ceph orch host add local-168-182-131 192.168.182.133

# 部署ceph-mgr
ceph orch apply mgr local-168-182-130,local-168-182-131,local-168-182-132

# ceph orch apply mgr local-168-182-130,local-168-182-131,local-168-182-132,local-168-182-133

insert image description here

Fourth, the use of cephadm tools

① Introduction to cephadm tool

  • Cephadm is a utility or management tool for managing Ceph clusters. The goal of Cephadm is to provide a fully functional, robust, and well-maintained installation and management layer that can be used by any environment that does not run Ceph in Kubernetes.
  • The specific features of Cephadm are as follows:
    • Deploy all components in containers : using containers simplifies dependencies and packaging complexity between different distributions, RPM and Deb packages are of course still built, but as more and more users transition to cephadm (or Rook ) and build containers, the less OS-specific bugs you can see;
    • Tight integration with the Orchestrator API : Ceph's Orchestrator interface was developed extensively during the development of cephadm to match the implementation and clearly abstract away the (slightly different) functionality present in Rook, the end result is both look and feel Both are like part of Ceph;
    • No dependencies on management tools : Tools like Salt and Ansible are great for large-scale deployments in large environments, but making Ceph dependent on such tools means that users also need to learn the associated software. What's more, deployments that rely on these tools (Salt, Ansible, etc.) can end up being more complex, harder to debug and (most notably) slower than deployment tools specifically designed to manage Ceph;
    • Minimal operating system dependencies : Cephadm requires Python 3, LVM and container runtime (Podman or Docker), any current Linux distribution will do;
    • Isolate clusters from each other : Supporting multiple Ceph clusters co-existing on the same host has always been a relatively niche scenario, but it does exist, and isolating clusters from each other in a robust, general way that makes testing and redeploying clusters A safe and natural process for both developers and users;
    • Automatic upgrades : Once Ceph "owns" its deployment, it can upgrade Ceph in a safe and automated manner;
    • Easy Migration from "Legacy" Deployment Tools : Need to easily transition to cephadm from existing Ceph deployments in existing tools such as ceph-ansible, ceph-deploy, and DeepSea.
  • Here is a list of some things cephadm can do:
    • cephadm can add Ceph containers to the cluster;
    • cephadm can remove Ceph containers from the cluster;
    • cephadm can update Ceph containers.

② cephadm installation

mkdir -p /opt/ceph/my-cluster ; cd /opt/ceph/my-cluster
curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm  -o cephadm
chmod +x cephadm

# 开始安装ceph-common,ceph工具
./cephadm install ceph-common ceph
# 安装cephadm工具
./cephadm install
which cephadm
which ceph

# 查看帮助
cephadm --help

③ Use of cephadm common commands

  • Generally, cephadm is used for environment initialization, and other operations are completed by ceph tools. Common commands are as follows:
### 1、配置ceph安装源(或指定版本)
./cephadm add-repo --release octopus
#或
#./cephadm add-repo --version 15.2.1

### 2、集群初始化
cephadm bootstrap --help # 查看帮助
# cephadm bootstrap --mon-ip *<mon-ip>*
cephadm bootstrap --mon-ip 192.168.182.130
  • The cephadm model has a simple "Bootstrap" step that is started from the command line that starts a minimal Ceph cluster (one monitor and manager daemons) on the local host. Then, deploy the rest of the cluster using orchestrator commands to add additional hosts, use storage devices, and deploy daemons for cluster services.

④ Enable ceph shell

  • The cephadm command is generally only used as a guide for deployment, but it is recommended to enable the ceph command, because the ceph command is more concise and powerful:
# 启用ceph shell
cephadm shell
# 这命令在容器中启动 bash shell 并在本机上安装了所有 Ceph 软件包。

# 查看ceph集群状态,非交互式
cephadm shell ceph status
# 或者
cephadm shell ceph -s
  • The ceph-common package can be installed, which contains all Ceph commands, including ceph, rbd, mount.ceph (for mounting the CephFS filesystem), etc.:
cephadm add-repo --release quincy
cephadm install ceph-common
# 当然也只安装ceph命令
cephadm install ceph

Five, ceph command use

① Add a new node

ceph orch host add local-168-182-131
ceph orch host add local-168-182-132

#第一次部署新节点时直接用上边的命令即可:
#但是之后的节点新增有可能上述命令出错:
ceph orch host add local-168-182-131 192.168.182.133  #后边跟上对应的IP

# 查看节点
ceph orch host ls

② Use ceph to install software

### 1、部署监视器(monitor)
# ceph orch apply mon *<number-of-monitors>*
# 确保在此列表中包括第一台(引导)主机。
ceph orch apply mon local-168-182-130,local-168-182-131,local-168-182-132

### 2、部署 osd
# 查看
ceph orch device ls
# 开始部署
# 【第一种方式】告诉Ceph使用任何可用和未使用的存储设备:
ceph orch apply osd --all-available-devices

# 【第二种方式】或者使用下面命令指定使用的磁盘(推荐)
# ceph orch daemon add osd *<host>*:*<device-path>*
#例如:
#从特定主机上的特定设备创建OSD:
ceph orch daemon add osd local-168-182-130:/dev/sdb
ceph orch daemon add osd local-168-182-130:/dev/sdc

ceph orch daemon add osd local-168-182-131:/dev/sdb
ceph orch daemon add osd local-168-182-131:/dev/sdc

ceph orch daemon add osd local-168-182-132:/dev/sdb
ceph orch daemon add osd local-168-182-132:/dev/sdc

### 3、部署mds
# ceph orch apply mds *<fs-name>* --placement="*<num-daemons>* [*<host1>* ...]"
ceph orch apply mds myfs --placement="3 local-168-182-130 local-168-182-131 local-168-182-132"

### 4、部署RGW
# 为特定领域和区域部署一组radosgw守护程序:
# ceph orch apply rgw *<realm-name>* *<zone-name>* --placement="*<num-daemons>* [*<host1>* ...]"
ceph orch apply rgw myorg us-east-1 --placement="3 local-168-182-130 local-168-182-131 local-168-182-132"

###说明:
#myorg : 领域名  (realm-name)
#us-east-1: 区域名 (zone-name)myrgw

### 5、部署ceph-mgr
ceph orch apply mgr local-168-182-130,local-168-182-131,local-168-182-132
  • Delete an OSD node:
### 1.停止osd进程
ceph osd stop x  //(x 可以通过ceph osd ls 查看)
#停止osd的进程,这个是通知集群这个osd进程不在了,不提供服务了,因为本身没权重,就不会影响到整体的分布,也就没有迁移

### 2.将节点状态标记为out
ceph osd out osd.x
#停止到osd的进程,这个是通知集群这个osd不再映射数据了,不提供服务了,因为本身没权重,就不会影响到整体的分布,也就没有迁移

### 3. 从crush中移除节点
ceph osd crush remove osd.x
# 这个是从crush中删除,

### 4. 删除节点
ceph osd rm osd.x
# 这个是从集群里面删除这个节点的记录ls

### 5. 删除节点认证(不删除编号会占住)
ceph auth del osd.x
#这个是从认证当中去删除这个节点的信息

#【注意】
# 比如卸载了node3的某osd,(osd.x 即:node:/dev/sdb),在node3上执行以下操作,可以后继续使用node3:/dev/sdb

#1. lvremove /dev/ceph-3f728c86-8002-47ab-b74a-d00f4cf0fdd2/osd-block-08c6dc02-85d1-4da2-8f71-5499c115cd3c  // dev 后的参数可以通过lsblk查看
#2. vgremove  ceph-3f728c86-8002-47ab-b74a-d00f4cf0fdd2
  • View services:
# 其实可以通过docker ps查看,但是不太直观,所以既然有ceph命令,肯定是用ceph查看更为详细直观了。
ceph orch ps
ceph orch ps --daemon-type alertmanager
ceph orch ps --daemon-type osd
# ceph orch ps --daemon-type [alertmanager|crash|grafana|mds|mgrmon|node-exporter|osd|prometheus|rgw]

insert image description here

③ Host operation

  • List hosts:
# ceph orch host ls [--format yaml] [--host-pattern <name>] [--label <label>] [--host-status <status>]
ceph orch host ls
  • Adding hosts To add each new host to the cluster, follow these steps:
    • The authorized_keys file for the root user on the new host:
# ssh-copy-id -f -i /etc/ceph/ceph.pub root@*<new-host>*
ssh-copy-id -f -i /etc/ceph/ceph.pub root@192.168.182.133
    • Tell Ceph that the new node is part of the cluster:
# ceph orch host add *<newhost>* [*<ip>*] [*<label1> ...*]
ceph orch host add local-168-182-130 192.168.182.130
# 最好显式提供主机 IP 地址。 如果 IP 是 未提供,则主机名将立即通过 将使用该 DNS 和该 IP
    • One or more tags can also be included to immediately tag new hosts:
ceph orch host add local-168-182-130 192.168.182.130 --labels _admin
  • Delete the host. After removing all daemons, it is safe to remove the host from the cluster from it:
    • To drain all daemons from a host, run a command of the following form:
# ceph orch host drain *<host>*
ceph orch host drain local-168-182-130

#将计划删除主机上的所有 osd。您可以通过以下方式检查 osd 删除进度:
ceph orch osd rm status

# 可以使用以下命令检查主机上是否没有守护程序:
# ceph orch ps <host>
ceph orch ps local-168-182-130
    • Once all daemons are removed, the host can be removed with the command:
# ceph orch host rm <host>
ceph orch host rm local-168-182-130
    • If a host is offline and cannot be recovered, it can still be removed from the cluster by:
# ceph orch host rm <host> --offline --force
ceph orch host rm local-168-182-130 --offline --force
  • Host labels: Orchestrator supports assigning labels to hosts. Labels are free-form and have no specific meaning to themselves and each host can have multiple labels, which can be used to specify the daemon process to be placed.
    • add tag:
# ceph orch host add my_hostname --labels=my_label1
ceph orch host add local-168-182-130 --labels=my_label1,my_label2

# 也可以,ceph orch host label add my_hostname my_label
ceph orch host label add local-168-182-130 my_label
    • Delete tags:
# ceph orch host label rm my_hostname my_label
ceph orch host label rm local-168-182-130 my_label
    • Special host tags, the following host tags have special meanings for cephalosporins:
      • _no_schedule: Do not schedule or deploy daemons on this host, this tag prevents cephadm from deploying daemons on this host, if it is added to an existing host that already contains Ceph daemons, it will cause cephadm to move the daemon elsewhere (except OSD, which is not automatically deleted).
      • _no_autotune_memory: do not automatically tune memory on this host, this flag will prevent daemon memory from being tuned even if osd_memory_target_autotune or similar options are enabled for one or more daemons on this host;
      • _admin: Distribute client.admin and ceph.conf to this host, by default an _admin label is applied to the first host in the cluster (where bootstrap is initially run), and the client.admin key is set to Distributing to this host via functionality, adding this tag to other hosts usually causes CEPHADM to deploy configuration and keyring files. Added to the default location since versions 16.2.10 and 17.2.1 Cephadm also stores configuration and keyring files in the file directory, ceph orch client-keyring ... /etc/ceph/etc/ceph//var/lib/ceph/ /config.

④ Maintenance mode

  • Put the host into and out of maintenance mode (stop all Ceph daemons on the host):
# 进入维护模式
# ceph orch host maintenance enter <hostname> [--force]
ceph orch host maintenance enter local-168-182-130

# 退出维护模式
# ceph orch host maintenance exit <hostname>
ceph orch host maintenance exit local-168-182-130

⑤ View service status

  • To view the status of a service running in a Ceph cluster, do the following:
# ceph orch ls [--service_type type] [--service_name name] [--export] [--format f] [--refresh]

# 查看所有服务
ceph orch ls
# 查看指定服务
ceph orch ls alertmanager
ceph orch ls  --service_name crash

⑥ View daemon status

  • Print a list of all daemons known to the orchestrator:
# ceph orch ps [--hostname host] [--daemon_type type] [--service_name name] [--daemon_id id] [--format f] [--refresh]
ceph orch ps

# 然后查询特定服务实例的状态(mon、osd、mds、rgw),对于 OSDID 是数字 OSD ID,对于 MDS 服务,id 是文件 系统名称:
ceph orch ps --daemon_type osd --daemon_id 0

⑦ OSD Service

  • List devices:
    • ceph-volume sequentially scans each host in the cluster from time to time to determine which devices are present and whether they are eligible to be used as OSDs.
    • To view the list, run the following command:
# ceph orch device ls [--hostname=...] [--wide] [--refresh]
ceph orch device ls

# 使用 --wide 选项提供与设备相关的所有详细信息, 包括设备可能不符合用作 OSD 条件的任何原因。
ceph orch device ls --wide

insert image description here

    • In the example above, you can see the fields named "Health", "Identity" and "Fault", this information is passed through the integration with libstoragemgmt By default, this integration is disabled (because libstoragemgmt may not be 100% compatible with your hardware compatible).
    • To have cephadm include these fields, enable CEPHADM's Enhanced Device Scanning option as follows:
ceph config set mgr mgr/cephadm/device_enhanced_scan true
  • Create a new OSD:
    • Ceph uses any available and unused storage devices:
# 如果将新磁盘添加到群集,它们将自动用于 创建新的 OSD。
ceph orch apply osd --all-available-devices
    • Create an OSD from a specific device on a specific host:
# ceph orch daemon add osd *<host>*:*<device-path>*
ceph orch daemon add osd local-168-182-133:/dev/sdb
ceph orch daemon add osd local-168-182-133:/dev/sdc

# 或者
# ceph orch daemon add osd host1:data_devices=/dev/sda,/dev/sdb,db_devices=/dev/sdc,osds_per_device=2
ceph orch daemon add osd local-168-182-133:data_devices=/dev/sdb,/dev/sdc

# 使用lvm
# ceph orch daemon add osd *<host>*:*<lvm-path>*
ceph orch daemon add osd host1:/dev/vg_osd/lvm_osd1701
    • Test run, not real execution:
# 这 --dry-run 标志使业务流程协调程序显示内容的预览 将在不实际创建 OSD 的情况下发生。

ceph orch apply osd --all-available-devices --dry-run
  • Remove OSDs:
    • Removing an OSD from a cluster involves two steps: evacuating all placement groups (PGs) from the cluster, and removing OSDs without PGs from the cluster.
    • The following command performs these two steps:
# ceph orch osd rm <osd_id(s)> [--replace] [--force]
ceph orch osd rm 0
  • To monitor the status of OSD removal:
ceph orch osd rm status
  • Stop removing OSDs:
# ceph orch osd rm stop <osd_id(s)>
ceph orch osd rm stop 4
  • Activate existing OSDs. If reinstalling the host's operating system, you need to activate existing OSDs. For this use case, cephadm provides a wrapper that activates all existing OSDs on the host:
# ceph cephadm osd activate <host>...
ceph cephadm osd activate local-168-182-133
  • View data latency:
ceph osd perf
  • List the usage of each disk in the cluster in detail:
ceph osd df

⑧ pool related operations

  • View the number of pools in the ceph cluster:
ceph osd lspools
#或者
ceph osd pool ls
  • Create a pool in the ceph cluster:
#这里的100指的是PG:
ceph osd pool create rbdtest 100

⑨ PG related

  • View the mapping information of the pg group:
ceph pg dump
# 或者
# ceph pg ls
  • View a PG map:
ceph pg map 7.1a
  • View PG status:
ceph pg stat
  • Display all pg statistics in a cluster:
ceph pg dump --format plain

6. Practical operation demonstration

① Block storage usage (RDB)

insert image description here

  • Use create to create a pool pool:
ceph osd lspools

# 创建
ceph osd pool create ceph-demo 64 64

# 创建命令时需要指定PGPGP数量,还可以指定复制模型还是纠删码模型,副本数量等
# osd pool create <pool> [<pg_num:int>] [<pgp_num:int>] [replicated|erasure] [<erasure_code_  create pool profile>] [<rule>] [<expected_num_objects:int>] [<size:int>] [<pg_num_min:int>] [on|off| warn] [<target_size_bytes:int>] [<target_size_ratio:float>]
  • Obtain pool property information, which can be reset. There are many parameters, which can be set as follows:
# 1、获取 pg 个数
ceph osd pool get ceph-demo pg_num

# 2、获取 pgp 个数
ceph osd pool get ceph-demo pgp_num

# 3、获取副本数
ceph osd pool get ceph-demo size

# 4、获取使用模型
ceph osd pool get ceph-demo crush_rule

# 5、设置副本数
ceph osd pool set ceph-demo size 2

# 6、设置 pg 数量
ceph osd pool set ceph-demo pg_num 128

# 7、设置 pgp 数量
ceph osd pool set ceph-demo pgp_num 128
  • The pool needs to be initialized:
rbd pool init ceph-demo
  • Create rbd block device:
# 查看 块设备
rbd -p ceph-demo ls
# 【方式一】创建块设备
rbd create -p ceph-demo --image rbd-demo.img --size 10G
# 【方式二】创建块设备
rbd create ceph-demo/rbd-demo2.img --size 10G
# 查看 块设备
rbd -p ceph-demo ls
  • View block device information:
rbd info ceph-demo/rbd-demo2.img
  • Delete block device:
rbd rm -p ceph-demo --image rbd-demo2.img
  • Mount the device, since there is no virtual machine to mount, you need to use the kernel map to mount:
rbd map ceph-demo/rbd-demo.img
  • An error occurred during the mapping process. This is because these features are not supported in Centos7. You can specify features when creating:
rbd feature disable ceph-demo/rbd-demo.img deep-flatten
rbd feature disable ceph-demo/rbd-demo.img fast-diff
rbd feature disable ceph-demo/rbd-demo.img object-map

rbd feature disable ceph-demo/rbd-demo.img exclusive-lock
  • Mount again, mount successfully:
rbd map ceph-demo/rbd-demo.img
  • View device list:
rbd device list
  • View the device list through fdisk:
fdisk -l
  • Using the rbd device:
# 格式化
mkfs.ext4 /dev/rbd0
# 创建挂载目录
mkdir /mnt/rbd-demo
# 挂载
mount /dev/rbd0 /mnt/rbd-demo/
  • Block device expansion, the device can be expanded or reduced, but it is not recommended to use shrinking capacity, which may cause data loss:
# 扩容
rbd resize ceph-demo/rbd-demo.img --size 10G
# 查看
rbd -p ceph-demo info --image rbd-demo.img
# 也可以通过lsblk查看
lsblk
  • Uninstall:
umount /mnt/rbd-demo

② File system use (CephFS)

insert image description here

  • View the ceph filesystem:
ceph fs ls
  • Create storage pool:
ceph osd pool create cephfs_data 128
ceph osd pool create cephfs_metadata 128

# 创建命令时需要指定PGPGP数量,还可以指定复制模型还是纠删码模型,副本数量等
# osd pool create <pool> [<pg_num:int>] [<pgp_num:int>] [replicated|erasure] [<erasure_code_  create pool profile>] [<rule>] [<expected_num_objects:int>] [<size:int>] [<pg_num_min:int>] [on|off| warn] [<target_size_bytes:int>] [<target_size_ratio:float>]
  • PG (Placement Group), pg is a virtual concept used to store objects, PGP (Placement Group for Placement purpose), is equivalent to an osd arrangement and combination stored in pg.
  • create file system
ceph fs new 128 cephfs_metadata cephfs_data

#此时再回头查看文件系统,mds节点状态
ceph fs ls
ceph mds stat
  • View storage pool quotas:
ceph osd pool get-quota cephfs_metadata
  • The kernel driver mounts the ceph file system:
    • Create a mount point:
mkdir /mnt/mycephfs
    • Obtain the storage key, if not go to the management node to copy again:
cat /etc/ceph/ceph.client.admin.keyring

#将存储密钥保存到/etc/ceph/admin.secret文件中:
vim /etc/ceph/admin.secret
# AQBFVrFjqst6CRAA9WaF1ml7btkn6IuoUDb9zA==

#如果想开机挂载可以写入/etc/rc.d/rc.local文件中
    • mount:
# Ceph 存储集群默认需要认证,所以挂载时需要指定用户名 name 和创建密钥文件一节中创建的密钥文件 secretfile ,例如:
# mount -t ceph {
    
    ip-address-of-monitor}:6789:/ /mnt/mycephfs
mount -t ceph 192.168.182.130:6789:/ /mnt/mycephfs -o name=admin,secretfile=/etc/ceph/admin.secret

insert image description here

    • Uninstall:
umount /mnt/mycephfs
  • Common commands:
# 查看存储池副本数
ceph osd pool get [存储池名称] size
# 修改存储池副本数
ceph osd pool set [存储池名称] size 3
# 打印存储池列表
ceph osd lspools
# 创建存储池
ceph osd pool create [存储池名称] [pg_num的取值] 
# 存储池重命名
ceph osd pool rename [旧的存储池名称] [新的存储池名称]
# 查看存储池的pg_num
ceph osd pool get [存储池名称] pg_num
# 查看存储池的pgp_num
ceph osd pool get [存储池名称] pgp_num
# 修改存储池的pg_num值
ceph osd pool set [存储池名称] pg_num [pg_num的取值]
# 修改存储池的pgp_num值
ceph osd pool set [存储池名称] pgp_num [pgp_num的取值]

③ Object storage usage (RGW)

  • rados is a utility for interacting with Ceph's Object Storage Cluster (RADOS), part of Ceph's Distributed File System:

insert image description here

  • Check how many pools there are in the ceph cluster:
rados lspools
# 同  ceph osd pool ls 输出结果一致
  • Show overall system usage:
rados df
  • Create a pool:
ceph osd pool create test
  • Create an object object:
rados create test-object -p test
  • View object file:
rados -p test ls
  • Delete an object:
rados rm test-object -p test
  • Use Ceph storage through the api interface: In order to use the Ceph SGW REST interface, you need to initialize a Ceph object gateway user for the S3 interface, then create a sub-user for the Swif interface, and finally you can access the object gateway through the created user authentication.
    • Create an S3 Gateway user:
radosgw-admin user create --uid="rgwuser" --display-name="This is first rgw test user"
  • info:
{
    
    
    "user_id": "rgwuser",
    "display_name": "This is first rgw test user",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [
        {
    
    
            "user": "rgwuser",
            "access_key": "48AIAPCYK7S4X9P72VOW",
            "secret_key": "oC5qKL0BMMzUJHAS76rQAwIoJh4s6NwTnLklnQYX"
        }
    ],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
    
    
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
    
    
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}

insert image description here

    • Test access to the S3 interface:
#参照官方文档,我们需要编写一个 Python 测试脚本,该脚本将会连接 radosgw,然后新建一个新的 bucket 再列出所有的 buckets。脚本变量 aws_access_key_id 和 aws_secret_access_key 的值就是上边返回值中的 access_key 和 secret_key。

#首先,我们需要安装 python-boto 包,用于测试连接 S3。:
yum install python-boto -y

# 然后,编写 python 测试脚本。
# cat s3.py
#!/usr/bin/python

import boto
import boto.s3.connection
access_key = '48AIAPCYK7S4X9P72VOW'
secret_key = 'oC5qKL0BMMzUJHAS76rQAwIoJh4s6NwTnLklnQYX'
conn = boto.connect_s3(
    aws_access_key_id = access_key,
    aws_secret_access_key = secret_key,
    host = 'local-168-182-130', port=80,
    is_secure=False,
    calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
bucket = conn.create_bucket('my-first-s3-bucket')
for bucket in conn.get_all_buckets():
        print "{name}\t{created}".format(
                name = bucket.name,
                created = bucket.creation_date,
)
    • The python-boto package is used here to connect to S3 using authentication information, and then create a bucket of my-first-s3-bucket, and finally list all created buckets, print the name and creation time:

insert image description here

  • Create a Swift user:
#要通过 Swift 访问对象网关,需要 Swift 用户,我们创建subuser作为子用户。
radosgw-admin subuser create --uid=rgwuser --subuser=rgwuser:swift --access=full
    • Info:
{
    
    
    "user_id": "rgwuser",
    "display_name": "This is first rgw test user",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [
        {
    
    
            "id": "rgwuser:swift",
            "permissions": "full-control"
        }
    ],
    "keys": [
        {
    
    
            "user": "rgwuser",
            "access_key": "48AIAPCYK7S4X9P72VOW",
            "secret_key": "oC5qKL0BMMzUJHAS76rQAwIoJh4s6NwTnLklnQYX"
        }
    ],
    "swift_keys": [
        {
    
    
            "user": "rgwuser:swift",
            "secret_key": "6bgDOAsosiD28M0eE8U1N5sZeGyrhqB1ca3uDtI2"
        }
    ],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
    
    
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
    
    
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}
  • Create a key:
radosgw-admin key create --subuser=rgwuser:swift --key-type=swift --gen-secret

#注意:返回的 Json 值中,我们要记住swift_keys中的secret_key 因为下边我们测试访问 Swift 接口时需要使用。secret_key以这条命令为准
    • Info:
{
    
    
    "user_id": "rgwuser",
    "display_name": "This is first rgw test user",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [
        {
    
    
            "id": "rgwuser:swift",
            "permissions": "full-control"
        }
    ],
    "keys": [
        {
    
    
            "user": "rgwuser",
            "access_key": "48AIAPCYK7S4X9P72VOW",
            "secret_key": "oC5qKL0BMMzUJHAS76rQAwIoJh4s6NwTnLklnQYX"
        }
    ],
    "swift_keys": [
        {
    
    
            "user": "rgwuser:swift",
            "secret_key": "AVThl3FGiVQW3VepkQl4Wsoyq9lbPlLlpKhXLhtR"
        }
    ],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
    
    
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
    
    
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}
    • Test access to the Swift interface:
#注意,以下命令需要python环境和可用的pip服务。
yum install python-pip -y
pip install --upgrade python-swiftclient

#测试
swift -A http://192.168.182.130/auth/1.0 -U rgwuser:swift -K 'AVThl3FGiVQW3VepkQl4Wsoyq9lbPlLlpKhXLhtR' list

insert image description here

  • S3 related operations:
# 1、删除S3用户
radosgw-admin  user rm --uid=rgwuser

# 2、权限调整,允许rgwuser读写users信息:
radosgw-admin caps add --uid=rgwuser --caps="users=*"

# 3、允许admin读写所有的usage信息
radosgw-admin caps add --uid=rgwuser --caps="usage=read,write"

# 4、删除swift子用户
radosgw-admin subuser rm  --subuser=rgwuser:swift

# 5、列出当前系统下所有的bucket信息
radosgw-admin bucket list

# 6、查看具体某个BUCKET属性
radosgw-admin bucket stats --bucket=my-first-s3-bucket

Guess you like

Origin blog.csdn.net/Forever_wj/article/details/131696936