[Distributed] ceph storage

1. Storage basis

stand-alone storage device

  • DAS (Directly Attached Storage, which is directly connected to the motherboard bus of the computer)
    Disks with IDE, SATA, SCSI, SAS, and USB interfaces
    The so-called interface is a disk device driven by a storage device that provides block-level storage

  • NAS (Network Attached Storage, which is storage attached to the current host file system through the network)
    NFS, CIFS, and FTP
    file system-level storage is itself a well-made file system. After outputting in user space through the nfs interface, the client The terminal communicates with the remote host based on the kernel module, and converts it to use like a local file system. This kind of storage service cannot format it again to create a file system block

  • SAN (storage area network)
    SCSI protocol (only used to transmit data access operations, the physical layer uses SCSI cables for transmission), FCSAN (the physical layer uses optical fiber for transmission), iSCSI (the physical layer uses Ethernet for transmission
    ) A type of network storage, but the difference is that the interface provided by SAN to client hosts is block-level storage

Standalone storage problem

  • Insufficient storage and processing capabilities.
    The IO value of traditional IDE is 100 times/s, SATA solid-state disks 500 times/s, and solid-state hard drives 2000-4000 times/s. Even if the IO capacity of the disk is dozens of times larger, it is still not enough to resist the simultaneous access of hundreds of thousands, millions or even hundreds of millions of users during the peak period of website access, which is also limited by the IO capacity of the host network.

  • Insufficient storage capacity No
    matter how large the capacity of a single disk is, it cannot meet the data capacity limit required by users for normal access.

  • Single point of failure problem
    There is a single point of failure problem for data stored on a single machine

Commercial Storage Solutions

EMC, NetAPP, IBM, DELL, Huawei, Inspur

2. Distributed storage (software-defined storage SDS)

Ceph, TFS, FastDFS, MooseFS (MFS), HDFS, GlusterFS (GFS)
storage mechanisms will scatter data to multiple nodes and have the advantages of high scalability, high performance, and high availability.

Types of Distributed Storage

  • Block storage (such as a hard disk, generally a storage is mounted and used by a server, suitable for container or virtual machine storage volume allocation, log storage, file storage) is a bare device, used to provide unorganized storage
    space, the underlying Store data in chunks

  • File storage (such as NFS, which solves the problem that block storage cannot be shared, and one storage can be mounted by multiple servers at the same time, and is suitable for directory structure storage and log storage) is an interface for organizing and storing data, generally established at the block
    level On top of the storage structure, data is stored in the form of files, and the metadata and actual data of files are stored separately

  • Object storage (such as OSS, a storage can be accessed by multiple services at the same time, has the high-speed read and write capabilities of block storage, and also has the characteristics of file storage sharing, suitable for image storage and video storage) file storage provided based on the API interface, each
    file They are all one object, and the file sizes are different. The metadata and actual data of the file are stored together.

Advantages of Ceph

  • High scalability : decentralized, supports the use of ordinary X86 servers, supports the scale of thousands of storage nodes, and supports expansion from TB to EB level.
  • High reliability : no single point of failure, multiple data copies, automatic management, automatic repair.
  • High performance : Abandoning the traditional centralized storage metadata addressing scheme, using the CRUSH algorithm, the data distribution is balanced, and the degree of parallelism is high.
  • Powerful functions : Ceph is a unified storage system that integrates block storage interface (RBD), file storage interface (CephFS), and object storage interface (RadosGW), so it is suitable for different application scenarios.

Ceph Architecture

From bottom to top, the Ceph system can be divided into four levels:

  • RADOS basic storage system (Reliab1e, Autonomic, Distributed object store, that is, reliable, automated, and distributed object storage)
    RADOS is the lowest-level functional module of Ceph, and it is an infinitely scalable object storage service that can disassemble files Countless objects (fragments) are stored in the hard disk, which greatly improves the stability of the data. It is mainly composed of OSD and Monitor. Both OSD and Monitor can be deployed in multiple servers. This is the origin of ceph distribution and the origin of high scalability.

  • LIBRADOS The basic library
    Librados provides a way to interact with RADOS, and provides API interfaces for Ceph services to upper-layer applications. Therefore, the upper-layer RBD, RGW, and CephFS are all accessed through Librados. Currently, PHP, Ruby, Java, Python, and Go , C and C++ support for client application development directly based on RADOS (rather than the entire Ceph).

  • High-level application interface: includes three parts
    1) Object storage interface RGW (RADOS Gateway)
    gateway interface, based on the object storage system developed by Librados, provides a RESTful API interface compatible with S3 and Swift.

2) Block storage interface RBD (Reliable Block Device)
provides a block device interface based on Librados, which is mainly used for Host/VM.

3) The file storage interface CephFS (Ceph File System)
The Ceph file system provides a POSIX-compliant file system that uses the Ceph storage cluster to store user data on the file system. Based on the distributed file system interface provided by Librados.

  • Application layer: various APPs developed based on high-level interfaces or the basic library Librados, or many clients such as Host and VM

Ceph Core Components

Ceph is an object-based storage system that divides each data stream to be managed (such as files and other data) into one or more object data (Object) of fixed size (default 4 megabytes), and uses it as an atomic unit (Atom is the smallest unit of an element) to complete the reading and writing of data.

  • OSD (Object Storage Daemon, daemon process ceph-osd)
    is a process responsible for physical storage. It is generally configured in a one-to-one correspondence with disks, and one disk starts an OSD process. The main function is to store data, copy data, balance data, restore data, and perform heartbeat checks with other OSDs, and is responsible for the process of returning specific data in response to client requests. Typically at least 3 OSDs are required for redundancy and high availability.

  • PG (Placement Group)
    PG is just a virtual concept and does not exist physically. It is similar to the index in the database in data addressing: Ceph first maps each object data to a PG through the HASH algorithm, and then maps the PG to the OSD through the CRUSH algorithm.

  • Pool
    Pool is a logical partition for storing objects, which acts as a namespace. Each Pool contains a certain number (configurable) of PGs. Pool can be used as a fault isolation domain, which can be isolated according to different user scenarios.

There are two types of data storage methods in the Pool:

  • Multiple copies (replicated): similar to raid1, an object data is saved with 3 copies by default, and placed in different OSDs
  • Erasure Code: Similar to raid5, it consumes a little more CPU, but saves disk space, and only one copy of object data is saved. Since some functions of Ceph do not support erasure coding pools, this type of storage pool is not used much

Relationship between Pool, PG and OSD

There are many PGs in a Pool; a PG contains a bunch of objects, and an object can only belong to one PG; PGs are divided into masters and slaves, and a PG is distributed on different OSDs (for multi-copy types)

  • Monitor (daemon process ceph-mon)
    is used to save OSD metadata. Responsible for maintaining the mapping views of the cluster state (Cluster Map: OSD Map, Monitor Map, PG Map and CRUSH Map), maintaining various charts showing the cluster state, and managing cluster client authentication and authorization. A Ceph cluster usually requires at least 3 or 5 (odd number) Monitor nodes to achieve redundancy and high availability, and they synchronize data between nodes through the Paxos protocol.

  • The Manager (the daemon ceph-mgr)
    is responsible for tracking runtime metrics and the current state of the Ceph cluster, including storage utilization, current performance metrics, and system load. Provides additional monitoring and interfaces to external monitoring and management systems, such as zabbix, prometheus, cephmetrics, etc. A Ceph cluster usually requires at least 2 mgr nodes to achieve high availability, and information synchronization between nodes is realized based on the raft protocol.

  • MDS (Metadata Server, daemon ceph-mds)
    is a metadata service that CephFS services depend on. Responsible for saving the metadata of the file system and managing the directory structure. Object storage and block device storage do not require metadata services; if you are not using CephFS, you can not install it.

OSD storage backend

OSDs have two ways of managing the data they store. In Luminous 12.2.z and later releases, the default (and recommended) backend is BlueStore. Before Luminous was released, FileStore was the default and only option.

  • Filestore
    FileStore is a legacy method of storing objects in Ceph. It relies on a standard file system (only XFS) combined with a key/value database (traditionally LevelDB, now BlueStore is RocksDB) for storing and managing metadata.
    FileStore is well tested and used extensively in production. However, due to its overall design and dependence on traditional file systems, it has many shortcomings in performance.

  • Bluestore
    BlueStore is a special purpose storage backend designed specifically for managing data on disk for OSD workloads. BlueStore's design is based on a decade of experience supporting and managing Filestores. Compared with Filestore, BlueStore has better read and write performance and security.

Key features of BlueStore include:

1) BlueStore directly manages the storage device, that is, directly uses the original block device or partition to manage the data on the disk. This avoids the intervention of abstraction layers (such as local file systems such as XFS), which can limit performance or increase complexity.
2) BlueStore uses RocksDB for metadata management. RocksDB's key/value database is embedded in order to manage internal metadata, including mapping object names to block locations on disk.
3) All data and metadata written to BlueStore is protected by one or more checksums. No data or metadata is read from disk or returned to the user without verification.
4) Support for inline compression. Data can optionally be compressed before being written to disk.
5) Support multi-device metadata layering. BlueStore allows its internal log (WAL write-ahead log) to be written to a separate high-speed device (such as SSD, NVMe or NVDIMM) for improved performance. Internal metadata can be stored on faster devices if there is plenty of faster storage available.
6) Support efficient copy-on-write. RBD and CephFS snapshots rely on the copy-on-write cloning mechanism efficiently implemented in BlueStore. This will result in efficient I/O for regular snapshots and erasure-coded pools (which rely on clones for efficient two-phase commit).

Stored procedures for Ceph data

1) The client obtains the latest Cluster Map from mon

2) In Ceph, everything is an object. The data stored by Ceph will be divided into one or more fixed-size objects (Object). Object size can be adjusted by the administrator, usually 2M or 4M.
Each object will have a unique OID, which is composed of ino and ono:
ino: the FileID of the file, which is used to uniquely identify each file globally
ono: the number of the slice
For example: a file FileID is A , it is cut into two objects, one is numbered 0 and the other is numbered 1, then the oids of these two files are A0 and A1.
The advantage of OID is that it can uniquely identify each different object and store the affiliation between the object and the file. Since all data in Ceph are virtualized into uniform objects, the efficiency of reading and writing will be relatively high.

3) Obtain a hexadecimal feature code by using the HASH algorithm on the OID, take the remainder from the feature code and the total number of PGs in the Pool, and the obtained serial number is PGID.
That is, Pool_ID + HASH(OID) % PG_NUM to get PGID

4) The PG will replicate according to the number of copies set, and calculate the IDs of the target primary and secondary OSDs in the PG by using the CRUSH algorithm on the PGID, and store them on different OSD nodes (in fact, all objects in the PG are stored on the OSD) .
That is, through CRUSH (PGID), the data in PG is stored in each OSD group.
CRUSH is a data distribution algorithm used by Ceph, similar to consistent hashing, so that data is allocated to the expected place.

Ceph Version Release Lifecycle

Starting from the Nautilus version (14.2.0), Ceph will have a new stable version released every year, which is expected to be released in March every year. Every year, the new version will have a new name (for example, "Mimic") and a main version number (for example, 13 for Mimic, since "M" is the 13th letter of the alphabet).

The format of the version number is xyz, x indicates the release cycle (for example, 13 for Mimic, 17 for Quincy), y indicates the type of release version, that is
x.0.z: y equals 0, indicating the development version
x.1.z : y equals 1, indicating a release candidate (for test clusters)
x.2.z : y equals 2, indicating a stable/bugfix release (for users)

Ceph cluster deployment

At present, Ceph officially provides many ways to deploy Ceph clusters. The commonly used methods are ceph-deploy, cephadm and binary:

  • ceph-deploy: A cluster automation deployment tool that has been used for a long time, is mature and stable, is integrated by many automation tools, and can be used for production deployment.

  • cephadm: From Octopus and newer releases use cephadm to deploy ceph clusters, use containers and systemd to install and manage Ceph clusters. Not recommended for production environments at this time.

  • Binary: Manual deployment, deploying the Ceph cluster step by step, supports more customization and understanding of deployment details, and is more difficult to install.

Deploy Ceph cluster based on ceph-deploy

Environment build

Ceph production environment recommendation:

1. The storage clusters all use 10G networks.
2. The cluster network (cluster-network, used for cluster internal communication) is separated from the public network (public-network, used for external access to the Ceph cluster). 3. Mon
, mds and osd are deployed separately in On different hosts (one host node can run multiple components in the test environment)
4. OSD can also use SATA
5. Cluster planning according to capacity
6. Xeon E5 2620 V3 or above CPU, 64GB or higher memory
7. Cluster Decentralized deployment of hosts to avoid cabinet power or network failures

//Ceph 环境规划
主机名				Public网络				Cluster网络				角色
admin				192.168.137.101									admin(管理节点负责集群整体部署)、client
node01				192.168.137.102			192.168.100.11			mon、mgr、osd(/dev/sdb、/dev/sdc、/dev/sdd)
node02				192.168.137.103			192.168.100.12			mon、mgr、osd(/dev/sdb、/dev/sdc、/dev/sdd)
node03				192.168.137.104			192.168.100.13			mon、osd(/dev/sdb、/dev/sdc、/dev/sdd)
client				192.168.80.14									client
//环境准备
可选步骤:创建 Ceph 的管理用户
useradd cephadm
passwd cephadm

visudo
cephadm ALL=(root) NOPASSWD:ALL
1、关闭 selinux 与防火墙
systemctl disable --now firewalld
setenforce 0
sed -i 's/enforcing/disabled/' /etc/selinux/config

insert image description here

2、根据规划设置主机名
hostnamectl set-hostname admin
hostnamectl set-hostname node01
hostnamectl set-hostname node02
hostnamectl set-hostname node03
hostnamectl set-hostname client

insert image description here

3、配置 hosts 解析
cat >> /etc/hosts << EOF
192.168.137.101 admin
192.168.137.102 node01
192.168.137.103 node02
192.168.137.104 node03
192.168.80.14 client
EOF

insert image description here

4、安装常用软件和依赖包
yum -y install epel-release
yum -y install yum-plugin-priorities yum-utils ntpdate python-setuptools python-pip gcc gcc-c++ autoconf libjpeg libjpeg-devel libpng libpng-devel freetype freetype-devel libxml2 libxml2-devel zlib zlib-devel glibc glibc-devel glib2 glib2-devel bzip2 bzip2-devel zip unzip ncurses ncurses-devel curl curl-devel e2fsprogs e2fsprogs-devel krb5-devel libidn libidn-devel openssl openssh openssl-devel nss_ldap openldap openldap-devel openldap-clients openldap-servers libxslt-devel libevent-devel ntp libtool-ltdl bison libtool vim-enhanced python wget lsof iptraf strace lrzsz kernel-devel kernel-headers pam-devel tcl tk cmake ncurses-devel bison setuptool popt-devel net-snmp screen perl-devel pcre-devel net-snmp screen tcpdump rsync sysstat man iptables sudo libconfig git bind-utils tmux elinks numactl iftop bwm-ng net-tools expect snappy leveldb gdisk python-argparse gperftools-libs conntrack ipset jq libseccomp socat chrony sshpass

5、在 admin 管理节点配置 ssh 免密登录所有节点
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
sshpass -p 'abc1234' ssh-copy-id -o StrictHostKeyChecking=no root@admin
sshpass -p 'abc1234' ssh-copy-id -o StrictHostKeyChecking=no root@node01
sshpass -p 'abc1234' ssh-copy-id -o StrictHostKeyChecking=no root@node02
sshpass -p 'abc1234' ssh-copy-id -o StrictHostKeyChecking=no root@node03

insert image description here

6、配置时间同步
systemctl enable --now chronyd
timedatectl set-ntp true					#开启 NTP
timedatectl set-timezone Asia/Shanghai		#设置时区
chronyc -a makestep							#强制同步下系统时钟
timedatectl status							#查看时间同步状态
chronyc sources -v							#查看 ntp 源服务器信息
timedatectl set-local-rtc 0					#将当前的UTC时间写入硬件时钟

#重启依赖于系统时间的服务
systemctl restart rsyslog 
systemctl restart crond

#关闭无关服务
systemctl disable --now postfix

insert image description hereinsert image description here


7、配置 Ceph yum源
wget https://download.ceph.com/rpm-nautilus/el7/noarch/ceph-release-1-1.el7.noarch.rpm --no-check-certificate

rpm -ivh ceph-release-1-1.el7.noarch.rpm --force
8、执行完上面所有的操作之后重启所有主机(可选)
sync
reboot

insert image description here

//部署 Ceph 集群
1、为所有节点都创建一个 Ceph 工作目录,后续的工作都在该目录下进行
mkdir -p /etc/ceph


2、安装 ceph-deploy 部署工具
cd /etc/ceph
yum install -y ceph-deploy

ceph-deploy --version

insert image description here

3、在管理节点为其它节点安装 Ceph 软件包
#ceph-deploy 2.0.1 默认部署的是 mimic 版的 Ceph,若想安装其他版本的 Ceph,可以用 --release 手动指定版本
cd /etc/ceph
ceph-deploy install --release nautilus node0{
    
    1..3} admin

#ceph-deploy install 本质就是在执行下面的命令:
yum clean all
yum -y install epel-release
yum -y install yum-plugin-priorities
yum -y install ceph-release ceph ceph-radosgw

#也可采用手动安装 Ceph 包方式,在其它节点上执行下面的命令将 Ceph 的安装包都部署上:
sed -i 's#download.ceph.com#mirrors.tuna.tsinghua.edu.cn/ceph#' /etc/yum.repos.d/ceph.repo
yum install -y ceph-mon ceph-radosgw ceph-mds ceph-mgr ceph-osd ceph-common ceph


4、生成初始配置
#在管理节点运行下述命令,告诉 ceph-deploy 哪些是 mon 监控节点
cd /etc/ceph
ceph-deploy new --public-network 192.168.137.0/24 --cluster-network 192.168.100.0/24 node01 node02 node03

#命令执行成功后会在 /etc/ceph 下生成配置文件
ls /etc/ceph
ceph.conf					#ceph的配置文件
ceph-deploy-ceph.log		#monitor的日志
ceph.mon.keyring			#monitor的密钥环文件

insert image description here

5、在管理节点初始化 mon 节点
cd /etc/ceph
ceph-deploy mon create node01 node02 node03			#创建 mon 节点,由于 monitor 使用 Paxos 算法,其高可用集群节点数量要求为大于等于 3 的奇数台

ceph-deploy --overwrite-conf mon create-initial		#配置初始化 mon 节点,并向所有节点同步配置
													# --overwrite-conf 参数用于表示强制覆盖配置文件

ceph-deploy gatherkeys node01						#可选操作,向 node01 节点收集所有密钥

#命令执行成功后会在 /etc/ceph 下生成配置文件
ls /etc/ceph
ceph.bootstrap-mds.keyring			#引导启动 mds 的密钥文件
ceph.bootstrap-mgr.keyring			#引导启动 mgr 的密钥文件
ceph.bootstrap-osd.keyring			#引导启动 osd 的密钥文件
ceph.bootstrap-rgw.keyring			#引导启动 rgw 的密钥文件
ceph.client.admin.keyring			#ceph客户端和管理端通信的认证密钥,拥有ceph集群的所有权限
ceph.conf
ceph-deploy-ceph.log
ceph.mon.keyring

#在 mon 节点上查看自动开启的 mon 进程
ps aux | grep ceph
root        1823  0.0  0.2 189264  9216 ?        Ss   19:46   0:00 /usr/bin/python2.7 /usr/bin/ceph-crash
ceph        3228  0.0  0.8 501244 33420 ?        Ssl  21:08   0:00 /usr/bin/ceph-mon -f --cluster ceph --id node03 --setuser ceph --setgroupceph
root        3578  0.0  0.0 112824   988 pts/1    R+   21:24   0:00 grep --color=auto ceph

#在管理节点查看 Ceph 集群状态
cd /etc/ceph
ceph -s
  cluster:
    id:     7e9848bb-909c-43fa-b36c-5805ffbbeb39
    health: HEALTH_WARN
            mons are allowing insecure global_id reclaim
 
  services:
    mon: 3 daemons, quorum node01,node02,node03
    mgr: no daemons active
    osd: 0 osds: 0 up, 0 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

#查看 mon 集群选举的情况
ceph quorum_status --format json-pretty | grep leader
"quorum_leader_name": "node01",

#扩容 mon 节点
ceph-deploy mon add <节点名称>

insert image description here

insert image description hereinsert image description here

6、部署能够管理 Ceph 集群的节点(可选)
#可实现在各个节点执行 ceph 命令管理集群
cd /etc/ceph
ceph-deploy --overwrite-conf config push node01 node02 node03		#向所有 mon 节点同步配置,确保所有 mon 节点上的 ceph.conf 内容必须一致

ceph-deploy admin node01 node02 node03			#本质就是把 ceph.client.admin.keyring 集群认证文件拷贝到各个节点

#在 mon 节点上查看
ls /etc/ceph
ceph.client.admin.keyring  ceph.conf  rbdmap  tmpr8tzyc

cd /etc/ceph
ceph -s

insert image description hereinsert image description here

insert image description here

7、部署 osd 存储节点
#主机添加完硬盘后不要分区,直接使用
lsblk 
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0   60G  0 disk 
├─sda1   8:1    0  500M  0 part /boot
├─sda2   8:2    0    4G  0 part [SWAP]
└─sda3   8:3    0 55.5G  0 part /
sdb      8:16   0   20G  0 disk 
sdc      8:32   0   20G  0 disk 
sdd      8:48   0   20G  0 disk 

#如果是利旧的硬盘,则需要先擦净(删除分区表)磁盘(可选,无数据的新硬盘可不做)
cd /etc/ceph
ceph-deploy disk zap node01 /dev/sdb
ceph-deploy disk zap node02 /dev/sdb
ceph-deploy disk zap node03 /dev/sdb

#添加 osd 节点
ceph-deploy --overwrite-conf osd create node01 --data /dev/sdb
ceph-deploy --overwrite-conf osd create node02 --data /dev/sdb
ceph-deploy --overwrite-conf osd create node03 --data /dev/sdb

#查看 ceph 集群状态
ceph -s
  cluster:
    id:     7e9848bb-909c-43fa-b36c-5805ffbbeb39
    health: HEALTH_WARN
            no avtive mgr
 
  services:
    mon: 3 daemons, quorum node01,node02,node03 (age 119m)
    mgr: no daemons active
    osd: 3 osds: 3 up (since 35s), 3 in (since 35s)
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 57 GiB / 60 GiB avail
    pgs: 


ceph osd stat
ceph osd tree
rados df
ssh root@node01 systemctl status ceph-osd@0
ssh root@node02 systemctl status ceph-osd@1
ssh root@node03 systemctl status ceph-osd@2

ceph osd status    #查看 osd 状态,需部署 mgr 后才能执行
+----+--------+-------+-------+--------+---------+--------+---------+-----------+
| id |  host  |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+--------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | node01 | 1025M | 18.9G |    0   |     0   |    0   |     0   | exists,up |
| 1  | node02 | 1025M | 18.9G |    0   |     0   |    0   |     0   | exists,up |
| 2  | node03 | 1025M | 18.9G |    0   |     0   |    0   |     0   | exists,up |
+----+--------+-------+-------+--------+---------+--------+---------+-----------+

ceph osd df    #查看 osd 容量,需部署 mgr 后才能执行
ID CLASS WEIGHT  REWEIGHT SIZE   RAW USE DATA    OMAP META  AVAIL  %USE VAR  PGS STATUS 
 0   hdd 0.01949  1.00000 20 GiB 1.0 GiB 1.8 MiB  0 B 1 GiB 19 GiB 5.01 1.00   0     up 
 1   hdd 0.01949  1.00000 20 GiB 1.0 GiB 1.8 MiB  0 B 1 GiB 19 GiB 5.01 1.00   0     up 
 2   hdd 0.01949  1.00000 20 GiB 1.0 GiB 1.8 MiB  0 B 1 GiB 19 GiB 5.01 1.00   0     up 
                    TOTAL 60 GiB 3.0 GiB 5.2 MiB  0 B 3 GiB 57 GiB 5.01                 
MIN/MAX VAR: 1.00/1.00  STDDEV: 0


#扩容 osd 节点
cd /etc/ceph
ceph-deploy --overwrite-conf osd create node01 --data /dev/sdc
ceph-deploy --overwrite-conf osd create node02 --data /dev/sdc
ceph-deploy --overwrite-conf osd create node03 --data /dev/sdc
ceph-deploy --overwrite-conf osd create node01 --data /dev/sdd
ceph-deploy --overwrite-conf osd create node02 --data /dev/sdd
ceph-deploy --overwrite-conf osd create node03 --data /dev/sdd

添加 OSD 中会涉及到 PG 的迁移,由于此时集群并没有数据,因此 health 的状态很快就变成 OK,如果在生产环境中添加节点则会涉及到大量的数据的迁移。

insert image description hereinsert image description here

8、部署 mgr 节点
#ceph-mgr守护进程以Active/Standby模式运行,可确保在Active节点或其ceph-mgr守护进程故障时,其中的一个Standby实例可以在不中断服务的情况下接管其任务。根据官方的架构原则,mgr至少要有两个节点来进行工作。
cd /etc/ceph
ceph-deploy mgr create node01 node02

ceph -s
  cluster:
    id:     7e9848bb-909c-43fa-b36c-5805ffbbeb39
    health: HEALTH_WARN
            mons are allowing insecure global_id reclaim
 
  services:
    mon: 3 daemons, quorum node01,node02,node03
    mgr: node01(active, since 10s), standbys: node02
    osd: 0 osds: 0 up, 0 in
 

#解决 HEALTH_WARN 问题:mons are allowing insecure global_id reclaim问题:
禁用不安全模式:ceph config set mon auth_allow_insecure_global_id_reclaim false

#扩容 mgr 节点
ceph-deploy mgr create <节点名称>

insert image description here

insert image description here

insert image description here

9、开启监控模块
#在 ceph-mgr Active节点执行命令开启
ceph -s | grep mgr

yum install -y ceph-mgr-dashboard

cd /etc/ceph

ceph mgr module ls | grep dashboard

#开启 dashboard 模块
ceph mgr module enable dashboard --force

#禁用 dashboard 的 ssl 功能
ceph config set mgr mgr/dashboard/ssl false

#配置 dashboard 监听的地址和端口
ceph config set mgr mgr/dashboard/server_addr 0.0.0.0
ceph config set mgr mgr/dashboard/server_port 8000

#重启 dashboard
ceph mgr module disable dashboard
ceph mgr module enable dashboard --force

#确认访问 dashboard 的 url
ceph mgr services

#设置 dashboard 账户以及密码
echo "12345678" > dashboard_passwd.txt
ceph dashboard set-login-credentials admin -i dashboard_passwd.txt
  或
ceph dashboard ac-user-create admin administrator -i dashboard_passwd.txt

浏览器访问:http://192.168.80.11:8000 ,账号密码为 admin/12345678


insert image description here
insert image description here
insert image description here

Resource Pool Pool Management

Above we have completed the deployment of the Ceph cluster, but how do we store data in Ceph? First we need to define a Pool resource pool in Ceph. Pool is an abstract concept for storing Object objects in Ceph. We can understand it as a logical partition on Ceph storage. Pool is composed of multiple PGs; PGs are mapped to different OSDs through the CRUSH algorithm; at the same time, Pool can set the replica size, and the default number of replicas is 3.

The Ceph client requests the status of the cluster from the monitor, and writes data to the Pool. According to the number of PGs, the data is mapped to different OSD nodes through the CRUSH algorithm to realize data storage. Here we can understand Pool as a logical unit for storing Object data; of course, the current cluster does not have a resource pool, so it needs to be defined.

创建一个 Pool 资源池,其名字为 mypool,PGs 数量设置为 64,设置 PGs 的同时还需要设置 PGP(通常PGs和PGP的值是相同的):
PG (Placement Group),pg 是一个虚拟的概念,用于存放 object,PGP(Placement Group for Placement purpose),相当于是 pg 存放的一种 osd 排列组合
cd /etc/ceph
ceph osd pool create mypool 64 64

#查看集群 Pool 信息
ceph osd pool ls    或    rados lspools
ceph osd lspools

#查看资源池副本的数量
ceph osd pool get mypool size

#查看 PG 和 PGP 数量
ceph osd pool get mypool pg_num
ceph osd pool get mypool pgp_num

insert image description here
insert image description here

#修改 pg_num 和 pgp_num 的数量为 128
ceph osd pool set mypool pg_num 128
ceph osd pool set mypool pgp_num 128

ceph osd pool get mypool pg_num
ceph osd pool get mypool pgp_num

#修改 Pool 副本数量为 2
ceph osd pool set mypool size 2

ceph osd pool get mypool size

#修改默认副本数为 2
vim ceph.conf
......
osd_pool_default_size = 2

ceph-deploy --overwrite-conf config push node01 node02 node03

insert image description hereinsert image description hereinsert image description here
insert image description here

#删除 Pool 资源池
1)删除存储池命令存在数据丢失的风险,Ceph 默认禁止此类操作,需要管理员先在 ceph.conf 配置文件中开启支持删除存储池的操作
vim ceph.conf
......
[mon]
mon allow pool delete = true

2)推送 ceph.conf 配置文件给所有 mon 节点
ceph-deploy --overwrite-conf config push node01 node02 node03

3)所有 mon 节点重启 ceph-mon 服务
systemctl restart ceph-mon.target

4)执行删除 Pool 命令
ceph osd pool rm pool01 pool01 --yes-i-really-really-mean-it

insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/2302_76410765/article/details/131737396