Deployment and application of distributed storage Ceph (create MDS, RBD, RGW interface)


1. Storage basis

1.1 Stand-alone storage device

●DAS (directly attached storage, which is directly connected to the motherboard bus of the computer) Disks with
IDE, SATA, SCSI, SAS, and USB interfaces
The so-called interface is a disk device driven by a storage device, providing block-level storage

●NAS (Network Attached Storage, which is storage attached to the current host file system through the network)
NFS, CIFS, FTP
file system-level storage is itself a well-made file system. After outputting in the user space through the nfs interface, The client performs network communication with the remote host based on the kernel module, and converts it to use like a local file system. This kind of storage service cannot format it again to create a file system block

●SAN (storage area network)
SCSI protocol (only used to transmit data access operations, the physical layer uses SCSI cables for transmission), FCSAN (the physical layer uses optical fiber for transmission), iSCSI (the physical layer uses Ethernet for transmission)
It is also a kind of network storage, but the difference is that the interface provided by SAN to the client host is block-level storage

1.2 The problem of stand-alone storage

●Insufficient storage and processing capacity
The IO value of traditional IDE is 100 times/second, that of SATA solid-state disk is 500 times/second, and that of solid-state hard disk is 2000-4000 times/second. Even if the IO capacity of the disk is dozens of times larger, it is still not enough to resist the simultaneous access of hundreds of thousands, millions or even hundreds of millions of users during the peak period of website access, which is also limited by the IO capacity of the host network.

●Insufficient storage capacity
No matter how large the capacity of a single disk is, it cannot meet the data capacity limit required by users for normal access.

●Single point of failure
There is a single point of failure for data stored on a single machine

Commercial storage solutions
EMC, NetAPP, IBM, DELL, Huawei, Inspur

2. Distributed storage (software-defined storage SDS)

Ceph, TFS, FastDFS, MooseFS (MFS), GlusterFS (GFS)
storage mechanisms will scatter data to multiple nodes and have the advantages of high scalability, high performance, and high availability.

2.1 Types of distributed storage

Block storage (such as a hard disk, generally a storage is mounted by a server, suitable for container or virtual machine storage volume allocation, log storage, file storage) block storage provides a storage volume that works like a hard drive, organized
into blocks of the same size. Typically, either the operating system formats a block-based storage volume with a file system, or applications (such as databases) access it directly to store data.

●File storage (such as NFS, solves the problem that block storage cannot be shared, and one storage can be mounted by multiple servers at the same time, suitable for directory structure storage and log storage) allows
data to be organized as a traditional file system. Data is kept in a file which has a name and some associated metadata such as modification timestamp, owner and access permissions. Provides file-based storage using a hierarchy of directories and subdirectories to organize how files are stored.

●Object storage (such as OSS, a storage can be accessed by multiple services at the same time, has the high-speed read and write capabilities of block storage, and also has the characteristics of file storage sharing, suitable for image storage and video storage) file storage provided based on the API interface,
each A file is an object, and the file size is different, and the metadata and actual data of the file are stored together.
Object storage allows arbitrary data and metadata to be stored as a unit, tagged with a unique identifier within a flat storage pool. Store and retrieve data using APIs instead of accessing data as blocks or in file system hierarchies.

3. Introduction to Ceph

Ceph is developed in C++ language and is an open, self-healing and self-managing open source distributed storage system. It has the advantages of high scalability, high performance and high reliability.

Ceph is currently supported by many cloud computing vendors and is widely used. RedHat, OpenStack, and Kubernetes can all be integrated with Ceph to support the back-end storage of virtual machine images.
It is roughly estimated that 70%-80% of cloud platforms in my country use Ceph as the underlying storage platform, which shows that Ceph has become the standard configuration of open source cloud platforms. At present, domestic companies that use Ceph to build distributed storage systems are more successful, such as Huawei, Ali, ZTE, H3C, Inspur, China Mobile, Netease, LeTV, 360, Tristar Storage, Shanyan Data, etc.

3.1 Advantages of Ceph

●High scalability: decentralized, supports the use of ordinary X86 servers, supports the scale of thousands of storage nodes, and supports expansion from TB to EB level.
●High reliability: no single point of failure, multiple data copies, automatic management, automatic repair.
●High performance: Abandoning the traditional centralized storage metadata addressing scheme, using the CRUSH algorithm, the data distribution is balanced, and the degree of parallelism is high.
● Powerful functions: Ceph is a unified storage system that integrates block storage interface (RBD), file storage interface (CephFS), and object storage interface (RadosGW), so it is suitable for different application scenarios.

3.2 Ceph Architecture

From bottom to top, the Ceph system can be divided into four levels:
RADOS basic storage system (Reliab1e, Autonomic, Distributed object store, that is, reliable, automated, distributed object storage)
RADOS is the lowest functional module of Ceph, It is an infinitely expandable object storage service that can disassemble files into countless objects (fragments) and store them in the hard disk, greatly improving data stability. It is mainly composed of OSD and Monitor. Both OSD and Monitor can be deployed in multiple servers. This is the origin of ceph distribution and the origin of high scalability.

● LIBRADOS The basic library
Librados provides a way to interact with RADOS, and provides Ceph service API interfaces to upper-layer applications. Therefore, the upper-layer RBD, RGW, and CephFS are all accessed through Librados. Currently, PHP, Ruby, Java, Python, Go, C, and C++ support for client application development directly on RADOS (rather than whole Ceph).

●High-level application interface: includes three parts
1) Object storage interface RGW (RADOS Gateway)
gateway interface, based on the object storage system developed by Librados, provides a RESTful API interface compatible with S3 and Swift.

2) Block storage interface RBD (Reliable Block Device)
provides a block device interface based on Librados, which is mainly used for Host/VM.

3) The file storage interface CephFS (Ceph File System)
The Ceph file system provides a POSIX-compliant file system that uses the Ceph storage cluster to store user data on the file system. Based on the distributed file system interface provided by Librados.

●Application layer: various APPs developed based on high-level interfaces or the basic library Librados, or many clients such as Host and VM

3.3 Ceph Core Components

Ceph is an object-based storage system that divides each data stream to be managed (such as files and other data) into one or more object data (Object) of fixed size (default 4 megabytes), and uses it as an atomic unit (Atom is the smallest unit of an element) to complete the reading and writing of data.
insert image description here

●OSD (Object Storage Daemon, daemon process ceph-osd)
is a process responsible for physical storage, and is generally configured to correspond to disks one by one, and one disk starts an OSD process. The main function is to store data, copy data, balance data, restore data, and perform heartbeat checks with other OSDs, and is responsible for the process of returning specific data in response to client requests. Typically at least 3 OSDs are required for redundancy and high availability.

●PG (Placement Group)
PG is a virtual concept and does not exist physically. It is similar to the index in the database in data addressing: Ceph first maps each object data to a PG through the HASH algorithm, and then maps the PG to the OSD through the CRUSH algorithm.

●Pool
Pool is a logical partition for storing objects, and it functions as a namespace. Each Pool contains a certain number (configurable) of PGs. The pool can be used as a fault isolation domain, which is not uniformly isolated according to different user scenarios.

There are two types of data storage methods in the Pool:
Replicated: similar to raid1, 3 copies of an object data are saved by default and placed in different OSDs
Erasure Code: similar to raid5, which consumes less CPU Large, but saves disk space, and only one copy of object data is saved. Since some functions of Ceph do not support erasure coding pools, this type of storage pool is not used much

3.4 Relationship between Pool, PG and OSD

There are many PGs in a Pool; a PG contains a bunch of objects, and an object can only belong to one PG; PGs are divided into masters and slaves, and a PG is distributed on different OSDs (for the three-copy type)

●Monitor (the daemon process ceph-mon)
is used to save the metadata of the OSD. Responsible for maintaining the mapping views of the cluster state (Cluster Map: OSD Map, Monitor Map, PG Map and CRUSH Map), maintaining various charts showing the cluster state, and managing cluster client authentication and authorization. A Ceph cluster usually requires at least 3 or 5 (odd number) Monitor nodes to achieve redundancy and high availability, and they synchronize data between nodes through the Paxos protocol.

● The Manager (daemon ceph-mgr)
is responsible for tracking runtime metrics and the current state of the Ceph cluster, including storage utilization, current performance metrics, and system load. Provides additional monitoring and interfaces to external monitoring and management systems, such as zabbix, prometheus, cephmetrics, etc. A Ceph cluster usually requires at least 2 mgr nodes to achieve high availability, and information synchronization between nodes is realized based on the raft protocol.

●MDS (Metadata Server, daemon process ceph-mds)
is a metadata service that CephFS services depend on. Responsible for saving the metadata of the file system and managing the directory structure. Object storage and block device storage do not require metadata services; if you are not using CephFS, you can not install it.

3.5 OSD storage backend

OSDs have two ways of managing the data they store. In Luminous 12.2.z and later releases, the default (and recommended) backend is BlueStore. Before Luminous was released, FileStore was the default and only option.

● Filestore
FileStore is a legacy method of storing objects in Ceph. It relies on a standard file system (only XFS) combined with a key/value database (traditionally LevelDB, now BlueStore is RocksDB) for storing and managing metadata.
FileStore is well tested and used extensively in production. However, due to its overall design and dependence on traditional file systems, it has many shortcomings in performance.

● Bluestore
BlueStore is a special-purpose storage backend designed specifically for OSD workload management of data on disk. BlueStore's design is based on a decade of experience supporting and managing Filestores. Compared with Filestore, BlueStore has better read and write performance and security.

The main functions of BlueStore include:
1) BlueStore directly manages storage devices, that is, directly uses raw block devices or partitions to manage data on disks. This avoids the intervention of abstraction layers (such as local file systems such as XFS), which can limit performance or increase complexity.
2) BlueStore uses RocksDB for metadata management. RocksDB's key/value database is embedded in order to manage internal metadata, including mapping object names to block locations on disk.
3) All data and metadata written to BlueStore is protected by one or more checksums. No data or metadata is read from disk or returned to the user without verification.
4) Support for inline compression. Data can optionally be compressed before being written to disk.
5) Support multi-device metadata layering. BlueStore allows its internal log (WAL write-ahead log) to be written to a separate high-speed device (such as SSD, NVMe or NVDIMM) for improved performance. Internal metadata can be stored on faster devices if there is plenty of faster storage available.
6) Support efficient copy-on-write. RBD and CephFS snapshots rely on the copy-on-write cloning mechanism efficiently implemented in BlueStore. This will result in efficient I/O for regular snapshots and erasure-coded pools (which rely on clones for efficient two-phase commit).

3.6 Ceph Data Stored Procedures

insert image description here

1) The client obtains the latest Cluster Map from mon

2) In Ceph, everything is an object. The data stored by Ceph will be divided into one or more fixed-size objects (Object). Object size can be adjusted by the administrator, usually 2M or 4M.
Each object will have a unique OID, which is composed of ino and ono:
ino: the FileID of the file, which is used to uniquely identify each file globally
ono: the number of the slice
For example: a file FileID is A , it is cut into two objects, one is numbered 0 and the other is numbered 1, then the oids of these two files are A0 and A1.
The advantage of OID is that it can uniquely identify each different object and store the affiliation between the object and the file. Since all data in Ceph are virtualized into uniform objects, the efficiency of reading and writing will be relatively high.

3) Obtain a hexadecimal feature code by using the HASH algorithm on the OID, take the remainder from the feature code and the total number of PGs in the Pool, and the obtained serial number is PGID.
That is, Pool_ID + HASH(OID) % PG_NUM to get PGID

4) The PG will replicate according to the number of copies set, and calculate the IDs of the target primary and secondary OSDs in the PG by using the CRUSH algorithm on the PGID, and store them on different OSD nodes (in fact, all objects in the PG are stored on the OSD) .
That is, through CRUSH (PGID), the data in PG is stored in each OSD group

3.7 Ceph Version Release Lifecycle

Starting from the Nautilus version (14.2.0), Ceph will have a new stable version released every year, which is expected to be released in March every year. Every year, the new version will have a new name (for example, "Mimic") and a main version number (for example, 13 for Mimic, since "M" is the 13th letter of the alphabet).

The format of the version number is xyz, x indicates the release cycle (for example, 13 for Mimic, 17 for Quincy), y indicates the type of release version, that is
x.0.z: y equals 0, indicating the development version
x.1.z : y equals 1, indicating a release candidate (for test clusters)
x.2.z : y equals 2, indicating a stable/bugfix release (for users)

3.8 Ceph cluster deployment


At present, Ceph officially provides a variety of methods for deploying Ceph clusters. The commonly used methods are ceph-deploy, cephadm and binary: for production deployment.

● cephadm: Use cephadm to deploy ceph clusters from Octopus and newer versions, use containers and systemd to install and manage Ceph clusters. Not recommended for production environments at this time.

●Binary: manual deployment, deploy Ceph cluster step by step, support more customization and understand deployment details, and installation is more difficult.

4. Deploy Ceph cluster based on ceph-deploy

Ceph production environment recommendation:
1. All storage clusters use 10G networks.
2. The cluster network (cluster-network, used for cluster internal communication) is separated from the public network (public-network, used for external access to the Ceph cluster). 3. mon
and mds Deploy separately from osd on different hosts (one host node can run multiple components in the test environment)
4. OSD can also use SATA
5. Cluster planning according to capacity
6. Xeon E5 2620 V3 or above CPU, 64GB or above High memory
7. Distributed deployment of cluster hosts to avoid power supply or network failure of the cabinet

Ceph Environment Planning
insert image description here

4.1 Environment preparation

Optional Step: Create Ceph Admin Users

useradd cephadm
passwd cephadm

visudo
cephadm ALL=(root) NOPASSWD:ALL

1. Close selinux and firewall

systemctl disable --now firewalld
setenforce 0
sed -i 's/enforcing/disabled/' /etc/selinux/config

2. Set the hostname according to the plan

hostnamectl set-hostname admin
hostnamectl set-hostname node01
hostnamectl set-hostname node02
hostnamectl set-hostname node03
hostnamectl set-hostname client

3. Configure hosts resolution

cat > /etc/hosts << EOF
192.168.154.10 admin
192.168.154.11 node01
192.168.154.12 node02
192.168.154.13 node03
192.168.154.14 client
EOF

4. Install common software and dependent packages

yum -y install epel-release
yum -y install yum-plugin-priorities yum-utils ntpdate python-setuptools python-pip gcc gcc-c++ autoconf libjpeg libjpeg-devel libpng libpng-devel freetype freetype-devel libxml2 libxml2-devel zlib zlib-devel glibc glibc-devel glib2 glib2-devel bzip2 bzip2-devel zip unzip ncurses ncurses-devel curl curl-devel e2fsprogs e2fsprogs-devel krb5-devel libidn libidn-devel openssl openssh openssl-devel nss_ldap openldap openldap-devel openldap-clients openldap-servers libxslt-devel libevent-devel ntp libtool-ltdl bison libtool vim-enhanced python wget lsof iptraf strace lrzsz kernel-devel kernel-headers pam-devel tcl tk cmake ncurses-devel bison setuptool popt-devel net-snmp screen perl-devel pcre-devel net-snmp screen tcpdump rsync sysstat man iptables sudo libconfig git bind-utils tmux elinks numactl iftop bwm-ng net-tools expect snappy leveldb gdisk python-argparse gperftools-libs conntrack ipset jq libseccomp socat chrony sshpass

5. Configure ssh on the admin management node to log in to all nodes without password

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
sshpass -p 'abc1234' ssh-copy-id -o StrictHostKeyChecking=no root@admin
sshpass -p 'abc1234' ssh-copy-id -o StrictHostKeyChecking=no root@node01
sshpass -p 'abc1234' ssh-copy-id -o StrictHostKeyChecking=no root@node02
sshpass -p 'abc1234' ssh-copy-id -o StrictHostKeyChecking=no root@node03

insert image description here

insert image description here

6. Configure time synchronization

systemctl enable --now chronyd
timedatectl set-ntp true					#开启 NTP
timedatectl set-timezone Asia/Shanghai		#设置时区
chronyc -a makestep							#强制同步下系统时钟
timedatectl status							#查看时间同步状态
chronyc sources -v							#查看 ntp 源服务器信息
timedatectl set-local-rtc 0					#将当前的UTC时间写入硬件时钟

#重启依赖于系统时间的服务
systemctl restart rsyslog 
systemctl restart crond

#关闭无关服务
systemctl disable --now postfix

insert image description here
insert image description here

7. Configure Ceph yum source

wget https://download.ceph.com/rpm-nautilus/el7/noarch/ceph-release-1-1.el7.noarch.rpm --no-check-certificate

rpm -ivh ceph-release-1-1.el7.noarch.rpm --force

insert image description here
8. Restart all hosts after performing all the above operations

reboot

4.2 Deploy Ceph cluster

1. Create a Ceph working directory for all nodes, and follow-up work is carried out in this directory

mkdir -p /etc/ceph

2. Install the ceph-deploy deployment tool

cd /etc/ceph
yum install -y ceph-deploy

ceph-deploy --version

insert image description here

3. Install the Ceph software package on the management node for other nodes

#ceph-deploy 2.0.1 默认部署的是 mimic 版的 Ceph,若想安装其他版本的 Ceph,可以用 --release 手动指定版本
cd /etc/ceph
ceph-deploy install --release nautilus node0{
    
    1..3} admin

#ceph-deploy install 本质就是在执行下面的命令:
yum clean all
yum -y install epel-release
yum -y install yum-plugin-priorities
yum -y install ceph-release ceph ceph-radosgw

#也可采用手动安装 Ceph 包方式,在其它节点上执行下面的命令将 Ceph 的安装包都部署上:
yum install -y ceph-mon ceph-radosgw ceph-mds ceph-mgr ceph-osd ceph-common ceph

Node1, node2, node3 respectively add a network card and three hard drives and modify the other network card
insert image description here
insert image description here
insert image description here
insert image description here

4. Generate initial configuration

#在管理节点运行下述命令,告诉 ceph-deploy 哪些是 mon 监控节点
cd /etc/ceph
ceph-deploy new --public-network 192.168.154.0/24 --cluster-network 192.168.100.0/24 node01 node02 node03

#命令执行成功后会在 /etc/ceph 下生成配置文件
ls /etc/ceph
ceph.conf					#ceph的配置文件
ceph-deploy-ceph.log		#monitor的日志
ceph.mon.keyring	

insert image description here
5. Initialize the mon node on the management node

cd /etc/ceph
ceph-deploy mon create node01 node02 node03			#创建 mon 节点,由于 monitor 使用 Paxos 算法,其高可用集群节点数量要求为大于等于 3 的奇数台

ceph-deploy --overwrite-conf mon create-initial		#配置初始化 mon 节点,并向所有节点同步配置
													# --overwrite-conf 参数用于表示强制覆盖配置文件

ceph-deploy gatherkeys node01						#可选操作,向 node01 节点收集所有密钥
#命令执行成功后会在 /etc/ceph 下生成配置文件
ls /etc/ceph
ceph.bootstrap-mds.keyring			#引导启动 mds 的密钥文件
ceph.bootstrap-mgr.keyring			#引导启动 mgr 的密钥文件
ceph.bootstrap-osd.keyring			#引导启动 osd 的密钥文件
ceph.bootstrap-rgw.keyring			#引导启动 rgw 的密钥文件
ceph.client.admin.keyring			#ceph客户端和管理端通信的认证密钥,拥有ceph集群的所有权限
ceph.conf
ceph-deploy-ceph.log
ceph.mon.keyring

insert image description here

#在 mon 节点(即node1、node2、node3)上查看自动开启的 mon 进程
ps aux | grep ceph-mon
root        1823  0.0  0.2 189264  9216 ?        Ss   19:46   0:00 /usr/bin/python2.7 /usr/bin/ceph-crash
ceph        3228  0.0  0.8 501244 33420 ?        Ssl  21:08   0:00 /usr/bin/ceph-mon -f --cluster ceph --id node03 --setuser ceph --setgroupceph
root        3578  0.0  0.0 112824   988 pts/1    R+   21:24   0:00 grep --color=auto ceph

insert image description here

#在管理节点查看 Ceph 集群状态
cd /etc/ceph
ceph -s
  cluster:
    id:     7e9848bb-909c-43fa-b36c-5805ffbbeb39
    health: HEALTH_WARN
            mons are allowing insecure global_id reclaim
 
  services:
    mon: 3 daemons, quorum node01,node02,node03
    mgr: no daemons active
    osd: 0 osds: 0 up, 0 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

insert image description here

#查看 mon 集群选举的情况
ceph quorum_status --format json-pretty | grep leader
"quorum_leader_name": "node01",

#扩容 mon 节点
ceph-deploy mon add <节点名称>

insert image description here

6. Deploy nodes capable of managing Ceph clusters (optional)

#可实现在各个节点执行 ceph 命令管理集群
cd /etc/ceph
ceph-deploy --overwrite-conf config push node01 node02 node03		#向所有 mon 节点同步配置,确保所有 mon 节点上的 ceph.conf 内容必须一致

ceph-deploy admin node01 node02 node03			#本子就是把 ceph.client.admin.keyring 集群认证文件拷贝到各个节点

#在 mon 节点上查看
ls /etc/ceph
ceph.client.admin.keyring  ceph.conf  rbdmap  tmpr8tzyc

cd /etc/ceph
ceph -s

insert image description here

insert image description here
insert image description here
7. Deploy osd storage nodes

#主机添加完硬盘后不要分区,直接使用
lsblk 
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0   60G  0 disk 
├─sda1   8:1    0  500M  0 part /boot
├─sda2   8:2    0    4G  0 part [SWAP]
└─sda3   8:3    0 55.5G  0 part /
sdb      8:16   0   20G  0 disk 
sdc      8:32   0   20G  0 disk 
sdd      8:48   0   20G  0 disk 
#如果是利旧的硬盘,则需要先擦净(删除分区表)磁盘(可选,无数据的新硬盘可不做)
cd /etc/ceph
ceph-deploy disk zap node01 /dev/sdb
ceph-deploy disk zap node02 /dev/sdb
ceph-deploy disk zap node03 /dev/sdb
#在admin添加 osd 节点
ceph-deploy --overwrite-conf osd create node01 --data /dev/sdb
ceph-deploy --overwrite-conf osd create node02 --data /dev/sdb
ceph-deploy --overwrite-conf osd create node03 --data /dev/sdb
#查看 ceph 集群状态
ceph -s
  cluster:
    id:     7e9848bb-909c-43fa-b36c-5805ffbbeb39
    health: HEALTH_WARN
            no avtive mgr
 
  services:
    mon: 3 daemons, quorum node01,node02,node03 (age 119m)
    mgr: no daemons active
    osd: 3 osds: 3 up (since 35s), 3 in (since 35s)
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 57 GiB / 60 GiB avail
    pgs: 

insert image description here

#查看 osd 状态
ceph osd tree

insert image description here

#扩容 osd 节点
cd /etc/ceph
ceph-deploy --overwrite-conf osd create node01 --data /dev/sdc
ceph-deploy --overwrite-conf osd create node02 --data /dev/sdc
ceph-deploy --overwrite-conf osd create node03 --data /dev/sdc
ceph-deploy --overwrite-conf osd create node01 --data /dev/sdd
ceph-deploy --overwrite-conf osd create node02 --data /dev/sdd
ceph-deploy --overwrite-conf osd create node03 --data /dev/sdd

Adding an OSD will involve the migration of PG. Since the cluster has no data at this time, the health status will soon become OK. If you add a node in the production environment, it will involve a large amount of data migration.

8. Deploy the mgr node

#ceph-mgr守护进程以Active/Standby模式运行,可确保在Active节点或其ceph-mgr守护进程故障时,其中的一个Standby实例可以在不中断服务的情况下接管其任务。根据官方的架构原则,mgr至少要有两个节点来进行工作。
cd /etc/ceph
ceph-deploy mgr create node01 node02

ceph -s
  cluster:
    id:     7e9848bb-909c-43fa-b36c-5805ffbbeb39
    health: HEALTH_WARN
            mons are allowing insecure global_id reclaim
 
  services:
    mon: 3 daemons, quorum node01,node02,node03
    mgr: node01(active, since 10s), standbys: node02
    osd: 0 osds: 0 up, 0 in
 

insert image description here

#解决 HEALTH_WARN 问题:mons are allowing insecure global_id reclaim问题:
禁用不安全模式:

ceph config set mon auth_allow_insecure_global_id_reclaim false

#扩容 mgr 节点
ceph-deploy mgr create <节点名称>

insert image description here

9. Turn on the monitoring module

#在 ceph-mgr Active节点(node1节点)执行命令开启
ceph -s | grep mgr

yum install -y ceph-mgr-dashboard

cd /etc/ceph

ceph mgr module ls | grep dashboard

insert image description here

#开启 dashboard 模块
ceph mgr module enable dashboard --force

#禁用 dashboard 的 ssl 功能
ceph config set mgr mgr/dashboard/ssl false
#配置 dashboard 监听的地址和端口
ceph config set mgr mgr/dashboard/server_addr 0.0.0.0
ceph config set mgr mgr/dashboard/server_port 8000
#重启 dashboard
ceph mgr module disable dashboard
ceph mgr module enable dashboard --force
#确认访问 dashboard 的 url
ceph mgr services

#设置 dashboard 账户以及密码
cd /etc/ceph/
echo "12345678" > dashboard_passwd.txt
ceph dashboard set-login-credentials admin -i dashboard_passwd.txt 

insert image description here
insert image description here

Browser access: http://192.168.154.11:8000, account password is admin/12345678

insert image description here

insert image description here

10. Resource Pool Pool Management
We have completed the deployment of the Ceph cluster above, but how do we store data in Ceph? First we need to define a Pool resource pool in Ceph. Pool is an abstract concept for storing Object objects in Ceph. We can understand it as a logical partition on Ceph storage. Pool is composed of multiple PGs; PGs are mapped to different OSDs through the CRUSH algorithm; at the same time, Pool can set the replica size, and the default number of replicas is 3.

The Ceph client requests the status of the cluster from the monitor, and writes data to the Pool. According to the number of PGs, the data is mapped to different OSD nodes through the CRUSH algorithm to realize data storage. Here we can understand Pool as a logical unit for storing Object data; of course, the current cluster does not have a resource pool, so it needs to be defined.

Create a Pool resource pool, its name is mypool, and the number of PGs is set to 64. When setting PGs, you also need to set PGP (usually the values ​​of PGs and PGP are the same): PG (Placement Group), pg is a virtual concept
, Used to store objects, PGP (Placement Group for Placement purpose), is equivalent to an osd arrangement and combination stored in pg

cd /etc/ceph
ceph osd pool create mypool 64 64

#查看集群 Pool 信息
ceph osd pool ls    或    rados lspools
ceph osd lspools

insert image description here
insert image description here

#查看资源池副本的数量
ceph osd pool get mypool size

#查看 PG 和 PGP 数量
ceph osd pool get mypool pg_num
ceph osd pool get mypool pgp_num

insert image description here

#修改 pg_num 和 pgp_num 的数量为 128
ceph osd pool set mypool pg_num 128
ceph osd pool set mypool pgp_num 128

ceph osd pool get mypool pg_num
ceph osd pool get mypool pgp_num

insert image description here

#修改 Pool 副本数量为 2
ceph osd pool set mypool size 2

ceph osd pool get mypool size

insert image description here

#修改默认副本数为 2
vim ceph.conf
......
osd_pool_default_size = 2

ceph-deploy --overwrite-conf config push node01 node02 node03

insert image description here
insert image description here
insert image description here

11. Delete the Pool resource pool
1) There is a risk of data loss in the command to delete the storage pool. Ceph prohibits such operations by default. The administrator needs to enable the operation to support the deletion of the storage pool in the ceph.conf configuration file first.

vim ceph.conf
......
[mon]
mon allow pool delete = true

insert image description here

2) Push the ceph.conf configuration file to all mon nodes

ceph-deploy --overwrite-conf config push node01 node02 node03

insert image description here

3) All mon nodes (that is, all node nodes) restart the ceph-mon service

systemctl restart ceph-mon.target

4) Execute the delete Pool command

ceph osd pool rm pool01 pool01 --yes-i-really-really-mean-it

insert image description here

#查看 osd 状态
ceph osd status

#查看 osd 容量
ceph osd df

insert image description here

Five, Ceph application

5.1 Create CephFS file system MDS interface

Server operation
1) Create mds service on the management node

cd /etc/ceph
ceph-deploy mds create node01 node02 node03

2) View the mds service of each node node

systemctl status ceph-mds@node01
systemctl status ceph-mds@node02
systemctl status ceph-mds@node03

insert image description here
insert image description here
insert image description here

3) Create a storage pool and enable the ceph file system
The ceph file system requires at least two rados pools, one for storing data and one for storing metadata. At this time, the data pool is similar to the shared directory of the file system.

ceph osd pool create cephfs_data 128					#创建数据Pool

ceph osd pool create cephfs_metadata 128				#创建元数据Pool

Create cephfs , command format:ceph fs new <FS_NAME> <CEPHFS_METADATA_NAME><CEPHFS_DATA_NAME>

ceph fs new mycephfs cephfs_metadata cephfs_data		#启用ceph,元数据Pool在前,数据Pool在后

ceph fs ls					#查看cephfs

4) Check the mds status, one is up, and the other two are on standby. The current work is the mds service on node01

ceph -s
mds: mycephfs:1 {
    
    0=node01=up:active} 2 up:standby

insert image description here

ceph mds stat
mycephfs:1 {
    
    0=node01=up:active} 2 up:standby

5) Create user
Syntax format:ceph fs authorize <fs_name> client.<client_id> <path-in-cephfs> rw

#账户为 client.zhangsan,用户 name 为 zhangsan,zhangsan 对ceph文件系统的 / 根目录(注意不是操作系统的根目录)有读写权限

ceph fs authorize mycephfs client.zhangsan / rw | tee /etc/ceph/zhangsan.keyring
#账户为 client.lisi,用户 name 为 lisi,lisi 对文件系统的 / 根目录只有读权限,对文件系统的根目录的子目录 /test 有读写权限

ceph fs authorize mycephfs client.lisi / r /test rw | tee /etc/ceph/lisi.keyring

insert image description here

Client operation
1) The client must be in the public network

2) Create a working directory on the client

mkdir /etc/ceph

3) Copy the ceph configuration file ceph.conf and the account keyring file to the client on the ceph management node

zhangsan.keyring、lisi.keyring
scp ceph.conf zhangsan.keyring lisi.keyring root@client:/etc/ceph

insert image description here

4) Install the ceph package on the client

cd /opt
wget https://download.ceph.com/rpm-nautilus/el7/noarch/ceph-release-1-1.el7.noarch.rpm --no-check-certificate
rpm -ivh ceph-release-1-1.el7.noarch.rpm
yum install -y ceph 

5) Create a key file on the client side

cd /etc/ceph
ceph-authtool -n client.zhangsan -p zhangsan.keyring > zhangsan.key			#把 zhangsan 用户的秘钥导出到 zhangsan.keyl
ceph-authtool -n client.lisi -p lisi.keyring > lisi.key						#把 lisi 用户的秘钥导出到 lisi.key

insert image description here

6) Client mount
● Method 1: Kernel-based
Syntax format:

mount -t ceph node01:6789,node02:6789,node03:6789:/  <本地挂载点目录>  -o name=<用户名>,secret=<秘钥>
mount -t ceph node01:6789,node02:6789,node03:6789:/  <本地挂载点目录>  -o name=<用户名>,secretfile=<秘钥文件>

Example one:

mkdir -p /data/zhangsan
mount -t ceph node01:6789,node02:6789,node03:6789:/ /data/zhangsan -o name=zhangsan,secretfile=/etc/ceph/zhangsan.key

insert image description here

Example two:

mkdir -p /data/lisi
mount -t ceph node01:6789,node02:6789,node03:6789:/ /data/lisi -o name=lisi,secretfile=/etc/ceph/lisi.key

Verify user permissions

cd /data/lisi
echo 123 > 2.txt
-bash:2.txt:权限不够

echo 123 > test/2.txt
cat test/2.txt
123

insert image description here
insert image description here

Example three:

#停掉 node02 上的 mds 服务
ssh root@node02 "systemctl stop ceph-mds@node02"

ceph -s

#测试客户端的挂载点仍然是可以用的,如果停掉所有的 mds,客户端就不能用了

●Method 2: Based on the fuse tool
1) Copy the ceph configuration file ceph.conf and the account keyring file to the client on the ceph management node

zhangsan.keyring、lisi.keyring
scp ceph.client.admin.keyring root@client:/etc/ceph

2) Install ceph-fuse on the client

yum install -y ceph-fuse

3) Before the client is mounted
, the previous mount point needs to be unmounted

cd /data/aa
ceph-fuse -m node01:6789,node02:6789,node03:6789 /data/aa [-o nonempty]			#挂载时,如果挂载点不为空会挂载失败,指定 -o nonempty 可以忽略

insert image description here

5.2 Create Ceph block storage system RBD interface

1. Create a storage pool named rbd-demo dedicated to RBD

ceph osd pool create rbd-demo 64 64

2. Convert the storage pool to RBD mode

ceph osd pool application enable rbd-demo rbd

3. Initialize the storage pool

rbd pool init -p rbd-demo			# -p 等同于 --pool

4. Create a mirror image

rbd create -p rbd-demo --image rbd-demo1.img --size 10G

可简写为:
rbd create rbd-demo/rbd-demo2.img --size 10G

5. Mirror image management

//查看存储池下存在哪些镜像
rbd ls -l -p rbd-demo

//查看镜像的详细信息
rbd info -p rbd-demo --image rbd-demo1.img
rbd image 'rbd-demo.img':
	size 10 GiB in 2560 objects							#镜像的大小与被分割成的条带数
	order 22 (4 MiB objects)							#条带的编号,有效范围是1225,对应4K到32M,而22代表222次方,这样刚好是4M
	snapshot_count: 0
	id: 5fc98fe1f304									#镜像的ID标识
	block_name_prefix: rbd_data.5fc98fe1f304			#名称前缀
	format: 2											#使用的镜像格式,默认为2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten			#当前镜像的功能特性
	op_features: 																	#可选的功能特性
	flags: 

insert image description here
insert image description here

//修改镜像大小
rbd resize -p rbd-demo --image rbd-demo1.img --size 20G

rbd info -p rbd-demo --image rbd-demo1.img

insert image description here

#使用 resize 调整镜像大小,一般建议只增不减,如果是减少的话需要加选项 --allow-shrink
rbd resize -p rbd-demo --image rbd-demo1.img --size 5G --allow-shrink

insert image description here

//删除镜像
#直接删除镜像
rbd rm -p rbd-demo --image rbd-demo2.img
rbd remove rbd-demo/rbd-demo2.img

#推荐使用 trash 命令,这个命令删除是将镜像移动至回收站,如果想找回还可以恢复
rbd trash move rbd-demo/rbd-demo1.img

rbd ls -l -p rbd-demo

rbd trash list -p rbd-demo
5fc98fe1f304 rbd-demo1.img
#还原镜像
rbd trash restore rbd-demo/5fc98fe1f304

rbd ls -l -p rbd-demo

insert image description here

6. Linux client use
There are two ways for the client to use RBD:
●The image is mapped to the local block device of the system through the kernel module KRBD, usually the setting file is: /dev/rbd*
●The other is through the librbd interface, Usually KVM virtual machines use this interface.

This example mainly uses the Linux client to mount the RBD image as a local disk. Before starting, you need to install the ceph-common package on the required client nodes, because the client needs to call the rbd command to map the RBD image to the local as a common hard disk. And also need to copy the ceph.conf configuration file and authorization keyring file to the corresponding node.

Create and authorize a user to access the specified RBD storage pool on the management node

#示例,指定用户标识为client.osd-mount,对另对OSD有所有的权限,对Mon有只读的权限
ceph auth get-or-create client.osd-mount osd "allow * pool=rbd-demo" mon "allow r" > /etc/ceph/ceph.client.osd-mount.keyring

Modify the RBD image feature, CentOS7 only supports layering and striping features by default, other features need to be turned off

rbd feature disable rbd-demo/rbd-demo1.img object-map,fast-diff,deep-flatten

Send the user's keyring file and ceph.conf file to the client's /etc/ceph directory

cd /etc/ceph
scp ceph.client.osd-mount.keyring ceph.conf root@client:/etc/ceph

insert image description here

Linux client operation

#安装 ceph-common 软件包
yum install -y ceph-common
#执行客户端映射
cd /etc/ceph
rbd map rbd-demo/rbd-demo1.img --keyring /etc/ceph/ceph.client.osd-mount.keyring --user osd-mount
#查看映射
rbd showmapped
rbd device list

#断开映射
rbd unmap rbd-demo/rbd-demo1.img
#格式化并挂载
mkfs.xfs /dev/rbd0

mkdir -p /data/bb
mount /dev/rbd0 /data/bb

insert image description here

#在线扩容
在管理节点调整镜像的大小
rbd resize rbd-demo/rbd-demo1.img --size 30G

insert image description here

在客户端刷新设备文件
xfs_growfs /dev/rbd0		#刷新xfs文件系统容量
resize2fs /dev/rbd0			#刷新ext4类型文件系统容量

insert image description here

7. Snapshot management
Taking a snapshot of the rbd image can preserve the status history of the image, and it can also use the layering technology of the snapshot to clone the snapshot into a new image.

//在客户端写入文件
echo 1111 > /data/bb/11.txt
echo 2222 > /data/bb/22.txt
echo 3333 > /data/bb/33.txt
//在管理节点对镜像创建快照
rbd snap create --pool rbd-demo --image rbd-demo1.img --snap demo1_snap1

可简写为:
rbd snap create rbd-demo/rbd-demo1.img@demo1_snap1
//列出指定镜像所有快照
rbd snap list rbd-demo/rbd-demo1.img

insert image description here

#用json格式输出:
rbd snap list rbd-demo/rbd-demo1.img --format json --pretty-format
//回滚镜像到指定
在回滚快照之前,需要将镜像取消镜像的映射,然后再回滚。
#在客户端操作
rm -rf /data/bb/*
umount /data/bb
rbd unmap rbd-demo/rbd-demo1.img

insert image description here

#在管理节点操作
rbd snap rollback rbd-demo/rbd-demo1.img@demo1_snap1

insert image description here

#在客户端重新映射并挂载
rbd map rbd-demo/rbd-demo1.img --keyring /etc/ceph/ceph.client.osd-mount.keyring --user osd-mount
mount /dev/rbd0 /data/bb
ls /data/bb				#发现数据还原回来了

insert image description here

//限制镜像可创建快照数
rbd snap limit set rbd-demo/rbd-demo1.img --limit 3
#解除限制:
rbd snap limit clear rbd-demo/rbd-demo1.img

//删除快照
#删除指定快照:
rbd snap rm rbd-demo/rbd-demo1.img@demo1_snap1

#删除所有快照:
rbd snap purge rbd-demo/rbd-demo1.img

insert image description here

Snapshot layering
Snapshot layering supports the use of snapshot clones to generate new images, which are almost identical to directly created images and support all operations of mirroring. The only difference is that the clone image references a read-only upstream snapshot, and this snapshot must be protected.

Snap clone
1) Set the upstream snapshot to protected mode:

rbd snap create rbd-demo/rbd-demo1.img@demo1_snap666

rbd snap protect rbd-demo/rbd-demo1.img@demo1_snap666

2) Clone the snapshot as a new image

rbd clone rbd-demo/rbd-demo1.img@demo1_snap666 --dest rbd-demo/rbd-demo666.img

rbd ls -p rbd-demo

insert image description here

3)命令查看克隆完成后快照的子镜像
rbd children rbd-demo/rbd-demo1.img@demo1_snap666

insert image description here

Snapshot flattening
Normally, the image obtained by snapshot cloning will retain a reference to the parent snapshot. At this time, the parent snapshot cannot be deleted, otherwise it will be affected.

rbd snap rm rbd-demo/rbd-demo1.img@demo1_snap666
#报错 snapshot 'demo1_snap666' is protected from removal.

If you want to delete a snapshot but want to keep its sub-mirror, you must first flatten its sub-mirror, and the time to flatten depends on the size of the mirror
1) Flatten the sub-mirror

rbd flatten rbd-demo/rbd-demo666.img

2) Cancel snapshot protection

rbd snap unprotect rbd-demo/rbd-demo1.img@demo1_snap666

3) Delete the snapshot

rbd snap rm rbd-demo/rbd-demo1.img@demo1_snap666

rbd ls -l -p rbd-demo			#在删除掉快照后,查看子镜像依然存在

insert image description here

8. Mirror image export and import

//导出镜像
rbd export rbd-demo/rbd-demo1.img  /opt/rbd-demo1.img
//导入镜像
#卸载客户端挂载,并取消映射
umount /data/bb
rbd unmap rbd-demo/rbd-demo1.img
#清除镜像下的所有快照,并删除镜像
rbd snap purge rbd-demo/rbd-demo1.img
rbd rm rbd-demo/rbd-demo1.img

rbd ls -l -p rbd-demo

insert image description here

#导入镜像
rbd import /opt/rbd-demo1.img  rbd-demo/rbd-demo1.img

rbd ls -l -p rbd-demo

insert image description here

5.3 Create Ceph object storage system RGW interface

1. The concept of object storage
Object storage (object storage) is a storage method for unstructured data. Each piece of data in object storage is stored as a separate object and has a unique address to identify the data object. It is usually used in cloud computing environments.
Unlike other data storage methods, object-based storage does not use directory trees.

Although there are differences in design and implementation, most object storage systems present similar core resource types to the outside world. From the perspective of the client, it is divided into the following logical units:
Amazon S3:
Provides
1. User (User)
2. Storage bucket (Bucket)
3. Object (Object)

The relationship between the three is:
1. User stores Objects in the Bucket on the system
2. The bucket belongs to a certain user and can accommodate objects. One bucket is used to store multiple objects
3. The same user can have multiple buckets , different users are allowed to use buckets with the same name, so the user name can be used as the namespace of the bucket

OpenStack Swift :
Provides user, container and object corresponding to users, storage buckets and objects respectively, but it also provides a parent component account for user, which is used to represent a project or user, so an account can contain one to Multiple users, they can share the same set of containers and provide namespaces for containers

RadosGW :
Provides user, subuser, bucket and object, where user corresponds to S3 user, and subuser corresponds to Swift user, but neither user nor subuser supports namespaces for buckets, so the storage buckets of different users The same name is not allowed; however, since the jewel version, RadosGW has introduced a tenant (tenant) to provide namespaces for users and buckets, but he is an optional component

It can be seen from the above that the core resource types of most object storage are similar, such as Amazon S3, OpenStack Swift and RadosGw. Among them, S3 and Swift are not compatible with each other. In order to be compatible with S3 and Swift, Ceph provides RGW (RadosGateway) data abstraction layer and management layer on the basis of RadosGW cluster, which can be natively compatible with S3 and Swift API.
S3 and Swift can complete data exchange based on http or https, and the built-in Civetweb of RadosGW provides services. It can also support proxy servers including nginx, haproxy, etc. to receive user requests in the form of proxies, and then forward them to the RadosGW process.
The function of RGW depends on the implementation of the object gateway daemon, which is responsible for providing the REST API interface to the client. For redundant load balancing requirements, there is usually more than one RadosGW daemon on a Ceph cluster.

2. Create an RGW interface.
If you need to use an interface like S3 or Swift, you need to deploy/create a RadosGW interface. RadosGW is usually used as Object Storage, similar to Alibaba Cloud OSS.

Create an RGW daemon process on the management node (this process generally requires high availability in a production environment, which will be introduced later)

cd /etc/ceph
ceph-deploy rgw create node01

ceph -s
  services:
    mon: 3 daemons, quorum node01,node02,node03 (age 3h)
    mgr: node01(active, since 12h), standbys: node02
    mds: mycephfs:1 {
    
    0=node02=up:active} 2 up:standby
    osd: 6 osds: 6 up (since 12h), 6 in (since 25h)
    rgw: 1 daemon active (node01)

insert image description here

After successful creation, a series of storage pools for RGW will be automatically created by default

ceph osd pool ls
rgw.root 
default.rgw.control			#控制器信息
default.rgw.meta			#记录元数据
default.rgw.log				#日志信息
default.rgw.buckets.index	#为 rgw 的 bucket 信息,写入数据后生成
default.rgw.buckets.data	#是实际存储的数据信息,写入数据后生成

insert image description here

By default RGW listens on port 7480

ssh root@node01 netstat -lntp | grep 7480

curl node01:7480
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Owner>
    <ID>anonymous</ID>
    <DisplayName/>
  </Owner>
  <Buckets/>
</ListAllMyBucketsResult>

insert image description here

Open http+https and change the monitoring port. The
internal RadosGW daemon process is implemented by Civetweb, and the basic management of RadosGW can be completed through the configuration of Civetweb.

To enable SSL on Civetweb, a certificate is first required, and the certificate is generated on the rgw node (ie node01 node)

1) Generate CA certificate private key:

openssl genrsa -out civetweb.key 2048

2) Generate CA certificate public key:

openssl req -new -x509 -key civetweb.key -out civetweb.crt -days 3650 -subj "/CN=192.168.154.11"

3. Merge the generated certificate into pem

cat civetweb.key civetweb.crt > /etc/ceph/civetweb.pem

insert image description here

Change the listening port
Civetweb listens on port 7480 by default and provides the http protocol. If you need to modify the configuration, you need to edit the ceph.conf configuration file on the management node

cd /etc/ceph

vim ceph.conf
......
[client.rgw.node01]
rgw_host = node01
rgw_frontends = "civetweb port=80+443s ssl_certificate=/etc/ceph/civetweb.pem num_threads=500 request_timeout_ms=60000"

insert image description here

  • rgw_host: the corresponding RadosGW name or IP address
  • rgw_frontends: Here configure the listening port, whether to use https, and some common configurations:
  • port: If it is an https port, you need to add an s after the port.
  • ssl_certificate: Specifies the path to the certificate.
  • num_threads: the maximum number of concurrent connections, the default is 50, adjust according to demand, usually this value should be larger in the production cluster environment
  • request_timeout_ms: send and receive timeout, in ms, the default is 30000
  • access_log_file: access log path, default is empty
  • error_log_file: error log path, default is empty

After modifying the ceph.conf configuration file, you need to restart the corresponding RadosGW service, and then push the configuration file

ceph-deploy --overwrite-conf config push node0{
    
    1..3}

ssh root@node01 systemctl restart ceph-radosgw.target

insert image description here

View port on rgw node

netstat -lntp | grep -w 80
netstat -lntp | grep 443

insert image description here

Authentication in client access

curl http://192.168.154.11:80
curl -k https://192.168.154.11:443

insert image description here

Create a RadosGW account
Use the radosgw-admin command on the management node to create a RadosGW account

radosgw-admin user create --uid="rgwuser" --display-name="rgw test user"
......
    "keys": [
        {
    
    
            "user": "rgwuser",
            "access_key": "ER0SCVRJWNRIKFGQD31H",
            "secret_key": "YKYjk7L4FfAu8GHeQarIlXodjtj1BXVaxpKv2Nna"
        }
    ],

insert image description here

After successful creation, the user's basic information will be output, and the two most important information are access_key and secret_key. After the user is successfully created, if you forget the user information, you can use the following command to view it

radosgw-admin user info --uid="rgwuser"

insert image description here

S3 interface access test
1) Install python3 and python3-pip on the client

yum install -y python3 python3-pip

python3 -V
Python 3.6.8

pip3 -V
pip 9.0.3 from /usr/lib/python3.6/site-packages (python 3.6)

insert image description here

2) Install the boto module for testing connection to S3

pip3 install boto

insert image description here

3) Test access to S3 interface

echo 123123 > /opt/123.txt
vim test.py
#coding:utf-8
#boto s3手册:http://boto.readthedocs.org/en/latest/ref/s3.html
#boto s3快速入门:http://boto.readthedocs.org/en/latest/s3_tut.html
#如果脚本长时间阻塞,请检查集群状态,开启的端口等
import ssl
import boto.s3.connection
from boto.s3.key import Key
#异常抛出
try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context
#test用户的keys信息
access_key = "ER0SCVRJWNRIKFGQD31H"
secret_key = "YKYjk7L4FfAu8GHeQarIlXodjtj1BXVaxpKv2Nna"
#rgw的ip与端口
host = "192.168.154.11"
#如果使用443端口,下述链接应设置is_secure=True
port = 443
#如果使用80端口,下述链接应设置is_secure=False
#port = 80
conn = boto.connect_s3(
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    host=host,
    port=port,
    is_secure=True,
    validate_certs=False,
    calling_format=boto.s3.connection.OrdinaryCallingFormat()
)

#一:创建存储桶
conn.create_bucket(bucket_name='bucket01')
conn.create_bucket(bucket_name='bucket02')

#二:判断是否存在,不存在返回None
exists = conn.lookup('bucket01')
print(exists)
exists = conn.lookup('bucket02')
print(exists)

#三:获得一个存储桶
bucket1 = conn.get_bucket('bucket01')
bucket2 = conn.get_bucket('bucket02')

#四:查看一个bucket下的内容
print(list(bucket1.list()))
print(list(bucket2.list()))

#五:向s3上存储数据,数据来源可以是file、stream、or string
#5.1、上传文件
bucket1 = conn.get_bucket('bucket01')
# name的值是数据的key
key = Key(bucket=bucket1, name='myfile')
key.set_contents_from_filename(r'D:\PycharmProjects\ceph\123.txt')
# 读取 s3 中文件的内容,返回 string 即文件 123.txt 的内容
print(key.get_contents_as_string())

#5.2、上传字符串
#如果之前已经获取过对象,此处不需要重复获取
bucket2 = conn.get_bucket('bucket02')
key = Key(bucket=bucket2, name='mystr')
key.set_contents_from_string('hello world')
print(key.get_contents_as_string())

#六:删除一个存储桶,在删除存储桶本身时必须删除该存储桶内的所有key
bucket1 = conn.get_bucket('bucket01')
for key in bucket1:
    key.delete()
bucket1.delete()
#bucket1.get_all_keys()[0].delete() #删除某一个 key

#迭代遍历删除 buckets and keys
for bucket in conn:
    for key in bucket:
        print(key.name,key.get_contents_as_string())
#—个判断文件夹中是否有文件的方法
bucket1 = conn.get_bucket('bucket01')
res = bucket1.get_all_keys()
if len(res) > 0:
    print('有文件')
else:
    print('为空')

insert image description here

insert image description here
insert image description here

4) Follow the steps below to execute the python script test

python3 test.py

create bucket
insert image description here

insert image description here
insert image description here

insert image description here
upload file
insert image description here
insert image description here
upload string
insert image description here
insert image description here
delete bucket01 bucket
insert image description here
insert image description here

5.4 OSD Fault Simulation and Recovery

1. Simulate OSD failure
If there are thousands of osds in the ceph cluster, it is normal for 2~3 failures every day, we can simulate down one osd

If the osd daemon is running normally, the down osd will quickly return to normal, so you need to shut down the daemon first

ssh root@node01 systemctl stop ceph-osd@0

#down 掉 osd
ceph osd down 0

ceph osd tree

insert image description here

2. Kick the broken osd out of the cluster

method one:

#将 osd.0 移出集群,集群会开始自动同步数据
ceph osd out osd.0

#将 osd.0 移除 crushmap
ceph osd crush remove osd.0

#删除守护进程对应的账户信息
ceph auth rm osd.0

ceph auth list

#删掉 osd.0
ceph osd rm osd.0

ceph osd stat
ceph -s

insert image description here
insert image description here

Method Two:

ceph osd out osd.0

#使用综合步骤,删除配置文件中针对坏掉的 osd 的配置
ceph osd purge osd.0 --yes-i-really-mean-it

3. Rejoin the cluster after repairing the original broken osd

#在 osd 节点创建 osd,无需指定名,会按序号自动生成
cd /etc/ceph

ceph osd create

#创建账户
ceph-authtool --create-keyring /etc/ceph/ceph.osd.0.keyring --gen-key -n osd.0 --cap mon 'allow profile osd' --cap mgr 'allow profile osd' --cap osd 'allow *'

#导入新的账户秘钥
ceph auth import -i /etc/ceph/ceph.osd.0.keyring

ceph auth list

#更新对应的 osd 文件夹中的密钥环文件
ceph auth get-or-create osd.0 -o /var/lib/ceph/osd/ceph-0/keyring

#加入 crushmap
ceph osd crush add osd.0 1.000 host=node01		#1.000 代表权重

#加入集群
ceph osd in osd.0

ceph osd tree

#重启 osd 守护进程
systemctl restart ceph-osd@0

ceph osd tree		#稍等片刻后 osd 状态为 up	

//如果重启失败
报错:
Job for ceph-osd@0.service failed because start of the service was attempted too often. See "systemctl  status [email protected]" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed [email protected]" followed by "systemctl start [email protected]" again.

#运行
systemctl reset-failed ceph-osd@0.service && systemctl restart ceph-osd@0.service

insert image description here

Guess you like

Origin blog.csdn.net/ll945608651/article/details/130965002