Ceph system architecture and basic concepts

Ceph System Architecture and Basic Concepts
Ceph is a unified, distributed storage system designed for excellent performance, reliability, and scalability.

"Unified" means that a Ceph storage system can provide three functions of object storage, block storage and file system storage at the same time, in order to simplify deployment and operation and maintenance under the premise of meeting different application needs.

"Distributed" means that the Ceph system is truly decentralized and has no theoretical upper limit of system scale scalability. In practice, Ceph can be deployed on thousands of servers.


Author: Younger Liu,

This work is licensed under the Creative Commons Attribution-Non-Commercial Use-Share 3.0 Unlocalized Version License Agreement in the same way.

The link for this article is: http://blog.csdn.net/younger_china/article/details/76794987


System Architecture
Ceph's system architecture consists of three levels: the lowest and core part is the RADOS object storage system; the second layer is the librados library layer; the uppermost layer corresponds to Ceph's different forms of storage interface implementation.

The bottom layer is based on RADOS (reliable, autonomous, distributed object store), which includes ceph-osd background service process and ceph-mon monitoring process.

The middle layer librados library is used to access the RADOS object storage system locally or remotely via the network.

The top layer provides three different storage interfaces for applications: block storage interface, object storage interface, and file system interface. The metadata server MDS of the file system is used to provide metadata access. The data is accessed directly through the librados library.

There are two object concepts in the ceph system: one is the object storage in the RGW, and the other is the object stored in the back-end of Ceph (hereinafter referred to as the Rados object). The two need to be distinguished: the former is user-oriented and user-oriented The object that the interface can access; the latter is the object operated by the ceph server;

Rados
RADOS is mainly composed of two types of nodes: one is a large number of OSD (Object Storage Device) responsible for completing data storage and maintenance functions, and the other is several monitors responsible for completing system state detection and maintenance.

monitor

Monitor is an independently deployed daemon process. By forming a Monitor cluster to ensure their high availability. The Monitor cluster achieves the consistency of its own data through the Paxos algorithm. It provides global configuration information such as node information for the entire storage system.
OSD


 OSD is a daemon process responsible for physical storage. Its function is to store data, handle data replication, recovery, backfilling, and rebalancing, and provide some monitoring information to Ceph Monitors by checking the heartbeat of other OSD daemons. When the Ceph storage cluster is set to have 2 copies, at least 2 OSD daemons are required for the cluster to reach the active + clean state.

Basic concepts
There are several basic concepts in the Ceph cluster: rados object, OSD, PG, PGP, pool, file, rbd, rgw object, etc. These basic concepts form the logical architecture of the entire Ceph cluster.

Rados object

The object is the basic unit of data storage, generally the default size of 4MB.
An object consists of three parts:
(1) Object ID (ID): uniquely identifies an object.
(2) Object data: It corresponds to a file in the local file system, and the object data is saved in the file.
(3) Object metadata: In the form of Key-Value (key-value pair), it can be saved in the extended attributes corresponding to the file.
OSD (Object Storage Device)


OSD is a process responsible for physical storage. Generally, it is configured to correspond to a disk one by one, and a disk starts an OSD process. (For detailed functions, please refer to the previous introduction)
Relationship description:
(1) Multiple PGs can be distributed on one OSD
(2) OSD device is a carrier
PG (placement group) that stores rados objects

PG is a layer of logic above OSD, which can be regarded as a logical concept. It can be understood from the name that PG is a placement strategy group, which is a collection of objects. All objects in the collection have the same placement strategy: copies of the objects are distributed on the same OSD list.
Relationship description:
(1) PG has master and slave. For multiple copies, the master and slave copies of a PG are distributed on different OSDs;
(2) An object can only belong to one PG, and a PG contains many objects
( 3) a PG OSD corresponding to a list of all objects stored in the PG for the corresponding OSD on the list of the object here is rados object, instead of the user object pool


Pool is an abstract storage pool, which is a layer of logic on top of PG.
It specifies the type of data redundancy and the corresponding copy distribution strategy. Two types of pools are currently implemented: replicated type and Erasure Code type.
Relationship description:
(1) A pool is composed of multiple PGs, and a PG can only belong to one POOL
(2) PGs in the same Pool have the same type, for example, if Pool is a copy type, all PGs in the Pool
PGP (Placement Group for Placemen) with multiple copies

There are not many introductions about PGP, "Learning Ceph":


PGP is Placement Group for Placement purpose, which should be kept equal to the total number of placement groups (pg_num). For a Ceph pool, if you increase the number of placement groups, that is, pg_num, you should also increase pgp_num to the same integer value as pg_num so that the cluster can start rebalancing. The undercover rebalancing mechanism can be understood in the following way.The pg_num value defines the number of placement groups, which are mapped to OSDs. When pg_num is increased for any pool, every PG of this pool splits into half, but they all remain mapped to their parent OSD. Until this time, Ceph does not start rebalancing. Now, when you increase the pgp_num value for the same pool, PGs start to migrate from the parent to some other OSD, and cluster rebalancing starts. In this way, PGP plays an important role in cluster rebalancing.
The basic meaning is:


1. PGP plays the role of resetting PG;
2. The value of PGP should be the same as PG. While the value of PG increases, the value of PGP should also be increased to keep the value of the two the same;
3. When the PG of a POOL increases, Ceph will not start rebalancing. Only after the value of PGP increases, the PG will start to migrate to other OSDs and start rebalancing
File

The file is a concept in the file system; the
ceph file system is a logical system built on the basis of the metadata storage pool and the data storage pool. The files in the file system are mapped to objects (rados objects) through libcephfs and RADOS, and then used Crush calculation to locate The location in the storage device.
RBD (Rados Block Device)

RBD is a ceph block device;
RBD image is a logical storage system built on a storage pool . The RBD image is mapped to an object (rados object) through librbd and RADOS, and then uses Crush calculation to locate the location in the storage device.
Rgw Object

Rgw Object, generally refers to a document, picture or video file, etc. Although users can upload a directory directly, but ceph does not save Rgw Object according to the directory hierarchy, all Rgw Objects are flat
Rgw Object mapped to librados and RADOS The object (rados object), and then use Crush calculation to locate the location in the storage device.
The basic concept is introduced here ~~~

Author: Younger Liu,

This work is licensed under the Creative Commons Attribution-Non-Commercial Use-Share 3.0 Unlocalized Version License Agreement in the same way.

The link for this article is: http://blog.csdn.net/younger_china/article/details/76794987


————————————————
Copyright Statement: This article is an original article by CSDN blogger "YoungerChina", following the CC 4.0 BY-SA copyright agreement, please attach the original source link and this statement for reprint .
Original link: https://blog.csdn.net/younger_china/article/details/76794987

Published 13 original articles · Likes6 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/majianting/article/details/103024816