Ceph Concepts and Components Introduction

A: Ceph basic introduction

  • Ceph is a reliable, automatic re-equalization, automatic recovery of a distributed storage system, according to the scene can be divided into Ceph divided into three blocks, namely, the object storage, block storage device and file system services.

  • Ceph advantages compared to other memory point is that it is not only stored, but also takes advantage of the computing power on the storage node, when storing each data, are obtained by calculating the position of the stored data, the data distribution equalization possible , and because Ceph good design, the use CRUSH algorithm, HASH ring and other methods, making it the traditional problem that there is no single point of failure, and with the expansion of the scale and performance will not be affected.

II: Introduction core components

  • Ceph OSD (must be installed)

Stands for Object Storage Device, the main features include copy data storage, data processing, recovery, covering, balanced data distribution, and provide relevant data to the Ceph Monitor


  • Ceph Monitor (must be installed)
    Ceph monitor, the main function is to maintain the health of the entire cluster, providing consistency of decision-making, including the Monitor map, namely cluster map, monitor the cluster itself does not store any data

  • Managers (must be installed)
    Ceph Manager daemon (ceph-mgr) is responsible for tracking the current status indicators and Ceph cluster running, including storage utilization, performance and system load current. Ceph Manager daemon also python-based plug-in to manage and disclose Ceph cluster information, including Web-based Ceph Manager Dashboard and REST API. High availability typically requires at least two managers.

  • Ceph MDS (optional )
    stands for Ceph Metadata Server, preservation is the main Ceph file system (File System) metadata (metadata), installation is not necessary when using only need to use the CephFS

Three: Introducing the foundation

  • rados
    itself is a complete distributed object storage system having a reliable, intelligent, distributed and other characteristics, the Ceph high reliability, high scalability, performance, and the automation of this layer is to provide the user data the final storage are also to be stored through this layer, RADOS can be said is the core of Ceph, mainly consists of two parts, namely OSD and Monitor

  • Librados
    it is a library that allows applications to be accessed by the system interaction with RADOS, support for multiple programming languages, such as C, C ++, Python, etc.

  • radosgw
    RADOSGW is set based on the popular RESTFUL gateway protocol, and is compatible with S3 and Swif, only when using object store will be used

  • RBD
    the RBD block to provide a distributed client device via the Linux kernel and QEMU / KVM driving can be understood as the same as the linux LVM, a disk divided from Ceph cluster, the user can make file system and hanging directly above upload directory

  • CephFs
    to provide a POSIX-compliant file system Linux kernel client and fuse, when some of linux mount command systems do not support or require more advanced operations, will be used ceph-fuse

Four: The term describes Glossary

crush
local data is used Ceph distributed algorithms, hashing similar consistency, so the data assigned to the expected


map
mentioned above, monitor component responsible for monitoring the health of the entire cluster, such as the state between nodes, cluster configuration information, which is provided by the maintenance daemon cluster members, and how to store this information, the answer is the map, ceph monitor map include these as follows

  • Monitor map: include information about the monitor-end node, including Ceph cluster ID, monitoring host name and IP and port. And stores the current version information as well as information on the latest changes, through the "ceph mon dump" View monitor map
  • OSD map: includes some common information, such as the cluster ID, create version information and last modified information OSD map, and pool-related information, including pool name, pool of ID, type, number of copies and PGP, etc., also including the number, status, weight, date of cleaning intervals and OSD host information. The command "ceph osd dump" View
  • PG map: PG version including the current, time stamp, the latest version information of OSD Map, spatial proportions, and nearly filled the ratio information, colleagues, as well as each of the PG state ID, number of objects, state, OSD, and the depth of the clean-up Details. "Ceph pg dump" command can check the status
  • CRUSH map: CRUSH map comprises defining the temporal rules failed cluster information storage device information, fault domain hierarchy and storing data. The command "ceph osd crush map" view
  • MDS map: MDS Map includes storing the current version of the MDS map information, current information Map creation, modification time, data and metadata POOL ID, cluster number of MDS and MDS status, by "ceph mds dump" View

A copy of
a copy is ceph store copies of data, can be understood as the number of copies of a file backup, ceph default number of copies is 3, that is a main (primary), a second (secondary), a sub-sub (tertiary), only the only copy of the primary osd explain the client request, it writes the data to other osd
below, you can see this is called testpool the pool there is a object object1 called, after his map information acquired can see
the object in the above is osd1 No, in osd0 and osd2 are times and times and

[root@ceph-1 ~]# ceph osd map testpool object1
osdmap e220 pool 'testpool' (38) object 'object1' -> pg 38.bac5debc (38.0) -> up ([1,0,2], p1) acting ([1,0,2], p1)

Other content explanation

  • osdmap e220 version of this map
  • pool 'testpool' (38) of this pool name and ID
  • object 'object1' this object's name
  • pg 38.bac5debc (38.0) pg number, i.e. 38.0
  • up ([1,0,2], p1) up set, the order of which indicates the copy present on osd, osd0 (primary) osd1 (secondary) and osd2 (tertiary)
  • acting ([1,0,2], p1), and at the same acting set up set normally, different situations will be appreciated pg_temp, i.e., if the acting set to pg [0,1,2], at this time if osd.0 failure, resulting in redistribution algorithm CRUSH pg of the acting set is [3,1,2]. At this time, the main OSD pg osd.3 for, but can not afford the osd.3 pg read operation, because it is the now there is no data. Therefore, to monitor the application of a temporary pg, osd.1 temporary main osd, then still acting set to [0,1,2], up set into a [1,3,2], then came out and acting set up set of different. When osd.3 backfill is completed, the pg of recovery is acting set up set, which is acting set up set and are [0,1,2]

object

ceph the bottom of the storage unit, i.e. the object, each object comprising metadata and raw data, when storing data to a user ceph clusters storing data is divided into a plurality of objects, the size of each object can be set the default is 4MB, object may be called the minimum unit of storage ceph


pg and pgp

pg is used to store the object
pgp is located in an arrangement corresponding to pg osd combination, he does not affect the number of copies, copy only affects the order


pool
pool is a logical storage concepts, we create a pool when the need to specify and pgp pg

Five: Confusable Point Analysis

pg relationship with the object
due to the high number of object, so Ceph introduced pg concept for management object, each object are finally mapped to a calculation by CRUSH pg, may comprise a plurality of object pg


Relationship with osd pg of
pg CRUSH calculation map also need to go to the storage osd, if three copies, each osd pg is mapped to three, such as [osd.0, osd.1, osd.2], so osd.0 is storing the master copy of the pg, osd.1 and osd.2 is located from a copy of the pg, ensuring data redundancy


pg relations and pool the
pool is a logical storage concepts, we create pools of storage pool when the need to specify the number of pgp and pg, pg is logically belongs to a storage pool, a bit like object is part of a pg of


Pg relations and the pgp

  • pg is used to store the object, pg is equivalent to storing pgp combination osd an arrangement, such as three osd 1 2 3, 3 and the number of copies set is the number of ceph copy of the default 3, assuming that a word is pgp, At this time, an object is only then is it the only possible order according osd0 osd1 osd2, assuming pgp is 2, then the object may be the case in accordance with osd0 osd1 osd2 osd1 osd0 osd2 and in which two kinds of a species order, assuming pgp is 3, then at this time there are three order, it does not actually affect the number of copies pgp pg affects only the number of copies in order osd pg distribution arrangement optional combination, We can also understand the role of pgp is balanced within the cluster data osd
  • pg is a storage pool specified directory stores the number of objects, pgp OSD distribution is the number of combinations of the storage pool pg
  • Pg on the OSD will cause an increase in the data pg to divide the same split among the newly generated pg
  • pgp will cause an increase in the distribution portion pg vary, but will not cause changes in the objects pg

Relationship data storage, object, pg, pgp, pool, osd, storage disks

  • This document is divided into 12M, objectA, objectB, objectC three objects, are stored in pgA, pgB, pgC pg three years, pgA, pgB, pgC which in turn are owned by three pg poolA, poolB, poolC management, and each distribution pg on which osd, be selective, but how many choices, we made the decision pgp, pgp here set to 1, then it is likely a sort pg distribution displayed on the map, and is the only if pgp is 2, then in addition to the distribution shown on FIG considered a negative, but there is another sort distribution may be pgA on osd1, pgB on osd3, pgC on OSD2, of course, there may be other the distribution of the sort, it is assumed here pgp is 2, so only two options

Ceph Concepts and Components Introduction

Guess you like

Origin blog.51cto.com/11093860/2454814