Ceph's rados design principles and implementation Chapter 4: OSD, the cornerstone of storage

OSD is essentially a process that overrides the operating system. It has resources such as CPU, memory, and network bandwidth. It is used to implement object storage and is compatible with various types of file systems.

The OSDs use the cluster network to supervise each other, and report failures to the Monitor in a timely manner. After the Monitor modifies the OSDMap, the OSDs then propagate the latest OSDMap point-to-point to each other.

1. Cluster management

The OSD needs to communicate with the Monitor regularly, report its own situation, update the latest OSDMap, and report its own capacity usage, keys, etc. to the Monitor. Therefore, the OSD process encapsulates the Monitor client component internally and is used to communicate with the Monitor.

2. Network communication

The network communication component Messenger includes public networks and cluster networks.

3.OSD power on

The boot data of Objectstore is stored on the disk. It is read out first and authenticated, and then the super block of Objectstore is read out in the memory, that is, the Objectstore is mounted. Everything is normal (OSD has sufficient permissions, meets the cluster UUID, and has a correct version number, etc.) After passing), the OSDMap needs to be synchronized with the Monitor. Since the OSDMap is still being updated when the OSD is powered off, the version of the OSDMap after the OSD is powered on may differ from the latest version of the Monitor by a certain number. Therefore, in addition to the OSD, the Monitor needs to be informed of its own status in the OSDMap. Change to UP (because the OSD has been powered on). At the same time, the Monitor also needs to send the increments (up to 40) of several versions of the OSDMap that were missing during the OSD power off to the OSD to update the OSD's local OSDMap.

4.OSD fault detection

Four states: Up, Down, In, Out.
Three detection methods: autonomous reporting, heartbeat detection, watchdog (regularly sending messages to the Monitor to keep alive).
After detecting that the OSD is Down, it will be set to Out after 600S. The affected PGs began to migrate.

5.OSD spatial statistics

Four levels: NearFull, BackFull (prevents PG migration from writing to OSD), Full (prevents writing to OSD), Failsafefull (prevents writing to avoid the final barrier of OSD fullness due to Full mark delay)

Total storage pool space = storage pool used space + storage pool maximum available space

The calculation formula for the maximum available space of the storage pool is: min{ [OSD capacity - reserved space (five percent)] / the proportion of the OSD in the total capacity of the storage pool / the number of storage pool copies}
where the OSD capacity / the proportion of the OSD = The sum of the capacities of all OSDs in the storage pool

The formula for calculating the used space of the storage pool is: the sum of the used space of all OSDs in the storage pool / the number of copies

In fact, the above calculation formula for the maximum available space of the storage pool is based on the premise of balanced data distribution. Since it is min, if two disks of the same capacity are used as two OSDs, the written data is not evenly distributed. min always takes the value with the largest occupied space to calculate the maximum available space, causing the calculated maximum available space to be too small.

Guess you like

Origin blog.csdn.net/mxy990811/article/details/135368901