Pool and object applications in RGW

Abstractly, RGW is a rados-client instance based on a rados cluster .

Object and pool brief description

There are many articles introduced on the Rados cluster online, and I will not describe them one by one here. The main ones to explain are object and pool . Any object in the rados cluster will be mapped to one ( pool , object ). where pool is the storage pool and object is the object name. If you are just doing the development of the upper-layer application of the rados cluster (similar to developing rgw ). You only need to design how to store data in which pool and which object in the rados cluster .

Pool and object applications in RGW

There are many pools in RGW , and different pools manage different object data. There are 2 data structures to focus on here :

struct RGWZoneParams {

  rgw_bucket domain_root;

  rgw_bucket control_pool;

  rgw_bucket gc_pool;

  rgw_bucket log_pool;

  rgw_bucket intent_log_pool;

  rgw_bucket usage_log_pool;

 

  rgw_bucket user_keys_pool;

  rgw_bucket user_email_pool;

  rgw_bucket user_swift_pool;

  rgw_bucket user_uid_pool;

}

struct RGWZonePlacementInfo {

  string index_pool;

  string data_pool;

  string data_extra_pool;

}

The objects that RGW applies to rados are divided into two categories, one is ordinary objects, which are equivalent to the storage of ordinary files. Another class is the omap object, which is used to store KV objects. The following describes the purpose of the main RGW pool :

domain_root pool : Each bucket corresponds to a common object for storing bucket metadata.

control pool : create several common objects on the pool for watch-notify ( a watch and notify mechanism provided by librados ). At present , RGW uses this mechanism to implement distributed caching (will be introduced in detail later).

gc pool : The large file data in rgw is generally deleted in the background, and the pool is used to record those file objects to be deleted.

log pool : used to store 3 types of log , oplog , meta_log , data_log . Among them, oplog is mainly used to record user operation records. Both Meta_log and data_log are needed in the remote replication function, which will be described in detail later.

Intent log pool : This pool is not currently used.

usage log pool : Stores metering data statistics, such as how many times a file is uploaded, how many times it is downloaded, and how many times a bucket is traversed .

user keys pool : It is used to store the correspondence between user AK and uid , which is convenient to find user id through ak requested by user restful .

user email pool : used to store the correspondence between user email and uid .

user swift pool : used to store the correspondence between swift key and uid

user uid pool : used to store user information, each user has a unique uid as the object name. At the same time, each user will also have an object for indexing the buckets under the user, which also exists under the pool .

index pool:存储bucket的文件索引对象。每个bucket对应在该pool下有一个索引对象,用来索引该bucket下所有的文件。同时远程复制中用到的bilog也存储在该poolbucket索引对象上。

data pool:顾名思义,所有的文件数据都存在该pool下。

data extra poolMultipart upload过程中一些中间态的数据,会存在该pool上。这些数据可以帮助用户进行断点续传及垃圾数据回收。

另外还有2pool也比较重要

zone root pool:用于存放zone的元数据信息,其实就是存放RGWZoneParams数据结构

region root pool:用于存放region的元数据信息。

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325560149&siteId=291194637