Ceph learning-librados and Osdc realize source code analysis

Librados,
RadosClient class
IoctxImpl
AioCompletionImpl
OSDC
ObjectOperation packaging operations
op_target Package information PG
Op packaging operations information
fragment Striper
This article describes the implementation of some modules Ceph client side. The client mainly implements the interface and provides external access functions. The upper layer can access Ceph storage through the interface. Librados and Osdc are at the bottom of the Ceph client. Librados provides basic interfaces for creation, deletion, and creation and deletion of objects. Osdc is used to encapsulate operations, calculate the address of objects, send requests, and handle timeouts. As shown:

According to the LIBRADOS architecture diagram, describe the general event flow. In the Ceph distributed storage combat, this book has the following paragraph:
first call LIBRADOS to create a RADOS according to the configuration file, and then create a radosclient for the RADOS. The radosclient contains three main modules (finisher, Message, Objector). According to the pool to create the corresponding ioctx, radosclient can be found in ioctx. Generate the corresponding OSD request by calling OSDC, and communicate with the OSD to respond to the request. This generally describes the role of librados and osdc in the entire Ceph.

The Librados
module contains two parts, namely the RadosClient module and IoctxImpl. RadosClient is at the top level and is the core management class of librados, which manages the management of the entire RADOS system level and the pool level. IoctxImpl manages one of the pools, such as the control of reading and writing objects.

RadosClient (Librados module)
IoctxImpl (Librados module)
Objecter (Osdc module)
RadosClient class First
look at the header file radosclient.h

class librados :: RadosClient: public Dispatcher // Inherited from Dispatcher (message distribution class)
{
  std :: unique_ptr <CephContext,
          std :: function <void (CephContext *)>> cct_deleter; // unique_ptr smart pointer

public:
  using Dispatcher :: cct;
  md_config_t * conf; // Configuration file
private:
  enum {
    DISCONNECTED,
    CONNECTING,
    CONNECTED,
  } state; // Network connection status

  MonClient monclient;                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  MgrClient mgrclient;
  Messenger * messenger; // !!!!!!!!!!!!!!!!!! Network Message Interface !!!!!!!!!!!!!!!!! !!!!!!

  uint64_t instance_id;

  // Relevant message distribution function of Dispatcher class rewrite
  bool _dispatch (Message * m);
  ...
  ...
  bool ms_handle_refused (Connection * con) override;

  Objecter * objecter; // !!!!!!!!!!!!!!!!!!!!! The Osdc module is used to send encapsulated OP messages !!!!!!!!!!!! !!!

  Mutex lock; // Mutual lock
  Cond cond;
  SafeTimer timer; // Timer
  int refcnt ;
  ...
  ...

public:
  Finisher finisher; // !!!!!!!!!!!!!!!!!!! The class that executes the callback function !!!!!!!!!!!!!!!!!!
  . ..
  ...
  // create a pool of relevant contextual information
  int create_ioctx (const char * name, IoCtxImpl ** IO);
  int create_ioctx (int64_t, IoCtxImpl ** IO);

  int get_fsid (std :: string * s);
  ... /// pool related operations
  ...
  bool get_pool_is_selfmanaged_snaps_mode (const std :: string & pool);
  // Synchronous pool creation and asynchronous pool
  int pool_create (string & name, unsigned long long auid = 0, int16_t crush_rule = -1);
  int pool_create_async (string & name, PoolAsyncCompletionImpl * c, unsigned long long auid = 0,
            int16_t crush_rule = -1);
  int pool_get_base_tier (int64_t pool_id, int64_t * base_tier);
  // Synchronous delete and asynchronous delete
  int pool_delete (const char * name);

  int pool_delete_async(const char *name, PoolAsyncCompletionImpl *c);

  int blacklist_add(const string& client_address, uint32_t expire_seconds);
  //Monitor相关命令处理,调用monclient.start_mon_command 把命令发送给Monitor处理
  int mon_command(const vector<string>& cmd, const bufferlist &inbl,
              bufferlist *outbl, string *outs);
  void mon_command_async(const vector<string>& cmd, const bufferlist &inbl,
                         bufferlist *outbl, string *outs, Context *on_finish);
  int mon_command(int rank,
          const vector<string>& cmd, const bufferlist &inbl,
              bufferlist *outbl, string *outs);
  int mon_command(string name,
          const vector<string>& cmd, const bufferlist &inbl,
              bufferlist *outbl, string *outs);
  mgr_command int (const Vector <String> cmd &, const BufferList & inbl,
              BufferList * outbl, String * outs);
  // OSD related command processing, call objecrot-> osd_command to send a command to the OSD processing             
  int osd_command (int osd, vector < string> & cmd, const bufferlist & inbl,
                  bufferlist * poutbl, string * prs);
  // PG related command processing, call objecrot-> pg_command to send the command to OSD processing 
  int pg_command (pg_t pgid, vector <string> & cmd, const bufferlist & inbl,
             bufferlist * poutbl, string * prs);

             ...
             ...
             ...
};
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
Let's take a look at some of these functions

connect () is the initialization function of RadosClient.

int librados::RadosClient::connect()
{
  common_init_finish(cct);

  int err;
  ...
  ...
  // get monmap
  err = monclient.build_initial_monmap (); // Check the initial Monitor information in the configuration file
  if (err <0)
    goto out;

  err = -ENOMEM;
  messenger = Messenger :: create_client_messenger (cct, "radosclient"); // Create communication module
  if (! messenger)
    goto out;
  // Set Policy related information      
  messenger-> set_default_policy (Messenger :: Policy :: lossy_client (CEPH_FEATURE_OSDREPLYMUX));

  ldout(cct, 1) << "starting msgr at " << messenger->get_myaddr() << dendl;

  ldout(cct, 1) << "starting objecter" << dendl;
  //创建objecter并初始化
  objecter = new (std::nothrow) Objecter(cct, messenger, &monclient,
              &finisher,
              cct->_conf->rados_mon_op_timeout,
              cct->_conf->rados_osd_op_timeout);
  if (!objecter)
    goto out;
  objecter->set_balanced_budget();

  monclient.set_messenger(messenger);
  mgrclient.set_messenger(messenger);

  objecter->init();
  messenger->add_dispatcher_head(&mgrclient);
  messenger->add_dispatcher_tail(objecter);
  messenger->add_dispatcher_tail(this);

  messenger->start();

  ldout(cct, 1) << "setting wanted keys" << dendl;
  monclient.set_want_keys(
      CEPH_ENTITY_TYPE_MON | CEPH_ENTITY_TYPE_OSD | CEPH_ENTITY_TYPE_MGR);
  ldout(cct, 1) << "calling monclient init" << dendl;
  //初始化monclient
  err = monclient.init();
  ...
  err = monclient.authenticate(conf->client_mount_timeout);
  ...

  objecter-> set_client_incarnation (0);
  objecter-> start ();
  lock.Lock ();
  // Timer initialization
  timer.init ();
  // finisher object initialization
  finisher.start ();

  state = CONNECTED;
  instance_id = monclient.get_global_id();

  lock.Unlock();

  ldout(cct, 1) << "init done" << dendl;
  err = 0;

 out:
 ...
 ...
  return err;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
create_ioctx () is used to create a pool-related context information IoCtxImpl object.

int librados::RadosClient::create_ioctx(const char *name, IoCtxImpl **io)
{
  int64_t poolid = lookup_pool(name);
  if (poolid < 0) {
    return (int)poolid;
  }

  * io = new librados :: IoCtxImpl (this, objecter, poolid, CEPH_NOSNAP);
  return 0;
}
1
2
3
4
5
6
7
8
9
10
mon_command () is used to process Monitor related commands

void librados :: RadosClient :: mon_command_async (const vector <string> & cmd,
                                              const bufferlist & inbl,
                                              bufferlist * outbl, string * outs,
                                              Context * on_finish)
{
  lock.Lock ();
  monclient.start_mon_command (cmd, inbl, outbl, outs, on_finish); // Send commands to Monitor to handle
  lock.Unlock ();
}
1
2
3
4
5
6
7
8
9
osd_command () to handle OSD related commands

int librados::RadosClient::osd_command(int osd, vector<string>& cmd,
                       const bufferlist& inbl,
                       bufferlist *poutbl, string *prs)
{
  Mutex mylock("RadosClient::osd_command::mylock");
  Cond cond;
  bool done;
  int ret;
  ceph_tid_t tid;

  if (osd < 0)
    return -EINVAL;

  lock.Lock();
  //调用objecter->osd_commandf 发送命令给OSD处理
  objecter->osd_command(osd, cmd, inbl, &tid, poutbl, prs,
            new C_SafeCond(&mylock, &cond, &done, &ret));
  lock.Unlock();
  mylock.Lock();
  while (!done)
    cond.Wait(mylock);
  mylock.Unlock();
  return ret;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
IoctxImpl
This class is the context information of the pool, one pool corresponds to one IoctxImpl object. All APIs related to IO operations in librados are designed in librados :: IoCtx, and the real implementation of the interface is in IoCtxImpl. Its processing process is as follows:
1) Encapsulate the request into the ObjectOperation class (in osdc)
2) Add the relevant pool information to it and encapsulate it into Objecter :: Op object
3) Call the corresponding function objecter-> op_submit and send it to Corresponding OSD
4) After the operation is completed, call the corresponding callback function.

AioCompletionImpl
Aio is Async IO, AioCompletion is Async Io Completion, which is the callback processing when Async IO is completed, and librados designed AioCompletion to provide a mechanism for processing the result code when Aio is completed. The processing function is implemented by the user. AioCompletion is an open library API designed by librados. The real design logic is in AioCompletionImpl.

For the use of AIoCompletion instances, all refer to the PC, that is, AioCompletionImpl, so specifically how to package AioCompletionImpl. It is mentioned here that all APIs related to IO operations in librados are designed in librados :: IoCtx, and the real implementation of the interface is in IoCtxImpl. And AioCompletionImpl is the callback of IO operation, because the packaging design for AioCompletionImpl is in the IoCtxImpl module

For detailed analysis of the callback mechanism, see: librados of ceph source code analysis: 1. AioCompletion callback mechanism analysis

The OSDC
module is the bottom layer of the client module. The module is used to encapsulate the operation data, calculate the address of the object, send the request, and process the timeout.

ObjectOperation encapsulation operation
This class is used to encapsulate operation- related parameters in this class, and can encapsulate multiple operations at once. The code is too long. . . . OMG, just read this summary. . . .

struct ObjectOperation {
  vector<OSDOp> ops;//操作集合
  int flags;
  int priority;

  vector <bufferlist *> out_bl; // Output buffer queue
  vector <Context *> out_handler; // Callback function queue
  vector <int *> out_rval; // Operation result queue

  ObjectOperation (): the flags (0), priority (0) {}
  ~ ObjectOperation () {
    the while {(out_handler.empty ()!)
      Delete out_handler.back ();
      out_handler.pop_back ();
    }
  }
  ...
  .. .
  ...
}
. 1
2
. 3
. 4
. 5
. 6
. 7
. 8
. 9
10
. 11
12 is
13 is
14
15
16
. 17
18 is
. 19
20 is
an operation object class OSDop package. The structure Ceph_osd_op encapsulates an opcode and related input and output parameters:

struct OSDop {
    ceph_osd_op op; // Operation code and operation parameter
    sobject_t soid;
    bufferlist indata, outdata
    int32_t rval; // Operation result
}
1
2
3
4
5
6
op_target encapsulating PG information
This structure encapsulates the PG and PG where the object is located Corresponding OSD list and other information.

struct op_target_t {
    int flags = 0;

    epoch_t epoch = 0;  ///< latest epoch we calculated the mapping

    object_t base_oid; // The object
    read_object_locator_t base_oloc; // The pool information of the object
    object_t target_oid; // The final read target object
    object_locator_t target_oloc; // The final pool information of the target object.

    ///< true if we are directed at base_pgid, not base_oid
    bool precalc_pgid = false;

    ///< true if we have ever mapped to a valid pool
    bool pool_ever_existed = false;

    ///< explcit pg target, if any
    pg_t base_pgid;

    pg_t pgid; ///< last (raw) pg we mapped to
    spg_t actual_pgid; ///< last (actual) spg_t we mapped to
    unsigned pg_num = 0; ///< last pg_num we mapped to
    unsigned pg_num_mask = 0; ///< last pg_num_mask we mapped to
    vector<int> up; ///< set of up osds for last pg we mapped to
    vector<int> acting; ///< set of acting osds for last pg we mapped to
    int up_primary = -1; ///< last up_primary we mapped to
    int acting_primary = -1;  ///< last acting_primary we mapped to
    int size = -1; ///< the size of the pool when were were last mapped
    int min_size = -1; ///< the min size of the pool when were were last mapped
    bool sort_bitwise = false; ///< whether the hobject_t sort order is bitwise
    bool recovery_deletes = false; ///< whether the deletes are performed during recovery instead of peering
    ...
    ...
    ...
  };
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Op encapsulation operation information
This structure encapsulates relevant context information for completing an operation, including target address information and connection information.

 struct Op: public RefCountedObject {
    OSDSession * session; // OSD related session information, session is information about connect
    int incarnation;

    op_target_t target; // Address information

    ConnectionRef con;  // for rx buffer only
    uint64_t features;  // explicitly specified op features

    vector <OSDOp> ops; // Multiple operations

    snapid_t snapid;快照ID
    SnapContext snapc;
    ceph::real_time mtime;

    BufferList * outbl;
    Vector <BufferList *> out_bl;
    Vector <the Context *> out_handler;
    Vector <int *> out_rval;
. 1
2
. 3
. 4
. 5
. 6
. 7
. 8
. 9
10
. 11
12 is
13 is
14
15
16
. 17
18 is
. 19
fragments Striper
when a file When mapping to an object, if the object has shards, use this class to shard and save shard information.

 class Striper {
  public:
    /*
     * map (ino, layout, offset, len) to a (list of) ObjectExtents (byte
     * ranges in objects on (primary) osds)该函数完成file到对象stripe后的映射。
     */
    static void file_to_extents(CephContext *cct, const char *object_format,
                const file_layout_t *layout,
                uint64_t offset, uint64_t len,
                uint64_t trunc_size,
                map<object_t, vector<ObjectExtent> >& extents,
                uint64_t buffer_offset=0

  };
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
where ObjectExtent saves the fragmentation information within the object.

class ObjectExtent {
 public:
  object_t oid; // object id
  uint64_t objectno; // fragment number
  uint64_t offset; // offset within the object
  uint64_t length; // length
  uint64_t truncate_size; // in object

  object_locator_t oloc; // object locator (pool etc) location information such as which pool

};
1
2
3
4
5
6
7
8
9
10
11

————————————————
Copyright Statement: This article is an original article by CSDN blogger "SEU_PAN", following the CC 4.0 BY-SA copyright agreement, please attach the original source link and this statement for reprint .
Original link: https://blog.csdn.net/CSND_PAN/article/details/78707756

Published 13 original articles · Likes6 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/majianting/article/details/102984590