ceph trackop analysis

data structure

class OpHistory {

set<pair<utime_t, TrackedOpRef> > arrived;//Sort by arrival time from early to late

  set<pair<double, TrackedOpRef> > duration;//Sort by op duration from small to large

  Mutex ops_history_lock;//Protect the above 2 variables

  bool shutdown;//Set when osd is down

  uint32_t history_size;//The maximum number of historical ops reserved

  uint32_t history_duration//The longest time for historical op retention

};

Save the historical op information that has been completed within a certain period of time.

Class OpTracker {

  Class RemoveOnDelete{

    OpTracker * tracker;

};

atomic64_t seq;//Each request has an incremental id, initially 0

struct ShardedTrackingData {

    Mutex ops_in_flight_lock_sharded;

    xlist<TrackedOp *> ops_in_flight_sharded;

};

vector<ShardedTrackingData*> sharded_in_flight_list;//Save the shard list of TrackedOp

uint32_t num_optracker_shards;//The number of shard lists, which cannot be dynamically modified

OpHistory history;//Instance of historical TrackedOp

float complaint_time;//Check whether trackedop needs alarm time threshold

int log_threshold;//The maximum number of alarm logs output by each check

public:

  bool tracking_enabled;//Whether to enable op tracking

  CephContext *cct;

};

Management class for the entire op tracking

class TrackedOp {

  xlist<TrackedOp*>::item xitem;//An item in xlist in OpTracker

protected:

  OpTracker * tracker;

  utime_t initiated_at;//The time when the request arrives

  list<pair<utime_t, string> > events; //The event points experienced by op and the corresponding time

  mutable Mutex lock; //保护events

  string current; //current event

  uint64_t seq; //seq allocated by OpTracker

  uint32_t warn_interval_multiplier; //limit output op warning

};

Instance parent class tracked by a single op

struct OpRequest : public TrackedOp {

int rmw_flags;//op flag refers to CEPH_OSD_RMW_FLAG_READ, etc.

private:

Message *request; /// the logical request we are tracking

osd_reqid_t reqid;//Client's request id

uint8_t hit_flag_points;//What flags are brought, referring to flag_reached_pg, etc., currently not used

uint8_t latest_flag_point;//The latest flag, currently not used

utime_t dequeued_time;//Time out of the op_shardedwq queue

};

A specific instance of a single op tracking

key function implementation

OpTracker :: RemoveOnDelete :: operator () (TrackedOp * op)

Called when TrackedOp's smart pointer is released.

1. Mark the current TrackedOp as done.

2. Call unregister_inflight_op to release TraackedOp from the shard corresponding to OpTracker's sharded_in_flight_list

3. Add TrackedOp to the history instance.

void OpHistory::cleanup(utime_t now)

Called in OpHistory::insert or OpHistory::dump_ops functions, and when inserting a new TrackedOp or dumping all TrackedOps.

This function traverses the arrived and duration lists of OpHistory, and first deletes the TrackedOp that has exceeded the time. By default History will save requests within 600s. When deleting too many ops, the default History only saves 20 requests.

bool OpTracker::check_ops_in_flight(std::vector<string> &warning_vector)

This function is called by the tick thread of osd to check whether the TrackerOp whose timed check has not been completed is normal.

The function traverses all shards to get the oldest op and saves it in oldest_op, and counts the total number of current ops and saves it in total_ops_in_flight. If the oldest op to the current time is smaller than the complaint_time, or if there is no op, it is normal, and returns false directly. Otherwise, continue to traverse all shards, find out the slow requests whose TrackedOp arrival time is less than complaint_time, save them in warning_vector, and record the number. When the number exceeds log_threshold, the loop will not be repeated.

There is also a little trick here is that warning_vector reserves the first index first, and when all statistics are finished, the statistical information is saved in the first one.

Event summary

Common event events

Event event

meaning

Initiated

The event set in the constructor of TrackedOp, the initialization event

reached_pg

The op_shardedwq queue that just came out of osd

started

There are many places to call, the normal main osd io process is called in do_op, after checking all exceptions, start calling execute_ctx

waiting for subops from

When the master osd sends the request to the replica osd

commit_queued_for_journal_write

When the request is ready to enter the log queue

write_thread_in_journal_buffer

The log data has been prepared in the buffer and has not been written yet

journaled_completion_queued

The log has been written, and the callback enters the queue

on_commit

Write commit return in multi-copy scenario

op_applied

The apply return of the multi-replica master osd

sub_op_commit_rec

In a multi-copy scenario, the master osd processes the commit message of the replica osd and returns

commit_sent

When all the commits requested by the three replicas are returned, it is triggered by the return of the latest replica.

sub_op_applied_rec

In a multi-replica scenario, the master osd processes the return of the apply message of the replica osd, and the normal read-write replica will not return the apply message.

waiting for rw locks

The read-write process will acquire the relevant lock in the do_op function. If the lock cannot be obtained, the request will be saved in the objectcontext and will be processed after the lock is released.

 

Event sequence of Io process

normal circumstances:

 

Read requests when writing to disk is slow:

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325560098&siteId=291194637