RocksDB Persistent Read Cache部分代码分析

【github Wiki搜索Persistent Read cache详细描述】

第二种优势:直接拿走不影响使用

设计时不考虑针对某种设备

三个主要部分：

Block Lookup Index：maps a given LSM block address to a cache record locator. The cacherecord locator helps locate the block data in the cache. The cache record canbe described as { file-id, offset, size }.将LSM block地址映射到一个cache record locator上，cache record locator帮助在cache中查找block data（根据hash表确定所在文件以及offset等，在查找时使用该表）

File Lookup Index / LRU：This index maps a given file identifier to its reference object abstraction.The object abstraction can be used for reading data from the cache【data 也存储在cache中】.When we run out of space on the persistent cache, we evict the least recentlyused file from this index.

将file ID和它引用对象的抽象映射起来，对象抽象用来从cache中读数据，persistent cache空间用完时，踢出index中最近最少使用的file（在空间不够回收空间时使用该表，随机选择了要剔除的file后进行删除）

扫描二维码关注公众号，回复： 3635834 查看本文章

File Layout：Thecache is stored in the file system as a sequence of files. Each file contains asequence of records which contain data corresponding to a block on RocksDB LSM.

整个cache相当于文件系统中一系列文件【配置时指定文件路径】，每个file包含一系列的records，每个records包含RocksDB LSM中的block【给定LSM block地址后，通过第一个BLOCK LOOKUP INDEX查找到file-id offset size】

API: https://github.com/facebook/rocksdb/blob/master/include/rocksdb/persistent_cache.h

// Persistent Cache
//
// Persistent cache is tiered key-value cache that can use persistent medium. It
// is a generic design and can leverage any storage medium -- disk/SSD/NVM/RAM.
// The code has been kept generic but significant benchmark/design/development
// time has been spent to make sure the cache performs appropriately for
// respective storage medium.
// The file defines
// PersistentCacheTier    : Implementation that handles individual cache tier
// PersistentTieresCache  : Implementation that handles all tiers as a logical
//                          unit
//
// PersistentTieredCache architecture:
// +--------------------------+ PersistentCacheTier that handles multiple tiers
// | +----------------+       |
// | | RAM            | PersistentCacheTier that handles RAM (VolatileCacheImpl)
// | +----------------+       |
// |   | next                 |
// |   v                      |
// | +----------------+       |
// | | NVM            | PersistentCacheTier implementation that handles NVM
// | +----------------+ (BlockCacheImpl)
// |   | next                 |
// |   V                      |
// | +----------------+       |
// | | LE-SSD         | PersistentCacheTier implementation that handles LE-SSD
// | +----------------+ (BlockCacheImpl)
// |   |                      |
// |   V                      |
// |  null                    |
// +--------------------------+
//               |
//               V
//              null
namespace rocksdb {

// Persistent Cache Config
//
// This struct captures all the options that are used to configure persistent
// cache. Some of the terminologies used in naming the options are
//
// dispatch size :
// This is the size in which IO is dispatched to the device
//
// write buffer size :
// This is the size of an individual write buffer size. Write buffers are
// grouped to form buffered file.
//
// cache size :
// This is the logical maximum for the cache size
//
// qdepth :
// This is the max number of IOs that can issues to the device in parallel
//
// pepeling :
// The writer code path follows pipelined architecture, which means the
// operations are handed off from one stage to another
//
// pipelining backlog size :
// With the pipelined architecture, there can always be backlogging of ops in
// pipeline queues. This is the maximum backlog size after which ops are dropped
// from queue

Wiki Benchmarking-tools部分：persistent_cache_bench

通过输入指令时—path获得参数path 为一个文件路径：

-path(Path for cachefile) type: string default: "/tmp/microbench/blkcache"

path_("") {
 verbose_ = IsFlagPresent(flags, ARG_VERBOSE);
 json_ = IsFlagPresent(flags, ARG_JSON);

 std::map<std::string, std::string>::const_iterator itr = options.find(ARG_PATH);
 if (itr != options.end()) {
   path_ = itr->second;
   if (path_.empty()) {
     exec_state_ = LDBCommandExecuteResult::Failed("--path: missing pathname");
   }
 }

cache_ =NewBlockCache(Env::Default(), path_,
                        /*size=*/std::numeric_limits<uint64_t>::max(),
                         /*direct_writes=*/true);
cache_ =NewTieredCache(Env::Default(), path_,
                         /*memory_size=*/static_cast<size_t>(1 * 1024 * 1024));

插入：

Persistent_cache_test.h中

PersistentCacheTierTest下

void Insert(const size_t nthreads,const size_t max_keys) {
   key_ = 0;
   max_keys_ = max_keys;
   // spawn threads
auto fn =std::bind(&PersistentCacheTierTest::InsertImpl, this);//开多个线程执行InsertImpl
   auto threads = SpawnThreads(nthreads, fn);
   // join with threads
   Join(std::move(threads));
   // Flush cache
   Flush();
 }

InsertImpl中 Status status = cache_->Insert(key,data, sizeof(data));

cache_：std::shared_ptr<PersistentCacheTier>cache_;

PersistentCacheTier： BlockCacheTier，VolatileCacheTier继承它 PersistentCacheTier 相当于一个接口，可叠加多层

// This a logical abstraction thatdefines a tier of the persistent cache. Tiers
// can be stacked over one another.PersistentCahe provides the basic definition
// for accessing/storing in thecache. PersistentCacheTier extends the interface
// to enable management and stackingof tiers.

Volatile_tier_impl.h

// VolatileCacheTier
//
// This file provides persistent cache tier implementation for caching
// key/values in RAM.
//
//        key/values
//           |
//           V
// +-------------------+
// | VolatileCacheTier | Store in an evictable hash table
// +-------------------+
//           |
//           V
//       on eviction
//   pushed to next tier
//
// The implementation is designed to be concurrent. The evictable hash table
// implementation is not concurrent at this point though.
//
// The eviction algorithm is LRU

class BlockCacheTier : publicPersistentCacheTier

执行在BlockCacheTier中的Insert函数：

上一个cacheFile写满了，执行Status BlockCacheTier::NewCacheFile() {}新建一个cacheFile

将新建的cache_file_插入元数据，status = metadata_.Insert(cache_file_);【插入index的是file】

boolBlockCacheTierMetadata::Insert(BlockCacheFile* file) {
  return cache_file_index_.Insert(file);//插入进索引
}

Hash_table_evictable.h：

将file插入索引，根据GetBucket获得要插入的bucket，再获取要插入的LRUList和这个LRUList对应的mutex，加锁，插入file到bucket，也插入LRU

  bool Insert(T* t) {
    const uint64_t h = Hash()(t);//获得hash值（file的）
    typename hash_table::Bucket& bucket = GetBucket(h);
    LRUListType& lru = GetLRUList(h);
    port::RWMutex& lock = GetMutex(h);

    WriteLock _(&lock);
    if (hash_table::Insert(&bucket, t)) {
      lru.Push(t);
      return true;
    }
    return false;
  }

其中：

  typename hash_table::Bucket& GetBucket(const uint64_t h) {
    const uint32_t bucket_idx = h % hash_table::nbuckets_;
    return hash_table::buckets_[bucket_idx];
  }

  LRUListType& GetLRUList(const uint64_t h) {
    const uint32_t bucket_idx = h % hash_table::nbuckets_;
    const uint32_t lock_idx = bucket_idx % hash_table::nlocks_;
    return lru_lists_[lock_idx];
  }

  port::RWMutex& GetMutex(const uint64_t h) {
    const uint32_t bucket_idx = h % hash_table::nbuckets_;
    const uint32_t lock_idx = bucket_idx % hash_table::nlocks_;
    return hash_table::locks_[lock_idx];
  }

整个区域分为nbuckets_个bucket，根据hash(t)余数确定是第几个bucket，根据nbuckets_与nlocks_大小关系确定几个bucket公用一个锁个lru_list（nlocks_>=nbuckets_ 则一个bucket一个lru list/mutex）

删除最近最少使用的object【先随机选择一个lru_list，看其中是否有可以剔除的object，如果有，则剔除，否则看下一个lru list有没有可以踢的，直到第nlocks_个lru list】

hash_table_evictable.h

  T* Evict(const std::function<void(T*)>& fn = nullptr) {
    uint32_t random = Random::GetTLSInstance()->Next();
    const size_t start_idx = random % hash_table::nlocks_;//随机找一个部分开始查看能不能踢
    T* t = nullptr;

    // iterate from start_idx .. 0 .. start_idx
    for (size_t i = 0; !t && i < hash_table::nlocks_; ++i) {//从这个一直找到第nlocks个
      const size_t idx = (start_idx + i) % hash_table::nlocks_;

      WriteLock _(&hash_table::locks_[idx]);
      LRUListType& lru = lru_lists_[idx];
      if (!lru.IsEmpty() && (t = lru.Pop()) != nullptr) {//是否有可以剔除的
        assert(!t->refs_);
        // We got an item to evict, erase from the bucket
        const uint64_t h = Hash()(t);
        typename hash_table::Bucket& bucket = GetBucket(h);
        T* tmp = nullptr;
        bool status = hash_table::Erase(&bucket, t, &tmp);
        assert(t == tmp);
        (void)status;
        assert(status);
        if (fn) {
          fn(t);
        }
        break;
      }
      assert(!t);
    }
    return t;
  }

预留空间: block_cache_tier.cc

bool BlockCacheTier::Reserve(const size_t size) {
  WriteLock _(&lock_);
  assert(size_ <= opt_.cache_size);


  if (size + size_ <= opt_.cache_size) {
    // there is enough space to write 如果空间还够，不用剔除
    size_ += size;
    return true;
  }


  assert(size + size_ >= opt_.cache_size);
  // there is not enough space to fit the requested data
  // we can clear some space by evicting cold data


  const double retain_fac = (100 - kEvictPct) / static_cast<double>(100);//kEvictPct：cache满了的时候要剔除的百分比
  while (size + size_ > opt_.cache_size * retain_fac) {
    unique_ptr<BlockCacheFile> f(metadata_.Evict());//while循环剔除一些文件
    if (!f) {
      // nothing is evictable
      return false;
    }
    assert(!f->refs_);
    uint64_t file_size;
    if (!f->Delete(&file_size).ok()) {
      // unable to delete file
      return false;
    }


    assert(file_size <= size_);
    size_ -= file_size;
  }


  size_ += size;
  assert(size_ <= opt_.cache_size * 0.9);
  return true;
}

RocksDB Persistent Read Cache部分代码分析

猜你喜欢