RocksDB thread-local cache

Outline

      During the development process, we often encounter concurrency issues, the usual method is to solve the problem of concurrent lock protection, such as the commonly used spinlock, mutex or rwlock, of course, can also be used without lock programming, implementation requires relatively high. For any shared variable, as long as there is concurrent read and write, you need to lock protection, and concurrent read and write normally will be faced with a fundamental question, write blocking read, write, or is of low priority, will be written starve phenomenon . These methods can be classified as locked pessimistic locking method, today introduced an optimistic locking mechanism to control concurrent, shared cached copy of each thread by thread-local variables variables, read unlocked, reading time if perceived to shared variables change, refill the local cache with the latest values of shared variables; for a write operation, you need to lock, inform all thread-local variables change. So, simply put, is reading is not locked, read and write do not conflict, only to write about the conflict . The implementation logic from Rocksdb thread-local cache implementation, the following details of the principles Rocksdb thread-local cache of ThreadLocalPtr.

Thread Local Storage (TLS)

A brief introduction thread-local variables, thread-local variables is that each thread has its own independent copy of each thread with its modified independently of each other, although the same variable name, but the storage space and no relationship. General In linux, we can create a thread local storage to achieve the following three functions, access.

int pthread_key_create(pthread_key_t *key, void (*destr_function) (void*)), 
int pthread_setspecific(pthread_key_t key, const void *pointer) ,
void * pthread_getspecific(pthread_key_t key)

ThreadLocalPtr类

     Sometimes, we do not want each thread independent variables, we still need a global variable, thread-local variables only global variables as a cache to ease concurrency. In RocksDB in ThreadLocalPtr this class is to do this thing. ThreadLocalPtr class contains three internal classes, ThreadLocalPtr :: StaticMeta, ThreadLocalPtr :: ThreadData and ThreadLocalPtr :: Entry. StaticMeta which is a singleton, manage all ThreadLocalPtr objects, we can simply consider a ThreadLocalPtr object is a thread local storage (ThreadLocalStorage ). But in fact, we only define a global thread-local variable, from StaticMeta constructor is evident. Then the global need for more thread-local cache how to do, in fact, make an issue of local storage space, thread-local variables are actually stored pointer ThreadData object, rather ThreadData which contains an array, each object has a separate ThreadLocalPtr id, in which occupies a separate space. Fetching a variable local cache, the incoming id assigned to each Entry ptr pointer is a pointer corresponding to the variable.

ThreadLocalPtr::StaticMeta::StaticMeta() : next_instance_id_(0), head_(this) {
  if (pthread_key_create(&pthread_key_, &OnThreadExit) != 0) {
    abort();
  }
  ......
}

void* ThreadLocalPtr::StaticMeta::Get(uint32_t id) const {
   auto* tls = GetThreadLocal();
   return tls->entries[id].ptr.load(std::memory_order_acquire);
}

struct Entry {
  Entry() : ptr(nullptr) {}
  Entry(const Entry& e) : ptr(e.ptr.load(std::memory_order_relaxed)) {}
  std::atomic<void*>ptr; 
};

The overall structure is as follows: each thread has a thread-local variable ThreadData, which contains a set of pointers ThreadLocalPtr, corresponding to a plurality of variables, and connected in series to each other between ThreadData by a pointer, this is very important, because a write operation is performed, write threads need to modify the values ​​of all thread local cache to notice a change in the shared variable.

 ---------------------------------------------------
 |          | instance 1 | instance 2 | instnace 3 |
 ---------------------------------------------------
 | thread 1 |    void*   |    void*   |    void*   | <- ThreadData
 ---------------------------------------------------
 | thread 2 |    void*   |    void*   |    void*   | <- ThreadData
 ---------------------------------------------------
 | thread 3 |    void*   |    void*   |    void*   | <- ThreadData

struct ThreadData {
  explicit ThreadData(ThreadLocalPtr::StaticMeta* _inst)
      : entries(), inst(_inst) {}
  std::vector<Entry> entries;
  ThreadData* next;
  ThreadData* prev;
  ThreadLocalPtr::StaticMeta* inst;
};

No concurrent read and write conflict

     Now comes the core of the problem, how do we achieve partial use TLS to achieve local cache, so read no lockout, no concurrent read and write conflicts. Read, write, and concurrency control logic primarily by ThreadLocalPtr by three key interfaces Swap, CompareAndSwap Scrape and implemented. For ThreadLocalPtr <Type *> variables, in particular of the thread local storage, the values ​​are saved in three different types:

  1) Normal type Type * pointer;

  2) a Type * Dummy variable type, referred to as InUse;

  . 3) nullptr value, denoted obsolote;

Read through the thread to get the variable content Swap the interface, the writer thread through Scrape interface through and reset all ThreadData is (obsolote ) nullptr, to achieve the purpose to inform other thread-local cache invalidation. The next time you read a thread and then read, found to obtain a pointer to nullptr, we need to reconstruct the local cache.

// Get a corresponding local cache content id, each object has a single ThreadLocalPtr id, object management StaticMeta single embodiment. 
void * ThreadLocalPtr :: :: StaticMeta Swap (uint32_t ID, void * PTR) {
 // Get the local caches locally 
Auto * = TLS GetThreadLocal ();                                         

  return TLS-> entries It [ID] .ptr.exchange (PTR, STD :: memory_order_acquire); 
} 

BOOL ThreadLocalPtr :: :: StaticMeta CompareAndSwap (uint32_t ID, void * PTR,
                                                 void * & expected) {
   // Get the local caches locally 
  Auto * = TLS GetThreadLocal ();
   return TLS->entries you [ID] .ptr.compare_exchange_strong ( 
      expected, PTR, memory_order_release STD ::, :: memory_order_relaxed STD); 
} 

// object pointer is set to manage all nullptr a, the pointer returned will expire, for the upper release
 // next when times were obtained from the local thread stack, find content nullptr, then re-apply object. 
void ThreadLocalPtr StaticMeta :: :: Scrape (uint32_t ID, STD :: Vector < void *> PtRS *, void * const Replacement) {                            
  MutexLock L (the Mutex ()); 
  for (T * = ThreadData head_.next; T =! head_ &; T = T-> Next) {                               
     IF (ID <T-> entries.size ()) {                                                            
       voidPtr = * 
          t -> entries It [the above mentioned id] .ptr.exchange (Replacement, std :: memory_order_acquire);               
       IF (! Ptr = nullptr) {
   // gather each thread cache, dereference release memory when necessary 
  ptrs-> push_back (PTR); 
      }                                                                             
    } 
  } 
} 

// initialization, or is replaced after nullptr described cache object has expired, the need to re-apply. 
* ThreadData ThreadLocalPtr StaticMeta :: :: GetThreadLocal () { 
   application thread-local ThreadData object into a doubly linked list managed by StaticMeta objects, each object manager instance a group of thread-local objects. 
   IF (Unlikely (tls_ == nullptr a)) { 
     Auto * Inst =  Instance ();
     tls_ =new ThreadData(inst);
     {                                                                        
      // Register it in the global chain, needs to be done before thread exit
      // handler registration                                                
      MutexLock l(Mutex());                                                  
      inst->AddThreadData(tls_);
     }
    return tls_;                                             
  }
}

A read operation consists of two parts, the Get and Release, which in addition to that acquired from the cache in the TLS, but also to issue a release old object memory. Get, the objects are replaced using TLS InUse objects, then the objects are replaced TLS Release back when there is no concurrent read and write relatively simple scenarios, as shown below, wherein the TLS Object represents the local thread-local cache, the global shared variables GlobalObject is visible to all threads .

下面我们再看看读写有并发的场景,读线程读到TLS object后,写线程修改了全局对象,并且遍历对所有的TLS object进行修改,设置nullptr。在此之后,读线程进行Release时,compareAndSwap失败,感知到使用的object已经过期,执行解引用,必要时释放内存。当下次再次Get object时,发现TLS object为nullptr,就会使用当前最新的object,并在使用完成后,Release阶段将object填回到TLS。

应用场景

      从前面的分析来看,TLS作为cache,仍然需要一个全局变量,全局变量保持最新值,而TLS则可能存在滞后,这就要求我们的使用场景不要求读写要实时严格一致,或者能容忍多版本。全局变量和局部缓存有交互,交互逻辑是,全局变量变化后,局部线程要能及时感知到,但不需要实时。允许读写并发,即允许读的时候,使用旧值读,待下次读的时候,再获取到新值。Rocksdb中的superversion管理则符合这种使用场景,swich/flush/compaction会产生新的superversion,读写数据时,则需要读supversion。往往读写等前台操作相对于switch/flush/compaction更频繁,所以读superversion比写superversion比例更高,而且允许系统中同时存留多个superversion。

每个线程可以拿superversion进行读写,若此时并发有flush/compaction产生,会导致superversion发生变化,只要后续再次读取superversion时,能获取到最新即可。细节上来说,扩展到应用场景,一般在读场景下,我们需要获取snapshot,并借助superversion信息来确认这次读取要读哪些物理介质(mem,imm,L0,L1...LN)。

1).获取snapshot后,拿superversion之前,其它线程做了flush/compaction导致superversion变化

这种情况下,可以拿到最新的superversion。

2).获取snapshot后,拿superversion之后,其它线程做了flush/compaction导致superversion变化

这种情况下,虽然superversion比较旧,但是依然包含了所有snapshot需要的数据。那么为什么需要及时获取最新的superversion,这里主要是为了回收废弃的sst文件和memtable,提高内存和存储空间利用率。

总结

     RocksDB的线程局部缓存是一个很不错的实现,用户使用局部缓存可以大大降低读写并发冲突,尤其在读远大于写的场景下,整个缓存维护代价也比较低,只有写操作时才需要锁保护。只要系统中允许共享变量的多版本存在,并且不要求实时保证一致,那么线程局部缓存是提升并发性能的一个不错的选择。

Guess you like

Origin www.cnblogs.com/cchust/p/11562949.html