leveldb source - the overall architecture analysis

A purpose of this article

Leveldb frame analysis of the overall design (leveldb on the basic principles, set forth in this article do not do, the reader can retrieve their own articles can be read), for leveldb the underlying data storage data format, memory data model, compact, version management, snapshots and other mechanisms realization and introduction leveldb entire source code to achieve the source of the files to achieve duties, fast and easy to have a grasp of the overall leveldb

 

Achieve two each characteristic mechanisms

  Storing the underlying data format 1.leveldb

          leveldb underlying data format, many online articles have described, which is not to say, how to say mainly introduce the upper data is written to disk.

   the data written to disk leveldb kv is written by data compression (CompactMemTable [memory compression to sst] & BackgroundCompaction [sst interlayer Compression]), the upper layer are achieved through the persistent data object TableBuilder

           The main two interfaces:

           # Add: a row is added to a buffer in

                   #Finish: The kv record table according to the format (generates a filter, metaindex, index, footer block), then assign each block is written to a file for sst

   compact using the general process:

     1. The compression process used in iterative mode, the input required to traverse sequentially compressed sst file key-val pairs.

     2. Discard key expired records (for example: a key repeatedly modified, retaining only the latest of a record, in addition to a snapshot, this is something), not expired on the record will be called TableBuilder.Add interface to TableBuilder buffer area,

     3 until the buffer reaches a threshold data size (user specified). Sst will call TableBuilder.Finish generate tabular data and persistent.

        2. Memory Data Model

    Memory model is the use of the data structure of the jump table to manage maintenance.

    # Leveldb jump table is defined in a generic data structure, the need for external key is passed to the compare target node.

    #MemTable way to use a combination of jump table, compare the custom object.

    #  Wherein MemTable interesting point to note is stored in the skip list node structure comprising (key + val), jump table by key comparison worth incoming externally defined, using the following MemTable comparison function from the function You can see the need for the node to extract real userKey comparison.

    

 

     3 Compact process: 

          compact divided into what steps

           1. Select Level is compact, by VersionSet.PickCompaction function to determine which one need to compact, which files need to change the compact layer

     # As for why to decide VersionSet, is described in detail because VersionSet current management information leveldb entire file organization structure, and then later version management will be

          2. The need for the establishment of compact file iterator, iterator sorted key in order to access all of the records

          3. sequentially through all the records, judgment needs to be discarded records, the records need to be retained, calls TableBuilder.Add interface is added to the buffer zone sst

    The recording condition determination discarded 

                  # Key with a non-first time, and recording the corresponding serial number smaller than the sequence number of the oldest snapshot (the recording does not need to be described snapshot backup)

                  # The key value into a delete operation and does not require a snapshot backup

           4.sst buffer is full, it will generate a complete sst file format, and then persisted to disk

     5. The compressed file changes involved in the process (for example: deleting old sst, sst new generation) is added to Version Management in version_edit

              # Note: sst here does not involve cleaning up old files, but the current record operating results in a compact which files need to be deleted, the delete operation actually performed by other processes

          6. Remove outdated files

              # Out of the current compact new sst file & all versions of sst file version management, will join the collection livefiles

              # Livefiles not set all the documents considered to be outdated files need to be deleted.

        

    4. Versioning

   Leveldb version management is extremely important module, in order to understand the whole leveldb must understand why & how to implement version management

   Why version 4.1 requires management

              1. Assuming a scenario: a user initiates a read operation for sst file, read the data into the half, this time to complete the compact, due to the compact is a separate thread, then sst file will be cleared away, and this time read user error

    So the sentence is: manage files on disk, to ensure the accuracy of data leveldbdb

           How to achieve the 4.2 version management

     basic concepts:

      version: a version corresponding to the recording time data file changes, such as compact file discovery will lead to a change (old file sst, sst newly created file), it is necessary to record the end of the current compact process, which files in the current database state,

        # Main data structures: std :: vector <FileMetaData *> files_ [config :: kNumLevels]; each layer of the recording management file sst

      versionSet: Because every time after the compact will have a version, so it is necessary to manage these version, using a doubly linked list has generated version in order to manage

      versionEdit: incremental change, each recording increment compact, which is to increase the files, which files need to be removed by the previous Version + versionEdit will get the current Version

        # Main data structures:

                                    #DeletedFileSet deleted_files_;

                                    #std::vector<std::pair<int, FileMetaData>> new_files_;

                Realize the general idea is:
      Every generation versionEdit after the compact, then by calling versionSet.LogAndApply the versionEdit to the current version of the latest generation version, versionSet then added to the list becomes the current version.
    
             # Some Problems:  
      1. version management usage scenarios
        After each use of the document (such as reading) will get the current version, then that version may reference count + 1, (read) will complete the reference count -1
       2. version when it will be deleted
         When there is no reference to a version, the system will be automatically removed away from the doubly-linked list versinSet
       3. Each compact system will generate a version that is not how many times it will mean compact version How many times do
         In Question 2 it has shown that a version of the survival period of the existence of that version of the reference count, if there is no reference will be immediately deleted
    
  5. snapshot mechanism

     # Management data structures: a snapshot of a doubly linked list, all the snapshots using the doubly linked list maintenance management up
     # snapshot data structure of the main field
          SnapshotImpl * prev_; // lists related
           SnapshotImpl * next_; // lists related
           const SequenceNumber sequence_number_; // snapshot sequence_number_
      the main idea # implementation:
      for a snapshot, get sequence_number_ snapshot, when reading the key, you can get the value of multiple insertions of the key, which get the biggest val seq seq of not more than sequence_number_ of.
                # Snapshot of how to ensure that records are not deleted it?
                     Delete all of the processes carried out only when the compact, compact logic, traversing key values of all need to compress the file, the key will be to determine whether to discard
                     one of them that, if there is a snapshot of the current key references, you can not compact out, so as to ensure played a snapshot of the data will not be discarded.

 Three source structure

  The directory structure is mainly explained by the source as well as the key role of source files and directories way to show the source of the whole structure.

    cmake: cmake related documents

    db: to achieve the main mechanism, including version management, compact, business and other functions to read and write mechanisms to achieve;

    doc: Documentation;
    helpers / memenv: simply complete memory file system, provide operational directory file interfaces;
    the include / leveldb: headers, cited the use of leveldb external project header file;
    Port: platform-specific implementation, the main provider of posix / android Related support;
    Table: defines the data structure of the persistent storage of the entire leveldb
    util: common functions implemented.

They introduced are db & table, these two parts is the essence of the whole leveldb

  table: finished defining the data format leveldb whole persistence layer and achieve

    # Block + block_builder: block format is defined and how to implement block generation, block block including technical details like the restart point, block access to the access iterator abstract pattern pairs

              # Filter_block: defines the implementation of the filter

    #  Two_level_iterator &  iterator_wrapper &  Iterator & Merger & two_level_iterator: defines the various iterations, thereby masking the underlying data access details

      # Wherein two_level_iterator: packaging operations iterators and index data iterator is essentially a double loop to traverse the key of orderly

      for (traverse an index) {

        for (iterate current index pointing block)

      }

      # Table & table_build: sst file defines the data format and the process of how to generate the sst

 

  db: to achieve a variety of mechanisms, including the above-mentioned, including version management, compact, disaster recovery and other specific implementation details

    #db_impl: definition of database interfaces, and various databases such as compact characteristics

    #log_format & log_reader & log_writer: defines the log file format and literacy, log file here used to implement a disaster recovery backup

              #memtable & skiplist: defines how data leveldb to achieve an orderly stored in memory usage jump table

    #snapshot: responsible for managing snapshots, using a list of ways to manage snapshots

    #version_edit & version_set: responsible for the operation of version management

    

 

             
 

    

   

      

 

Guess you like

Origin www.cnblogs.com/chenhao-zsh/p/11616838.html