Figure easily understand the data structures and algorithms series (NoSQL storage -LSM tree) - full text

"Figure it easy to understand data structures and algorithms", the main use images to describe common data structures and algorithms, easy to read and understand and grasp. This series includes various stacks, various queues, the various lists, a variety of tree, the various figures, like various sorting dozens like.

About LSM tree

LSM tree, i.e., the log-structured merge tree (Log-Structured Merge-Tree). In fact, it does not belong to a specific data structure, it is more of a data structure design. Most NoSQL databases are based on the core idea of ​​LSM to do, just different specific implementation. We did not intend to be included in the series, but a friend of mine a couple of times a message let me tell LSM tree, then the tree say about LSM.

LSM tree background birth

Btree using traditional relational database or some variant thereof as a storage configuration, lookup can be carried out efficiently. But when saved to disk it also has an obvious flaw, it is logically linked very close but they may be separated physically very far away, which may cause a lot of random disk read and write. Random read and write a lot slower than the sequential read and write, in order to improve IO performance, we need a mechanism capable of random operations become the order of operations, so there will LSM tree. LSM tree allows us to carry out the order of writing to the disk, thus greatly enhancing the write operation, as the cost of sacrificing some of the read performance.

About Disk IO

Relates to the disk to read and write data on a disk to find the address typically consists of three cylinder number, disk number and block number. That is the first mobile arm moved in accordance with the specified cylinder to cylinder number, the track and the disk is determined according to the disk number, the last block number in accordance with the specified segment track to move the magnetic head to read and write can begin.

The whole process has three main parts time consuming to find a time (seek time) + waiting time (latency time) + transmission time (transmission time). Consuming positioning cylinder respectively, the track segment block number specified time consuming to move the magnetic head, the time-consuming data to the memory. The entire disk IO most time-consuming to find a place in time, it can reduce the search time significantly improve performance.

LSM tree principle

LSM tree by two or more storage structures, such as paper for convenience of explanation in the simplest use of two storage structure. A permanent memory storage structure, called C0 tree, concrete can be any convenient health value lookup data structures, such as red-black tree, map like, or even a jump table. Another resident in the hard disk storage structure, called C1 tree, a specific configuration similar to the B-tree. All nodes C1 is 100% full, the node size of disk block size.

Inserting step

The general idea is: When you insert a new record, first insert in the log file operation log to use later restored, the log is append form insert, so very fast; the new record is inserted into the index in C0, here in memory completed, the operation does not involve disk IO; C0 when the size reaches a certain threshold at periodic intervals, the recorded C0 C1 scroll incorporated into the disk; in the case of a plurality of storage structure, when increasing the amount of C1 to body C2 to merge, and so on, has been up merger Ck.

Step merger

Merge process uses two blocks: emptying block and filling block.

C1 is not read from the combined leaf node is placed in memory emptying block. From small to large to find a node C0, merge sort and emptying block, save the combined results to the filling block, delete and C0 corresponding node. Continue to step 2 operation, merge sort the results continue to fill in the filling block, when it is full it will append it to a new location on the disk, rather than additional attention is to change the original node. The Palace emptying block is finished using the re-read from the non-leaf nodes C1 merged during the merge. C0 and C1 all leaf nodes are more pressing after the completion of the merger once the merger is completed.

About optimization measures

In this paper, with the basic principle of LSM, but in actual fact, there are many project optimization strategy, and there are a lot of trees for LSM optimized paper. For example, using the Bloom filter to quickly determine whether there is key, as well as do some additional index to help quickly find the records.

Insert

LSM is inserted into the tree

A E L R U

First will be inserted into the memory C0 tree, where the use AVL tree, insert "A", the first entry of additional recording disk log file, and then insert C0,

Insert "E", is also the first additional log write memory,

Continued insertion of "L", the following rotation,

Insert "R" "U", after the final rotation follows.

Assuming that the combined trigger, since the C1 tree yet, so emptying block is empty, looking sequentially from the lowest node directly C0 tree. filling block length is 4, assuming a block size of the disk 4.

Start looking for the smallest node, and placed in the filling block,

Continue to look for the second node,

So, fill filling block,

Began to be written to disk, C1 tree,

Continue insert

B F N T

, Respectively, first write the log, then inserted into the C0 tree in memory,

If at this time were combined, to load the leftmost leaf node C1 to emptying block,

Next, the node tree and C0 emptying block merge sort, is the first "A" into the filling block,

Then "B",

The end result is merge sort,

The filling block appended to the new disk location, the original nodes removed,

Continue merge sort, fill filling block again,

The filling block is added to a new location on the disk, but also to the nodes on layer disk blocks (or more disk block) size of the write, try to avoid random write. In addition, as the merger process may lead to update the upper node, can be temporarily stored in memory, the back is written at the right time.

Find operation

Find The overall idea is to go first memory tree C0, C1 can not find the disk to find a tree, then the tree is C2, and so on.

If you're looking for "B", first find C0 tree, did not find.

Then look for C1 tree, starting from the root,

Find the "B".

Deletion

In order to quickly perform a delete operation, mainly achieved by marking, in the memory about the recording mark to be deleted, the corresponding record deleted later asynchronous execution merge.

For example, to delete the "U", assuming that marked the # means to delete, "U" node of the tree becomes C0,

And if there is no tree records C0, C0 is generated in a tree node, and labeled #, you can find that in memory when the record has been deleted, the disk without having to go to find them. For example, to delete the "B", then there is no need to delete the disk operating directly in the C0 tree, insert a "B" node, and labeled #.

 
Reprinted from: http: //www.qhmoney.cn/media/104227_all.html

Guess you like

Origin www.cnblogs.com/xibuhaohao/p/11880344.html