HBase summary (7) LSM understanding

Before talking about the LSM tree, we need to mention three basic storage engines, so that we can understand the origin of the LSM tree :

  • The hash storage engine is a persistent implementation of the hash table. It supports addition, deletion, modification, and random read operations, but does not support sequential scanning . The corresponding storage system is a key-value storage system. For key-value insertion and query, the complexity of the hash table is O(1), which is obviously faster than the tree operation O(n). If you do not need to traverse the data in order, the hash table is your Mr.Right
  • The B-tree storage engine is a persistent implementation of the B-tree (for the origin, data structure and application scenarios of the B-tree, see the previous blog post) . It not only supports the addition, deletion, reading, and modification operations of a single record, but also supports sequential scanning ( The pointer between the leaf nodes of the B+ tree), the corresponding storage system is the relational database (Mysql, etc.).
  • The LSM tree (Log-Structured Merge Tree) storage engine, like the B-tree storage engine, also supports add, delete, read, modify, and sequential scan operations. Moreover, the problem of random write to disk is avoided through batch storage technology. Of course, everything has its pros and cons. Compared with the B+ tree, the LSM tree sacrifices part of the read performance to greatly improve the write performance.

Through the above analysis, you should know the origin of the LSM tree. The design idea of ​​the LSM tree is very simple: keep the incremental modification of the data in memory, and write these modification operations to the disk in batches after reaching the specified size limit , but read It is a little troublesome to fetch. It needs to merge the historical data in the disk and the recent modification operations in the memory, so the writing performance is greatly improved. When reading, you may need to check whether the memory is hit first, otherwise you need to access more disk files. In extreme terms, the write performance of HBase based on the LSM tree is an order of magnitude higher than that of MySQL , and the read performance is an order of magnitude lower.

The principle of LSM tree splits a large tree into N small trees. It is first written into the memory. As the small tree grows, the small tree in the memory will be flushed to the disk, and the tree in the disk can be done periodically. The merge operation is merged into a large tree to optimize read performance.



 
The above are probably the main ideas of HBase storage design, here are the corresponding descriptions:

  • Because the small tree is written to the memory first, in order to prevent the loss of memory data, it needs to be temporarily persisted to the disk while writing the memory, which corresponds to HBase's MemStore and HLog
  • After the tree on the MemStore reaches a certain size, it needs to be flushed to the HRegion disk (usually Hadoop DataNode), so that the MemStore becomes the disk file StoreFile on the DataNode, and the HRegionServer periodically merges the data of the DataNode to completely delete the invalid space. Multiple small trees are merged into a large tree at this time to enhance read performance.

 

Regarding LSM Tree, for the simplest two-layer LSM Tree, the data in memory and the data in disk merge operation, as shown below



 Figure from the lsm paper

The lsm tree, in theory, can be a part of the in-memory tree and the first-level tree in the disk for merge. The direct update operation for the tree in the disk may destroy the continuity of the physical block, but in practical applications, generally lsm has Multi-layer, when the small trees in the disk are merged into a large tree, the order can be rearranged to make the blocks continuous and optimize the read performance.

In the implementation of hbase, after the entire memory is at a certain threshold, flush to disk to form a file. The storage of this file is also a small B+ tree, because hbase is generally deployed on hdfs, hdfs does not support file access update operation, so hbase flushes the overall memory instead of merging update with the small tree in the disk, this design can also make sense. Small trees that are flushed to disk are periodically merged into a large tree. On the whole, hbase uses the idea of ​​lsm tree.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326946050&siteId=291194637