HBase for common big data interview questions

1. Why use HBase storage

  • HBase (Hadoop Database) is a reliable, high-performance, scalable, column-oriented distributed database
  • The relationship between HBase and Hadoop is very close. Hadoop's hdfs provides high-reliability underlying storage support, Hadoop MapReduce provides HBase with high-performance computing capabilities, and zookeeper provides Hbase with the stability and failover mechanism. At the same time, other peripheral products For example, Hive can be combined with HBase to make data statistical processing in HBase easier. Sqoop provides HBase with a convenient RDBMS data import function, which makes it easy to migrate traditional database data to HBase, and high-performance memory distribution such as spark. The computing engine may also help us process and analyze the data in HBase more quickly.

2. Rowkey design principles

1. Length principle

  • Rowkey is a binary code stream, it can be any string, the maximum length is 64kb, in actual applications it is generally 10-100byte, stored in the form of byte[], generally designed as a fixed length. It is recommended that the shorter the better, not more than 16 Bytes for the following reasons:
  • The persistent file of data in HFile is stored according to key-value. If the Rowkey is too long, for example, more than 100byte, then the record of 1000w rows, only Rowkey needs to occupy nearly 1G space. This will greatly affect the storage efficiency of HFile
  • MemStore will cache part of the data in the memory. If the Rowkey field is too long, the effective utilization of the memory will be reduced, and more data cannot be cached, thereby reducing retrieval efficiency
  • The current operating systems are all 64-bit systems, the memory is aligned with 8 bytes, and the control is 16 bytes, and the integer multiple of 8 bytes utilizes the best features of the operating system

2. The only principle

  • The uniqueness of Rowkey must be guaranteed in design. Since data storage in HBase is in the form of key-value, if you insert the same Rowkey data into the same table in HBase, the original data will be overwritten by the new data.

3. Sorting principle

  • HBase's Rowkey is sorted in ASCII order, so we should make full use of this when designing Rowkey

4. Hashing principle

  • The designed Rowkey should be evenly distributed on each HBase node

Three. Hbase optimization

1. Table design

  • 1) When the table is built, partition (pre-partition), rowkey set fixed length (64 bytes), CF2 to 3
  • 2)Max Versio, Time to live, Compact&split

2. Write the table

  • 1) Multiple Htables write concurrently to improve throughput
  • 2) Htable parameter setting, manual flush, reduce IO
  • 3)WriteBuffer
  • 4) Batch write to reduce network I/O overhead
  • 5) Multi-threaded concurrent writing, combined with timing flush and write buffer (writeBufferSize), can ensure that when the amount of data is small, the data can be flushed in a short time (such as within 1 second), and it is guaranteed that the amount of data is large. When the write buffer is full, flush in time

3. Read the meter

  • 1) Multiple Htables are read concurrently to improve throughput
  • 2) Htable parameter setting
  • 3) Batch reading
  • 4) Release resources
  • 5) Cache query results

Four. HBase read and write process

1. Metadata storage

  • There is a system table hbase:meta in HBase that stores HBase metadata information, which can be viewed in the HBase Web UI. As shown below.
    Insert picture description here

  • This table record saves the Region address of each table, as well as some other information. For example, the name of the Region, the name of the corresponding table, the start row key, the end row key, and server information. Each row in the hbase:meta table corresponds to a single Region. The data is shown below.
    Insert picture description here

  • ZooKeeper stores the location of the hbase:meta table. The client can find the location of the hbase:meta table through Zookeeper. hbase:meta is a table in HBase and must be managed by an HRegionServer. In fact, it is mainly through Zookeeper's " /hbase/meta-region-server" Get the address of the HRegionServer that stores the "hbase:meta" table.
    Insert picture description here

  • It can be seen from the figure that the location of the hbase:meta table is hadoop103.

2. Reading process

  • The HBase read data process is shown in the figure:
    Insert picture description here

  • 1) The Client first accesses Zookeeper, reads the location of the Region from the meta table, and then reads the data in the meta table. The Region information of the user table is stored in meta;

  • 2) Find the corresponding Region information in the meta table according to RowKey;

  • 3) Find the RegionServer corresponding to this Region;

  • 4) Find the corresponding Region;

  • 5) First find data from MemStore, if not, then read it in BlockCache;

  • 6) BlockCache is not yet, and then read on StoreFile (for read efficiency);

  • 7) If the data read from StoreFile is not directly returned to the client, it is written to BlockCache first, and then returned to the client.
    From the overall perspective, as shown in the following figure:
    Insert picture description here

3. Writing process

The Hbase writing process is shown in the figure:
Insert picture description here

  • 1) The Client accesses Zookeeper to obtain the location (ip) of the meta table.
  • 2) Access the meta table, and then read the data in the meta table.
  • 3) According to the namespace (similar to the database in the relational database, which will be introduced in the next lesson), the table name and RowKey are found in the meta table to which Region the RowKey should be written.
  • 4) Find the RegionServer corresponding to this Region, and send a write data request.
  • 5) HRegionServer writes the data to HLog (Write Ahead Log) first. For data persistence and recovery.
  • 6) HRegionServer writes data to memory (MemStore).
  • 7) It reports that the client is successfully written.

It can also be seen in the data write block that after HBase writes the data into the memory, it returns to the client to write successfully, and the response is very fast. This is why HBase writes data fast.

4. Data Flush Process

  • From the above process of writing data, it can be seen that when HBase writes data to the MemStore memory, it will be returned to the client instead of directly on the disk. This is also the reason why HBase inserts data faster, and the disk IO is very small. So when will the data fall to the disk? In fact, MemStore space is limited. When MemStore data reaches the threshold (the default is 128M, the old version is 64M), RegionServer flashes the data to HDFS, generates HFile, then deletes the data in the memory, and deletes the historical data in HLog. , This operation is done by RegionServer itself.

Five. How Rowkey is designed to avoid hot issues

1.reverse

  • The fixed length Rowkey is inverted and stored, so that the frequently changed part of the Rowkey can be placed at the top, which can effectively randomize the Rowkey.
  • The example of reversing Rowkey usually takes a mobile phone as an example. The string after the mobile phone number is reversed can be used as the Rowkey. This avoids the problem of hot spots caused by the fixed beginning of the mobile phone number (137x, 15x, etc.). The disadvantage of this is Sacrifice the orderliness of Rowkey

2.salt add salt

  • The salt is to add a prefix to each Rowkey, and the prefix uses some random characters to make the data scattered in multiple different regions to achieve the goal of region load balancing
  • For example, in an HBase table with 4 Regions (Note: start with [,a) (a,b) (b,c), (c) as Region), add Rowkey before salt: abc001, abc002, abc003
  • We respectively add the abc prefix, and the Rowkey after adding salt is: a-abc001, b-abc002, c-abc003
  • It can be seen that the Rowkey before salting will be in the second region by default, and the Rowkey data after salting will be distributed in 3 regions. In theory, the throughput after processing should be 3 times the previous. Because the prefix is ​​random Yes, it takes more time to read these data, so salt increases the throughput of write operations, but the disadvantage is that it also increases the overhead of read operations

3.Hash or Mod

  • The advantage of using Hash to replace the random salt prefix is ​​that a given row can have the same prefix, which disperses the load of the Region and enables read operations to be inferred. Deterministic Hash (for example, take the first 4 after mod5) Bit as a prefix) allows the client to reconstruct the complete Rowkey, and use the get operation to directly get the desired row
  • For example, the original Rowkey above is hashed, and here we use the md5 hash algorithm to take the first 4 digits as the prefix and the result is as follows:
  • 9bf0-abc001 (abc001 is 9bf049097142c168c38a94c626eddf3d after md5, and the first 4 digits are 9bf0)
  • 7006-abc002
  • 95e6-abc003
  • If the previous 4 characters are used as the start and end of different partitions, the above several Rowkey data will be distributed in 3 regions. The actual application scenario is that when the amount of data becomes larger and larger, this design will make the partitions more balanced
  • If Rowkey is a number type, you can also consider the Mod method

Six. The smallest storage unit of HBase

  • HRegion is the smallest unit of distributed storage and load balancing in HBase. The smallest unit means that different HRegions can be distributed on different HRegion servers.
  • HRegion is composed of one or more Stores, and each store stores a column family. Each Store is composed of a memStore and 0 to multiple StoreFiles. Each storefile is stored in HFile format on hdfs, and HFile format is stored on hdfs, HFile It is a binary format file of hadoop. In fact, StoreFile is a lightweight packaging for HFile, that is, the bottom layer of StoreFile is HFile

7. How does Hbase perform pre-partitioning and its role

  • HBase has a region when the table is created by default. The rowkey of this region has no boundaries, that is, there is no startkey and endkey. When data is written, all data will be written to this default region. As the data continues to increase, this region has been Unable to withstand the increasing amount of data, it will be split and divided into 2 regions. During this process, two problems will arise:
  • 1. When data is written to a region, there will be hot writing issues.
  • 2.The region split consumes precious cluster I/O resources.
  • Based on this, we can control the creation of multiple empty regions when building the table, and determine the starting and ending rowkey of each region, so as long as our rowkey design can evenly hit each region, there will be no write hot issues. The probability of natural split will also be greatly reduced. Of course, as the number continues to grow, the split still needs to be split. The way to create HBase table partitions in advance like this is called pre-partitioning
  • Pre-partition creation can be achieved through shell or java code
#以下是shell方式
#指明分割点
create 't1','f1',SPLITS=>['10','20','30','40']
#HexStringSplit指明分割策略,-c 10 指明要分割的区域数量,-f指明表中的列族,用":"分割
hbase org.apache.hadoop.hbase.util.RegionSpliter test_table HexStringSplit -c 10 -f f1
#根据文件创建分区并压缩
create 'split_table_test',{
    
    NAME => 'cf',COMPRESSION => 'SNAPPY'},{
    
    SPLITS_FILE => 'region_split_info.txt'}

8. When does HFile in HBase need to be merged into large files and when to split into small files

1. Merge

1. HFile merge

  • 1) Why do I need HFile merge
  • We all know that HBase is a database that can be read and written randomly, and the persistence layer hdfs it is based on is a system that is either added or deleted entirely and cannot be modified. So how does HBase realize our addition, deletion, and modification check? The real situation It's like this: HBase is a Log-Structured Merge Tree architecture mode. HBase is almost always doing new operations. When you add a cell, HBase adds a new data to hdfs. When you modify a cell When you delete a cell, HBase adds a new piece of data to hdfs, but the version number is larger than the previous one (or you define it yourself). When you delete a cell, HBase still adds a new piece of data! But this data has no value, type For DELETE, this piece of data is called Tombstone. When does the actual deletion happen? Because the database has accumulated many additions, deletions, and changes during use, the continuity and sequence of the data will inevitably be destroyed. In order to improve performance , HBase will perform a compaction every interval of time, and the merged object is the HFile file. In addition, as the data writing continues to increase, the number of flushes will continue to increase, and then the HFile data file will increase. However, Too many files will cause an increase in the number of data query IOs, so HBase tries to continue to merge these files
  • There are two types of mergersminor compaction 和 major compaction
  • Minor Compaction: Combine multiple HFiles in the Store into one HFile. In this process, the data that reaches TTL will be removed, but the manually deleted data will not be removed. This type of merger triggers more frequently
  • Major Compaction: Combine all the HFiles in the Store into one HFile. The data that is manually deleted during this process will be truly deleted. At the same time, the version data in the cell that exceeds MaxVersions in the cell will be deleted. This kind of merge trigger frequency is low , The default is once every seven days. However, because Major Compaction consumes a lot of performance, you don’t want it to happen during peak business periods. It is recommended to manually control the timing of Major Compaction.
  • Note: Major Compaction merges HFiles in a store into one HFile file, not all HFiles in a Region are merged into one file

2.Compaction execution time

  • The timing of triggering Compaction is as follows:
  • Use the CompactionChecker thread to periodically check whether compaction is required (initialized in initializeThreads() when the RegionServer starts), and check every 10000 seconds (configurable)
  • Whenever a Memstore flush operation occurs in the RegionServer, it will also check whether a Compaction operation is required
  • Manual trigger, execute command: major_compact ,compact

3.Compaction related control parameters

  • The following configuration is above version 2.0
  • 1)Minor Compaction
Attribute value Defaults meaning
hbase.hstore.compaction.max 10 Indicates that a maximum of 10 store files can be selected in a minor compaction
hbase.hstore.compaction.min 3 Indicates that minor compaction will only start when at least three store files that meet the conditions are required
hbase.hstore.compaction.min.size Indicates that the store file whose file size is smaller than this value will definitely be added to the store file of minor compaction
hbase.hstore.compaction.max.size Store files with files larger than this value will be excluded
hbase.hstore.compaction.ratio 1.2 Sort store files according to file age (order to younger), minor compaction always starts from the old store
  • Sort StoreFile according to file age. Minor comcpation always starts from the older store file. The calculation formula: the file <(the sum of all file sizes-the file size) * scale factor. That is to say, if the size of the file is smaller than the following hbase. The sum of hstore.compaction.max store file fize is multiplied by the value of ratio, then the store file will be added to the minor compaction. If the number of files meeting the minor compaction conditions is greater than hbas.hstore.compaction.min, it will start.
  • If the file size is less than the minimum combined size (minCompactSize), you don't even need to apply the above formula and enter the list with combined merge directly. The configuration item with the minimum combined size: hbase.hstore.compaction.min.size. If this configuration is not set Item, use hbase.hregion.memstore.flush.size The selected file must pass the above-mentioned filtering conditions, and the number of files contained in the combination must be greater than hbase.hstore.compcation.min and less than hbase.hstore. compcation.max. There are too few files, there is no need to merge, and resources are wasted; too many files are too resource-intensive, and I am afraid that the machine can't stand it
  • The above selection method will form multiple store file combinations that meet the conditions, and then compare which file combination contains more files, which combination will be merged. If there is a tie, choose which file size is a smaller combination
  • 2)Major Compaction
  • hbase.hregion.majorcompaction: The period of major compaction, in milliseconds, the default value is seven days
  • Note: Although there are the above mechanisms to control the timing of Major Compaction, because the pressure on the system during Major Compaction is very large, it is recommended to turn off the automatic Major Compaction (hbase.hregion.majorcompaction=0), and use manual triggering on a regular basis. Perform Major Compaction.
  • The manual Major Compaction command is: major_compact, as follows:
#Compact all region in a table:
hbase> major_compact't1'
hbase> major_compact'ns1:t1'
#Compact an entire region:
hbase> major_compcat'r1'
#Compact a single column family within a region:
hbase> major_compact'r1','c1'
#Compact a single column family within a table:
hbase> major_compact't1','c1'

2. Split

1) ConstantSizeRegionSplitPolicy (understand the content)

  • In version 0.94, HBase had only one split strategy. This strategy is very simple. It can be seen from the name that this strategy is to split the Region according to a fixed size. The parameters that control it are:
hbase.hregion.max.filesize The maximum size of the Region. The default is 10GB
  • When the size of a single Region exceeds 10GB, it will be split into 2 Regions by HBase. The region size in the cluster after adopting this strategy is very average. Since this strategy is too simple, it will not be explained in detail.

2) IncreasingToUpperBoundRegionSplitPolicy (default after version 0.94)

  • After 0.94 version, there is IncreasingToUpperBoundRegionSplitPolicy strategy. This strategy can be seen from the name is a strategy to limit the growing file size. Rely on the following formula to calculate:

  • Math.min(tableRegionsCounts^3initialSize,defaultRegionMaxFileSize)

  • tableRegionCount: The sum of the number of Regions that the table has on all RegionServers. initialSize: If hbase.increasion.policy.initial.size is defined, use this value.

  • If it is not defined, use 2 times the flash size of memstore, hbase.hregion.memstore.flush.size*2

  • defaultRegionMaxFileSize: hbase.hregion.max.filesize used by ConstantSizeRegionSplitPolicy, which is the maximum size of the Region.

  • If hbase.hregion.memstore.flush.size is defined as 128MB, then the upper limit increase of the file size will be like this:

  • (1). When there is only one region at the beginning, the upper limit is 256MB, because 1^3*128*2=256MB.

  • (2). When there are 2 regions, the upper limit is 2GB, because 2^3*128*2=2048MB.

  • (3). When there are 3 files, the upper limit is 6.75GB, because 3^3*128*2=6912MB.

  • (4). By analogy, until the calculated upper limit reaches 10GB defined by hbase.hregion.max.filesize region.
    The trend is as follows:
    Insert picture description here

  • When the number of regions reaches 4, since the calculated upper limit has reached 16GB, which is already greater than 10GB, the file size upper limit will not increase when the number of regions increases. IncreasingToUpperBoundRegionSplitPolicy is the default configuration in the latest version.

3) KeyPrefixRegionSplitPolicy (extended content)

  • In addition to the simple and rude splitting based on size, we can also define the split point by ourselves. KeyPrefixRegionSplitPolicy is a subclass of IncreasingToUpperBoundRegionSplitPolicy. On the basis of the former, the definition of split point (splitPoint, which is the rowkey where the Region is split) is added. It ensures that rowkeys with the same prefix will not be split into two different Regions. The parameters used in this strategy are as follows:
KeyPrefixRegionSplitPolicy.prefix_length prefix length of rowkey
  • This strategy will intercept rowkey according to the length defined by KeyPrefixRegionSplitPolicy.prefix_length as the basis for grouping, and the data of the same group will not be divided into different Regions. For example, the rowkeys are all 16 bits, and the first 5 bits are designated as prefixes, then the same rowkeys with the same first 5 bits will be allocated to the same region during region split. The difference between splitting with the default policy and splitting with KeyPrefixRegionSplitPolicy is as follows.

  • The result of splitting using the default strategy is shown in the figure.
    Insert picture description here

  • The result of splitting with KeyPrefixRegionSplitPolicy (first 2 bits) is shown in the figure.
    Insert picture description here

  • If all your data has only one or two prefixes, then the KeyPrefixRegionSplitPolicy is invalid, and the default policy is better at this time. If your prefix is ​​finely divided, your query is more prone to cross-Region queries. At this time, it is better to use KeyPrefixRegionSplitPolicy.

  • So the applicable scenario of this strategy is: data has multiple prefixes. The query is mostly for prefixes, and it is relatively rare to query data across multiple prefixes.

4) DelimitedKeyPrefixRegionSplitPolicy (extended content)

  • This strategy is also inherited from IncreasingToUpperBoundRegionSplitPolicy, which is also split according to your Rowkey prefix. The only difference is: KeyPrefixRegionSplitPolicy is judged based on the fixed first few characters of the rowkey, while DelimitedKeyPrefixRegionSplitPolicy is judged based on the separator. Sometimes the prefix of rowkey may not always be a fixed length. For example, if you use the name of the server as the prefix, some servers are called host12 and some are called host1. In these scenarios, it may be difficult to strictly require all prefixes to be fixed length, and this fixed length is not easy if you want to change it in the future. DelimitedKeyPrefixRegionSplitPolicy gives you the freedom to define the length character prefix. To use this strategy, you need to add the following attributes to the table definition:
    DelimitedKeyPrefixRegionSplitPolicy.delimiter: prefix delimiter. For example, if you define the prefix separator as _, then the prefixes of host1_001 and host12_999 are host1 and host12 respectively.

5) BusyRegionSplitPolicy (extended content)

  • The previous split strategy did not consider hot issues. The so-called hot issue is that the regions in the database are accessed differently. Some regions are accessed very frequently in a short period of time, which carries a lot of pressure. These regions are hot regions. BusyRegionSplitPolicy was created to solve this scenario. How does it determine which Region is a hot spot? First introduce the parameters it uses:
  • hbase.busy.policy.blockedRequests: request blocking rate, that is, the severity of the request being blocked. The value range is 0.0~1.0, and the default is 0.2, which means that 20% of requests are blocked.
  • hbase.busy.policy.minAge: The minimum age for splitting. When the age of the Region is smaller than this, it will not split. This is to prevent short-term access frequency peaks when judging whether to split. The result is unnecessary The split Region was split, because the short-term wave peak quickly dropped back to normal levels. The unit is milliseconds. The default value is 600000, which is 10 minutes.
  • hbase.busy.policy.aggWindow: the time window for calculating whether it is busy, in milliseconds, the default value is 300000, which is 5 minutes. Used to control the frequency of calculation. The calculation method for calculating whether the Region is busy is as follows:
  • If "current time-last detection time >= hbase.busy.policy.aggWindow", then the following calculation is performed: requests blocked during this period/total requests during this period = blocked rate of requests (aggBlockedRate)
  • If "aggBlockedRate> hbase.busy.policy.blockedRequests", the Region is judged to be busy.
  • If your system often has hot regions and you have a high pursuit of performance, then this strategy may be more suitable for you. It will relieve the pressure of the hot region by splitting the hot region, but splitting the region according to the hot spot will also bring a lot of uncertainties, because you don't know which Region will be split next.

6)DisableRegionSplitPolicy

  • This strategy is not actually a strategy. If you look at the source code of this strategy, you will find a method shouldSplit, and always return false. Set to this strategy is that Region will never split automatically. If you use DisableRegionSplitPolicy to make the Region never split automatically, you can still split the Region manually. What is the use of this strategy? No matter which split strategy you have set, when data enters Hbase at the beginning, only one Region will be filled with data. It is necessary to wait until the size of a Region expands to a certain threshold before it is split according to the split strategy. However, when a large amount of data is poured in, a large amount of data may be written while splitting. Since splitting takes up a lot of IO, it may cause certain pressure on the database. If you know in advance what strategy the Table should use to split the Region, you can also define the split point (SplitPoint) in advance. The so-called split point is the rowkey at the split point. For example, you can define 25 split points by 26 letters, so that once the data arrives in HBase, it will be assigned to the respective Region. At this time, we can turn off the automatic splitting and only use manual splitting. There are two situations for manual splitting: pre-splitting and forced splits.

Nine. Why HBase queries are faster

  • The main reason is determined by its architecture and underlying data structure, which is determined by LSM-Tree (Log-Structured Merge-Tree) + HTable (region partition) + Cache
  • The client can directly locate the Hregion-server server where the data to be checked is located, and then directly find the data to be matched on a region of the server, and these data parts are cached by the cache
  • Hbase will save the data in the memory. The data in the memory is in order. If the memory space is full, it will be flashed to HFile, and the content saved in HFile is also in order. When the data is written to HFile , The data in the memory will be discarded. HFile files are optimized for disk sequential reading
  • Hbase's writing speed is fast because it does not actually write to the file immediately, but first writes to the memory, and then asynchronously flushes the HFile. So from the perspective of the client, the writing speed is very fast. In addition, writing Random writing is converted to sequential writing when entering, and the data writing speed is also very stable. The reading speed is very fast because it uses the LSM tree structure instead of B or B+ tree. The disk sequential reading speed is fast, but relatively In comparison, the speed of finding tracks is much slower. The storage structure of Hbase requires that the seek time of the disk is within a predictable range, and reading any number of records continuous with the rowkey to be queried will not cause additional Seek overhead. For example, if there are five storage files, then up to 5 disk seeks are enough. For relational databases, even if there is an index, the number of disk seeks cannot be determined. Moreover. HBase reads will first be in the cache (BlockCache ), it uses LRU (Least Recently Used Algorithm). If it is not found in the cache, it will be searched from the MemStore in memory. Only when neither of these two places can be found, will the content in HFile be loaded

Guess you like

Origin blog.csdn.net/sun_0128/article/details/108192423