How to avoid HBase Hotspotting（解决HBase热点问题）

HBase hotspotting occurs when large amount of traffic from various clients redirected to single or very few numbers of nodes in the cluster. The HBase hotspotting occurs because of bad row key design. In this article, we will see how to avoid HBase hotspotting or region server hotspotting.

How Does HBase hotspotting occurs?

HBase hotspotting occurs because of poorly designed row key. Because of bad row key, HBase stores large amount of data on single node and entire traffic is redirected to this node when client requests some data leaving other node idle.

This traffic may represent reads, writes, or store operations. The entire traffic would go to single machine responsible for hosting that region containing required data, this issue causes performance degradation and sometimes causes region unavailability.

You schema should be in such a way that, data should evenly distribute across all the regions in all the nodes available in cluster.

How to avoid HBase Hotspotting?

So the question is how to avoid Hbase hotspotting?

Answer to this question is lies in your schema and row key design. Design your row key in such a way that data being written should go to multiple regions across the cluster.

There are some techniques that can be used to avoid hotspotting. There are some pros and cons of these techniques.

Below are some of techniques use to avoid hotspotting:

Salting

Salting is nothing but appending random assigned value to the start of row key. The number of different random values depends upon the number of regions in the cluster.

Salting process is helpful when you have small number of fixed number of row keys those come up over and over again.

For examples, let us consider you have below four row key values:

machine0001machine0002machine0003machine0004

If you would like to write thsese across four different regions. You can use the four letters a, b, c and d. The updated values would be:

a-machine0001b-machine0002c-machine0003d-machine0004

The problem with salting is, if you add one more machine details then salting will end up assigning one of four values randomly and end up storing in one of the four regions.

Hashing

Hashing mechanism is using hash functions to assign values instead of using random mechanism.

You can use the one-way hash function that would allow row being stored is always be “salted” with the same prefix, that would spread load across regionServers.

Reversing the Key

A third common technique for preventing hotspotting is to reverse a fixed-width or numeric row key so that the part that changes the most often is first.

Read:

转载自： How to avoid HBase Hotspotting?

===========================================

Rowkey Design

36.1. Hotspotting

Rows in HBase are sorted lexicographically by row key. This design optimizes for scans, allowing you to store related rows, or rows that will be read together, near each other. However, poorly designed row keys are a common source of hotspotting. Hotspotting occurs when a large amount of client traffic is directed at one node, or only a few nodes, of a cluster. This traffic may represent reads, writes, or other operations. The traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability. This can also have adverse effects on other regions hosted by the same region server as that host is unable to service the requested load. It is important to design data access patterns such that the cluster is fully and evenly utilized.

To prevent hotspotting on writes, design your row keys such that rows that truly do need to be in the same region are, but in the bigger picture, data is being written to multiple regions across the cluster, rather than one at a time. Some common techniques for avoiding hotspotting are described below, along with some of their advantages and drawbacks.

Salting

Salting in this sense has nothing to do with cryptography, but refers to adding random data to the start of a row key. In this case, salting refers to adding a randomly-assigned prefix to the row key to cause it to sort differently than it otherwise would. The number of possible prefixes correspond to the number of regions you want to spread the data across. Salting can be helpful if you have a few "hot" row key patterns which come up over and over amongst other more evenly-distributed rows. Consider the following example, which shows that salting can spread write load across multiple RegionServers, and illustrates some of the negative implications for reads.

Example 11. Salting Example

Suppose you have the following list of row keys, and your table is split such that there is one region for each letter of the alphabet. Prefix 'a' is one region, prefix 'b' is another. In this table, all rows starting with 'f' are in the same region. This example focuses on rows with keys like the following:

foo0001
foo0002
foo0003
foo0004

Now, imagine that you would like to spread these across four different regions. You decide to use four different salts: a, b, c, and d. In this scenario, each of these letter prefixes will be on a different region. After applying the salts, you have the following rowkeys instead. Since you can now write to four separate regions, you theoretically have four times the throughput when writing that you would have if all the writes were going to the same region.

a-foo0003
b-foo0001
c-foo0004
d-foo0002

Then, if you add another row, it will randomly be assigned one of the four possible salt values and end up near one of the existing rows.

a-foo0003
b-foo0001
c-foo0003
c-foo0004
d-foo0002

Since this assignment will be random, you will need to do more work if you want to retrieve the rows in lexicographic order. In this way, salting attempts to increase throughput on writes, but has a cost during reads.

Hashing

Instead of a random assignment, you could use a one-way hash that would cause a given row to always be "salted" with the same prefix, in a way that would spread the load across the RegionServers, but allow for predictability during reads. Using a deterministic hash allows the client to reconstruct the complete rowkey and use a Get operation to retrieve that row as normal.

Example 12. Hashing Example

Given the same situation in the salting example above, you could instead apply a one-way hash that would cause the row with key foo0003 to always, and predictably, receive the a prefix. Then, to retrieve that row, you would already know the key. You could also optimize things so that certain pairs of keys were always in the same region, for instance.

Reversing the Key

A third common trick for preventing hotspotting is to reverse a fixed-width or numeric row key so that the part that changes the most often (the least significant digit) is first. This effectively randomizes row keys, but sacrifices row ordering properties.

See https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and article on Salted Tables from the Phoenix project, and the discussion in the comments of HBASE-11682 for more information about avoiding hotspotting.

转载自： hbase官方文档