HBase Best Practices - Cluster Planning

This article is published by  NetEase Cloud .  

 

Author: Fan Xinxin

This article is only for sharing on this site. If you need to reprint, please contact NetEase for authorization.

 

 

HBase itself has excellent scalability, and therefore, building an extended cluster is one of its natural strengths. In actual online applications, many services run on a cluster, and services share cluster hardware and software resources. The question is, which services should run on a cluster to make the most of the hardware and software resources of the system? In addition, for a given business, how should the hardware capacity of the cluster be planned so that resources are not wasted? Finally, how many are deployed on a given RegionServer?

Region is more suitable? Presumably these questions have confused a lot of HBaser, then this article will briefly analyze these three questions based on the sharing of predecessors and the author's experience, and I hope everyone can conduct in-depth exchanges on these topics!

 

Cluster business planning

 

Generally speaking, an HBase cluster rarely runs only one business. In most cases, multiple businesses share the cluster, which actually means sharing system software and hardware resources. Two major problems are usually involved here. One is the problem of resource isolation between businesses, which is to logically isolate each business and not affect each other. This problem arises in the business sharing scenario once a certain business has heavy traffic for a period of time. The increase will inevitably affect other services due to excessive consumption of system resources; the second is how to maximize the utilization of system resources in the case of sharing. Ideally, it is of course hope that all software and hardware resources in the cluster are utilized to the greatest extent. The former will not be discussed this time, and a 'special session' will be held later. This section mainly discusses the latter.

To maximize the utilization of cluster system resources, it first depends on the business requirements for system resources. After sorting out online businesses, these businesses can usually be divided into the following categories:

 

1. Hard disk capacity-sensitive business: This kind of business does not have great requirements on read and write latency and throughput, and the only requirement is the hard disk capacity. For example, in most offline read/write analysis services, upper-layer applications generally write a large amount of data in batches at regular intervals, and then read a large amount of data in batches at regular intervals. Features: Offline writing, offline reading, requires hard disk capacity

2. Bandwidth-sensitive services: Most of these services have high write throughput, but have little requirement for read throughput. For example, in the real-time log storage business, the upper-layer application transmits massive logs in real time through kafka, requiring real-time writing, and the reading scenario is generally offline analysis or log retrieval when the last business encounters an exception. Features: online writing, offline reading, bandwidth requirements

3. IO-sensitive business: Compared with the previous two types of business, IO-sensitive business is generally a more core business. Such services have higher requirements for read and write latency, especially for read latency that is usually within 100ms, and some services may have higher requirements. Such as online message storage system, historical order system, real-time recommendation system, etc. Features: (offline) writing, online reading, memory requirements, high IOPS media

(As for CPU resources, HBase itself is a CPU-sensitive system, which is mainly used for data block compression/decompression. All businesses have common requirements for CPU)

 

If a cluster wants to maximize resource utilization, one idea is to "enhance strengths and avoid weaknesses" among various businesses, make reasonable arrangements, and get what they need. In fact, the above types of services can be mixed and distributed. It is recommended not to distribute too many services of the same type in the same cluster. Therefore, the theoretically more efficient configuration of resource utilization of a cluster is: hard disk-sensitive services + bandwidth-sensitive services + IO-sensitive services.

In addition, in addition to the issue of maximizing resource utilization, cluster business planning also needs to consider the actual operation and maintenance requirements. It is recommended to distribute core services and non-core services in the same cluster, and it is strongly recommended not to distribute too many core services in the same cluster at the same time. There are two main considerations for this:

1. On the one hand, because of 'one mountain does not allow two tigers', the shared resources of the core business will inevitably lead to competition. Once there is competition, no matter which business is "lost", we are not willing to see it;

2. On the other hand, it is convenient for the operation and maintenance of children's shoes to be downgraded in special scenarios. For example, in big promotion activities such as Taobao Double Eleven, a certain core business is expected to have a large influx of traffic. In order to ensure the stability of the core business , in the case of resource sharing, other non-core businesses can only be sacrificed, and the resource usage of these businesses can be restricted on the basis of full communication with non-core business parties, and even these non-core businesses can be stopped directly when the traffic is limited. Just imagine, if many core businesses share the cluster, which core business is willing to give way easily?

Then some students said: If you design it like this, it will produce many small clusters. Indeed, this design will generate many small clusters. I believe that if there is no resource isolation, small clusters cannot be avoided. Some clusters that use 'rsgroup' for business resource isolation will be very large, and large clusters will independently distribute business to many independent RSs through isolation, so in fact, many small logical clusters are generated, then, these small clusters Clusters also apply the planning ideas proposed above.

 

The hard disk specification of the RegionServer is 3.6T * 12, and the total memory size is 128G. In theory, will this configuration lead to waste of resources? If there is, is it a waste of hard disk or a waste of memory? What should a reasonable hard drive/memory mix look like? What are the influencing factors?

A concept of 'Disk / Java Heap Ratio' needs to be proposed here, which means that the size of the Java memory of 1bytes on a RegionServer needs to be matched with the most reasonable hard disk size. Before giving a reasonable explanation, first give the results:

 

Disk Size / Java Heap = RegionSize / MemstoreSize * ReplicationFactor * HeapFractionForMemstore * 2

 

According to the default configuration, RegionSize = 10G, the corresponding parameter is hbase.hregion.max.filesize; MemstoreSize = 128M, the corresponding parameter is hbase.hregion.memstore.flush.size; ReplicationFactor = 3, the corresponding parameter is dfs.replication; HeapFractionForMemstore = 0.4 , the corresponding parameter is hbase.regionserver.global.memstore.lowerLimit;

The calculation is: 10G / 128M * 3 * 0.4 * 2 = 192, which means that the Java memory size of 1bytes on the RegionServer needs to be matched with the hard disk size of 192bytes. Going back to the question given before, the total memory size of 128G, take 96G is used as Java memory for RegionServer, which corresponds to 96G * 192 = 18T hard disk capacity, but the actual purchased machine is configured with 36T, indicating that almost half of the hard disk will be wasted under the default configuration.

 

How does the calculation formula 'take' out?

 

Let's go back and see how the calculation formula came out. In fact, it is very simple. It only needs to calculate the number of Regions from the latitude of hard disk capacity and the latitude of Java Heap, and then make the two equal to deduce it, as follows:

The number of Regions under the latitude of hard disk capacity: Disk Size / (RegionSize *ReplicationFactor)

 

Number of Regions under Java Heap latitude: Java Heap * HeapFractionForMemstore / (MemstoreSize / 2 )

 

Disk Size / (RegionSize *ReplicationFactor) = Java Heap * HeapFractionForMemstore / (MemstoreSize / 2 )

 

=> Disk Size / Java Heap = RegionSize / MemstoreSize * ReplicationFactor * HeapFractionForMemstore * 2

 

What is the specific meaning of such a formula?

 

1. The most intuitive meaning is to judge whether there will be waste of resources under the current given configuration, and whether the memory resources and hard disk resources match.

 

2. In turn, if hardware resources have been given, for example, the hardware procurement department has purchased 128G of current machine memory, allocated 96G to Java Heap, and the hard disk is

40T, obviously the two do not match. Can the two match by modifying the HBase configuration? Of course, you can increase RegionSize or decrease

MemstoreSize is implemented, for example, the default RegionSize is increased from 10G to 20G. At this time, Disk Size / Java Heap = 384, 96G * 384 = 36T, which can basically match the hard disk and memory.

3. In addition, if the memory and hard disk do not match in a given configuration, in the actual scenario, is the memory 'wasted' better or the hard disk 'wasted' better? The answer is that the memory is 'wasted'. For example, the purchased machine Java Heap can be allocated to 126G, and the total hard disk capacity is only 18T. The default configuration must be a waste of Java Heap, but you can modify the HBase configuration to allocate excess memory resources to HBase. Read cache BlockCache, so that you can ensure that Java Heap is not actually wasted.

 

Also, there are these resources to be aware of…

 

Bandwidth resources: Because HBase consumes network bandwidth resources during a large number of scans and high-throughput writes, it is strongly recommended that the HBase cluster be deployed in a 10G switch room, and a single machine is preferably a 10G network card + bond. If the switch is a Gigabit NIC in a special case, it must be ensured that all RegionServer machines are deployed on the same switch. Cross-switches will cause a large write delay and seriously affect the service write performance.

CPU resources: HBase is a CPU-sensitive business. Whether data is written or read, it will consume computing resources due to a large number of compression and decompression operations. So for HBase, the more CPU the better.

refer to:

Region planning mainly involves two aspects: planning of the number of regions and planning of the size of a single region. These two aspects are not independent, but related to each other. The number of regions corresponding to a large region is small, and the number of regions corresponding to a small region is large. . Region planning is believed to be a concern of many HBase operation and maintenance students. A given specification

How many Regions are running on the RegionServer is more appropriate. When I first started to contact HBase, this problem has always plagued the author. In practical applications, too many or too few Regions have certain advantages and disadvantages:

 

 

advantage

shortcoming

 

 

 

 

Lots of small Regions

 

 

 

1. It is more conducive to load distribution between clusters. 2. It is conducive to efficient and stable Compaction. This is because the HFile in a small Region is relatively small and the Compaction cost is small. For details, see: Stripe Compaction

1. The most direct impact: In the case of abnormal downtime or restart of a RegionServer, the reallocation and migration of a large number of small Regions is a very time-consuming operation.

Region migration takes about 1.5s to 2.5s. The more regions there are, the longer the migration time. Directly lead to a long time for failover.

 

2. A large number of small Regions may generate more frequent flushes, generate many small files, and then cause unnecessary Compactions. In special scenarios, once the number of Regions exceeds a threshold, it will cause flushing at the entire RegionServer level, which will severely block users from reading and writing.

 

3. RegionServer management and maintenance costs are high

 

A few large Regions

1. Conducive to the rapid restart and downtime recovery of RegionServer

2. Can reduce the total number of RCPs

3. Facilitates fewer and larger flushes

1. The compaction effect is very poor, which will cause large data write jitter and poor stability

 

2. Not conducive to load balancing between clusters

 

It can be seen that in the current working mode of HBase, it is not a good thing to have too many or too few Regions, and a compromise point needs to be selected in the actual online environment. A recommended range given by the official document is between 20 and 200, while the size of a single Region is controlled at 10G to 30G, which is more in line with the actual situation.

 

However, HBase cannot directly configure the number of Regions on a RegionServer. The number of Regions most directly depends on the size of RegionSize. Configure hbase.hregion.max.filesize. HBase believes that once the size of a Region is larger than the configured value, it will be split. .

 

The default value of hbase.hregion.max.filesize is 10G. If a RegionServer is expected to run 100 Regions, the estimated data volume on a single RegionServer is: 10G * 100 * 3 = 3T. Conversely, if a RegionServer wants to store 12T data, then if a single Region is calculated as 10G, 400 Regions will be split, which is obviously unreasonable. At this time, you need to adjust the parameter hbase.hregion.max.filesize, and adjust the value to 20G or 30G. In fact, the hard disks that can be configured on a single physical machine are getting larger and larger. For example, 36T is very common. If you want to use all the capacity to store data, it is still assumed that 100 Regions are distributed on a RegionServer, then each Region The size will reach a terrible 120G, once the Compaction will be a disaster.

 

It can be seen that for the current HBase, if you want HBase to work more smoothly (the number of Regions is controlled between 20 and 200, and the size of a single Region is controlled between 10G and 30G), the maximum amount of data that can be stored is almost 200 * 30G *3 = 18T. If the amount of stored data exceeds 18T, it will inevitably cause more or less performance problems. Therefore, from the perspective of Region scale, the upper limit of the hard disk capacity that can be reasonably utilized by a single RegionServer is basically 18T.

 

However, with the continuous reduction of hardware costs, a single RegionServer can easily configure a hard disk capacity of 40T+. According to the above statement, more and more hard disks are actually just 'the moon in the mirror and the flower in the water'. The community is also aware of this problem, and proposed the concept of Sub- Region under the current concept of Region , which can be simply understood as dividing the current Region into many logically small Sub-Regions. Region is the same as the previous Region, but all the Compactions performed in the unit of Region will be executed with a smaller Sub-Region granularity. In this way, a single Region can be configured to be very large, such as 50G and 100G. At this time, a single RegionServer can store more data. Personally think that the Sub-Region function will be a focus of HBase development.

 

Summarize

Based on the theoretical knowledge of HBase and the author's practical experience, this article makes a simple analysis of the three most common problems in HBase cluster planning - business planning, capacity planning and Region planning, hoping to give you some inspiration and thinking. Online cluster planning is a process of accumulation of experience. I believe that every HBase operation and maintenance student will encounter some pits more or less, and will definitely have their own thinking and opinions. I hope everyone can communicate more in the comment area or email ,thanks!

 

 

NetEase has a number: an enterprise-level big data visualization analysis platform. The self-service agile analysis platform for business personnel uses PPT mode for report production, which is easier to learn and use, and has powerful exploration and analysis functions, which really help users gain insight into data and discover value. Click here for a free trial .

 

Learn about NetEase Cloud:
NetEase Cloud Official Website: https://www.163yun.com/
New User Gift Package: https://www.163yun.com/gift
NetEase Cloud Community: https://sq.163yun.com/

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325653203&siteId=291194637