Hbase be focused on optimization :()

 

First, the server tuning

1, parameter configuration

1), hbase.regionserver.handler.count: This setting determines the number of threads handling the RPC, the default value is 10, can be adjusted generally large, for example: 150, when the requested content large (MB, such as a large put, using cache scans) when, if the value is set too high it will take up too much memory, causing frequent GC, or there OutOfMemory, so the value is not the bigger the better.

2), hbase.hregion.max.fifilesize: Configuring region size, version 0.94.12 default is 10G, the total amount of data to support the size of the cluster region of a relationship, if the total amount of data is small, a single region is too large, is not conducive to parallel data processing, if the cluster needs to support a greater amount of total data, region is too small, it will cause excessive number region, resulting in the high cost of management of the region, if a total disk configured for RS 3T * 12 = 36T data amount, data replication parts 3, the RS server may store a data of 10T, if the maximum of each region 1OG, the region up to 1000, so look, the default configuration is quite suitable 94.12, but If you want to manage their own split, you should turn up the value and number of planned region and rowkey design in the construction of the table, a region pre-built, so that within a certain period of time, the data size of each region is under a certain amount of data when discovered a large region, or region needs to be expanded to the entire table then split operation, generally provide online services hbase cluster will automatically split hbase abandoned and instead Manage their own split.

3), hbase.hregion.majorcompaction: interval major combined configuration, the default is one day, may be set to 0 to prohibit automatic major combined, can be performed manually or scripts incorporated by periodic major, two compact: minor and major after, minor will usually several small adjacent storeFile merge into one large storeFile, minor does not delete marked as deleted data and outdated data, major removes the need to delete data, major merger, a store only a storeFile file will store all data is rewritten, there is a large performance overhead.

4), hbase.hstore.compactionThreshold: number of storeFile HStore> value = compactionThreshold configured, it may be Compact, the default value is 3, can transfer large, for example 6 to which the remainder of the major compact files in regular merger.

5), hbase.hstore.blockingStoreFiles: HStore the number of files is greater than a configuration storeFile value, to be split before or Compact flflushmemstore, unless hbase.hstore.blockingWaitTime configuration time exceeds a default is 7, large adjustment, such as: 100, to avoid memstore not timely flflush, when writing large, triggering block memstore, thereby blocking write operations.

6), hbase.regionserver.global.memstore.upperLimit: default value 0.4, all RS memstore upper memory occupancy proportion of the total memory, this value is reached, it will be needed to find the most flflush the entire region from the RS for flflush until the total number of the memory ratio fell below the limit, and drops will block all write operations memstore former limit rate or less, in order to write-based cluster, you can turn up the configuration items, too much is not recommended, because the total size of the block cache and memstore cache will not exceed 0.8, and does not recommend the total size of the two cache is at or near 0.8, to avoid OOM, at the time of writing favor of business can be configured to 0.45, memstore.lowerLimit not remain 0.35 change in favor of the read operations may be lowered to 0.35, while the lower memstore.lowerLimit 0.3, or 0.05 points and then down, not too low, unless there is only a very small write operation, if both read and write , the default value is used.

7), hbase.regionserver.global.memstore.lowerLimit: default value 0.35, RS memory for all memstore lower proportion of total memory, when this value is reached, it will need to find the most flflush the entire region from the RS flflush, be combined memstore.upperLimit configuration and block cache configuration.

8), fifile.block.cache.size: block cache memory size limit of the RS, the default value is 0.25, the deflection of the read operations can be appropriately adjusted larger value for an operational test hbase wherein the specific configuration of the cluster service, combined with memory accounting memstore be considered.

9), hbase.hregion.memstore.flflush.size: 128M default value, in bytes, is more than flflush to HDFS, the relatively modest value need not be adjusted.

10), hbase.hregion.memstore.block.multiplier: Default 2, if memstore memory size has more than 2 times hbase.hregion.memstore.flflush.size, the write operation will be blocked memstore until it fell to the value, in order to avoid a jam occurs, the best transfer large value, such as: 4, not too large, if too large, it will increase the possibility of causing the entire RS memstore memstore.upperLimit memory exceeds the limit, and further increasing obstruction wrote chance of the entire RS. If a blockage occurs region will lead to a lot of threads are blocked in the region to, so the number of threads other region will decline, affecting the overall RS service capabilities, such as:

Began blocking:

 

 

 

Unlock blocked:

 

 

10 minutes 11 seconds from the start blocking 10 minutes and 20 seconds to unlock the total time 9 seconds, which can not be written in 9 seconds, and can consume a large amount of RS handler threads this period, or for other region and threads the number will gradually decrease, thus affecting the overall performance, but also

To write an asynchronous, and limit the write speed to avoid congestion.

11), hfifile.block.index.cacheonwrite: writing at the time index put unrooted allows (non-root) of the multi-level index block to block cache, the default is false, set to true, better performance may read, but whether there are side effects need to adjust

check.

12), io.storefifile.bloom.cacheonwrite: The default is false, need to investigate its role.

13), hbase.regionserver.regionSplitLimit: controls the maximum number of region, can not be split over operations, the default is Integer.MAX, can be set to 1, prohibits automatically split, manually, or write scripts to perform when the cluster is idle . If it does not automatically prohibit the split, when the region is larger than hbase.hregion.max.fifilesize triggers split operation (specific split a certain strategy, not only by the parameter control, pre-split region will consider the amount of data and memstore size), after each flflush or compact, regionserver will check whether you need Split, split the old region will first off the assembly line and then on the region after the line split, the process will be very fast, but there will be two problems: ① off the assembly line after the old region , a new region in front line client access will fail, retry process will succeed but if the system is to provide real-time services will increase the long response time, compact ②split after the action is a more consumption of resources.

14), Jvm adjust a, memory size: master default 1G, can be increased to 2G, regionserver default 1G, adjustable 1OG large, or larger, ZK consumption of resources is not possible without adjustment; B, garbage collection: pending the study.

2, other tuning

1), column family, rowkey be as short as possible, the value of each cell will store a column family name and rowkey, even the name of the column should be as short as possible, the following is a screenshot of the data table test2 and content files stored in hdfs: a the figure shows: a short column family name, rowkey, the column name great influence on the final contents of the file size.

2) The number of region RS: Generally speaking, each not too RegionServer 1000, the region will result in too many more small files, resulting in more compact, when there are a large number of more than 5G of the region and the total number reached region RS 1000, should consider expansion.

3), when construction of the table:

 a, if not multiple versions, should be set version = 1;

 B, snappy turn lzo or compression, compression would consume some's CPU, but disk IO and IO network will be a great improvement can be compressed approximately 4 to 5 times;

 c, reasonable design rowkey, when designing rowkey need to fully understand the existing business and future business reasonably foreseeable, unreasonable rowkey design will result in poor operational performance of hbase;

d, reasonable amount of planning data, pre-partition table split avoid constant during use, and to read and write data into different the RS, full play the role of the cluster;

E, column family name as short as possible, such as: "f", and try to only one column group;

 f, depending on the scene open bloomfifilter, optimize read performance.

Two, Client side tuning

1, hbase.client.write.buffffer: Write cache size, defaults to 2M, recommended setting is 6M, in bytes, of course, not the bigger the better, if too large, take up too much memory;

2, hbase.client.scanner.caching: scan buffer, the default is 1, too, can be configured according to the specific service feature, in principle, may be too large to avoid excessive occupation of memory of the client and rs, generally hundreds maximum If a data is too large, it should set a smaller value, usually a set number of pieces of data the business needs of a query, such as: business characteristics determine a maximum of 100, it can be set to 100

3, and provided a reasonable timeout retries, the specific content will explain in detail in the subsequent blog.

4, client application separate read and write read and write separation, located in different tomcat instance, data is first written redis queue, then HBase asynchronous writes, if the write queue redis fails to return memory, read-ahead redis cached data (if cached to note here is not redis redis buffer queue), if not read read hbase. When hbase cluster is unavailable, or when a RS is unavailable, because the number of retries and the timeout HBase are relatively large (in order to ensure normal service access, can not be adjusted to a relatively small value, if a RS hung up, a read or write, after a number of retries and time-outs may last tens of seconds or minutes), so that a single operation could last a long time, resulting in a request tomcat thread is prolonged occupation, the limited number of threads tomcat, it will be fast accounting finish, resulting in no spare thread to do other operations, the separate read and write, write, write thanks to redis queue, then asynchronous write hbase, so there is no problem tomcat thread is filled, the application can also provide writing services, if it is recharge and other services, you will not lose revenue, and read tomcat threads are filled time service appears also become longer, if the operation and maintenance timely intervention, the read service impact is relatively limited.

5, if the org.apache.hadoop.hbase.client.HBaseAdmin configured as a spring of bean, you need to configure as lazy loading, avoid links hbase lead to failure of the Master fails to start at boot time, which can not carry out some of the downgrade.

6, Scan query optimization program:

① adjust caching;

② If this is similar to the full table scan query, or periodic task, you can set setCacheBlocks scan is false, to avoid useless cache;

③ Close scanner, to avoid wasting the client and server memory;

Defining a scan range ④: the cluster specified column or columns to specify the query;

⑤ If only query rowkey, using KeyOnlyFilter can greatly reduce network overhead; hbase as a coordinator state dependent data is stored and ZK HDFS, also need to be tuned:

ZK Tuning:

①zookeeper.session.timeout: The default value is three minutes can not be configured too short, avoid overtime session, hbase stop service, online production environment due to the configuration of 1 minute, there have been two times hbase stop the service of the causes, nor too configuration long, if too long, when rs hang up, zk can not quickly know, leading master of the region can not be timely migration.

Number ②zookeeper: at least 5 nodes. To about 1G of memory for each zookeeper, preferably with a separate disk. (Independent Disks can ensure zookeeper unaffected). If the cluster load is heavy, do not run at the same Zookeeper and RegionServer

Machine above. Like DataNodes and TaskTrackers, as there is only serving just over half of zk, for example: a total of five, then hung up to only run for two, three and four configured as a maximum run hanging Taiwan.

③hbase.zookeeper.property.maxClientCnxns: zk maximum number of connections, the default of 300 can be configured thousands

Three, HDFS Tuning:

① dfs.name.dir: namenode data storage address, you can configure multiple, located on different disks and configure a remote NFS file system, such nn data can have multiple backups

② dfs.data.dir: dn data storage address, a path for each disk configuration, which can greatly enhance the ability to read and write in parallel

③ dfs.namenode.handler.count: RPC node nn number of processing threads, the default is 10, need to improve, for example: 60

④ dfs.datanode.handler.count: dn node number of processing threads RPC, the default is 3, need to improve, for example: 20

⑤ dfs.datanode.max.xcievers: dn processed simultaneously limit the file, the default is 256, to be improved, such as: 8192

⑥ dfs.block.size: dn data block size, the default is 64M, if the files are stored is relatively large transfer large files can be considered, for example, when using HBase, can be set to 128M, noted in bytes

⑦ dfs.balance.bandwidthPerSec: control transfer of files when doing load balancing start-balancer.sh speed, the default is 1M / s, can be configured as several tens of M / s, for example: 20M / s

⑧ dfs.datanode.du.reserved: each disk free space reserved, should be set aside some for non hdfs files, the default value 0

⑨dfs.datanode.failed.volumes.tolerated: at startup will cause the number of bad disks dn hang up. The default is 0, that is a bad disk, and hung dn, can not be adjusted.

Guess you like

Origin www.cnblogs.com/tesla-turing/p/11959933.html
Recommended