Hbase record-client access to zk a lot of disconnection and parameter tuning analysis (reproduced)

1.hbase client configuration parameters

The configuration of timeout, number of retries, and retry interval is also important, because the default configuration values ​​are all large. If the hbase cluster or RegionServer and ZK are turned off, it will be catastrophic to the application. Timeout and restart It will quickly fill up the links of the web container, causing the web container to stop serving. There are two types of socket timeouts: 1: timeout for establishing a connection; 2: timeout for reading data.

The following parameters can be configured:

1. hbase.rpc.timeout: The timeout time of rpc, the default is 60s. It is not recommended to modify it to avoid affecting normal business. The online environment is configured for 3 seconds at the beginning. After running for half a day, a large number of timeout errors are found. The reason is that there are A region has the following problem blocking the write operation: "Blocking updates ... memstore size 434.3m is >= than blocking 256.0m size" can not be too low.

2. ipc.socket.timeout: The timeout time for socket establishment connection should be less than or equal to the timeout time of rpc, the default is 20s

3. hbase.client.retries.number: The number of retries, the default is 14, and it can be configured to 3

4. hbase.client.pause: sleep time for retry, the default is 1s, which can be reduced, such as 100ms

5. zookeeper.recovery.retry: The number of retries of zk can be adjusted to 3 times, zk does not hang easily, and if there is a problem with the hbase cluster, each retry will retry the operation of zk, and the retry of zk The total number of times is: hbase.client.retries.number * zookeeper.recovery.retry, and the sleep time of each retry will increase exponentially by 2, and each visit to hbase will retry. In an hbase operation, if Involving multiple zk accesses, if zk is unavailable, there will be many zk retries, which is a waste of time.

6. zookeeper.recovery.retry.intervalmill: sleep time for zk retry, the default is 1s, which can be reduced, for example: 200ms

7. hbase.regionserver.lease.period: The timeout period for each interaction with the server during scan query. The default value is 60s, which can not be adjusted.

2. Troubleshooting the reasons for the disconnection and reconnection of a large number of zookeeper connections (combined with server parameters and search codes)

#Check the contents of the log, and find that many connections are established and disconnected immediately

#netstat -antp | grep 2181

# jstack -l pid to view the thread stack of the process

# netstat -ae outputs some information

#tcpdump -vv host 192.168.66.27 and port 2181 -w 2181.cap  

3.hbase zookeeper parameter tuning

zookeeper.session.timeout 
Default value: 3 minutes (180000ms) 
Description: The connection timeout time between RegionServer and Zookeeper. When the timeout expires, the ReigonServer will be removed from the RS cluster list by Zookeeper. After the HMaster receives the removal notification, it will rebalance the regions responsible for this server and let other surviving RegionServers take over. 
Tuning: 
This timeout determines Whether the RegionServer can failover in time. Set to 1 minute or less to reduce the extended failover time due to wait timeouts. 
However, it should be noted that for some online applications, the RegionServer itself has a very short time from downtime to recovery (flashes such as network failures, crashes, etc., operation and maintenance can quickly intervene). Because when the ReigonServer is officially removed from the RS cluster, the HMaster starts to balance (let other RSs recover according to the WAL log recorded by the faulty machine). When the faulty RS is restored by manual intervention, this balancing action is meaningless, but will make the load uneven and bring more burdens to the RS. Especially those with fixed allocation of regions. 


hbase.regionserver.handler.count 
Default value: 10 
Description: The number of IO threads for request processing of RegionServer. 
Tuning: 
The tuning of this parameter is closely related to memory. 
Fewer IO threads are suitable for processing Big PUT scenarios with high memory consumption for a single request (large-capacity single PUT or scan with a large cache set, both belong to Big PUT) or scenarios where the memory of ReigonServer is relatively tight. 
More IO threads are suitable for scenarios with low memory consumption per request and very high TPS requirements. When setting this value, monitor the memory as the main reference.
It should be noted here that if the server has a small number of regions and a large number of requests fall on one region, the read-write lock caused by the flush triggered by the fast filling of the memstore will affect the global TPS, not the higher the number of IO threads, the better. 
During stress testing, enable Enabling RPC-level logging to monitor the memory consumption and GC status of each request at the same time, and finally adjust the number of IO threads reasonably through the results of multiple stress testing. 
Here is a case? Hadoop and HBase Optimization for Read Intensive Search Applications, the author sets the number of IO threads on the SSD machine to 100, just for reference. 

hbase.hregion.max.filesize 
Default value: 256M 
Description: The maximum storage space of a single Reigon on the current ReigonServer, when a single Region exceeds this value, the Region will be automatically split into smaller regions. 
Tuning: 
Small regions are friendly to split and compaction, because splitting regions or compacting storefiles in small regions is fast and has low memory usage. The disadvantage is that split and compaction will be frequent. 
In particular, the continuous split and compaction of a large number of small regions will cause large fluctuations in the cluster response time. Too many regions will not only bring trouble to management, but also cause some HBase bugs. 
Generally, anything below 512 is considered a small region. 

Large regions are not suitable for frequent split and compaction, because doing a compaction and split will cause a long pause, which has a great impact on the read and write performance of the application. In addition, a large region means a larger storefile, which is also a challenge to memory when compacting. 
Of course, large regions also have their place. If in your application scenario, the traffic volume at a certain point in time is low, then doing compaction and split at this time can not only complete the split and compaction smoothly, but also ensure stable read and write performance most of the time. 

Since split and compaction affect performance so much, is there a way to remove it? 
Compaction is unavoidable, but split can be adjusted from automatic to manual. 
As long as the value of this parameter is increased to a value that is difficult to achieve, such as 100G, automatic split can be disabled indirectly (RegionServer will not split regions that do not reach 100G). 
Then cooperate with the tool RegionSplitter to manually split when split is required. 
Manual split is much more flexible and stable than automatic split. On the contrary, the management cost does not increase much, and it is recommended to use online real-time systems. 

In terms of memory, a small region is more flexible in setting the size of the memstore, while a large region cannot be too large or too small. Too large a region will cause the app's IO wait to increase during flushing, and if it is too small, it will affect the read performance due to too many store files. 

hbase.regionserver.global.memstore.upperLimit/lowerLimit 

Default value: 0.4/0.35 
upperlimit Description: hbase.hregion.memstore.flush.size The function of this parameter is that when the sum of all memstore sizes in a single Region exceeds the specified value, flush the All memstores for the region. The flush of the RegionServer is processed asynchronously by adding the request to a queue, simulating the production and consumption mode. Then there is a problem here. When the queue is too late to consume and a large number of backlog requests are generated, it may cause a sharp increase in memory, and the worst case is to trigger OOM. 
The function of this parameter is to prevent excessive memory usage. When the total memory occupied by memstores of all regions in ReigonServer reaches 40% of the heap, HBase will force block all updates and flush these regions to release the memory occupied by all memstores. 
lowerLimit description: Same as upperLimit, but lowerLimit does not flush all memstores when the memory occupied by memstores in all regions reaches 35% of the Heap. It will find a region with the largest memory usage in the memstore and do individual flushes. At this time, write updates will still be blocked. lowerLimit is a remedial measure before all regions are forced to flush and cause performance degradation. In the log, it appears as "** Flush thread woke up with memory above low water." 
Tuning: This is a Heap memory protection parameter, and the default value is suitable for most scenarios. 
Parameter adjustment will affect reading and writing. If the writing pressure often exceeds this threshold, reduce the read cache hfile.block.cache.size to increase the threshold, or when there is a large heap margin, the read cache size is not modified. 
If it does not exceed this threshold under high pressure, it is recommended that you lower this threshold appropriately and then do the pressure test to ensure that the number of triggers is not too many, and then when there is more Heap margin, increase hfile.block .cache.size improves read performance. 
There is also a possibility that ?hbase.hregion.memstore.flush.size remains unchanged, but RS maintains too many regions. It is necessary to know that the number of regions directly affects the size of the occupied memory. 

hfile.block.cache.size 

Default value: 0.2 
Description: The read cache of storefile occupies the percentage of the heap size, 0.2 means 20%. This value directly affects the performance of data read. 
Tuning: Of course, the bigger the better, if the writing is much less than the reading, it is no problem to open it to 0.4-0.5. If the reading and writing are more balanced, it is about 0.3. If you write more than you read, let it be the default. When setting this value, you should also refer to ?hbase.regionserver.global.memstore.upperLimit?, which is the maximum percentage of memstore in the heap. One of the two parameters affects reading and the other affects writing. If the two values ​​add up to more than 80-90%, there is a risk of OOM, so set it carefully. 

hbase.hstore.blockingStoreFiles 

Default value: 7 
Description: When flushing, when there are more than 7 storefiles in the Store (Coulmn Family) in a region, block all write requests for compaction to reduce the number of storefiles. 
Tuning: Block write requests will seriously affect the response time of the current regionServer, but too many storefiles will also affect read performance. From a practical point of view, in order to obtain a smoother response time, the value can be set to infinite. If you can tolerate large peaks and valleys in the response time, you can adjust it by default or according to your own scene. 

hbase.hregion.memstore.block.multiplier 

Default value: 2 
Description: When the memory size of the memstore in a region exceeds twice the size of hbase.hregion.memstore.flush.size, block all requests of the region and flush, Free up memory. 
Although we set the total memory size of memstores occupied by the region, such as 64M, imagine that at the last 63.9M, I put a 200M data, and the size of the memstore will instantly skyrocket to exceed the expected hbase.hregion Several times of .memstore.flush.size. The function of this parameter is to block all requests when the size of the memstore increases to more than 2 times hbase.hregion.memstore.flush.size, preventing further expansion of the risk. 
Tuning: The default value of this parameter is quite reliable. If you estimate that your normal application scenarios (excluding exceptions) will not have burst writes or a controllable amount of writes, then keep the default values. If under normal circumstances, your write request volume will often increase to several times normal, then you should increase this multiple and adjust other parameter values, such as hfile.block.cache.size and hbase.regionserver.global.memstore .upperLimit/lowerLimit to reserve more memory to prevent HBase server OOM. 

hbase.hregion.memstore.mslab.enabled 

Default value: true 
Description: Reduce Full GC caused by memory fragmentation and improve overall performance. 
Tuning: see http://kenwublog.com/avoid-full-gc-in-hbase-using-arena-allocation 
Others 

enable LZO compression 
. Compared with Hbase's default GZip, the former has higher performance and the latter has higher compression. See ?Using LZO Compression for details. For developers who want to improve the read and write performance of HBase, LZO is a better choice. For developers who are very concerned about storage space, it is recommended to keep the default. 

Don't define too many Column Family in one table 

Hbase currently cannot handle tables with more than 2-3 CFs well. Because when a CF is flushed, its neighboring CFs will also be triggered to flush due to the associated effect, which will eventually cause the system to generate more IO. 

Batch Import 

Before batch importing data to Hbase, you can balance the data load by creating regions in advance. See Table Creation: Pre-Creating Regions 

to avoid CMS concurrent mode failure 

HBase uses CMS GC. The default trigger time for GC is when the old generation memory reaches 90%. This percentage is set by the -XX:CMSInitiatingOccupancyFraction=N parameter. concurrent mode failed occurs in such a scenario: 
When the old generation memory reaches 90%, the CMS starts concurrent garbage collection. At the same time, the young generation is rapidly and continuously promoting objects to the old generation. When the old generation CMS has not completed concurrent marking, the old generation is full, and tragedy happens. CMS has to suspend the mark because there is no memory available, and trigger a stop the world (suspend all jvm threads), and then use a single-threaded copy method to clean up all garbage objects. This process will be very long. In order to avoid concurrent mode failed, it is recommended to trigger the GC when it is less than 90%. 

By setting the percentage ?-XX:CMSInitiatingOccupancyFraction=N 
, it can be easily calculated like this. If your ?hfile.block.cache.size and ?hbase.regionserver.global.memstore.upperLimit add up to 60% (default), then you can set 70-80, generally about 10% higher. 

maxClientCnxns=300 The 
default number of connections used by zookeeper for each client IP is 10, and there are often not enough connections. Modifying the number of connections seems to only support the modification of the zoo.cfg configuration file, so it needs to restart the zookeeper to take effect. 
zoo.cfg: 
maxClientCnxns=300 
otherwise the following error will be reported: 2011-10-28 09:39:44,856 – WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5858:NIOServerCnxn$Factory@253] – Too many connections from / 172.*.*.* – max is 10 

HBASE_HEAPSIZE=3000 
  Hbase has a special preference for memory, and allocate enough memory to it if the hardware allows it. 
By modifying 
export HBASE_HEAPSIZE=3000 in hbase-env.sh #The default here is 1000m 

hadoop and hbase typical configuration 
•Region Server 
•HBaseRegion Server JVM Heap Size: -Xmx15GB 
•Number of HBaseRegion Server Handlers: hbase.regionserver.handler.count= 50 (Matching number of active regions) 
•Region Size: hbase.hregion.max.filesize=53687091200 (50GB to avoid automatic split) 
•Turn off auto major compaction: hbase.hregion.majorcompaction=0 
•Map Reduce 
•Number of Data Node Threads: dfs.datanode.handler.count=100 
•Number of Name Node Threads: dfs.namenode.handler.count=1024 (Todd: 
•Name Node Heap Size: -Xmx30GB 
•Turn Off Map Speculative Execution: mapred.map.tasks.speculative.execution=false 
•Turn off Reduce Speculative Execution: mapred.reduce.tasks.speculative.execution=false 
•Client settings 
•HBaseRPC Timeout: hbase.rpc.timeout=600000 (10 minutes for client side timeout) 
•HBaseClient Pause: hbase.client.pause=3000 
•HDFS 
•Block Size: dfs.block.size=134217728 (128MB) 
•Data node xcievercount: dfs.datanode.max.xcievers=131072 
•Number of mappers per node: mapred.tasktracker.map.tasks.maximum=8 
•Number of reducers per node: mapred.tasktracker.reduce.tasks.maximum=6 
•Swap turned off

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324951825&siteId=291194637