Troubleshooting and solving process of a slow query request in HBase

From:  http://www.cnblogs.com/panfeng412/archive/2013/06/08/hbase-slow-query-troubleshooting.html

Recently, HBase cluster encountered a slow query request problem. The following is a detailed description of this problem and the troubleshooting and solution process.

1. Find the problem

There is an HBase table in the project, and a batch of data will be imported in batches after the early morning every day. One day, I suddenly found that when the table scans only a few pieces of data according to each region (256 in total), the response time of query requests in some regions is very slow, ranging from 10 seconds or even tens of seconds.

2. Troubleshoot the problem

First of all, by looking at the region server monitoring interface that comes with HBase, we can see that there are only 1 to 3 StoreFiles under each region in this table, which excludes the case of slow query response due to too many StoreFiles.

Then I checked and found that the TTL of this table is 5 days, so there will be a lot of expired data. At the same time, because this table imports a batch of data every morning (more than 700 million records were imported on 3.22 last week), and the major compact cycle configuration of the cluster is 7 days, although the data on 3.22 has expired as of today However, the deletion of expired data has not been triggered by major compact. Therefore, there are a large number of expired data that have not been cleared, so even if only a few pieces of data are scanned according to each region, it is still necessary to filter out a large number of expired data (from The monitoring showed that the block cache access volume at that time was about twice as high as usual, as shown in the figure below), so that the actual useful data could be scanned, so the query response time was very slow.

3. Problem solving

There are two solutions to this problem:

1) After importing data every morning, a major compact operation is forced to be triggered (see the majorCompct method of HBaseAdmin, executed asynchronously), so that the expired data in each region in the table can be cleared in time.

2) Since the major compaction period of the cluster is 7 days and the TTL of the table is 5 days, the major compaction period can be reduced (the configuration parameter is hbase.hregion.majorcompaction, in milliseconds; at the same time, hbase.offpeak.start. hour can set the hour when major compact is started, for example, set it to 1, it can be guaranteed to be triggered after 1:00), from the cluster level, to ensure that major compact is triggered and executed as soon as possible.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326833566&siteId=291194637