Hbase RegionServer CPU Soars Due to Hardware Failure

 

scene introduction      

      During the National Day holiday, one of the RegionServer a02 machines in the Hbase cluster went offline due to a memory failure. The cluster ran normally when there was less than one machine. After the holiday, the a02 machine rejoined the cluster to provide services after the memory failure was repaired. Almost at the same time, another a04 in the cluster was found. The CPU usage of the a04 remains high. According to common sense analysis, the cluster should be restored to its original state, and it should run well. In fact, the CPU usage of a04 is soaring.

 

Initial troubleshooting

  • Region data inspection, the number of cluster regions is balanced, and no abnormality is found
  • GC log analysis, found that the frequency of Parnew has increased, 20-30 times a day, which is higher than other machines
  • RegionServer log troubleshooting, no abnormal output was found
  • With the network card, disk IO, etc., no clues were found, and the RPC processing queue became larger.

 

Adjust your thinking

   There is no problem with the software. Is there a problem with the hardware? The hardware models of the same batch of machines are the same. It is very likely that there are problems with the memory or other hardware. After investigation by the operation and maintenance classmates, the memory of this machine was indeed faulty. After the memory was replaced, the CPU usage plummeted, and the CPU usage was the same as that of other machines. It was finally confirmed as a hardware failure.

 

 

 

Analyzing the GC harvest

After several months of GC log analysis, it was found that the GC frequency suddenly increased in early August. Compared with the GC situation of other RegionServers, it was found that the symptom time point was the same. The next step is to verify what major event occurred at this point. There are two common processing solutions: tuning the GC and expanding the cluster to share the pressure.

 

GC tuning scheme

1. JVM parameter tuning

2. Enable MemStoreChunkPool to optimize GC

Official website introduction: https://issues.apache.org/jira/browse/HBASE-8163

MSLAB improves HBASE GC performance: http://blog.csdn.net/map_lixiupeng/article/details/40914567

 

 

 

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326572231&siteId=291194637