background

In view of the feedback from readers in the last article - "Cloud HBase team successfully rescued a company's self-built HBase cluster and saved 30+T data", they are more interested in the reverse engineering of HBase, and ask how to use corresponding tools for operation and maintenance, etc. Wait. In general, I want to have a deeper understanding of HBase operation and maintenance principles, improve the ability to operate and maintain the HBase production environment, and deal with various common abnormal phenomena. Different readers have different degrees of understanding of HBase. This article does not intend to focus on how to use a tool, but to explain the basic knowledge of HBase operation and maintenance. In order to help most readers improve HBase operation and maintenance capabilities, I will write a series of articles on the topic of "HBase Operation and Maintenance Series". Welcome to scan the code at the bottom to follow DingTalk.

5d8291a7eaad73d5ee198c6f017867597e43a164

introduce

I believe that many companies that build their own HBase will often encounter various HBase operation and maintenance problems. For example, when using HBase, the RegionServer node starts to hang after HBase is written for a period of time. After restarting the RegionServer, it is found that the startup is very slow, and RTI problems occur in many regions, causing the business of reading and writing a region to hang. There are also some people who have tried to operate and maintain their HBase cluster for many times, but HBase cannot be started directly, and the meta table starts to report errors when it goes online, resulting in a series of problems such as the final business not running normally. This article starts with the basic principles of HBase operation and maintenance, focusing on data integrity, and the principles and methods of metadata "reverse engineering" to restore data integrity. Start the follow-up series of HBase operation and maintenance knowledge explanations.

HBase directory structure

This article explains the 1.x version, and the different versions are roughly the same. HBase will use a separate directory on HDFS as the root directory of the HBase file directory, usually "/hbase". Based on this directory, there will be the following directory organization structure:

/hbase/archive (1)
/hbase/corrupt (2) 
/hbase/data/default/TestTable/.tabledesc/.tableinfo.0000000001 (3)
/hbase/data/default/TestTable/fc06f27a6c5bc2ff57ea38018b4dd399/info/2e58b3e274ba4d889408b05e526d4b7b (4)
/hbase/data/default/TestTable/fc06f27a6c5bc2ff57ea38018b4dd399/recovered.edits/340.seqid (5)
/hbase/data/default/TestTable/fc06f27a6c5bc2ff57ea38018b4dd399/.regioninfo (6)
/hbase/data/default/TestTable/fc06f27a6c5bc2ff57ea38018b4dd399/.tmp (7)
/hbase/data/default/TestTable/fc06f27a6c5bc2ff57ea38018b4dd399/.splits (8)
/hbase/data/default/TestTable/fc06f27a6c5bc2ff57ea38018b4dd399/.merges (9)
/hbase/data/hbase/acl (10)
/hbase/data/hbase/meta (11)
/hbase/hbase.id (12)
/hbase/hbase.version (13)
/hbase/MasterProcWALs (14)
/hbase/oldWALs (15)
/hbase/.tmp (16)
/hbase/.trashtables/data (17)
/hbase/WALs/tins-donot-rm-test-hb1-004.hbase.9b78df04-b.rds.aliyuncs.com,16020,1523502350378/tins-donot-rm-test-hb1-004.hbase.9b78df04-b.rds.aliyuncs.com%2C16020%2C1523502350378.default.1524538284034 (18)

(1) The archive directory used for snapshot or upgrade. When compaction deletes the hfile, it will also archive the existing hfile here.

(2) The corrupt directory of splitlog, and the directory of corrupt hfile.

(3) The basic attribute information metafile tableinfo of the table.

(4) The hfile data file under the corresponding table.
(5) When splitlog occurs, the wal of an RS will split WALs according to the region level and write to the recovered.edits directory in the corresponding directory, so that when the region is opened again, these recovered.edits logs will be played back.

(6) regioninfo file.

(7) Temporary tmp directory for compaction, etc.

(8) Temporary directory during split. If the last region split is not completed and interrupted, this directory will be automatically cleaned up when the region is opened again, generally without manual intervention.

(9) The temporary directory during merges, like split, if it is interrupted when it is not completed normally, it will be automatically cleaned up the next time it is opened. Human intervention is generally not required.

(10) acl Permission record system table when HBase permission control is enabled

(11) meta metadata table, recording region related information

(12) hbase.id The unique id of the created cluster when the cluster is initialized. You can re-fix
(13) hbase.version hbase software version file, the static version of the code, is now 8
(14) The state of the master execution process program is saved, which is used for interrupt recovery execution.

(15) OldWALs historical wal, that is, the data recorded by wal has been confirmed to be persistent, then these wals will be moved here. The logs completed by splitlog will also be put here.

(16) tmp temporary auxiliary directory, such as writing a hbase.id file, after successful writing here, rename to /hbase/hbase.id

(17) /hbase/.trashtables/data When truncate table or delete table, these data will be temporarily placed here, and will be cleared within 1 hour by default

(18) Records the WAL log file on a RegionServer. It can be seen that the name of the regionserver has time, that is, the wal directory of the RS will use the new directory structure to store the wal the next time it starts, and the old RS wal directory will be split and played back by the splitlog process.

HBase operation and maintenance foundation - metadata reverse repair principle

background

introduce

HBase directory structure

Guess you like