WAL of HBase source code analysis

WAL (Write-Ahead Logging) is a technology that guarantees atomicity and durability in database systems. By using WAL, the random writing of data can be changed into sequential writing, which can improve the performance of data writing. When writing data in hbase, the data will be written into memory and the wal log will be written at the same time. To prevent log loss, the log is written on hdfs. 
The default is that each RegionServer has 1 WAL, and HBase1.0 began to support multiple WALs HBASE-5699 , which can improve the write throughput. The configuration parameter is hbase.wal.provider=multiwal, and the supported values ​​are defaultProvider and filesystem (these two are the same implementation). 
The persistence levels of WAL are as follows:

  1. SKIP_WAL: Do not write wal log, which can greatly improve the performance of writing, but there is a danger of data loss. It is only used when writing in large batches (it can be re-run if there is an error), and it is not recommended for other situations.
  2. ASYNC_WAL: asynchronous write
  3. SYNC_WAL: Write the wal log file synchronously to ensure that the data is written to the DataNode node.
  4. FSYNC_WAL: Currently not supported, the performance is consistent with SYNC_WAL
  5. USE_DEFAULT: If the persistence level is not specified, the default is USE_DEFAULT, which uses the HBase global default level (SYNC_WAL)

wal write

Let's take a look at several main classes in wal writing 
1. WALKey: the key of the wal log, including regionName: the region to which the log belongs 
tablename: the table to which the log belongs, writeTime: the log writing time, clusterIds: the id of the cluster, in It is used when copying data. 
2.WALEdit: A transaction log that records a series of modifications in the transaction log of hbase. In addition, WALEdit implements the Writable interface, which can be used for serialization. 
3. FSHLog: The implementation class of WAL, responsible for writing data to the file system 
. The multi-producer and single-consumer mode is used here for each wal write. The disruptor framework is used here to encapsulate the WALKey and WALEdit information as FSWALEntry , and then put into RingBuffer through RingBufferTruck. Next, look at the writing process of hlog, which is divided into the following three steps:

  1. Log write buffer: rpcHandler writes log information to the buffer ringBuffer.
  2. Cache data is written to the file system: Each FSHLog has a thread responsible for writing data to the file system (HDFS)
  3. Data synchronization: If the persistence level of the operation is (SYNC_WAL or USE_DEFAULT, data synchronization is required)

The following is a detailed description of how various threads cooperate to achieve these steps.

  1. The rpcHandler thread is responsible for writing the log information (FSWALEntry) into the cache RingBbuffer. After the operation log is written, the rpcHandler will call the sync method of wal to synchronize the data. The actual processing is to write a SyncFuture to the RingBuffer, and then block until the syncFuture is processed. Finish.
  2. The wal thread takes data from the cached RingBuffer. If it is a log (FSWALEntry), it calls Writer to write the data to the file system. If it is a SyncFuture, it is synchronized by a special synchronization thread. 
    The overall processing flow chart is as follows: 
    write picture description here

Writing to HLog

wal is written to the file system through Writer, and its actual class is ProtobufLogWriter, which uses Protobuf's format persistence processing. Using the Protobuf format has the following advantages:

  1. high performance
  2. More compact structure, saving space
  3. It is easy to extend and support other languages, and parse logs through other languages.

    The written logs are stored in sequence according to WALKey and WALEdit (for details, see the description of the WALKey and WALEdit classes above). In addition, WALKey and WALEdit are compressed separately.

wal synchronization process

There is a RingBufferEventHandler object in each wal, which uses an array to manage multiple SyncRunner threads (configured by the parameter hbase.regionserver.hlog.syncer.count, default 5) for synchronization processing, each SyncRunner object has a LinkedBlockingQueue (syncFutures , the size is the parameter {hbase.regionserver.handler.count default value 200}*3 
In addition, the SyncFuture here is one for each rpcHandler thread, which is owned by the private final Map in wal

class RingBufferEventHandler implements EventHandler<RingBufferTruck>, LifecycleAware { private final SyncRunner [] syncRunners; private final SyncFuture [] syncFutures; ... } private class SyncRunner extends HasThread { private volatile long sequence; // Keep around last exception thrown. Clear on successful sync. private final BlockingQueue<SyncFuture> syncFutures; ... }

Here, when processing syncFutures in ringBuffer, each one is not submitted to syncRunner for processing, but processed in batches. The batches here are divided into two cases:

  1. A batch of data fetched from ringBuffer (to improve efficiency, data is fetched from ringBuffer in batches in the disruptor framework. For details, please refer to the relevant documentation of disruptor), if the number of syncFutures in this batch of data is <{hbase .regionserver.handler.count default value is 200}, then it is processed in batches
  2. If the number of syncFutures in this batch of data is >={hbase.regionserver.handler.count default value 200}, the batch will be batched according to {hbase.regionserver.handler.count default value 200}.

If the batch size is reached, the next SyncRunner is sequentially selected from the syncRunner array, and the batch of data is inserted into the BlockingQueue of the SyncRunner. Finally, the hdfs file synchronization process is performed by the SyncRunner thread. In order to ensure that the data is not lost, the rpc request needs to ensure that the wal log is successfully written before returning. Here, HBase has done a series of optimization operations.

wal rolling

By switching the wal log, it can avoid the generation of a separate oversized wal log file, which can facilitate subsequent log cleanup (the expired log files can be deleted directly). In addition, if you need to use the log for recovery, you can also parse multiple small files at the same time. log files, reducing the time required for recovery. 
The scenarios in which wal triggers switching are as follows:

  1. After the SyncRunner thread processes log synchronization, if an exception occurs, it will call requestLogRoll to initiate a log rolling request
  2. After the SyncRunner thread processes the log synchronization, it checks whether the log size of the currently being written wal exceeds the configuration {hbase.regionserver.hlog.blocksize defaults to the hdfs directory block size}*{hbase.regionserver.logroll.multiplier defaults to 0.95}, after Also call requestLogRoll to initiate a log roll request
  3. Each RegionServer has a LogRoller thread that periodically rolls the log, and the rolling period is controlled by the parameter {hbase.regionserver.logroll.period default value 1 hour}

The first two scenarios here call requestLogRoll to initiate a log rolling request, and finally perform the log rolling operation through LogRoller.

wal failed

When the data in the memstore is refreshed to hdfs, the corresponding wal log is not needed. The FSHLog records the oldest sequenceId corresponding to each region in the current memstore. If the latest sequenceId of the operations of each region in a log is If it is smaller than the oldest sequenceId of each region to be refreshed recorded in wal, it means that the log file is not needed, so the log file will be moved from the ./WALs directory to the ./oldWALs directory. This block is processed by calling cleanOldLogs after the previous log roll is completed.

wal delete

Since the wal log is also used for cross-cluster synchronization, the wal log will not be deleted immediately after it expires, but will be moved to the oldWALs directory. The Chore thread of LogCleaner in HMaster is responsible for the deletion of wal logs. Inside LogCleaner, the parameters {hbase.master.logcleaner.plugins} are used to filter out log files that can be deleted by means of plugins. The currently configured plugins are ReplicationLogCleaner, SnapshotLogCleaner and TimeToLiveLogCleaner

  1. TimeToLiveLogCleaner: Log files whose last modification time is before the configuration parameter {hbase.master.logcleaner.ttl default 600 seconds} can be deleted
  2. ReplicationLogCleaner: If there is a need for cross-cluster data synchronization, use this Cleaner to ensure that those logs in synchronization are not deleted
  3. SnapshotLogCleaner: The wal used by the snapshot of the table is not deleted

Summarize

In this article, the entire cycle of the wal log in HBase is described, so that you can have an overall understanding of the wal processing process, and the content of the recovery of the WAL log will be discussed separately later.

References: 
1.  http://hbasefly.com/2016/03/23/hbase_writer/ 
2.  http://hbasefly.com/2016/10/29/hbase-regionserver-recovering/

Transfer: https://blog.csdn.net/xiangel/article/details/54424900

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325166553&siteId=291194637