Advanced usage of Big Data Training _W HBase Xingjian design


  Big Data Training Advanced Usage of _W HBase Xingjian design

3405684-7b65016b519ff9ce.png
Big Data Training

  1. Hot Issues

  Write to disperse, reducing the hot issues of the region.

  For example: historical trading orders, order numbers are usually generated by a timestamp + random four-digit number, order number allows reverse. Such rowkey is stored in reverse order number, to reduce to a centralized storage region.

  For example: a user Internet traffic is stored in the Hbase, usually let the phone number as rowkey hbase of. Due to the four-digit phone number is random. It allows mobile phone number reverse order as rowkey hbase of.

  Thinking:

  Directly to the timestamp as Xingjian, a hot issue occurs when writing a single region, and why?

  Answer: Direct use the timestamp will lead to a single region of the hot issues.

  At the bottom is the HBase rowkey HFile stored data to <K, V> SortedMap stored in the key-value pair. And the region in order rowkey is stored, if time is relatively concentrated. It will be stored in a region, so a data region increases, other well data region, data loading will be very slow. Until the region split can be resolved.

  2. Design Xingjian

  There are two basic types of the key structure in HBase: revolves and columns of keys

  Both of which are stored meaningful information, not just a value corresponding to the key:

  Column key: it includes family name and qualifier columns, navigate to the index of the column

  Revolves: equivalent to the primary key of a relational database, a line obtained by a logic arrangement revolves all columns

  Each row of logical user did not set together, but the actual time is stored in each column group in a separate file, the group of cells in different columns never appear in the same StoreFile. At the same time, HBase will not storage cell is empty, the file on disk only have these cell values.

  Each cell in the actual storage revolves and also contains the columns of keys, so each cell is individually stored key information of its location in the table.

  With different versions of a cell is stored separately as a continuous cell, the cell according to the timestamp in descending order, so the default latest read cell data.

  When the same column group press revolves cell sorting, when a plurality of cells in a row of the column then sorted key storage, when there are multiple versions of the same cell sorted by time stamp

  According to the above storage characteristics, it is recommended at the time of the query specified column family information stored files can effectively reduce the query, improve efficiency

  3. Key points of the line key design

  (1) Storage: on the disk all the cells under a column family is stored in a storage file (store file), different column families cells do not appear in the same store file.

  (2) NULL HBase not stored in the tables.

  (3) each cell is stored in the actual revolves and also contains the columns of keys, i.e., the individual information stored in its location in the table.

  A plurality of versions (4) of the same cell is stored as a single contiguous cells, the cell according to the timestamp in descending order. Therefore, in HFile reading time, the latest value to be read.

  KeyValue contents (5) cells: revolves, the column group, qualifier column, the timestamp value. KeyValue stored in accordance with the first Xingjian sort, then the sort key columns.

  (6) For the KeyValue, screening efficiency deteriorates left to right: revolves, the column group, qualifier column, the timestamp value. Therefore important to try to filter information to the left.

Reproduced in: https: //www.jianshu.com/p/799827187218

Guess you like

Origin blog.csdn.net/weixin_34223655/article/details/91206696