Easy to understand big data technology and application of core HBase

HBase is part hadoop ecosystem, which considers the modeling of Google's BigTable, to achieve the programming language Java, built on top of HDFS, high reliability, high performance, column-store, scalable, real-time database system to read and write. It can only be retrieved by the primary data key (row key), and the primary key range, is mainly used to store data loose semi-structured and unstructured. Like with hadoop, HBase target mainly depend on scale, by increasing low-cost commodity servers to increase the computing and storage capacity. HBase database tables generally have such characteristics:

  • Large: A table can have hundreds of millions of rows on one million
  • For columns: column (group) for storage and access control, the column (group) BIR
  • Sparse: For column is empty (null) and does not take up storage space, so the table can be designed very sparse

Architecture:

Client main features:

  • HBase using RPC mechanism to communicate with HMaster and HRegionServer
  • For operation Management, Client and an RPC HMaster
  • For data read-type operation, Client with an RPC HRegionServer

Zookeeper features:

  • Ensure that any time, only one cluster master, will be registered with the Master and RegionServers start when ZooKeeper
  • Region server real-time monitoring of on-line and off-line information, and real-time notification to Master
  • metadata table entry address and store all Region of HBase

HMaster features:

  • Management HRegionServer, to achieve its load balancing
  • Management and distribution HRegion, such as assigning new HRegion when HRegion split; migrated therein when HRegionServer exit HRegion to other HRegionServer
  • Monitor the status of all cluster HRegionServer (via Heartbeat and listening ZooKeeper in the state)

HRegionServer features:

  • Region server maintenance Master assigned to its region, the handling of these region IO requests
  • Region server is responsible for segmentation becomes too large during operation of the region

Summary:
Process · client access to the master data does not need to participate in the hbase (ZooKeeper access addressing, data read and write access regione server), master table and only the maintainer of the metadata information region, the low load
data processed · HRegion try and DataNode data resides together to achieve localization data

Data Model:

  • Table: similar to traditional relational database, HBase to table (Table) way of organizing data, application data into HBase table

  • Row: HBase row in a table is uniquely identified via RowKey, whether digital or strings will eventually be converted into a field for storing data; HBase table rows are arranged in the order of dictionary RowKey

  • Column Family: HBase table organized by rows and columns, the column while introducing the concept of family, it will be one or more columns are grouped together, HBase columns must belong to a family row, just specify the table name when creating the table and at least one column family

  • Cell: row and column is called a cross-point cell, the cell content is the value of the column is stored in binary form, and it is versioned

  • version: the plurality of values ​​for each cell can be stored version of the data (in the end support several versions can be specified at the time of construction of the table), are arranged in reverse chronological order, a 64-bit integer time stamp, may be assigned when writing data, It can also be automatically assigned RegionServer

note:

  • HBase no data type, any column values ​​are converted into a string and perform different types of storage columns relational database when creating the table for an explicitly included, each row of the table may have different HBase columns
  • RowKey same insertion operation is considered to be operating in the same row. I.e., the same secondary RowKey write operation, the second time may be may be updating that some of the columns of the row
  • Column group are connected by row and column names from, the separator is a colon, such as d: Name (d: Group Name column, Name: column name)

summary:

  • HBase does not support conditional queries and other inquiries Order by reading records can only Row key (and range) or a full table scan
  • Simply declare the table name and family name at least one column in the table is created, each Column Family is a storage unit
  • HBase table in designing a practical application is strongly recommended to use a separate family
  • Defined can dynamically add, the same Column Family Columns will cluster in a storage unit, and sorted by Key Column, and therefore should be designed with the same I / O characteristics do not create a table in a design of the Column Column Column Family on to improve performance. Note: This column can be added and deleted, this traditional database and our big difference. So he fit unstructured data
  • HBase is determined by a row and column data, the value of this data may have multiple versions, different versions of the values ​​in chronological reverse order, ie the latest data at the top, the default query returns the latest version.
  • Timestamp default is the current system time (accurate to milliseconds), the value can also be specified when writing data
    -values for each cell unique index keys 4, tableName + RowKey + ColumnKey + Timestamp => value
  • Storage Type
  • TableName is a string
  • RowKey ColumnName and a binary value (Java type byte [])
  • Timestamp is a 64-bit integer (Java type long)
  • value is an array of bytes (Java type byte [])

HBase addressing, how to find the region where a row key when Client access to user data?
0.94 version, you must first access before the Client access to user data zookeeper, then visit -ROOT- table, then visit .META table, and finally to find the location of user data to access, require multiple network operating center, as shown below: Here Insert Picture Description
0.96+ revision deleted root table instead zookeeper inside the file, as shown a, to read, for example, such as addressing schematic B:
Here Insert Picture Description

Published 36 original articles · won praise 13 · views 1055

Guess you like

Origin blog.csdn.net/weixin_44598691/article/details/105010593