Analysis of Hbase database and table design

       Recently, I am working on the knowledge reserve of big data, I have organized and written my own study notes, and briefly talked about Hbase data and design.

      First, HBase is a distributed, column-oriented open source database, the technology is derived from the Google paper "Bigtable: A Distributed Storage System for Structured Data" written by Fay Chang. ) provides distributed data storage, HBase provides capabilities similar to Bigtable on top of Hadoop. HBase is a sub-project of Apache's Hadoop project. HBase is different from general relational databases, it is suitable for unstructured data. Storage database. Another difference is HBase's column-based rather than row-based schema.

      HBase is a NoSQL database used to process massive data and can support large tables with 1 billion rows and millions of columns. Let's understand the table design of HBase database by comparing it with relational databases

      The table structure of relational databases , in order to better understand the idea of ​​HBase tables, here is a review of the processing methods of tables in relational databases

       For example, there is a user table user_info with fields: id, name, tel, table name and fields need to be specified when creating the table

      create table user_info (

             id type,

            name type,

             tel type

      )

     Then insert two data insert into user_info values('...','...','...')

     The table structure is as follows

id

name

tel

1

Akari

123

2

little king

456

Later, the fields are not enough, and new users need to record the address , so they need to add a new field

id

name

tel

addr

1

Akari

123

 

2

little king

456

 

When the demand is increased in the future, continue to add new fields, or add an expansion table

The main contents of the above are:

  • The way to create a table, you need to specify the table name and fields in advance
  • The method of inserting records, specifying the table name and the value of each field
  • A data table is a two-dimensional structure, with rows and columns
  • Adding fields is not flexible

   Let's take a look at how HBase handles it

   HBase table structure

When creating a table, you need to specify: table name, column family

create table statement

create 'user_info', 'base_info', 'ext_info'

Means to create a new table, the name is user_info, contains two column families base_info and ext_info

A column family  is a collection of columns, and a column family contains multiple columns

The table structure at this time:

row key

base_info

ext_info

...

...

...

Row key  is the row key, the ID of each row, this field is created automatically, you don't need to specify it when creating the table

Insert a piece of user data: name is 'a', tel is '123'

insert statement

put 'user_info', 'row1', 'base_info:name', 'a'

put 'user_info', 'row1', 'base_info:tel', '123'

It means to add a data name:a to the base_info column family with row key row1 in the user_info table, and then add a data tel:123

name and tel are specific fields, which belong to the column family of base_info

The table structure at this time:

row key

base_info

ext_info

row1

name:a, tel:123

 

Insert another piece of data: name is 'b', addr is 'beijing'

put 'user_info', 'row2', 'base_info:name', 'b'

put 'user_info', 'row2', 'ext_info:addr', 'bj'

The table structure at this time:

row key

base_info

ext_info

row1

name:a, tel:123

 

row2

name:b

addr:bj

There is also an important concept in HBase tables: version , the value of each field has version information (specified by timestamp)

For example, base_info:name will retain the previous value every time it is modified, that is to say, its old value can be retrieved

row key

base_info

ext_info

row1

name:a, tel:123

 

row2

name:c(v2)[name:b(v1)]

addr:bj

summary

From the above process of creating tables and inserting data, we can see the characteristics of HBase storage data.

  • Like relational databases, it also uses a row and column structure
  • When creating a table, the table name and column family (collection of fields) are defined, not specific fields
  • A column family can contain any number of fields, the field names do not need to be predefined, and the fields in the same column family in each row can also be inconsistent
  • Multidimensional structure, the table of relational database is two-dimensional, by referring to the row and column to locate a data, HBase needs to locate the specific data through the row key, column family name, field name, version number
  • When inserting data, insert data of one field at a time, instead of inserting multiple fields at a time like a relational database

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324930429&siteId=291194637