Big Vernacular Explains Big Data HBase in detail, Lao Liu is really attentive (1)

Insert picture description here
Lao Liu reviewed HBase knowledge today and found that many materials did not clarify the concept, and there were many professional terms without explanation. For example, this framework is high-performance and high-availability, so what is high-performance and high-availability? How to achieve high performance and high availability? I didn't say it at all!

How would the interviewer respond if he listened to what you said? My feeling is that what you are talking about is someone else's, without your own understanding. That's why Lao Liu wrote the series about big data in big vernacular, just trying to make things clear and clear! If you think Lao Liu wrote well, give Lao Liu a thumbs up!

01 HBase knowledge points

Insert picture description here
Point 1: Definition of HBase

Insert picture description here
The red box on the official website directly says that HBase is a distributed and scalable big data storage Hadoop database. Imagine that you are an interviewer and others answer you like this. Will you be satisfied? Lao Liu felt that what he learned must be spoken in his own words to be truly mastered.

In Lao Liu's view, HBase, or Hadoop database, is Hadoop database.

Its data is usually stored on HDFS. HDFS provides HBase with high-reliability underlying storage support; uses Hadoop MapReduce to process the massive data in HBase, providing HBase with high-performance computing capabilities; using ZooKeeper to provide HBase Stable service.

Based on the above, it can be concluded that HBase is a distributed database built on HDFS, with high reliability, high performance, scalability, and support for massive data storage.

Generally, the HBase database is used when the stored data is relatively large and the read and write performance requirements are relatively high.

Do you know what is higher reading and writing performance? That is, when the reading is faster and the writing is faster, the read and write performance is higher!

Point 2: Features of HBase

1) Extremely easy to expand

The bottom layer of HBase depends on HDFS. When the disk space is insufficient, we only need to dynamically increase the DataNode nodes. Of course, the storage of the cluster can also be expanded by adding servers.

2) Mass storage

Can store large quantities of data. On the premise of storing massive amounts of data, the data can be returned within tens to hundreds of milliseconds. This is very related to the extremely easy scalability of HBase. It is precisely because of the good scalability of HBase that it provides convenience for the storage of massive data.

3) Columnar storage (Here must figure out the difference between columnar storage and row storage)

The column storage here actually refers to column family storage, and HBase stores data according to column families. There can be many columns under the column family, and the column family must be specified when creating the table.

4) Sparse

Sparseness is mainly for the flexibility of HBase columns. In the column family, you can specify any number of columns. When the column data is empty, it will not occupy storage space.

5) Single data type

All its data is stored in byte arrays.

Point 3: The data model of the HBase table

Insert picture description here
First give a HBase design table, according to this table to introduce the structure of the table.

rowkey

1) It is the primary key of the table. The records in the table are sorted according to the lexicographical order of the rowkey. What is the lexicographic order?

The information from many organizations is just a single pass, and it is not responsible at all, which makes people very uncomfortable. Lao Liu had to search for related knowledge of lexicographic sorting.

In layman's terms, lexicographical sorting is probably the comparison of two content starting from the first letter, according to the ASCII code table, the smaller is placed first, the first is the same, the next is compared, both are the same and one is compared When it's over, the short one comes first.

Common ASCII code size rules: 0 9<A Z<a~z

2) Rowkey row key, it can be any string, among which its maximum length is 64KB, and the length in practical application is generally 10-100 bytes.

Column Family列族

1) Each column in the HBase table belongs to a certain column family.

2) The column family is part of the schema of the table (and the column is not), that is, at least one column family is specified when the table is created.

3) For example, if we create a user table, which contains two column families info and data, the code is create'user','info','data'.

Column

A column must be a column under a certain column family of the table. It is represented by column family name: column name. For example, a infocolumn under a namecolumn family is expressed as info:name.

Cell

Insert picture description here
As shown in the figure above, we can determine a Cell cell by specifying the rowkey row key, column family, and column. The data in the Cell does not have any type, all are stored in the form of byte arrays.

Timestamp

This means that multiple assignments can be made to the Cell in the table, and the timestamp of each assignment operation can be regarded as the version number of the Cell value.

That is, a Cell can have multiple versions of the value.

Point 4: HBase architecture

Insert picture description here
As can be seen from the figure, this is a very typical master-slave architecture.

Let's talk about each component in detail below:

Client client

Client is the entrance to the operation of the HBase cluster. Using Client, it can communicate with HMaster through RPC to complete table addition, deletion, and modification operations. It can also communicate with RegionServer through RPC to complete the operation of reading and writing table data. Generally, we can use HBase shell or Java API to program to complete the operation of the above table.

ZooKeeper cluster

What is ZooKeeper and what is the use? Old Liu has already explained in detail in the article on the ZooKeeper framework. You can check it out. Then its role in the HBase cluster is very obvious. ① It realizes the high availability of HMaster, and the master and slave elections are carried out among multiple HMasters. ② The meta table of HBase metadata information is saved. ③ Monitor each node of HMaster and HRegionServer.

HMaster

The HBase cluster is also a master-slave architecture. HMaster is the master role and the leader of the cluster, mainly responsible for managing tables and regions. What does it do to manage tables and regions?

1) Manage some operations of Client adding, deleting and modifying tables;

2) The operation of managing Region will be a little more than that of managing Client. When the Region is split, it is responsible for the allocation of the new Region to the designated HRegionServer; when the HRegionServer goes down, it is responsible for the migration of the region on it; and manages the load balancing between the HRegionServer.

Do you understand load balancing and what is load balancing?

For example, when a website is first established, its traffic volume is relatively small, but when its traffic becomes extremely large, the concurrent volume becomes extremely large, and the website will experience some phenomenon of visit delay. At this time, load balancing is needed. This website used to be a single server. Now you can configure multiple server clusters so that the access traffic can be distributed to different servers in the cluster, which greatly reduces the pressure on a single server, that is, load balancing wants to do Thing.

HRegionServer

It is the slave role in the HBase cluster and the younger brother in the cluster. It is mainly responsible for responding to client requests for reading and writing data, and for managing a series of Regions.

Region

It is the smallest unit of distributed storage in the HBase cluster. A Region corresponds to part of the data in a Table. The simple understanding is that the tables are stored in HBase, and they are all stored in Regions.

Point 5: HBase shell commands

Talk about some common basic operation commands:

创建user表,包含info、data两个列族
create 'user', 'info', 'data'

使用put命令向user表中插入信息,row key为rk0001,列族info中添加名为name的列,值为zhangsan
put 'user', 'rk0001', 'info:name', 'zhangsan'

获取user表中row key为rk0001的所有信息(即所有cell的数据)
get 'user', 'rk0001'
获取user表中row key为rk0001,info列族的所有信息
get 'user', 'rk0001', 'info'

更新数据操作 将user表的f1列族版本数改为5
alter 'user', NAME => 'info', VERSIONS => 5

删除数据以及删除表操作
删除user表row key为rk0001,列标示符为info:name的数据
delete 'user', 'rk0001', 'info:name'

清空表数据
truncate 'user'

删除表
首先需要先让该表为disable状态,使用命令:
disable 'user'
然后使用drop命令删除这个表
drop 'user'
注意:如果直接drop表,会报错:Drop the named table. Table must first be disabled

Let's talk about the advanced operation commands of HBase:

显示服务器状态
status 'node01'

显示HBase当前用户
whoami

显示当前所有的表
list

统计指定表的记录数
count 'user' 

检查表是否存在,适用于表量特别多的情况
exists 'user'

检查表是否启用或禁用
is_enabled 'user'
is_disabled 'user'

禁用一张表/启用一张表
disable 'user'
enable 'user'

删除一张表,记得在删除表之前必须先禁用
drop

The above are the contents of HBase shell commands, as well as the contents of table operations using JAVA API, but Lao Liu will not explain this part. If necessary, you can contact Lao Liu and I will share with you.

02 HBase summary

Today is the first part of HBase knowledge points. Lao Liu tried to use plain language to explain these knowledge points. If you have any questions, you can contact the public account: Lao Liu who works hard.

Finally, I hope that what I said today will be helpful to students who are interested in big data, and I hope to get your criticism and guidance.

Guess you like

Origin blog.csdn.net/qq_36780184/article/details/110141035
Recommended