Beginner Hadoop-HBase command line client use

1. Introduction to HBase

HBase is a distributed, column-oriented open source database, derived from a Google paper "BigTable: A Distributed Storage System for Structured Data". HBase stores data in the form of tables, which are composed of rows and columns, and the columns are divided into several column families/column families.

There are three modes of operation of HBase: stand-alone mode, pseudo-distributed mode, and distributed mode:
stand-alone mode: install and use HBase on one computer without involving distributed storage of data;
pseudo-distributed mode: on one computer Simulate a small cluster;
distributed mode: use multiple computers to realize distributed storage in a physical sense. This is for learning purposes.

2. Experimental environment

Operating system: Ubuntu64 bit
Hadoop version: Hadoop 2.7.1
Jdk version: jdk-8u241-linux-x64
HBase version: HBase1.1.2

Three, install HBase

(1) Download the HBase installation compressed package
Download URL: Hbase download URL

(2) Decompress the HBase compressed package

sudo tar -zxf ~/下载/hbase-1.1.2-bin.tar.gz -C /usr/local 
#解压安装包hbase-1.1.2-bin.tar.gz至路径 /usr/local

And change the decompressed file name hbase-1.1.2 to hbase:

sudo mv /usr/local/hbase-1.1.2 /usr/local/hbase

Insert picture description here

(3) Configure environment variables

Add the bin directory under hbase to the path, so that starting hbase does not need to go to the /usr/local/hbase directory, which greatly facilitates the use of hbase.

vi ~/.bashrc

Add the following at the end of the ~/.bashrc file:

export PATH=$PATH:/usr/local/hbase/bin

Insert picture description here
After editing, execute the source command to make the above configuration effective immediately in the current terminal. The command is as follows:

source ~/.bashrc

(4) Add HBase permissions

cd /usr/local
sudo chown -R hadoop ./hbase       
#将hbase下的所有文件的所有者改为hadoop,hadoop是当前用户的用户名。

(5) Check the HBase version and confirm that the hbase installation is successful, the command is as follows:

hbase -version

Insert picture description here

Four, HBase configuration

HBase has three operating modes, stand-alone mode, pseudo-distributed mode, and distributed mode.
Must ensure: Java, Hadoop, shh configuration is completed.

(1) Stand-alone mode configuration

① Configure /usr/local/hbase/conf/hbase-env.sh

Open and edit hbase-env.sh with vi command:

vi /usr/local/hbase/conf/hbase-env.sh

Insert picture description here

② Configure /usr/local/hbase/conf/hbase-site.xml

Open and edit hbase-site.xml:

vi /usr/local/hbase/conf/hbase-site.xml

Before starting HBase, you need to set the property hbase.rootdir, which is used to specify the storage location of HBase data, because if it is not set, hbase.rootdir defaults to /tmp/hbase-${user.name}, which means that every time the system is restarted Will lose data. Here is set to the hbase-tmp folder under the HBase installation directory (/usr/local/hbase/hbase-tmp), add the following configuration:

<configuration>       
<property>               
<name>hbase.rootdir</name>
<value>file:///usr/local/hbase/hbase-tmp</value>       
</property>
</configuration>

Insert picture description here

③ Test run

Start HBase:

start-hbase.sh

Insert picture description here

Open the shell command line mode:

hbase shell

Insert picture description here

Exit the shell command line mode:

Exit

Insert picture description here

Stop HBase running:

stop-hbase.sh

Insert picture description here

(2) Pseudo-distributed mode configuration

① Configure /usr/local/hbase/conf/hbase-env.sh

vi /usr/local/hbase/conf/hbase-env.sh

Insert picture description here

配置JAVA_HOME,HBASE_CLASSPATH,HBASE_MANAGES_ZK.

HBASE_CLASSPATH is set to the conf directory under the local Hadoop installation directory (ie /usr/local/hadoop/conf)

② Configure /usr/local/hbase/conf/hbase-site.xml

vi /usr/local/hbase/conf/hbase-site.xml

Insert picture description here

hbase.rootdir specifies the storage directory of HBase; hbase.cluster.distributed sets the cluster in distributed mode.

③ Test and run HBase

First log in to ssh, before setting up passwordless login, so no password is needed here:

ssh localhost

Start hadoop:

start-dfs.sh

Insert picture description here

Enter the command jps:

Insert picture description here

Start HBase:

start-hbase.sh

Insert picture description here

Open the shell command line mode:

hbase shell

Insert picture description here

Exit the shell command line mode:

exit

Insert picture description here

Stop HBase running:

stop-hbase.sh

Insert picture description here

Five, programming practice (using shell commands)

Note: You
need to turn on Hadoop first, then turn on HBASE, and then turn on the shell command format
to use the shell command in HBASE.
When closing, you need to exit the shell command format, then close HBASE, and then close Hadoop.
Insert picture description here

(1) Create a table in HBase

create 'student','Sname','Ssex','Sage','Sdept','course'

Insert picture description here

That is, a "student" table is created; the
attributes are: Sname, Ssex, Sage, Sdept, course.
Because the HBase table has a system default attribute as the row key, you don't need to create it yourself. The default is the first data after the table name in the put command operation.
After creating the "student" table, you can use the describe command to view the basic information of the "student" table. The command execution screenshot is as follows:
Insert picture description here

(2) Basic operation of HBase database

That is to add, delete, modify, and check operations of HBase.

Note:
When adding data, HBase will automatically add a timestamp to the added data, so when you need to modify the data, you only need to add the data directly, and HBase will generate a new version to complete the "modification" operation. The version is still retained, and the system periodically recycles garbage data, leaving only the latest versions. The number of saved versions can be specified when the table is created.

① Add data

Use the put command to add data in HBase;
Note:
You can only add data to one column of a row of data in a table at a time, that is, one cell. Therefore, directly using the shell command to insert data is very inefficient. In practical applications, generally It uses programming to manipulate data.

put 'student','95001','Sname','LiYing'

Insert picture description here

That is, a row of data with a student ID of 95001 and a name of LiYing is added to the student table, and its row key is 95001.

put 'student','95001','course:math','80'

Insert picture description here

② Delete data

Use delete and deleteall commands to delete data in HBase;
the difference between them is:

  1. delete is used to delete a piece of data, which is the reverse operation of put;
  2. The deleteall operation is used to delete a row of data.
delete 'student','95001','Ssex'

Insert picture description here

Delete all the data in the Ssex column under row 95001 in the student table.

Read the data in the student table:

get ‘student’

Insert picture description here

deleteall 'student','95001'

Insert picture description here

③ View data

There are two commands for viewing data in HBase:

  1. The get command is used to view a row of data in the table;
  2. The scan command is used to view all the data of a table
get 'student','95001'

Insert picture description here

scan 'student'

Insert picture description here

④ Delete table

There are two steps to delete a table. The first step is to make the table unavailable, and the second step is to delete the table.

disable 'student'  
drop 'student'

Insert picture description here

(3) Query the historical data of the table

Querying the historical version of the table requires two steps.

① When creating the table, specify the number of saved versions (assuming that 5 is specified)

create 'teacher',{
    
    NAME=>'username',VERSIONS=>5}

Insert picture description here
② Insert the data and then update the data to generate historical version data. Note: Insert and update data here are both put commands

put 'teacher','91001','username','Mary'

put 'teacher','91001','username','Mary1'

put 'teacher','91001','username','Mary2'

put 'teacher','91001','username','Mary3'

put 'teacher','91001','username','Mary4'  

put 'teacher','91001','username','Mary5'

Insert picture description here

③ When querying, specify the number of historical versions to query. The latest data will be queried by default. (Valid values ​​are 1 to 5)

 get 'teacher','91001',{
    
    COLUMN=>'username',VERSIONS=>5}

Insert picture description here

If you do not specify the number of historical versions of the query:

Insert picture description here

Exit the HBase database table operation:

exit

Insert picture description here

Note:
Exiting the HBase database here means exiting the operation on the database table, rather than stopping the background operation of the HBase database.

This article is mainly based on the experimental tutorial of teacher Lin Ziyu when I was learning Hadoop, and it was compiled by myself and practiced.

Guess you like

Origin blog.csdn.net/qq_45154565/article/details/109187782