As you can see from the homepage of the hadoop official website, there are now five modules that come with the hadoop project, namely:
hadoop common
hdfs
yarn
mapReduce
hadoop Ozone
The first item can be seen from the name is the basic function module, hdfs is the file storage system, yarn is the scheduling and cluster management, and mapReduce is the data calculation and processing. These are the ones that will inevitably come into contact with the beginning of learning to use hadoop.
The last Hadoop Ozone is a distributed object storage system. This is a supplement to hdfs. It is a relatively new content. It appears relatively infrequently in people's mouths. Many people may not know it at first, including me.
It can also be seen from here that people who hear about big data generally hear about hbase, hive, spark, etc., which do not actually belong to the hadoop project itself, but other related projects that can be used based on hadoop.
As for the modules of Hadoop itself, as long as the corresponding version of Hadoop supports, these module functions can be used as long as Hadoop is installed. Associated projects must be installed and deployed independently, and then associated in the configuration.
Hadoop can do a lot of things, but when you mention it, you must think of big data the first time, and when it comes to big data, you must think of data processing, calculation and data storage.
If you put aside hadoop Ozone first, the data storage in the big data system I know currently cannot be separated from hdfs and hbase. Personal understanding, hbase can be regarded as a supplement to hdfs in a sense, because from the front If you understand it, you can know that hdfs does not support file content modification, and there are bound to be disadvantages as well as advantages.
We know that the database is finally stored in the form of files. Whether it is mysql, mongodb or hbase, the current underlying file system of hbase is supported by hdfs. I have verified that there is no problem, but the Internet says that it is also possible to change to other file systems. No problem, this requires further verification.
Based on the above ideas, after initially learning hdfs, it should be a basic literacy, installation, configuration and basic operation of hbase.
hbase download
Hbase can find the installation package download page http://hbase.apache.org/downloads.html from the official website. This page lists many versions. By checking the release notes, I see that version 2.2.5 already supports version 3.2.x hadoop, and my hadoop is 3.1.3, so I chose this version.
There are many ways to obtain the installation package, which have been mentioned in the redis installation and hadoop installation articles. If you want to know, you can move to the
hadoop installation environment preparation and related knowledge analysis
. Redis installation in Linux and software installation related Linux knowledge points
I will directly here Use the fastest way:
wget https://downloads.apache.org/hbase/2.2.5/hbase-2.2.5-bin.tar.gz
Unzip
tar -zxvf hbase-2.2.5-bin.tar.gz
hbase-env.sh configuration
Configure it in time after decompression. The first thing to configure is the file hbase-env.sh, which is in the conf directory of the installation directory. For example, if my hbase installation directory is /root/soft/bigdata/hbase/hbase-2.2.5
, the file path is /root/soft/bigdata/hbase/hbase-2.2.5/conf/hbase-env.sh
. The file needs to be configured as follows
export JAVA_HOME=/root/soft/jdk1.8.0_261
export HBASE_CLASSPATH=/root/soft/bigdata/hbase/hbase-2.2.5/conf
export HBASE_MANAGES_ZK=true
It should be noted that the actual operation is to replace the installation path of your own jdk and hbase.
hbase-sit.xml configuration
Then you want to configure is hbase-sit.xml, where you need to specify the directory to store hbase hdfs system data, due to my hadoop distributed mode, the configuration also need to turn on a distributed mode, these configurations need to be placed <configuration>
and </configuration>
middle :
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.139.9:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!--
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
-->
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
Of the above several configurations, the first one points to the hdfs file system I built, where hbase is still a non-existent directory and will be created automatically during subsequent use.
The second configuration is for my hadoop distributed mode configuration, hbase defaults this configuration to false, that is, stand-alone mode.
The third item seems to be a temporary directory. The hbase configuration file originally comes with it. I am not sure of its usefulness for the time being. I will comment it first.
The fourth item is also included in the hbase configuration file. Looking at the online explanation, it is to avoid startup errors. I will leave it unchanged for now.
hbase environment variable configuration
This is a habitual operation that can make the operation more convenient. You can execute hbase-related commands in any directory, and you don’t need to configure it if you don’t want to configure it:
export HBASE_HOME=/root/soft/bigdata/hbase/hbase-2.2.5
export PATH=$PATH:$HBASE_HOME/bin
ServerNotRunningYetException
According to the online tutorial, you should be able to start with the above configuration, provided that the hadoop has been started.
After I started hadoop, I executed the hbase startup command and start-hbase.sh
found that the result was an error:
ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
I searched the Internet and said that it was because hdfs opened the safe mode, but after I used to hdfs dfsadmin -safemode get
check the hdfs safe mode status, I found that I was closed here, and the reason for this error must be more than this one. (Note: It hdfs dfsadmin -safemode enter
can be turned on hdfs dfsadmin -safemode leave
manually or turned off manually)
So I checked the startup log of hbase, and finally found that I was careless. When configuring the ip of hdfs, 192.168.139.9
it was typed 12.168.139.9
incorrectly. After changing the ip to the correct one, it succeeded. start up.
multiple SLF4J bindings
Although hbase started successfully, an alarm reminder appeared when it started:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/soft/bigdata/hadoop/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
This reminder is still very clear, that is, there is a conflict between hbase and hadoop-related log dependencies, so I changed the name of the log jar in hbase, and it was fine.
verification
shell connection
After hbase is started successfully, like mysql, mongodb, redis, hbase also has its own shell client tool, which can be connected and operated:
hbase shell
After executing the above command, you can enter the command line interface of hbase. Maybe it is because of my machine. It takes a long time to enter.
Create table
The simple operation of creating a table structure in hbase is as follows:
create 'user','name','age','addr','phone','email'
The meaning of the above command is to create a table named user, which contains attributes such as name, age, addr, phone, and email.
Master is initializing
When creating the table above, an episode occurred. The following exception was thrown during the first creation:
ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2811)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:2018)
at
Literally, the master is being started, so I waited a while and re-executed it, and it was successfully created.
View table structure
After creating the hbase table, you can use the describe command to view the table structure, for example
describe 'user'
The output of my command above is as follows:
Table user is ENABLED
user
COLUMN FAMILIES DESCRIPTION
{NAME => 'addr', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERS
ION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE
=> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS =>
'0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE =
> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_
BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOC
KSIZE => '65536'}
{NAME => 'age', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSI
ON_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE =
> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS =>
'0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE =>
'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_B
LOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCK
SIZE => '65536'}
{NAME => 'email', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VER
SION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE
=> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS =
> '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE
=> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH
_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLO
CKSIZE => '65536'}
{NAME => 'name', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERS
ION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE
=> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS =>
'0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE =
> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_
BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOC
KSIZE => '65536'}
{NAME => 'phone', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VER
SION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE
=> 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS =
> '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE
=> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH
_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLO
CKSIZE => '65536'}
5 row(s)
QUOTAS
0 row(s)
Took 4.6373 seconds
So far, it proves that the hbase installation configuration is indeed available.
Stop hbase
The hbase stop is originally very simple, start it start-hbase.sh
, then it is normally said to stop stop-hbase.sh
, in fact it is indeed.
But I encountered a small problem when I stopped. When I stopped for the first time, it was always in the stopping state.
The Internet said that it needs to be executed first hbase-daemons.sh stop regionserver
. I tried it and it was really effective. Then the execution stop-hbase.sh
stopped immediately. However, this phenomenon only appeared once, and stop-hbase.sh
it stopped quickly every time afterwards.
For the above part, please refer to the following article:
http://dblab.xmu.edu.cn/blog/2442-2/