Hadoop Big Data platform combat: Linux combat install HBase, and save the data

Apache HBase free open source Hadoop database, developed in Java, is a distributed, scalable NoSQL database. This article will detail HBase principles, architecture, new features, install Linux environment, the installation mode, table creation, simple CRUD operations.

 

1, HBase database description

HBase is an open source NoSQL database, mainly for large data platform. Inspired by Google published an article about the Big Table in 2006. When we need a large randomized data, real-time read / write access, you can use Apache HBase. HBase can host very large tables - billions of rows X one million, large-scale storage of irregular data sets. 

Apache HBase is an open source, distributed, versioned NoSQL non-relational database, mimic Google's Bigtable database: Chang et al, structured data distributed storage system. As Google Bigtable use of distributed data storage system provides the same file, Apache HBase provides Bigtable similar functionality on top of Hadoop and HDFS.

If you are ready to join the big data and big data regarding the current 2019

[Prospects] poke me read

[Jobs] poke me read

[Salary] Big Data poke me read

[Line] complete learning poke me read

Focus on micro-channel public number itdaima get a full set of development tools, as well as large data entry learning materials

2, HBase new features

1) linear scalability and modularity.

2) strictly consistent read and write operations.

3) Automatic partition, and sub-table automatic configuration fragment

4) automatic fail-over support between RegionServers.

5) convenience base class, using Apache HBase Hadoop MapReduce job table supports.

6) easy-to-use Java API, for client access.

7) prevent caching and Bloom filters for real-time query.

8) pushed down by the server query predicate filter

9) Thrift gateway and REST-ful Web services that support XML, Protobuf and binary data encoding options

10) may be extended based on jruby (JIRB) the client Shell

11) supported by Hadoop sub-index index will be exported to a file or Ganglia; or via JMX

3, HBase development history

Google published a paper on the Big Table in 2006,

HBase was originally a project undertaken by the company Powerset, due to large volume data for natural language search. At the end of 2006 and the beginning of the development of HBase.

HBase prototype in 2007 created for Hadoop contrib, the first available HBase release in 2007. 

In 2008, Hadoop has become the top-level Apache project, HBase as its subprojects. 

In addition, HBase 0.18,0.19 released in October 2008.

In 2010, HBase become a top-level Apache project. 

HBase 0.92 released in 2011. The latest version is 0.96.

Facebook in November 2010 chose to use HBase implementation of its new messaging platform, but in 2018 migrated from HBase.

As of February 2017, 1.2.x series is the stable version.

2019 The latest version is 2.1.4 version.

4, HBase architecture

HBase distributed architecture, the underlying data storage using HDFS support regional RegionServer mechanism automatically extended partition clusters, support for large data extension. Column group storing data using ColumnFamily mode. In HBase, the table is divided into regions by a different storage area server providing the service. Region is vertically divided by column family "memory area." Storage area saved as a separate file in HDFS. Shown below are the HBase distributed storage architecture.

 

5, HBase download and install

The following detailed description will be provided HBase independent single node. Examples of having all independent HBase daemons - Master, RegionServers and ZooKeeper - run in a single JVM, persisted to the local file system.

HBase installation is usually divided into three modes: Standalone, pseudo-distributed cluster, fully distributed cluster:

Standalone mode

Pseudo Distributed mode

Fully Distributed mode

HBase installation requirements to install the JDK, we use JDK8 version. We recommend the use of Open JDK.

6, installation JDK8

Install open source JDK8, free, fees will not cause problems.

 

sudo apt installdefault-jdk

View installed version of Java -version

7, install SSH

 

Test login, no password:

ssh localhost

8, install HBase database

Download, we select the current stable version 1.2.11 version. http://hbase.apache.org/downloads.html . We chose Tsinghua domestic server mirroring.

Waiting for the download is complete, extract, and moved to the installation directory, the following command:

tar zxvf hbase-1.2.11-bin.tar.gztar xvzf hbase-1.2.11-bin.tar.gzsudo mv hbase-1.2.11/usr/local/hbase/

9, the configuration HBase environment variables

After installation is complete, you can configure HBase environment variables.

Use vim ~ / .bashrc edit a configuration file, and then inserted into the environment variable HBase

To take effect, source ~ / .bashrc

10, HBase edit the configuration file

For single node configuration file can be edited, the path conf / hbase-site.xml, we can insert a reference to the following configuration: HBase and ZooKeeper we can specify the storage location in the configuration file, the default settings may be used.

11, start HBase database

Use ./start-hbase.sh start HBase database. Use JPS command to check for proper operation.

There has been a normal start HBase database.

You can also use status version whoami three commands to view the status, version, and account information

 

12, test HBase database, create tables, and save the data

Create a table test, column family cf, and save 3 data.

 

Read all the data

scan 'test'

scan 'users'

Get a single value

get 'test', 'row1'

 

Follow us explain Hadoop cluster architecture, HBase underlying principles and algorithms, memory model, the cluster structures.



 

Guess you like

Origin blog.csdn.net/huasdsadsa/article/details/94210612