Data warehouse Hive (a) - hive introduction, production, installation

1.Hive Profile

  • database
  • Interpreter, compiler, optimizer, etc.
  • Runtime metadata stored in a relational database inside

The difference between 1.1 database and data warehouse

  • Database need to return immediate results, do not need a data warehouse
  • Data warehouse that can accommodate a variety of data sources, and the database can only keep the product line
  • The database can be modified, the data warehouse can not be modified

1.2Hive generation

  • Non-java programmers do mapreduce operations on the data of hdfs

2.Hive architecture

        Figure 2.1 Chart

(1) There are three user interfaces: CLI, Client and WUI. One of the most commonly used CLI, Cli when activated, will start at the same time a copy of the Hive. Hive Client is a client, the user connects to the Hive Server. Client mode at startup, it is necessary to point out the Hive Server node, and start Hive Server in the node. WUI is accessed through a browser Hive.

(2) Hive metadata stored in the database, such as mysql, derby. Hive metadata includes the name of the table, and a list of partitions and their properties, property sheet (whether for the external table, etc.), directory and other data tables.

(3) interpreter, compiler, optimizer completed HQL query from lexical analysis, parsing, compilation, optimization and query plan generation. The generated query plan is stored in HDFS, MapReduce and subsequently have called for execution.

(4) Hive data stored in HDFS, most queries, calculations complete (containing * query, such as select * from tbl MapRedcue task does not generate) a MapReduce.

 

              2.2 transfer process described with FIG.

 

 

3. Installation hive steps of:

3.1 download, unzip

wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz


3.2 modify environment variables

vi /etc/profile
export HIVE_HOME=/opt/bigdata/hive-2.3.4


将bin目录添加到PATH路径中


3、修改配置文件,进入到/opt/bigdata/hive-2.3.4/conf

mv hive-default.xml.template hive-site.xml


增加配置:
进入到文件之后,将文件原有的配置删除,但是保留最后一行,从<configuration></configuration>

:.,$-1d


增加如下配置信息:

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://node01:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123</value>
</property>


4、添加MySQL的驱动包拷贝到lib目录
5、执行初始化元数据数据库的步骤

schematool -dbType mysql -initSchema


6、执行hive启动对应的服务
7、执行相应的hive SQL的基本操作

4.架构方式

        4-1 hive数据架构图

 4-2搭建模式(一)单hive形式->自带metastore_db模式{In-memory DB}

    4-3搭建模式(二)一个hive 一个数据库模式{三中的模式}

      4-3搭建模式(三)远程访问模式

 

远程访问模式:远程服务器模式 用于非Java客户端访问元数据库,在服务器端启动MetaStoreServer,客户端利用Thrift协议通过MetaStoreServer访问元数据库

 

Guess you like

Origin www.cnblogs.com/littlepage/p/11246548.html