Hive基于MySQL保存元数据的安装

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/liuguangrong/article/details/52345399

Hive下载
Hive官方网站:http://hive.apache.org/
Hive官方下载:http://hive.apache.org/downloads.html
Apache归档:Apache Software Foundation Distribution Directory
本次下载版本:apache-hive-0.13.1-bin.tar.gz
解压Hive

$ tar zxvf apache-hive-0.13.1-bin.tar.gz -C /opt/modules/
$ cd /opt/modules/
$ mv apache-hive-0.13.1-bin/ hive-0.13.1

配置Hive

$ cd /opt/modules/hive-0.13.1/conf
$ cp hive-env.sh.template hive-env.sh

编辑hive-env.sh修改如下两行代码

$ vim hive-env.sh
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/opt/modules/hadoop-2.5.0
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/modules/hive-0.13.1/conf

验证Hive
运行Hive之前,先启动Hadoop,需要在HDFS上创建/tmp和/user/hive/warehouse文件夹,并需要给新创建的文件夹写权限,如下代码所示:

$ cd /opt/modules/hadoop-2.5.0/
$ bin/hdfs dfs -mkdir /tmp
$ bin/hdfs dfs -mkdir -p /user/hive/warehouse
$ bin/hdfs dfs -chmod g+w /tmp
$ bin/hdfs dfs -chmod g+w /user/hive/warehouse

至此Hive内嵌模式已经安装完成,如下命令来验证hive安装:

$ cd /opt/modules/hive-0.13.1/
$ bin/hive

如下信息表示Hive内嵌模式安装成功。

Logging initialized using configuration in jar:file:/opt/modules/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.576 seconds, Fetched: 1 row(s)

MySQL保存元数据
下载MySQL源

$ wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm

安装mysql-community-release-el7-5.noarch.rpm包

$ sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm

安装mysql

$ sudo yum install -y mysql-server

启动MySQL

$ sudo service mysqld start

配置MySQL开机启动

$ sudo chkconfig mysqld on

设置MySQL root密码

$ mysqladmin -u root password 'hive'

登录MySQL

$ mysql -uroot -p

配置远程登录

mysql> grant all privileges on *.* to 'root'@'%' identified by 'hive' with grant option;

删除原用户信息

mysql> use mysql
mysql> delete from user where host='localhost' and user='root';

最后只剩如下root记录

mysql> select host, user, password from user;
+------+------+-------------------------------------------+
| host | user | password                                  |
+------+------+-------------------------------------------+
| %    | root | *4DF1D66463C18D44E3B001A8FB1BBFBEA13E27FC |
+------+------+-------------------------------------------+

重启MySQL服务

mysql> quit;
$ sudo service mysqld restart

配置Hive使用MySQL存储

$ cd /opt/modules/hive-0.13.1/
$ cp conf/hive-default.xml.template conf/hive-site.xml

修改hive-site.xml文件

$ vim conf/hive-site.xml

<configuration>
    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://hadoop01.malone.com:3306/metastore?createDatabaseIfNotExist=true</value>
      <description>JDBC connect string for a JDBC metastore</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
      <description>Driver class name for a JDBC metastore</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>root</value>
      <description>username to use against metastore database</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>hive</value>
      <description>password to use against metastore database</description>
    </property> 
</configuration>

导入MySQL驱动包

$ mv mysql-connector-java-5.1.27-bin.jar /opt/modules/hive-0.13.1/lib/

HQL语句测试

$ cd /opt/modules/hive-0.13.1/
$ bin/hive
hive> show databases;
OK
default
Time taken: 1.418 seconds, Fetched: 1 row(s)
hive> create database if not exists hive_testdb;
OK
Time taken: 1.084 seconds
hive> use hive_testdb;
OK
Time taken: 0.027 seconds
hive> show tables;
OK
Time taken: 0.029 seconds
hive> create table employee(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 1.542 seconds
hive> load data local inpath '/opt/datas/hive/employee.txt' into table employee;
Copying data from file:/opt/datas/hive/employee.txt
Copying file: file:/opt/datas/hive/employee.txt
Loading data to table hive_testdb.employee
Table hive_testdb.employee stats: [numFiles=1, numRows=0, totalSize=52, rawDataSize=0]
OK
Time taken: 1.939 seconds
hive> desc employee;
OK
id                      int                                         
name                    string                                      
Time taken: 0.185 seconds, Fetched: 2 row(s)
hive> desc extended employee;
OK
id                      int                                         
name                    string                                      

Detailed Table Information  Table(tableName:employee, dbName:hive_testdb, owner:hadoop, createTime:1472398263, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://hadoop01.malone.com:8020/user/hive/warehouse/hive_testdb.db/employee, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=  , field.delim=
Time taken: 0.161 seconds, Fetched: 4 row(s)
hive> desc formatted employee;
OK
# col_name              data_type               comment             

id                      int                                         
name                    string                                      

# Detailed Table Information         
Database:               hive_testdb              
Owner:                  hadoop                   
CreateTime:             Sun Aug 28 23:31:03 CST 2016     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://hadoop01.malone.com:8020/user/hive/warehouse/hive_testdb.db/employee  
Table Type:             MANAGED_TABLE            
Table Parameters:        
    COLUMN_STATS_ACCURATE   true                
    numFiles                1                   
    numRows                 0                   
    rawDataSize             0                   
    totalSize               52                  
    transient_lastDdlTime   1472398294          

# Storage Information        
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    field.delim             \t                  
    serialization.format    \t                  
Time taken: 0.264 seconds, Fetched: 33 row(s)
hive> select * from employee;
OK
1   burce.lee
2   jacky.chen
3   elbert.malone
4   andy.lau
Time taken: 0.817 seconds, Fetched: 4 row(s)
hive> select id from employee;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1472391663133_0001, Tracking URL = http://hadoop01.malone.com:8088/proxy/application_1472391663133_0001/
Kill Command = /opt/modules/hadoop-2.5.0/bin/hadoop job  -kill job_1472391663133_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-08-28 23:35:16,716 Stage-1 map = 0%,  reduce = 0%
2016-08-28 23:35:50,749 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.84 sec
MapReduce Total cumulative CPU time: 1 seconds 840 msec
Ended Job = job_1472391663133_0001
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 1.84 sec   HDFS Read: 294 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 840 msec
OK
1
2
3
4
Time taken: 86.453 seconds, Fetched: 4 row(s)

Hive常用属性配置
cli命令行显示数据库名称和列标题名称

$ cd /opt/modules/hive-0.13.1/
$ vim conf/hive-site.xml

新增如下配置信息

<property>
  <name>hive.cli.print.header</name>
  <value>true</value>
  <description>Whether to print the names of the columns in query output.</description>
</property>

<property>
  <name>hive.cli.print.current.db</name>
  <value>true</value>
  <description>Whether to include the current database in the Hive prompt.</description>
</property>

修改后的效果

$ bin/hive

Logging initialized using configuration in jar:file:/opt/modules/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive (default)> show databases;
OK
database_name
default
hive_testdb
Time taken: 0.768 seconds, Fetched: 2 row(s)
hive (default)> use hive_testdb;
OK
Time taken: 0.028 seconds
hive (hive_testdb)> show tables;
OK
tab_name
employee
Time taken: 0.063 seconds, Fetched: 1 row(s)
hive (hive_testdb)> select * from employee;
OK
employee.id employee.name
1   burce.lee
2   jacky.chen
3   elbert.malone
4   andy.lau
Time taken: 0.917 seconds, Fetched: 4 row(s)

配置Hive的日志信息

$ cd /opt/modules/hive-0.13.1/conf
$ cp hive-log4j.properties.template hive-log4j.properties
$ vim hive-log4j.properties

修改如下信息

# Define some default values that can be overridden by system properties
hive.log.threshold=ALL
hive.root.logger=INFO,DRFA
hive.log.dir=/opt/modules/hive-0.13.1/logs
hive.log.file=hive.log

猜你喜欢

转载自blog.csdn.net/liuguangrong/article/details/52345399