Chapter 2 Hive installation
2.1 Hive installation address
- Hive official website address
http://hive.apache.org/ - Document viewing address
https://cwiki.apache.org/confluence/display/Hive/GettingStarted - Download
http://archive.apache.org/dist/hive/
2.2 Hive installation and deployment
- Hive installation and configuration
(1) the apache-hive-1.2.1-bin.tar.gz upload the linux / opt / software directory
(2) extracting apache-hive-1.2.1-bin.tar.gz to / opt / module / directory
[atguigu@hadoop102 software]$ tar -zxvf
apache-hive-1.2.1-bin.tar.gz -C /opt/module/
(3) modify the name apache-hive-1.2.1-bin.tar.gz is hive
[atguigu@hadoop102 module]$ mv apache-hive-1.2.1-bin/ hive
(4) modify the / opt / module / hive / hive-env.sh.template conf directory name is hive-env.sh
[atguigu@hadoop102 conf]$ mv hive-env.sh.template hive-env.sh
(5) Configuration file hive-env.sh
(a) Path Configuration HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
(B) Path Configuration HIVE_CONF_DIR
export HIVE_CONF_DIR=/opt/module/hive/conf
- Hadoop cluster configuration
(1) must be started and yarn hdfs
[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh
[atguigu@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh
(2) create / tmp and / user / hive / warehouse two directories and modify them in the same group permissions can be written on HDFS
(not operating, the system automatically creates)
[atguigu@hadoop102 hadoop-2.7.2]$ bin/hadoop fs -mkdir /tmp
[atguigu@hadoop102 hadoop-2.7.2]$ bin/hadoop fs -mkdir -p
/user/hive/warehouse
[atguigu@hadoop102 hadoop-2.7.2]$ bin/hadoop fs -chmod g+w /tmp
[atguigu@hadoop102 hadoop-2.7.2]$ bin/hadoop fs -chmod g+w
/user/hive/warehouse
- Hive basic operation
(1) Start hive
[atguigu@hadoop102 hive]$ bin/hive
(2) View database
hive> show databases;
(3) open the default database
hive> use default;
(4) Display default database table
hive> show tables;
(5) create a table
hive> create table student(id int, name string);
(6) There are several tables in the database are
hive> show tables;
Structure (7) View table
hive> desc student;
(8) inserting data into a table
hive> insert into student values(1000,"ss");
(9) the data look-up table
hive> select * from student;
(10) to exit the hive
hive> quit;
2.3 The local file into Hive Case
Needs
to import data in the local directory to the hive /opt/module/data/student.txt the Student (int ID, name
String) table.
1. Data preparation
to prepare the data in / opt / module / data directory
(1) create data in / opt / module / directory
[atguigu@hadoop102 module]$ mkdir data
(2) create student.txt files in / opt / module / datas / directory and add the data
[atguigu@hadoop102 datas]$ touch student.txt
[atguigu@hadoop102 datas]$ vi student.txt
1001 zhangshan
1002 lishi
1003 zhaoliu
Note that in the tab spacing.
- Hive actual operation
(1) Start hive
[atguigu@hadoop102 hive]$ bin/hive
(2) Display Database
hive> show databases;
(3) using the default database
hive> use default;
(4) Display default database table
hive> show tables;
student table (5) delete created
hive> drop table student;
(6) to create student table and declares file separator '\ t'
hive> create table student(id int, name string) ROW FORMAT
DELIMITED FIELDS TERMINATED
BY '\t';
(7) /opt/module/data/student.txt file loaded into student database table.
hive> load data local inpath '/opt/module/data/student.txt' into
table student;
(8) Hive query results
hive> select * from student;
OK
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.266 seconds, Fetched: 3 row(s)
- Problems encountered
then open a client window to start hive, will produce java.sql.SQLException exception.
Exception in thread "main" java.lang.RuntimeException:
java.lang.RuntimeException:
Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClien t
at
org.apache.hadoop.hive.ql.session.SessionState.start(Session
State.java:522)
at
org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at
org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621
)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
ssorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
thodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClien
t
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(
MetaStoreUtils.java:1523)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<in
it>(RetryingMetaStoreClient.java:86)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.get
Proxy(RetryingMetaStoreClient.java:132)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.get
Proxy(RetryingMetaStoreClient.java:104)
at
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClien
t(Hive.java:3005)
at
org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:302
4)
at
org.apache.hadoop.hive.ql.session.SessionState.start(Session
State.java:503)
... 8 more
The reason is, Metastore default stored in the database comes derby, it is recommended to use MySQL storage Metastore;
2.4 MySql installation
2.4.1 installation package ready
- View mysql is installed, if installed, uninstall mysql
(1) View
[root@hadoop102 桌面]# rpm -qa|grep mysql
mysql-libs-5.1.73-7.el6.x86_64
(2) Uninstall
[root@hadoop102 桌 面 ]# rpm -e --nodeps
mysql-libs-5.1.73-7.el6.x86_64 2.解压 mysql-libs.zip 文件到当前目录
[root@hadoop102 software]# unzip mysql-libs.zip
[root@hadoop102 software]# ls
mysql-libs.zip
mysql-libs
3. Into the mysql-libs folder
[root@hadoop102 mysql-libs]# ll
总用量 76048
-rw-r--r--. 1 root root 18509960 3 月 26 2015
MySQL-client-5.6.24-1.el6.x86_64.rpm
-rw-r--r--. 1 root root 3575135 12 月 1 2013
mysql-connector-java-5.1.27.tar.gz
-rw-r--r--. 1 root root 55782196 3 月 26 2015
MySQL-server-5.6.24-1.el6.x86_64.rpm
MySql Server 2.4.2 installation
- Install mysql server
[root@hadoop102 mysql-libs]# rpm -ivh
MySQL-server-5.6.24-1.el6.x86_64.rpm
- View random password generated
[root@hadoop102 mysql-libs]# cat /root/.mysql_secret
OEXaQuS8IWkG19Xs
- View mysql status
[root@hadoop102 mysql-libs]# service mysql status
- Start mysql
[root@hadoop102 mysql-libs]# service mysql start
2.4.3 install MySql Client
- Install mysql client
[root@hadoop102 mysql-libs]# rpm -ivh
MySQL-client-5.6.24-1.el6.x86_64.rpm
- Link mysql
[root@hadoop102 mysql-libs]# mysql -uroot -pOEXaQuS8IWkG19Xs
- change Password
mysql>SET PASSWORD=PASSWORD('000000');
- Exit mysql
mysql>exit
2.4.4 MySql in the user table Host Configuration
Configuration as long as the root user + password, can log on any host MySQL database.
- Enter mysql
[root@hadoop102 mysql-libs]# mysql -uroot -p000000
- Display Database
mysql>show databases;
- Use mysql database
mysql>use mysql;
- Show all the tables in the mysql database
mysql>show tables;
- Show user table structure
mysql>desc user;
- Query user table
mysql>select User, Host, Password from user;
- Modify the user table, the table is amended as follows Host%
mysql>update user set host='%' where host='localhost';
- Other host delete the root user
delete from user where Host='hadoop102';
delete from user where Host='127.0.0.1';
delete from user where Host='::1';
- Refresh
mysql>flush privileges;
- drop out
mysql>quit;
2.5 Hive metadata arranged to MySql
2.5.1 drive copy
- Extracting mysql-connector-java-5.1.27.tar.gz driver package in / opt / software / mysql-libs directory
[root@hadoop102 mysql-libs]# tar -zxvf
mysql-connector-java-5.1.27.tar.gz
- Copy mysql-connector-java-5.1.27-bin.jar to / opt / module / hive / lib /
[root@hadoop102 mysql-connector-java-5.1.27]# cp
/opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-c
onnector-java-5.1.27-bin.jar/opt/module/hive/lib/
2.5.2 Configuration Metastore to MySql
- Create a hive-site.xml in / opt / module / hive / conf directory
[atguigu@hadoop102 conf]$ touch hive-site.xml
[atguigu@hadoop102 conf]$ vi hive-site.xml
- According to the official document configuration parameters, copy the data to the hive-site.xml file
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseI
fNotExist=true</value>
<description>JDBC connect string for a JDBC
metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC
metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore
database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>000000</value>
<description>password to use against metastore
database</description>
</property>
</configuration>
- Once configured, if you start hive exceptions, you can restart the virtual machine. (After restarting, do not forget to start hadoop cluster)
2.5.3 Multi-window start testing Hive
- First start MySQL
[atguigu@hadoop102 mysql-libs]$ mysql -uroot -p000000
Access to several databases
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+ 2.再次打开多个窗口,分别启动 hive
[atguigu@hadoop102 hive]$ bin/hive
- After starting the hive, back window to view the MySQL database, the database shows an increase of metastore
mysql> show databases;
+--------------------+
| Database | +--------------------+
| information_schema |
| metastore |
| mysql |
| performance_schema |
| test |
+--------------------+
2.6 HiveJDBC access
2.6.1 Start hiveserver2 Service
[atguigu@hadoop102 hive]$ bin/hiveserver2
2.6.2 Start beeline
[atguigu@hadoop102 hive]$ bin/beeline
Beeline version 1.2.1 by Apache Hive
beeline>
2.6.3 connection hiveserver2
beeline> !connect jdbc:hive2://hadoop102:10000(回车)
Connecting to jdbc:hive2://hadoop102:10000
Enter username for jdbc:hive2://hadoop102:10000: atguigu(回车)
Enter password for jdbc:hive2://hadoop102:10000: (直接回车)
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoop102:10000> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
| hive_db2 |
+----------------+--+
2.7 Hive commonly used interactive command
[atguigu@hadoop102 hive]$ bin/hive -help
usage: hive
-d,--define <key=value> Variable subsitution to apply
to hive
commands. e.g. -d A=B or --define
A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable subsitution to apply
to hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-S,--silent Silent mode in interactive
shell
-v,--verbose Verbose mode (echo executed SQL
to the console)
- "-E" do not enter the hive interactive window execute sql statement
[atguigu@hadoop102 hive]$ bin/hive -e "select id from student;"
- "-F" to execute the script in sql statement
(1) create hivef.sql files in / opt / module / datas directory
[atguigu@hadoop102 datas]$ touch hivef.sql
File written correctly sql statement
select * from student;
(2) execute the file sql statement
[atguigu@hadoop102 hive]$ bin/hive -f
/opt/module/datas/hivef.sql
sql statement and the result is written to the file (3) in the executable file
[atguigu@hadoop102 hive]$ bin/hive -f
/opt/module/datas/hivef.sql >
/opt/module/datas/hive_result.txt
2.8 Hive other command operations
- How to view hdfs file system in hive cli command window
hive> dfs -ls /; - How to view the hive cli command window to a local file system
hive> ls / opt / module / datas!; - View all entered in the hive command history
(1) into the root directory of the current user / root or / Home / atguigu
(2) View. Hivehistory file
[atguigu @ hadoop102 ~] $ cat .hivehistory
2.9 Hive common attribute configuration
2.9.1 Hive data warehouse location configuration
1) The most original position Default data warehouse is in hdfs: / the user / hive / warehouse path.
2) In the repository directory, not create a folder on the default database default. If a default database tables belong to, create a folder directly in the data warehouse catalog.
3) modify the default data warehouse original position (below the hive-default.xml.template copy the configuration file to the hive-site.xml).
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
Configuring the same group of users have execute permissions
bin/hdfs dfs -chmod g+w /user/hive/warehouse
2.9.2 After the query information display configuration
1) add the following information in hive-site.xml configuration file, the header information can be achieved display the current database, and the look-up table configuration.
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
2) before and after the restart hive, comparative configuration differences.
(1) Before the configuration, as shown in Figure 6-2
(2) configuration shown in Figure 6-3
2.9.3 Hive running log configuration
- Hive of log stored in the default directory /tmp/atguigu/hive.log (current user name)
- Modify hive of log storage logs to / opt / Module / hive / logs
(1) modify the file name for the hive-log4j.properties /opt/module/hive/conf/hive-log4j.properties.template
[atguigu@hadoop102 conf]$ pwd
/opt/module/hive/conf
[atguigu@hadoop102 conf]$ mv hive-log4j.properties.template
hive-log4j.properties
(2) modify the log in the storage position hive.log.dir hive-log4j.properties file = / opt / module / hive / logs
2.9.4 configuration parameters
- See all current configuration information
hive>set;
- Configuration parameters of three ways
(1) profile mode
default profile: hive-default.xml
custom profile: hive-site.xml
Note: custom configuration will overwrite the default configuration. In addition, Hadoop Hive will be read into the configuration, because as a client Hadoop Hive is initiated, the configuration will overwrite the Hive Hadoop configuration. Set profiles are valid for all native Hive process started.
(2) the way command line parameters
when starting Hive, can -hiveconf param = value row is added in order to set the parameters.
E.g:
[atguigu@hadoop103 hive]$ bin/hive -hiveconf
mapred.reduce.tasks=10;
Note: The only valid hive start viewing parameters:
hive (default)> set mapred.reduce.tasks;
(3) parameter declaration embodiment
can set the parameters of the SET keyword in HQL
example:
hive (default)> set mapred.reduce.tasks=100;
Note: The only valid hive start.
Check parameter settings
hive (default)> set mapred.reduce.tasks;
Priority setting of the three ways in ascending order. That configuration file <command line parameter <parameter declaration. Note that some system-level parameters, such as log4j related settings must be set with the first two methods, because those parameters read in the previous session setup has been completed.