Environment: Vmware 11, Ubuntu 15.10, Hadoop 2.7.1
Hive is built on the basis of static batch Hadoop, Hadoop usually have higher latency and requires a lot of overhead at the time of job submission and scheduling. Therefore, Hive is not suitable for low-latency fast queries on large data sets. For example, online transaction processing (OLTP). Hive query operation process strictly abide by the Hadoop MapReduce job execution model, Hive converts the user's HiveQL statement submitted to the Hadoop clusters MapReduce jobs through an interpreter, Hadoop monitoring process to execute the job, and then return to the job execution result to the user.
Hive is best use of batch jobs where large data sets, for example, Web log analysis.
Hive is used by default embedded Derby database, but the database can only access a file, so it is necessary to use other relational databases alternative Derby , here demonstrates Mysql and PostgreSQL installation.
1.1 Installation Mysql
- Download:
sudo apt-get install mysql-server mysql-client libmysqlclient-dev
during the installation process will be prompted to enter a password, remember to enter.
- Check the installation was successful:
sudo netstat -tap | grep mysql
successful if installed.
Or enter sudo service mysql status
Views:
Use
mysql -u root -p
login Mysql databaseGRANT ALL PRIVILEGES ON *.* TO ‘root’@’%’ WITH GRANT OPTION;
或
GRANT ALL PRIVILEGES ON *.* TO ‘ROOT’@’%’ IDENTIFIED BY ‘ROOTPASSWD’ WITH GRANT OPTION;
此处 % 表示任意hostCreating == hive == == hive == database and user
mysql use
INSERT INTO User (Host, User, password) values ( '%', 'Hive', password ( 'Hive')); i.e., the user can use the remote access time // host is mysql%
Modify mysqld.cnf configuration file
sudo vim /etc/mysql/mysql.conf.d/mysqld.cnf
, thebind-address = 127.0.0.1
comment out or modifiedbind-address=0.0.0.0
. Need to restart mysql : ,sudo service mysql restart
then you can remotely access mysql the
Can be found, MySQL process Host has changedCreate a system user to log database with shell
sudo adduser hive
- Completely remove MysqL
APT-GET autoremove -purge sudo MySQL-Server
sudo MySQL-APT-GET the Remove the Common // this is very important
dpkg -l | grep ^ rc | awk '{print $ 2}' | sudo xargs dpkg -P // clean up residual data
PostgreSQL 1.2 installation
- download
sudo apt-get install postgresl-9.4
- Use the default user login
sudo -u postgres psql postgres
, create Adminpack:CREATE EXTENSION adminpack;
- Use
password postgres
, set the default user == postgres == password, useq
exit pgsql shell:
- Create a database user == hive == (password: hive) and == hive == database
-u -d -P Postgres Createuser the sudo -A hive
the sudo -u Postgres hive that createdb -O // first hive to hive username
Create a system user to log database with shell
sudo adduser hive
Configure pg_hba.conf and postgresql.conf
south you /etc/postgresql/9.4/main/pg_hba.conf
south you /etc/postgresql/9.4/main/postgresql.conf
192.168.8.0
Is the local IP, other items should also be amended to md5 verification
PostgreSQL native access only by default, so you want to configure listen_address to*
Restart the service, fully equipped to take effect
sudo systemctl restart postgresql
或者
sudo service postgresql restart
Reference:
http://www.unixmen.com/install-postgresql-9-4-and-phppgadmin-on-ubuntu-15-10/
https://help.ubuntu.com/lts/serverguide/mysql.html
2 Installation Hive
Environment variable settings:
2.1 download Mysql / PostgreSQL's JDBC package
http://dev.mysql.com/downloads/connector/j/
https://jdbc.postgresql.org/download.html
the driver package into the Hive 's == lib == directory, demonstrated here is Mysql
2.2 Hive Configuration
Into the Hive 's == conf ==, copy the following files:
cp hive-default.xml.template hive-site.xml
cp hive-env.sh.template hive-env.sh
Configuration hive-env.sh
|
|
Configuration hive-site.xml
Hive loaded two files Hive-default.xml , is a hive-site.xml file, if two files of different configuration parameters, Hive will be configured by the user hive-site.xml prevail . So we can hive-site.xml to retain only the following parameters
|
|
2.3 创建必要的目录
hdfs dfs -mkdir /user/
hdfs dfs -mkdir /user/hive/
hdfs dfs -mkdir /user/hive/warehouse
hdfs dfs -mkdir /tmp/
hdfs dfs -mkdir /tmp/hive
hdfs dfs -chmod 777 /user/hive/warehouse
hdfs dfs -chmod 777 /tmp/hive
记得修改文件夹权限,否则在JDBC访问时,可能会出现如下错误:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=anonymous, access=WRITE, inode=”/user/hive/warehouse/loginfo”:phoenix:supergroup:drwxr-xr-x
==loginfo==是使用JDBC创建的Hive表
2.3 启动
hive start
会抛出下面错误:
Exception in thread “main” java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType …) to create the schema. If needed, don’t forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)
由于Hive默认的是 Derby 数据库,所以先要初始化Hive:
|
|
Mysql的Hive库里生成一些元数据表
JDBC接口
配置Hadoop的core-site.xml,以便允许匿名访问Hadoop
|
|
否则会遇到如下错误:
org.apache.hadoop.ipc.RemoteException: User: phoenix is not allowed to impersonate anonymous
设置完成后需要重启hadoop,然后运行==hive/bin==下的hiveserver2
在Eclipse项目中的包如下图产,这些jar包可以在==hive/lib==目录下找到,==slf4j==这两个包可以在hadoop中找到:
测试类:
|
|
Reference:
https://www.shiyanlou.com/courses/document/766
http://blog.csdn.net/nengyu/article/details/51620760
http://www.cnblogs.com/linjiqin/archive/2013/ 03/04 / 2943025.html
Original: Big Box Hive 2.0.1 installation deployment