Hive tutorial (a) - Installation and configuration parsing

Installation on the installation, do not pull the other

 

hive dependence

Before the installation must have the following conditions hive

1. A relational database can be connected, such as Mysql, postgresql the like, for storing metadata

2. hadoop, and start hdfs

3. HBase, not necessarily, but if not installed, there will be a warning, but does not affect the use of

4. java, 1.8 or later

 

Ready to work

1. Download the installation package 

https://mirrors.tuna.tsinghua.edu.cn/apache/hive/ Tsinghua Mirror, download speed

http://apache.org/dist/hive/ official website, download speed is slow

Select tar packets containing bin, the paper mounted hive-2.3.6

2. Upload server

The best uploaded to the master hadoop, and I do so; no need to upload all nodes

3. Extract, perhaps you can rename it, easy to operate

 

Environment Variables

export HIVE_HOME=/opt/SoftWare/Hive/hive‐2.3.2
export PATH=$PATH:$HIVE_HOME/bin

You can modify according to their own path

 

At this hive has been successfully installed, perform the hive --version to see the version

 

Configuration

1. First modify hive-env.sh file itself does not exist

cp hive-env.sh.template hive-env.sh

 

Add the following

export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64
export HADOOP_HOME=/usr/lib/hadoop-2.6.5
export HIVE_HOME=/usr/lib/hive2.3.6
export SPARK_HOME=/usr/lib/spark

 

2. Modify the hive-site.xml file itself does not exist

cp hive-default.xml.template hive-site.xml

Note that this should hive- default.xml.template duplicate copy, is a hive-default.xml, the other is a hive-site.xml, wherein hive-site.xml user-defined configuration, hive-default.xml global configuration;

When starting hive, hive-site.xml custom configurations will cover the same configuration item hive-default.xml global configuration.

Very not recommend a direct copy directly to the hive-default.xml be modified after hive-site, because in that case, we do not remember those configuration items have been modified, due to hive-site is to override the default configuration, we only local needs will need to modify the configuration to the hive-site.xml file can be. This process is omitted herein []

 

[Modify the following is a basic configuration that was first run up the hive, if used in special scenarios, you may need additional configuration]

The main configuration is the connection information database,  Hive Derby database using a database as a default element , here a modified postgres

<! - Database Configuration - > 
< Property > 
    < name > javax.jdo.option.ConnectionURL </ name > <! - - Database connection address > 
    < value > JDBC: MySQL: //192.168.100.103: 3306 ? / ccx_hive createDatabaseIfNotExist = to true </ value > <! - - using MySQL to store metadata information > 
    < value > jdbc: PostgreSQL: //172.16.89.80:? 5432 / db = ssl to true; databaseName = metastore_db; the Create = to true </ value > <! - using the stored metadata information postgres - > 
    < value >jdbc: PostgreSQL: //172.16.89.80: 5432 / Ball </ value > <! --ball database is best established in advance, or it may be other configurations, such as the create = true, you can see for yourself - > 
    < the Description > 
      JDBC A JDBC Connect String for Metastore. 
      the to use SSL to the encrypt / the authenticate at The Connection, the Provide Database-specific SSL Flag at The Connection in the URL of. 
      the for Example, jdbc: PostgreSQL: // myhost / db = ssl to true for Postgres Database?. 
    </ Description > 
</ Property > 

< Property > <! - database-driven - > < name > javax.jdo.option.ConnectionDriverName </name
    >
    <value>com.mysql.jdbc.Driver</value><!‐‐mysql‐‐>
    <value>org.postgresql.Driver</value><!‐‐postgres‐‐>
    <description>Driver class name for a JDBC metastore</description>
</property>
        
<property><!‐‐数据库用户名‐‐>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>u_ccx_hive</value>
</property>

<property><!‐‐数据库密码‐‐>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
</property>


<!‐‐hive 执行引擎‐‐>
<property>
    <name>hive.execution.engine</name>
    <value>mr</value><!‐‐mapreduce 作为引擎‐‐>    
    <value>spark</value><!‐‐spark 作为引擎‐‐>
    <description>
      Expects one of [mr, tez, spark].
      Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
      remains the default engine for historical reasons, it is itself a historical engine
      and is deprecated in Hive 2 line. It may be removed without further warning.
    </description>
</property>


<property>
    <name>hive.metastore.schema.verification</name>
    <value>False</value>
    <description>
      Enforce metastore schema version consistency.
      True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
            schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
            proper metastore schema migration. (Default)
      False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
    </description>
</property>

Each value value can not have spaces

 

Upload database-driven

Upload driver postgresql-9.2-1003.jdbc4.jar to the lib directory of the hive

 

Storing metadata in the database

Database should be established in advance, this step is to initialize the database

bin/schematool -dbType mysql -initSchema
bin/schematool -dbType postgres -initSchema

Then you can view the database, create a bunch of table

 

Start hive Services

Started in this way can directly access the hive with a client, try this tool DBVisualizer

/bin/hive --service hiveserver2

Security presence HiveServer2 good solution HiveServer, concurrency issues, HiveServer now without a

The service start the program $ {HIVE_HOME} / bin / hiveserver2 which may be started so

--service hiveserver2 --hiveconf hive.server2.thrift.port = Hive 10001 # specified port

You can also specify the port number in the configuration file

<property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
</property>

 

Start hive of shell client

shell client easy to operate

[root @ Master bin] # Hive 
# input show tables; displays the following message stating Hive has started 
Hive> show tables; 
the OK 
Time taken: 1.594 seconds The

 

Hive detection

Hive installed, but hadoop Hive, and what does it matter? Practical operation explanation

Create a database in the hive

Hive> the Create Database hive1;     # create the database 
the OK 
Time taken: 0.478 seconds The 
Hive > Show Databases;     # display database 
the OK 
default 
hive1     # successfully created 
Time taken: 0.132 seconds, Fetched: 2 row (s)

The question is, is successfully created, but the library where it? What is the relationship with hadoop? What is the relationship with metadata?

1. First, we look at the metadata, there is a table called DBS stored in the database metadata, it is clear that the name of the database storage

We see the new database hive1

 

2. Then we see hdfs

We also see the new database hive1

 

This path can be found in the corresponding configuration of the hive-site

 

Create a data table in the hive database

Hive> use hive1;     # to switch to hive1 database environment 
the OK 
Time taken: 0.042 seconds The 
Hive > the Create the Table hive_01 (the above mentioned id int, name String);   # Create a data table 
the OK 
Time taken: 0.984 seconds The 
Hive > Show the Tables;     # lookup table 
the OK 
hive_01    # Creating success 
Time taken: 0.067 seconds, Fetched: 1 row (s)

1. Similarly, we look at the metadata, there is a TBLS table, it is clear that the storage table name

 

2. Then we see hdfs

We have found a correspondence table

 

Conclusion: The database created in the hive and tables are stored in hdfs, metadata stored in the metadata database;

Lane hive.user.install.directory hive-site.xml parameter defines the path HDFS default / user

 

Exception record

1.  When given as a command executed schematool

Supplied dbType NO , because the database did a good job, and check the initialization schematool command, my answer is above operation

 

2. execute the command being given as follows schematool

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: org.postgresql.util.PSQLException : The server does not support SSL.
SQL Error code: 0
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

SSL problem, javax.jdo.option.ConnectionURL parameters hive-site modification is jdbc: postgresql: // myhost / db ssl = true; true to false or deleted?

 

3. Start hive shell being given as follows

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/root/a71af7ab-060a-465e-91ba-124ba4b07e36. Nam
e node is in safe mode.The reported blocks 200 has reached the threshold 0.9990 of total blocks 200. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be tur
ned off automatically in 15 seconds.

The reason: namenode in a safe state, you can turn off safe mode

# Turn off safe mode 
 hadoop dfsadmin -safemode leave

 

4. Start hive shell being given below

Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D

Solutions are as follows:

1. Check the hive-site.xml configuration, you see the configuration values contain : configuration item "system java.io.tmpdir" of

2 . New Folder /home/grid/hive-0.14.0-bin/iotmp, pay attention to rights issues

3. comprising : the value of the configuration item "system java.io.tmpdir" modified to address the above

 

5. Start hive shell warning follows

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

Note that just a warning, because hive2.x version is no longer supported mr, the solution is to replace the spark

 

 

 

References:

https://blog.csdn.net/u013384984/article/details/80435531

Configuring a little more https://www.cnblogs.com/jiangnange/p/9460034.html

https://blog.csdn.net/cjfeii/article/details/49423459

https://www.cnblogs.com/dxxblog/p/8193967.html#top

http://www.tianshouzhi.com/api/tutorials/hive/151

https://blog.csdn.net/kongxx/article/details/79418977  hive postgres

https://www.cnblogs.com/slymonkey/p/9967619.html stepped pit Record

https://blog.csdn.net/pengjunlee/article/details/81737814 stepped pit Record

https://blog.csdn.net/lby0307/article/details/80309225 hive local mode schematool Unable to initialize the mysql database

Guess you like

Origin www.cnblogs.com/yanshw/p/11766343.html