Hive is a data warehouse tool based on Hadoop, which can map structured data files into a database table, and provides simple SQL query functions, which can convert SQL statements into MapReduce tasks for operation. The advantage is that the learning cost is low, and simple MapReduce statistics can be quickly implemented through SQL-like statements, without the need to develop special MapReduce applications, which is very suitable for statistical analysis of data warehouses.
The premise is that hadoop is installed
Download address: http://apache.fayea.com/hive/hive-1.2.0/
: Apache-hive-1.2.0-bin.tar.gz
: : Tar –zxvf apache-hive-1.2.0-bin.tar.gz
Configure environment variables:
export HIVE_HOME=/opt/apache-hive-1.2.0-bin
export HIVE_CONF_DIR=$HIVE_HOME/conf
export HIVE_LIB=$HIVE_HOME/lib
export CLASSPATH=$CLASSPATH:$HIVE_LIB
export PATH=$HIVE_HOME/bin:$PATH
cd /opt/apache-hive-1.2.0-bin/conf Copy hive-env.sh.template to hive-env.sh
cp hive-env.sh.template hive-env.sh
Change the content of hive-env.sh:
vi hive-env.sh
HADOOP_HOME=/opt/soft-228238/hadoop-2.6.0
export HIVE_CONF_DIR=/opt/apache-hive-1.2.0-bin/conf
cp hive-default.xml.template hive-site.xml
vi hive-site.xml
Modify the information in the following configuration:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:oracle:thin:@192.168.XX.XXX:1521:dpap</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>oracle.jdbc.driver.OracleDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>user</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
<description>password to use against metastore database</description>
</property>
Put the database driver package (oracle-jdbc-10.1.0.2.0.jar) into /opt/apache-hive-1.2.0-bin/lib
start hive
cd /opt/apache-hive-1.2.0-bin/bin
Execute: ./hive --service metastore (using the remote database hive server program)
./hive
hive -hiveconf hive.root.logger=DEBUG,console
schematool -dbType oracle -initSchema
common problem:
- Startup error:
Logging initialized using configuration in jar:file:/opt/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI:
${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:519)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:560)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:505)
... 8 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at java.net.URI.checkPath(URI.java:1804)
at java.net.URI.<init>(URI.java:752)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
... 11 more
Solution: cd /opt/soft-228238/hadoop-2.6.0/bin
Execute: /hadoop fs -chmod -R 777 /tmp/hive
cd /opt/apache-hive-1.2.0-bin/conf
vi hive-site.xml (modify the value of the following configuration information)
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/hive</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp/hive</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/tmp/hive/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
- Error 2:
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.TerminalFactory.create(TerminalFactory.java:101)
at jline.TerminalFactory.get(TerminalFactory.java:158)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:229)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
solution:
The reason is that there is an old version of jline in the hadoop directory:
/hadoop-2.6.0/share/hadoop/yarn/lib:
-rw-r--r-- 1 root root 87325 Mar 10 18:10 jline-0.9.94.jar
The workaround is:
Copy the JAR package of the new version of jline under hive to hadoop:
cp /opt/apache-hive-1.2.0-bin/lib/jline-2.12.jar ./
mv jline-0.9.94.jar jline-0.9.94.jar.bak
- Hive connection to oracle failed to create table
Failed to create table with create table command:
Log information View log error information in the /temp/root/hive.log directory:
solution:
Open the hive-metastore-1.2.0.jar in ${HIVE_HOME}/lib with the decompression tool, hive-jdbc-1.2.0-standalone.jar finds a file named package.jdo, open the file and find the following content.
<field name="viewOriginalText" default-fetch-group="false">
<column name="VIEW_ORIGINAL_TEXT" jdbc-type="LONGVARCHAR"/>
</field>
<field name="viewExpandedText" default-fetch-group="false">
<column name="VIEW_EXPANDED_TEXT" jdbc-type="LONGVARCHAR"/>
</field>
It can be found that the types of the columns VIEW_ORIGINAL_TEXT and VIEW_EXPANDED_TEXT are both LONGVARCHAR, corresponding to LONG in Oracle, which contradicts the requirement that only one column of type LONG exists in the Oracle table, so an error occurs.
Change the jdbc-type values of the two columns to CLOB as suggested on the Hive official website. The modified content is as follows.
<field name="viewOriginalText"default-fetch-group="false">
<column name="VIEW_ORIGINAL_TEXT" jdbc-type="CLOB"/>
</field>
<field name="viewExpandedText"default-fetch-group="false">
<column name="VIEW_EXPANDED_TEXT" jdbc-type="CLOB"/>
</field>
After modification, restart hive.