hive is a data warehouse tool based on Hadoop

Hive is a data warehouse tool based on Hadoop, which can map structured data files into a database table, and provides a complete SQL query function, which can convert SQL statements into MapReduce tasks for operation. Its advantages are that the learning cost is low, and simple MapReduce statistics can be quickly implemented through SQL-like statements, without the need to develop special MapReduce applications, which is very suitable for statistical analysis of data warehouses.

Hive was first open sourced by facebook. It was originally used to solve the problem of massive structured log data statistics. It is an ETL tool.
Some of the application scenarios of Hive are as follows:
log analysis and statistics   website
  pv and uv
  multi-dimensional data analysis in a period of time
The company uses Hive for log analysis, including Baidu, Taobao,
etc.
  Offline analysis of massive structured data in other scenarios,
  low-cost data analysis (without writing MR directly)


In this article, Sanxian will introduce the installation and deployment of Hive, Hive is not a distribution Therefore, its installation is relatively easy. Before Hive is installed, make sure that your hadoop environment has been successfully built and can be started normally. The configuration version of Sanxian is the version of hadoop1.2.0, hive0 .10 version.

 

 

Let's first summarize the installation steps of Hive:

serial number describe 1 Install and configure MySQL database, (default is derby) 2 Put a mysql JDBC connection jar package into hvie/lib 3 Add the directory path of hadoop to hive-env.sh (copy the template and rename it) 4 Create Hive's home directory hive and temporary file storage directory tmp on HDFS 5 Configure hive-site.xml, configure some related information about hive (copy the template that comes with it, rename it) 6 Start Hive

 


What Sanxian wants to say is that, regarding the configuration of 3, 4, and 5, the installation of mysql is actually very simple, and will not be introduced here. First, we have to copy hive-env.sh. The screenshot of the directory path to which hadoop is added is as follows: Second, we need to create the directory where the corresponding hive table is stored on HDFS. The screenshot is as follows: Next, we need the most important thing, hive-site.xml, Rename hive-default.xml.template to hive-site.xml and modify a few of them:






Xml code copy code  Favorite code
  1. <configuration>  
  2. <property>    
  3. <!-- MySQ URL configuration -->  
  4.   <name>javax.jdo.option.ConnectionURL</name>     
  5.   <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>    
  6. </property>   
  7. <!-- Database username configuration-->   
  8. <property>    
  9.   <name>javax.jdo.option.ConnectionUserName</name>    
  10.   <value>root</value>    
  11. </property>   
  12. <!-- 此处JDBC的驱动务必加上,对应的数据配置对应的驱动-->  
  13. <property>    
  14.   <name>javax.jdo.option.ConnectionDriverName</name>    
  15.   <value>com.mysql.jdbc.Driver</value>    
  16.   <description>Driver class name for a JDBC metastore</description>    
  17. </property>    
  18. <!-- 数据库密码配置-->  
  19. <property>    
  20.   <name>javax.jdo.option.ConnectionPassword</name>    
  21.   <value>root</value>    
  22. </property>  
  23. <!-- HDFS路径hive表的存放位置-->  
  24. <property>  
  25.   <name>hive.metastore.warehouse.dir</name>  
  26.   <value>/root/hive</value>  
  27. </property>  
  28. <!--HDFS路径,用于存储不同 map/reduce 阶段的执行计划和这些阶段的中间输出结果。 -->  
  29. <property>  
  30.   <name>hive.exec.scratchdir</name>  
  31.   <value>/root/tmp</value>  
  32. </property>  
  33. <property>  
  34.   <name>mapred.child.java.opts</name>  
  35.   <value>-Xmx4096m</value>  
  36. </property>  
  37. <!-- 日志的记录位置-->  
  38. <property>  
  39. <name>hive.querylog.location</name>  
  40. <value>/root/hive-0.10.0/logs</value>  
  41. </property>  
  42. </configuration>   
<configuration>
<property>  
<!-- MySQ的URL配置 -->
  <name>javax.jdo.option.ConnectionURL</name>   
  <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>  
</property> 
<!-- 数据库的用户名配置--> 
<property>  
  <name>javax.jdo.option.ConnectionUserName</name>  
  <value>root</value>  
</property> 
<!-- 此处JDBC的驱动务必加上,对应的数据配置对应的驱动-->
<property>  
  <name>javax.jdo.option.ConnectionDriverName</name>  
  <value>com.mysql.jdbc.Driver</value>  
  <description>Driver class name for a JDBC metastore</description>  
</property>  
<!-- 数据库密码配置-->
<property>  
  <name>javax.jdo.option.ConnectionPassword</name>  
  <value>root</value>  
</property>
<!-- HDFS路径hive表的存放位置-->
<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/root/hive</value>
</property>
<!--HDFS path, used to store execution plans of different map/reduce stages and intermediate output results of these stages. -->
<property>
  <name>hive.exec.scratchdir</name>
  <value>/root/tmp</value>
</property>
<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx4096m</value>
</property>
<!-- Log location -->
<property>
<name>hive.querylog.location</name>
<value>/root/hive-0.10.0/logs</value>
</property>
</configuration>


At this point, the configuration part has been completed. Now we can start hive and try to create a table. The screenshot is as follows:




Finally, the exit command can be used to exit.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326691394&siteId=291194637