hive初步了解



1、下载hive安装包
2、上传至linux服务器
3、拷贝安装包至hadoop用户主目录下
4、修改安装包所属组、所属用户为hadoop   这里用什么命令呢?(chown hadoop apache-hive-0.13.1-bin、chgrp hadoop apache-hive-0.13.1-bin)
5、添加hive-site.xml文件,这里一般拷贝hive-default.xml.template即可 cp hive-default.xml.template hive-site.xml
6、添加hive环境变量,配置HIVE_HOME 至.bashrc 中 source .bashrc


/home/hadoop/hive_testdata

CREATE TABLE test_1(id INT, name STRING, city STRING)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

load data local inpath '/home/hadoop/hive_testdata' overwrite into table  test_1
执行mapreduce
hive支持像mysql一样的sql,但是hive一般只有查询和插入(load),没有更新,在执行select *的时候直接把hdfs中的数据输出,不会执行mapreduce,当执行其他的时候就会先去hadoop集群上面执行mapreduce,然后将结果展现出来

hadoop分为两部分:存储(hdfs),计算(mapreduce)
首先,hive的源数据一定是存储在hdfs上面的。
例如:

[hadoop@localhost ~]$ hadoop fs -ls /user/hive
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2015-01-08 21:04 /user/hive/warehouse
[hadoop@localhost ~]$ hadoop fs -ls /user/hive/warehouse
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2015-01-08 21:04 /user/hive/warehouse/test_1
[hadoop@localhost ~]$ hadoop fs -ls /user/hive/warehouse/test_1/
Found 1 items
-rw-r--r--   1 hadoop supergroup         21 2015-01-08 21:04 /user/hive/warehouse/test_1/hive_testdata
[hadoop@localhost ~]$ hadoop fs -ls /user/hive/warehouse/test_1/hive_testdata/
Found 1 items
-rw-r--r--   1 hadoop supergroup         21 2015-01-08 21:04 /user/hive/warehouse/test_1/hive_testdata
[hadoop@localhost ~]$ hadoop fs -cat /user/hive/warehouse/test_1/hive_testdata/
1       a       a1
2       b       b1
3       c       c1

那么在查询的时候,如果是select *,hive做的就比较简单,将hdfs中的数据合并并简单处理输出即可,不会走mapreduce
如果是其他相对复杂的查询,那么hive会将查询语句分解为mapreduce任务,然后将任务发送给集群,由mapreduce的jobtracker节点负责分发任务给tasktracker===
本来mapreduce也是集群方式,他主要分为jobtracker和tasktracker两类,jobtracker负责任务分发,tasktracker负责处理执行所分配到的任务

hive> select * from test_1;
OK
1       a       a1
2       b       b1
3       c       c1
Time taken: 0.781 seconds, Fetched: 3 row(s)
hive> select id from test_1 group by id;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
Starting Job = job_201412251631_0001, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201412251631_0001
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201412251631_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-01-08 21:09:01,496 Stage-1 map = 0%,  reduce = 0%
2015-01-08 21:09:03,531 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.48 sec
2015-01-08 21:09:11,625 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 1.48 sec
2015-01-08 21:09:12,639 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.17 sec
MapReduce Total cumulative CPU time: 4 seconds 170 msec
Ended Job = job_201412251631_0001
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 4.17 sec   HDFS Read: 237 HDFS Write: 6 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 170 msec
OK
1
2
3
Time taken: 21.549 seconds, Fetched: 3 row(s)

猜你喜欢

转载自xfyzhy.iteye.com/blog/2278212