转载请注明出处:https://blog.csdn.net/l1028386804/article/details/80184742
索引是hive0.7之后才有的功能,创建索引需要评估其合理性,因为创建索引也是要磁盘空间,维护起来也是需要代价的
创建索引
hive> create index [index_studentid] on table student(studentid) > as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' > with deferred rebuild > IN TABLE index_table_student; OK Time taken: 15.219 seconds hive>org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler :创建索引需要的实现类
index_studentid:索引名称
student:表名
index_table_student:创建索引后的表名
查看索引表
(index_table_student)没有数据。
hive> select*from index_table_student; OK Time taken: 0.295 seconds
加载索引数据
hive> alter index index_studentid on student rebuild; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases. Query ID = root_20161226235345_5b3fcc2b-7f90-4b10-861f-31cbaed8eb73 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1482824475750_0001, Tracking URL = http://liuyazhuang121:8088/proxy/application_1482824475750_0001/ Kill Command = /usr/local/development/hadoop-2.6.4/bin/hadoop job -kill job_1482824475750_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-05-02 23:55:40,317 Stage-1 map = 0%, reduce = 0% 2018-05-02 23:56:40,757 Stage-1 map = 0%, reduce = 0% 2018-05-02 23:56:48,768 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.08 sec 2018-05-02 23:57:34,981 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 3.66 sec 2018-05-02 23:57:40,716 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.68 sec MapReduce Total cumulative CPU time: 4 seconds 680 msec Ended Job = job_1482824475750_0001 Loading data to table default.index_table_student MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.68 sec HDFS Read: 10282 HDFS Write: 537 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 680 msec OK Time taken: 280.693 seconds
查询索引表中数据
hive> select*from index_table_student; OK 1 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [0] 2 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [28] 3 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [56] 4 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [85] 5 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [113] 6 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [143] Time taken: 2.055 seconds, Fetched: 6 row(s) hive>查看hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt
[root@liuyazhuang121 ~]# hdfs dfs -text /opt/hive/warehouse/student/sutdent.txt; 001 0 BeiJing [email protected] 002 1 [email protected] 003 0 ShegZhen [email protected] 004 1 NanJing [email protected] 005 0 GuangDong [email protected] 006 1 HaiNan [email protected] [root@liuyazhuang121 ~]#
删除索引
DROP INDEX index_studentid on student;
查看索引
hive> SHOW INDEX on student; OK index_studentid student studentid index_table_student compact Time taken: 0.487 seconds, Fetched: 1 row(s) hive>