Hive之——索引操作

转载请注明出处：https://blog.csdn.net/l1028386804/article/details/80184742

索引是hive0.7之后才有的功能，创建索引需要评估其合理性，因为创建索引也是要磁盘空间，维护起来也是需要代价的

创建索引

hive> create index [index_studentid] on table student(studentid)
> as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
> with deferred rebuild
> IN TABLE index_table_student;
OK
Time taken: 15.219 seconds
hive>

org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler ：创建索引需要的实现类
index_studentid:索引名称
student:表名

index_table_student:创建索引后的表名

查看索引表

（index_table_student）没有数据。

hive> select*from index_table_student;
OK
Time taken: 0.295 seconds

加载索引数据

hive> alter index index_studentid on student rebuild;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = root_20161226235345_5b3fcc2b-7f90-4b10-861f-31cbaed8eb73
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1482824475750_0001, Tracking URL = http://liuyazhuang121:8088/proxy/application_1482824475750_0001/
Kill Command = /usr/local/development/hadoop-2.6.4/bin/hadoop job -kill job_1482824475750_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-05-02 23:55:40,317 Stage-1 map = 0%, reduce = 0%
2018-05-02 23:56:40,757 Stage-1 map = 0%, reduce = 0%
2018-05-02 23:56:48,768 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.08 sec
2018-05-02 23:57:34,981 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 3.66 sec
2018-05-02 23:57:40,716 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.68 sec
MapReduce Total cumulative CPU time: 4 seconds 680 msec
Ended Job = job_1482824475750_0001
Loading data to table default.index_table_student
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.68 sec HDFS Read: 10282 HDFS Write: 537 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 680 msec
OK
Time taken: 280.693 seconds

查询索引表中数据

hive> select*from index_table_student;
OK
1 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [0]
2 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [28]
3 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [56]
4 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [85]
5 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [113]
6 hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt [143]
Time taken: 2.055 seconds, Fetched: 6 row(s)
hive>

查看hdfs://liuyazhuang121:8020/opt/hive/warehouse/student/sutdent.txt

[root@liuyazhuang121 ~]# hdfs dfs -text /opt/hive/warehouse/student/sutdent.txt;
001 0 BeiJing [email protected]
002 1 [email protected]
003 0 ShegZhen [email protected]
004 1 NanJing [email protected]
005 0 GuangDong [email protected]
006 1 HaiNan [email protected]
[root@liuyazhuang121 ~]#

删除索引

DROP INDEX index_studentid on student;

查看索引

hive> SHOW INDEX on student;
OK
index_studentid         student               studentid               index_table_student    compact                 
Time taken: 0.487 seconds, Fetched: 1 row(s)
hive>