大数据开发之Hive篇13-Hive的索引

备注:
Hive 版本 2.1.1

一.Hive索引简介

Hive索引的目标是提高对表的某些列进行查询查找的速度。如果没有索引,则使用类似于“WHERE tab1.col1 = 10”这样的谓词进行查询’加载整个表或分区并处理所有行。但是如果col1存在索引,那么只需要加载和处理文件的一部分。

索引所能提供的查询速度的提高是以创建索引和存储索引的磁盘空间的额外处理为代价的。

Hive 3.0开始将 移除index的功能,取而代之的是Hive 2.3版本开始的物化视图,自动重写的物化视图替代了index的功能。

索引是传统RDBMS的标准技术,用来加速查询
Hive自0.7版本开始支持索引,但提供的功能很有限,效率也并不高,因此Hive索引很少使用

Hive索引原理:
1) 在指定列上建立索引,生成一张索引表(Hive的一张物理表),记录以下三个字段:索引列的值、该值对应的HDFS文件路径、该值在文件中的偏移量
2 ) 在执行索引字段查询时候,首先额外生成一个MapReduce job,根据对索引列的过滤条件,从索引表中过滤出索引列的值对应的hdfs文件路径及偏移量,输出到hdfs上的一个文件中,然后根据这些文件中的 hdfs路径和偏移量,筛选原始input文件,生成新的split,作为整个job的split,达到不用全表扫描的目的

Hive索引的弊端:
每次查询时候都要先用一个job扫描索引表,如果索引列的值非常稀疏,那么索引表本身也会非常大
索引表不会自动rebuild,如果表有数据新增或删除,那么必须手动rebuild索引表数据

二.Hive 索引案例

2.1 索引语法

语法:

-- 创建索引
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
   [ ROW FORMAT ...] STORED AS ...
   | STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"]

-- 删除索引
DROP INDEX [IF EXISTS] index_name ON table_name;

-- 更改索引属性
ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD;

2.2 索引相关操作

  1. 创建/构建,查看和删除索引
CREATE INDEX table01_index ON TABLE table01 (column2) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';
SHOW INDEX ON table01;
DROP INDEX table01_index ON table01;
  1. 创建和构建,指定列格式化查看和删除索引
CREATE INDEX table02_index ON TABLE table02 (column3) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD;
ALTER INDEX table02_index ON table2 REBUILD;
SHOW FORMATTED INDEX ON table02;
DROP INDEX table02_index ON table02;
  1. 创建位图,查看和删除索引
CREATE INDEX table03_index ON TABLE table03 (column4) AS 'BITMAP' WITH DEFERRED REBUILD;
ALTER INDEX table03_index ON table03 REBUILD;
SHOW FORMATTED INDEX ON table03;
DROP INDEX table03_index ON table03;
  1. 在新表中创建索引
CREATE INDEX table03_index ON TABLE table03 (column4) AS 'BITMAP' WITH DEFERRED REBUILD;
ALTER INDEX table03_index ON table03 REBUILD;
SHOW FORMATTED INDEX ON table03;
DROP INDEX table03_index ON table03;
  1. 创建rc文件格式的索引
CREATE INDEX table05_index ON TABLE table05 (column6) AS 'COMPACT' STORED AS RCFILE;
  1. 创建textfile文件格式的索引
CREATE INDEX table06_index ON TABLE table06 (column7) AS 'COMPACT' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
  1. 创建索引带索引属性
CREATE INDEX table07_index ON TABLE table07 (column8) AS 'COMPACT' IDXPROPERTIES ("prop1"="value1", "prop2"="value2");
  1. 创建索引带表属性
CREATE INDEX table08_index ON TABLE table08 (column9) AS 'COMPACT' TBLPROPERTIES ("prop3"="value3", "prop4"="value4");
  1. 如存在删除索引
DROP INDEX IF EXISTS table09_index ON table09;
  1. 在一个分区重建索引
ALTER INDEX table10_index ON table10 PARTITION (columnX='valueQ', columnY='valueR') REBUILD;

2.3 创建索引案例

代码:

create index ods_fact_sale_idx02 on table ods_fact_sale (id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild; 
alter index ods_fact_sale_idx02 on ods_fact_sale rebuild;
SHOW FORMATTED INDEX ON ODS_FACT_SALE;
DROP INDEX  ods_fact_sale_idx02 on ods_fact_sale;

测试记录:

hive> 
    > DROP INDEX  ods_fact_sale_idx02 on ods_fact_sale;
OK
Time taken: 0.126 seconds
hive> 
    > create index ods_fact_sale_idx02 on table ods_fact_sale (id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild; 
OK
Time taken: 0.255 seconds
hive> alter index ods_fact_sale_idx02 on ods_fact_sale rebuild;
Query ID = root_20201222153341_45c09d53-2fe9-4316-8ef4-6b36f79242d5
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 469
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1606698967173_0354, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0354/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0354
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 469
2020-12-22 15:33:50,110 Stage-1 map = 0%,  reduce = 0%
2020-12-22 15:34:13,830 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 61.66 sec
2020-12-22 15:34:32,360 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 118.75 sec
2020-12-22 15:35:00,105 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 194.41 sec
2020-12-22 15:35:30,915 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 288.08 sec
2020-12-22 15:35:46,308 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 338.34 sec
2020-12-22 15:36:10,946 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 409.04 sec
2020-12-22 15:36:23,270 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 443.45 sec
2020-12-22 15:36:50,995 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 529.32 sec
2020-12-22 15:37:04,332 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 571.73 sec
2020-12-22 15:37:31,002 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 648.36 sec
2020-12-22 15:37:58,701 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 735.16 sec
2020-12-22 15:38:09,966 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 769.21 sec
2020-12-22 15:38:37,652 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 850.71 sec
2020-12-22 15:38:49,957 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 892.39 sec
2020-12-22 15:39:16,572 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 968.21 sec
2020-12-22 15:39:45,276 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 1059.42 sec
2020-12-22 15:39:55,520 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 1084.93 sec
2020-12-22 15:40:25,223 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 1173.97 sec
2020-12-22 15:40:38,550 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 1217.52 sec
2020-12-22 15:41:12,398 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 1311.11 sec
2020-12-22 15:41:22,641 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 1344.35 sec
2020-12-22 15:41:49,273 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 1421.17 sec
2020-12-22 15:42:15,892 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 1506.27 sec
2020-12-22 15:42:28,190 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 1540.37 sec
2020-12-22 15:42:54,829 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 1625.41 sec
2020-12-22 15:43:07,140 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 1667.05 sec
2020-12-22 15:43:33,773 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 1743.24 sec
2020-12-22 15:44:04,508 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 1836.61 sec
2020-12-22 15:44:13,725 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 1861.45 sec
2020-12-22 15:44:43,418 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 1950.29 sec
2020-12-22 15:44:57,770 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 1989.13 sec
2020-12-22 15:45:24,410 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 2072.34 sec
2020-12-22 15:45:54,127 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 2165.23 sec
2020-12-22 15:46:05,399 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 2191.15 sec
2020-12-22 15:46:35,076 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 2281.29 sec
2020-12-22 15:46:41,231 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 2306.47 sec
2020-12-22 15:47:17,071 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 2409.37 sec
2020-12-22 15:47:22,199 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 2436.0 sec
2020-12-22 15:47:51,918 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 2519.65 sec
2020-12-22 15:48:18,555 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 2602.14 sec
2020-12-22 15:48:33,925 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 2644.48 sec
2020-12-22 15:49:01,573 Stage-1 map = 42%,  reduce = 0%, Cumulative CPU 2728.82 sec
2020-12-22 15:49:14,890 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 2769.67 sec
2020-12-22 15:49:39,457 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 2844.61 sec
2020-12-22 15:50:08,161 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 2918.23 sec
2020-12-22 15:50:20,453 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 2966.91 sec
2020-12-22 15:50:52,200 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 3065.02 sec
2020-12-22 15:51:01,420 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 3091.15 sec
2020-12-22 15:51:31,093 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 3180.02 sec
2020-12-22 15:51:42,352 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 3219.38 sec
2020-12-22 15:52:02,842 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 3282.47 sec
2020-12-22 15:52:25,360 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 3346.31 sec
2020-12-22 15:52:49,915 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 3412.03 sec
2020-12-22 15:53:09,353 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 3466.34 sec
2020-12-22 15:53:39,022 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 3559.84 sec
2020-12-22 15:53:55,424 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 3600.18 sec
2020-12-22 15:54:16,894 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 3677.0 sec
2020-12-22 15:54:40,450 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 3742.23 sec
2020-12-22 15:55:06,022 Stage-1 map = 59%,  reduce = 0%, Cumulative CPU 3809.95 sec
2020-12-22 15:55:27,531 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 3872.88 sec
2020-12-22 15:55:46,970 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 3949.54 sec
2020-12-22 15:56:11,541 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 4019.95 sec
2020-12-22 15:56:35,074 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 4079.36 sec
2020-12-22 15:56:50,434 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 4134.21 sec
2020-12-22 15:57:17,045 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 4218.52 sec
2020-12-22 15:57:29,327 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 4250.03 sec
2020-12-22 15:57:59,029 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 4343.61 sec
2020-12-22 15:58:14,383 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 4372.78 sec
2020-12-22 15:58:39,990 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 4457.12 sec
2020-12-22 15:59:03,531 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 4516.39 sec
2020-12-22 15:59:20,938 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 4579.95 sec
2020-12-22 15:59:47,535 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 4662.51 sec
2020-12-22 16:00:00,865 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 4704.85 sec
2020-12-22 16:00:28,513 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 4781.9 sec
2020-12-22 16:00:58,207 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 4859.82 sec
2020-12-22 16:01:12,536 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 4915.46 sec
2020-12-22 16:01:40,186 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 5000.19 sec
2020-12-22 16:01:49,392 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 5032.13 sec
2020-12-22 16:02:17,051 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 5116.37 sec
2020-12-22 16:02:46,743 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 5185.63 sec
2020-12-22 16:03:01,095 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 5241.31 sec
2020-12-22 16:03:34,886 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 5312.93 sec
2020-12-22 16:04:10,729 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 5368.73 sec
2020-12-22 16:04:45,541 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 5422.23 sec
2020-12-22 16:05:26,495 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 5485.99 sec
2020-12-22 16:06:25,883 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 5576.06 sec
2020-12-22 16:07:00,664 Stage-1 map = 87%,  reduce = 0%, Cumulative CPU 5630.26 sec
2020-12-22 16:07:43,677 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 5692.55 sec
2020-12-22 16:08:24,613 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 5752.85 sec
2020-12-22 16:09:05,589 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 5814.03 sec
2020-12-22 16:09:40,413 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 5869.31 sec
2020-12-22 16:10:40,832 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 5953.63 sec
2020-12-22 16:10:44,928 Stage-1 map = 92%,  reduce = 0%, Cumulative CPU 5961.29 sec
2020-12-22 16:11:25,875 Stage-1 map = 93%,  reduce = 0%, Cumulative CPU 6024.9 sec
2020-12-22 16:12:07,848 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 6088.08 sec
2020-12-22 16:12:42,648 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 6142.66 sec
2020-12-22 16:13:23,613 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 6206.25 sec
2020-12-22 16:14:01,477 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 6263.55 sec
2020-12-22 16:14:57,801 Stage-1 map = 98%,  reduce = 0%, Cumulative CPU 6347.73 sec
2020-12-22 16:15:33,637 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 6401.32 sec
2020-12-22 16:16:28,934 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6484.47 sec
2020-12-22 16:16:44,291 Stage-1 map = 100%,  reduce = 1%, Cumulative CPU 6520.9 sec
2020-12-22 16:17:05,796 Stage-1 map = 100%,  reduce = 2%, Cumulative CPU 6582.22 sec
2020-12-22 16:17:24,237 Stage-1 map = 100%,  reduce = 3%, Cumulative CPU 6631.31 sec
2020-12-22 16:17:46,772 Stage-1 map = 100%,  reduce = 4%, Cumulative CPU 6692.69 sec
2020-12-22 16:18:09,259 Stage-1 map = 100%,  reduce = 5%, Cumulative CPU 6756.62 sec
2020-12-22 16:18:28,730 Stage-1 map = 100%,  reduce = 6%, Cumulative CPU 6806.64 sec
2020-12-22 16:18:53,284 Stage-1 map = 100%,  reduce = 7%, Cumulative CPU 6869.09 sec
2020-12-22 16:19:13,775 Stage-1 map = 100%,  reduce = 8%, Cumulative CPU 6932.39 sec
2020-12-22 16:19:33,219 Stage-1 map = 100%,  reduce = 9%, Cumulative CPU 6980.84 sec
2020-12-22 16:19:55,741 Stage-1 map = 100%,  reduce = 10%, Cumulative CPU 7042.75 sec
2020-12-22 16:20:20,321 Stage-1 map = 100%,  reduce = 11%, Cumulative CPU 7106.03 sec
2020-12-22 16:20:37,736 Stage-1 map = 100%,  reduce = 12%, Cumulative CPU 7155.75 sec
2020-12-22 16:20:59,240 Stage-1 map = 100%,  reduce = 13%, Cumulative CPU 7218.24 sec
2020-12-22 16:21:23,794 Stage-1 map = 100%,  reduce = 14%, Cumulative CPU 7280.81 sec
2020-12-22 16:21:44,279 Stage-1 map = 100%,  reduce = 15%, Cumulative CPU 7341.54 sec
2020-12-22 16:22:02,715 Stage-1 map = 100%,  reduce = 16%, Cumulative CPU 7391.08 sec
2020-12-22 16:22:27,287 Stage-1 map = 100%,  reduce = 17%, Cumulative CPU 7453.5 sec
2020-12-22 16:22:48,813 Stage-1 map = 100%,  reduce = 18%, Cumulative CPU 7515.72 sec
2020-12-22 16:23:06,232 Stage-1 map = 100%,  reduce = 19%, Cumulative CPU 7565.05 sec
2020-12-22 16:23:30,816 Stage-1 map = 100%,  reduce = 20%, Cumulative CPU 7628.2 sec
2020-12-22 16:23:52,322 Stage-1 map = 100%,  reduce = 21%, Cumulative CPU 7691.45 sec
2020-12-22 16:24:10,772 Stage-1 map = 100%,  reduce = 22%, Cumulative CPU 7742.07 sec
2020-12-22 16:24:33,332 Stage-1 map = 100%,  reduce = 23%, Cumulative CPU 7803.63 sec
2020-12-22 16:24:55,863 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 7865.39 sec
2020-12-22 16:25:14,322 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 7915.86 sec
2020-12-22 16:25:36,863 Stage-1 map = 100%,  reduce = 26%, Cumulative CPU 7978.59 sec
2020-12-22 16:25:58,383 Stage-1 map = 100%,  reduce = 27%, Cumulative CPU 8039.64 sec
2020-12-22 16:26:16,826 Stage-1 map = 100%,  reduce = 28%, Cumulative CPU 8090.56 sec
2020-12-22 16:26:40,365 Stage-1 map = 100%,  reduce = 29%, Cumulative CPU 8153.82 sec
2020-12-22 16:27:01,879 Stage-1 map = 100%,  reduce = 30%, Cumulative CPU 8215.84 sec
2020-12-22 16:27:25,454 Stage-1 map = 100%,  reduce = 31%, Cumulative CPU 8276.43 sec
2020-12-22 16:27:42,863 Stage-1 map = 100%,  reduce = 32%, Cumulative CPU 8325.88 sec
2020-12-22 16:28:05,402 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 8391.38 sec
2020-12-22 16:28:28,991 Stage-1 map = 100%,  reduce = 34%, Cumulative CPU 8454.08 sec
2020-12-22 16:28:47,436 Stage-1 map = 100%,  reduce = 35%, Cumulative CPU 8504.74 sec
2020-12-22 16:29:09,986 Stage-1 map = 100%,  reduce = 36%, Cumulative CPU 8567.02 sec
2020-12-22 16:29:33,571 Stage-1 map = 100%,  reduce = 37%, Cumulative CPU 8627.9 sec
2020-12-22 16:29:51,003 Stage-1 map = 100%,  reduce = 38%, Cumulative CPU 8677.2 sec
2020-12-22 16:30:13,552 Stage-1 map = 100%,  reduce = 39%, Cumulative CPU 8739.33 sec
2020-12-22 16:30:36,087 Stage-1 map = 100%,  reduce = 40%, Cumulative CPU 8802.55 sec
2020-12-22 16:30:54,539 Stage-1 map = 100%,  reduce = 41%, Cumulative CPU 8851.75 sec
2020-12-22 16:31:17,098 Stage-1 map = 100%,  reduce = 42%, Cumulative CPU 8914.12 sec
2020-12-22 16:31:39,653 Stage-1 map = 100%,  reduce = 43%, Cumulative CPU 8976.57 sec
2020-12-22 16:32:02,203 Stage-1 map = 100%,  reduce = 44%, Cumulative CPU 9038.99 sec
2020-12-22 16:32:20,654 Stage-1 map = 100%,  reduce = 45%, Cumulative CPU 9089.78 sec
2020-12-22 16:32:43,206 Stage-1 map = 100%,  reduce = 46%, Cumulative CPU 9154.28 sec
2020-12-22 16:33:06,798 Stage-1 map = 100%,  reduce = 47%, Cumulative CPU 9217.02 sec
2020-12-22 16:33:25,263 Stage-1 map = 100%,  reduce = 48%, Cumulative CPU 9264.79 sec
2020-12-22 16:33:46,816 Stage-1 map = 100%,  reduce = 49%, Cumulative CPU 9326.92 sec
2020-12-22 16:34:11,410 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 9389.93 sec
2020-12-22 16:34:29,847 Stage-1 map = 100%,  reduce = 51%, Cumulative CPU 9440.26 sec
2020-12-22 16:34:52,404 Stage-1 map = 100%,  reduce = 52%, Cumulative CPU 9502.12 sec
2020-12-22 16:35:14,992 Stage-1 map = 100%,  reduce = 53%, Cumulative CPU 9563.36 sec
2020-12-22 16:35:34,477 Stage-1 map = 100%,  reduce = 54%, Cumulative CPU 9613.78 sec
2020-12-22 16:35:58,060 Stage-1 map = 100%,  reduce = 55%, Cumulative CPU 9675.66 sec
2020-12-22 16:36:20,623 Stage-1 map = 100%,  reduce = 56%, Cumulative CPU 9738.18 sec
2020-12-22 16:36:39,069 Stage-1 map = 100%,  reduce = 57%, Cumulative CPU 9787.32 sec
2020-12-22 16:37:01,654 Stage-1 map = 100%,  reduce = 58%, Cumulative CPU 9849.5 sec
2020-12-22 16:37:24,207 Stage-1 map = 100%,  reduce = 59%, Cumulative CPU 9911.15 sec
2020-12-22 16:37:45,743 Stage-1 map = 100%,  reduce = 60%, Cumulative CPU 9973.2 sec
2020-12-22 16:38:04,194 Stage-1 map = 100%,  reduce = 61%, Cumulative CPU 10022.28 sec
2020-12-22 16:38:27,756 Stage-1 map = 100%,  reduce = 62%, Cumulative CPU 10084.13 sec
2020-12-22 16:38:49,296 Stage-1 map = 100%,  reduce = 63%, Cumulative CPU 10143.47 sec
2020-12-22 16:39:07,755 Stage-1 map = 100%,  reduce = 64%, Cumulative CPU 10194.02 sec
2020-12-22 16:39:31,358 Stage-1 map = 100%,  reduce = 65%, Cumulative CPU 10254.1 sec
2020-12-22 16:39:52,885 Stage-1 map = 100%,  reduce = 66%, Cumulative CPU 10315.81 sec
2020-12-22 16:40:10,353 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 10366.31 sec
2020-12-22 16:40:34,963 Stage-1 map = 100%,  reduce = 68%, Cumulative CPU 10427.09 sec
2020-12-22 16:40:57,525 Stage-1 map = 100%,  reduce = 69%, Cumulative CPU 10488.52 sec
2020-12-22 16:41:15,979 Stage-1 map = 100%,  reduce = 70%, Cumulative CPU 10538.81 sec
2020-12-22 16:41:38,569 Stage-1 map = 100%,  reduce = 71%, Cumulative CPU 10601.0 sec
2020-12-22 16:42:01,113 Stage-1 map = 100%,  reduce = 72%, Cumulative CPU 10661.02 sec
2020-12-22 16:42:24,693 Stage-1 map = 100%,  reduce = 73%, Cumulative CPU 10724.19 sec
2020-12-22 16:42:44,185 Stage-1 map = 100%,  reduce = 74%, Cumulative CPU 10774.42 sec
2020-12-22 16:43:03,667 Stage-1 map = 100%,  reduce = 75%, Cumulative CPU 10835.19 sec
2020-12-22 16:43:28,281 Stage-1 map = 100%,  reduce = 76%, Cumulative CPU 10897.09 sec
2020-12-22 16:43:46,763 Stage-1 map = 100%,  reduce = 77%, Cumulative CPU 10946.29 sec
2020-12-22 16:44:06,238 Stage-1 map = 100%,  reduce = 78%, Cumulative CPU 11008.59 sec
2020-12-22 16:44:31,881 Stage-1 map = 100%,  reduce = 79%, Cumulative CPU 11070.09 sec
2020-12-22 16:44:50,338 Stage-1 map = 100%,  reduce = 80%, Cumulative CPU 11119.49 sec
2020-12-22 16:45:10,864 Stage-1 map = 100%,  reduce = 81%, Cumulative CPU 11182.32 sec
2020-12-22 16:45:35,471 Stage-1 map = 100%,  reduce = 82%, Cumulative CPU 11244.0 sec
2020-12-22 16:45:52,893 Stage-1 map = 100%,  reduce = 83%, Cumulative CPU 11293.35 sec
2020-12-22 16:46:14,420 Stage-1 map = 100%,  reduce = 84%, Cumulative CPU 11356.49 sec
2020-12-22 16:46:38,045 Stage-1 map = 100%,  reduce = 85%, Cumulative CPU 11419.4 sec
2020-12-22 16:46:56,528 Stage-1 map = 100%,  reduce = 86%, Cumulative CPU 11468.41 sec
2020-12-22 16:47:20,132 Stage-1 map = 100%,  reduce = 87%, Cumulative CPU 11531.36 sec
2020-12-22 16:47:42,714 Stage-1 map = 100%,  reduce = 88%, Cumulative CPU 11593.23 sec
2020-12-22 16:48:05,312 Stage-1 map = 100%,  reduce = 89%, Cumulative CPU 11654.95 sec
2020-12-22 16:48:23,791 Stage-1 map = 100%,  reduce = 90%, Cumulative CPU 11704.69 sec
2020-12-22 16:48:46,388 Stage-1 map = 100%,  reduce = 91%, Cumulative CPU 11766.5 sec
2020-12-22 16:49:08,944 Stage-1 map = 100%,  reduce = 92%, Cumulative CPU 11829.9 sec
2020-12-22 16:49:26,402 Stage-1 map = 100%,  reduce = 93%, Cumulative CPU 11878.99 sec
2020-12-22 16:49:48,986 Stage-1 map = 100%,  reduce = 94%, Cumulative CPU 11941.07 sec
2020-12-22 16:50:13,611 Stage-1 map = 100%,  reduce = 95%, Cumulative CPU 12004.41 sec
2020-12-22 16:50:32,065 Stage-1 map = 100%,  reduce = 96%, Cumulative CPU 12054.25 sec
2020-12-22 16:50:54,635 Stage-1 map = 100%,  reduce = 97%, Cumulative CPU 12116.61 sec
2020-12-22 16:51:18,221 Stage-1 map = 100%,  reduce = 98%, Cumulative CPU 12180.35 sec
2020-12-22 16:51:35,661 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 12230.74 sec
2020-12-22 16:52:07,473 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 12317.39 sec
MapReduce Total cumulative CPU time: 0 days 3 hours 25 minutes 17 seconds 390 msec
Ended Job = job_1606698967173_0354
Loading data to table test.test__ods_fact_sale_ods_fact_sale_idx02__
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 117  Reduce: 469   Cumulative CPU: 12317.39 sec   HDFS Read: 31439348928 HDFS Write: 73491333534 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 days 3 hours 25 minutes 17 seconds 390 msec
OK
Time taken: 4709.654 seconds
hive> SHOW FORMATTED INDEX ON ODS_FACT_SALE;
OK
idx_name        tab_name        col_names       idx_tab_name    idx_type        comment
idx_name                tab_name                col_names               idx_tab_name            idx_type                comment             
                                         
                                         
ods_fact_sale_idx01     ods_fact_sale           sale_date               test__ods_fact_sale_ods_fact_sale_idx01__       compact             
ods_fact_sale_idx02     ods_fact_sale           id                      test__ods_fact_sale_ods_fact_sale_idx02__       compact             
Time taken: 0.069 seconds, Fetched: 5 row(s)
hive> 

查看表和索引的大小
表的大小30G左右
索引的大小70G左右,id列创建索引,居然比表都大

[root@hp1 ~]# hadoop fs -ls /user/hive/warehouse/test.db/ods_fact_sale |awk -F ' ' '{print $5}'|awk '{a+=$1}END{print a}'
31421093662
[root@hp1 ~]# hadoop fs -ls /user/hive/warehouse/test.db/test__ods_fact_sale_ods_fact_sale_idx02__ |awk -F ' ' '{print $5}'|awk '{a+=$1}END{print a}'      
73491280068
[root@hp1 ~]# 

2.4 使用索引案例

使用索引需要调整如下参数:

-- 默认输入格式。如果遇到使用CombineHiveInputFormat的问题,将此设置为HiveInputFormat。
SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
-- 是否启用索引的自动使用  默认为false
SET hive.optimize.index.filter=true;
-- 自动使用压缩索引的输入的最小大小(以字节为单位)
SET hive.optimize.index.filter.compact.minsize=0;
-- 查询能使用的最大的压缩索引的空间直接数,默认为10737418240,为负值的时候代表无穷大
set hive.index.compact.query.max.size=-1;

测试记录:
如下的测试记录,ods_fact_sale表7亿左右数据,id为主键,在hive表中创建索引,索引居然比表占的存储都大。
而且单行的查询,用索引居然比直接查询更慢,当然更快的还是spark。
这也就是hive会在后面的版本中去掉索引,用物化视图的自动查询重写来代替。

-- 未使用索引
hive> 
    > select * from ods_fact_sale where id = 10000;
Query ID = root_20201222152519_ed85b5ad-000e-41dc-a85a-7e0bbc9e0176
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0353, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0353/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0353
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 0
2020-12-22 15:25:27,751 Stage-1 map = 0%,  reduce = 0%
2020-12-22 15:25:39,162 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 15.37 sec
2020-12-22 15:25:46,394 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 22.64 sec
2020-12-22 15:25:53,616 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 37.16 sec
2020-12-22 15:25:54,665 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 44.45 sec
2020-12-22 15:26:00,842 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 51.79 sec
2020-12-22 15:26:01,870 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 59.02 sec
2020-12-22 15:26:09,059 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 73.6 sec
2020-12-22 15:26:17,276 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 88.21 sec
2020-12-22 15:26:25,487 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 102.7 sec
2020-12-22 15:26:33,694 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 117.33 sec
2020-12-22 15:26:39,838 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 124.44 sec
2020-12-22 15:26:47,012 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 138.83 sec
2020-12-22 15:26:49,061 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 146.02 sec
2020-12-22 15:26:54,177 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 153.39 sec
2020-12-22 15:26:56,224 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 160.72 sec
2020-12-22 15:27:01,364 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 167.89 sec
2020-12-22 15:27:04,444 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 175.21 sec
2020-12-22 15:27:12,640 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 189.69 sec
2020-12-22 15:27:16,749 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 196.86 sec
2020-12-22 15:27:19,825 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 204.02 sec
2020-12-22 15:27:22,893 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 211.12 sec
2020-12-22 15:27:25,969 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 218.38 sec
2020-12-22 15:27:35,189 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 232.69 sec
2020-12-22 15:27:38,248 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 239.79 sec
2020-12-22 15:27:42,338 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 246.99 sec
2020-12-22 15:27:45,409 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 254.03 sec
2020-12-22 15:27:50,531 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 261.5 sec
2020-12-22 15:27:52,579 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 268.68 sec
2020-12-22 15:27:59,745 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 283.25 sec
2020-12-22 15:28:06,918 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 290.47 sec
2020-12-22 15:28:07,944 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 297.58 sec
2020-12-22 15:28:14,091 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 304.7 sec
2020-12-22 15:28:15,116 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 312.05 sec
2020-12-22 15:28:21,254 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 319.32 sec
2020-12-22 15:28:28,412 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 333.58 sec
2020-12-22 15:28:30,451 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 340.7 sec
2020-12-22 15:28:35,566 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 347.92 sec
2020-12-22 15:28:37,615 Stage-1 map = 42%,  reduce = 0%, Cumulative CPU 355.18 sec
2020-12-22 15:28:42,760 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 362.32 sec
2020-12-22 15:28:45,837 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 369.46 sec
2020-12-22 15:28:53,000 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 383.79 sec
2020-12-22 15:28:57,099 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 390.96 sec
2020-12-22 15:29:00,171 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 398.23 sec
2020-12-22 15:29:03,239 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 405.41 sec
2020-12-22 15:29:07,333 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 412.59 sec
2020-12-22 15:29:09,381 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 418.3 sec
2020-12-22 15:29:17,571 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 433.16 sec
2020-12-22 15:29:21,670 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 440.77 sec
2020-12-22 15:29:24,739 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 448.0 sec
2020-12-22 15:29:29,852 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 455.16 sec
2020-12-22 15:29:32,923 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 462.21 sec
2020-12-22 15:29:37,015 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 469.44 sec
2020-12-22 15:29:45,208 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 484.18 sec
2020-12-22 15:29:46,232 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 491.29 sec
2020-12-22 15:29:53,402 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 505.98 sec
2020-12-22 15:30:00,561 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 520.64 sec
2020-12-22 15:30:08,750 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 535.06 sec
2020-12-22 15:30:16,921 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 549.64 sec
2020-12-22 15:30:25,110 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 564.1 sec
2020-12-22 15:30:33,297 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 578.87 sec
2020-12-22 15:30:40,468 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 586.16 sec
2020-12-22 15:30:41,492 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 593.42 sec
2020-12-22 15:30:48,656 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 607.97 sec
2020-12-22 15:30:56,844 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 622.63 sec
2020-12-22 15:31:05,035 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 637.24 sec
2020-12-22 15:31:11,176 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 644.51 sec
2020-12-22 15:31:13,226 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 651.73 sec
2020-12-22 15:31:18,340 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 658.77 sec
2020-12-22 15:31:21,408 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 666.09 sec
2020-12-22 15:31:29,616 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 680.5 sec
2020-12-22 15:31:31,665 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 687.82 sec
2020-12-22 15:31:36,784 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 694.88 sec
2020-12-22 15:31:39,855 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 701.86 sec
2020-12-22 15:31:44,974 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 709.08 sec
2020-12-22 15:31:47,020 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 716.29 sec
2020-12-22 15:31:55,214 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 730.6 sec
2020-12-22 15:32:00,337 Stage-1 map = 87%,  reduce = 0%, Cumulative CPU 737.95 sec
2020-12-22 15:32:02,385 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 745.08 sec
2020-12-22 15:32:08,514 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 752.31 sec
2020-12-22 15:32:09,538 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 759.48 sec
2020-12-22 15:32:15,677 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 766.75 sec
2020-12-22 15:32:22,843 Stage-1 map = 92%,  reduce = 0%, Cumulative CPU 781.36 sec
2020-12-22 15:32:23,866 Stage-1 map = 93%,  reduce = 0%, Cumulative CPU 788.45 sec
2020-12-22 15:32:30,009 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 795.75 sec
2020-12-22 15:32:32,057 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 802.85 sec
2020-12-22 15:32:38,211 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 810.04 sec
2020-12-22 15:32:40,254 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 817.29 sec
2020-12-22 15:32:47,426 Stage-1 map = 98%,  reduce = 0%, Cumulative CPU 831.76 sec
2020-12-22 15:32:52,545 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 838.89 sec
2020-12-22 15:32:55,612 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 846.16 sec
MapReduce Total cumulative CPU time: 14 minutes 6 seconds 160 msec
Ended Job = job_1606698967173_0353
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 117   Cumulative CPU: 846.16 sec   HDFS Read: 31436963448 HDFS Write: 10228 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 14 minutes 6 seconds 160 msec
OK
ods_fact_sale.id        ods_fact_sale.sale_date ods_fact_sale.prod_name ods_fact_sale.sale_nums
10000   2012-03-11 00:00:00.0   PROD5   96
Time taken: 457.342 seconds, Fetched: 1 row(s)

-- 使用索引
hive> 
    > SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
hive> SET hive.optimize.index.filter=true;
hive> SET hive.optimize.index.filter.compact.minsize=0;
hive> set hive.index.compact.query.max.size=-1;
hive> 
    > 
    > select * from ods_fact_sale where id = 10000;
Query ID = root_20201222165727_6d0369ce-6498-4bed-8242-b52be9de77a8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0355, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0355/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0355
Hadoop job information for Stage-1: number of mappers: 237; number of reducers: 0
2020-12-22 16:57:34,782 Stage-1 map = 0%,  reduce = 0%
2020-12-22 16:57:44,081 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 11.95 sec
2020-12-22 16:57:51,278 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 23.66 sec
2020-12-22 16:57:56,418 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 35.37 sec
2020-12-22 16:58:07,711 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 52.9 sec
2020-12-22 16:58:13,853 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 64.76 sec
2020-12-22 16:58:22,039 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 82.49 sec
2020-12-22 16:58:28,188 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 94.33 sec
2020-12-22 16:58:34,335 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 105.97 sec
2020-12-22 16:58:43,550 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 123.51 sec
2020-12-22 16:58:49,692 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 135.15 sec
2020-12-22 16:58:55,842 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 147.0 sec
2020-12-22 16:59:04,028 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 164.52 sec
2020-12-22 16:59:10,178 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 176.03 sec
2020-12-22 16:59:16,310 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 187.32 sec
2020-12-22 16:59:26,541 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 204.7 sec
2020-12-22 16:59:32,680 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 216.24 sec
2020-12-22 16:59:39,835 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 233.71 sec
2020-12-22 16:59:45,971 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 245.21 sec
2020-12-22 16:59:52,106 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 256.37 sec
2020-12-22 17:00:03,363 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 274.03 sec
2020-12-22 17:00:09,513 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 285.58 sec
2020-12-22 17:00:15,656 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 303.05 sec
2020-12-22 17:00:21,796 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 314.47 sec
2020-12-22 17:00:27,932 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 325.46 sec
2020-12-22 17:00:39,192 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 342.82 sec
2020-12-22 17:00:45,334 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 354.32 sec
2020-12-22 17:00:51,479 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 365.88 sec
2020-12-22 17:00:58,635 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 383.29 sec
2020-12-22 17:01:04,782 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 395.04 sec
2020-12-22 17:01:09,911 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 406.56 sec
2020-12-22 17:01:21,149 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 423.83 sec
2020-12-22 17:01:27,298 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 435.55 sec
2020-12-22 17:01:34,472 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 452.98 sec
2020-12-22 17:01:40,616 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 464.47 sec
2020-12-22 17:01:46,756 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 476.0 sec
2020-12-22 17:01:57,998 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 499.32 sec
2020-12-22 17:02:03,113 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 505.22 sec
2020-12-22 17:02:09,262 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 516.84 sec
2020-12-22 17:02:16,418 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 534.09 sec
2020-12-22 17:02:22,550 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 545.71 sec
2020-12-22 17:02:28,696 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 557.23 sec
2020-12-22 17:02:38,940 Stage-1 map = 42%,  reduce = 0%, Cumulative CPU 574.68 sec
2020-12-22 17:02:45,070 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 586.31 sec
2020-12-22 17:02:52,230 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 603.61 sec
2020-12-22 17:02:58,369 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 615.2 sec
2020-12-22 17:03:04,502 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 626.88 sec
2020-12-22 17:03:15,736 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 644.23 sec
2020-12-22 17:03:21,867 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 655.32 sec
2020-12-22 17:03:28,012 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 666.74 sec
2020-12-22 17:03:35,180 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 683.84 sec
2020-12-22 17:03:40,291 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 695.25 sec
2020-12-22 17:03:51,544 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 712.8 sec
2020-12-22 17:03:57,688 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 724.58 sec
2020-12-22 17:04:03,803 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 736.14 sec
2020-12-22 17:04:10,966 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 753.53 sec
2020-12-22 17:04:17,094 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 764.89 sec
2020-12-22 17:04:23,248 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 776.53 sec
2020-12-22 17:04:33,482 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 794.16 sec
2020-12-22 17:04:39,621 Stage-1 map = 59%,  reduce = 0%, Cumulative CPU 805.71 sec
2020-12-22 17:04:46,787 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 822.53 sec
2020-12-22 17:04:52,933 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 834.34 sec
2020-12-22 17:04:59,072 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 846.03 sec
2020-12-22 17:05:10,317 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 863.38 sec
2020-12-22 17:05:15,432 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 875.03 sec
2020-12-22 17:05:21,579 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 886.8 sec
2020-12-22 17:05:28,732 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 904.16 sec
2020-12-22 17:05:34,870 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 915.76 sec
2020-12-22 17:05:41,000 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 927.28 sec
2020-12-22 17:05:52,264 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 944.79 sec
2020-12-22 17:05:58,401 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 956.18 sec
2020-12-22 17:06:04,544 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 973.61 sec
2020-12-22 17:06:10,688 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 985.31 sec
2020-12-22 17:06:16,822 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 996.76 sec
2020-12-22 17:06:28,072 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 1014.18 sec
2020-12-22 17:06:34,210 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 1025.82 sec
2020-12-22 17:06:40,339 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 1037.42 sec
2020-12-22 17:06:47,496 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 1054.79 sec
2020-12-22 17:06:52,614 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 1066.27 sec
2020-12-22 17:07:03,879 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 1083.42 sec
2020-12-22 17:07:10,021 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 1095.07 sec
2020-12-22 17:07:16,162 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 1106.66 sec
2020-12-22 17:07:23,329 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 1123.82 sec
2020-12-22 17:07:29,475 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 1135.42 sec
2020-12-22 17:07:35,621 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 1147.24 sec
2020-12-22 17:07:45,851 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 1164.55 sec
2020-12-22 17:07:51,997 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 1175.9 sec
2020-12-22 17:07:59,167 Stage-1 map = 87%,  reduce = 0%, Cumulative CPU 1193.34 sec
2020-12-22 17:08:05,300 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 1204.96 sec
2020-12-22 17:08:11,440 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 1216.65 sec
2020-12-22 17:08:22,708 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 1234.08 sec
2020-12-22 17:08:27,837 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 1245.61 sec
2020-12-22 17:08:33,979 Stage-1 map = 92%,  reduce = 0%, Cumulative CPU 1257.17 sec
2020-12-22 17:08:41,147 Stage-1 map = 93%,  reduce = 0%, Cumulative CPU 1274.05 sec
2020-12-22 17:08:47,299 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 1285.58 sec
2020-12-22 17:08:53,447 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 1297.34 sec
2020-12-22 17:09:04,704 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 1314.97 sec
2020-12-22 17:09:09,823 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 1326.53 sec
2020-12-22 17:09:16,989 Stage-1 map = 98%,  reduce = 0%, Cumulative CPU 1343.78 sec
2020-12-22 17:09:23,135 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 1354.25 sec
2020-12-22 17:09:31,324 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1366.84 sec
MapReduce Total cumulative CPU time: 22 minutes 46 seconds 840 msec
Ended Job = job_1606698967173_0355
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 237   Cumulative CPU: 1366.84 sec   HDFS Read: 31437557556 HDFS Write: 20668 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 22 minutes 46 seconds 840 msec
OK
ods_fact_sale.id        ods_fact_sale.sale_date ods_fact_sale.prod_name ods_fact_sale.sale_nums
10000   2012-03-11 00:00:00.0   PROD5   96
Time taken: 725.125 seconds, Fetched: 1 row(s)
hive> -

-- 使用spark更快
-- 删除索引后spark依旧很快,证明spark未使用到索引
hive>  
    > 
    > set hive.execution.engine=spark;
hive> select * from ods_fact_sale where id = 10000;
Query ID = root_20201222171114_26a785b8-b6a9-4780-96d2-f868d89d0dc9
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1606698967173_0356
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/yarn application -kill application_1606698967173_0356
Hive on Spark Session Web UI URL: http://hp4:36726

Query Hive on Spark job[0] stages: [0]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-0 ........         0      FINISHED    237        237        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 01/01    [==========================>>] 100%  ELAPSED TIME: 61.24 s    
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 61.24 second(s)
Spark Job[0] Metrics: TaskDurationTime: 439651, ExecutorCpuTime: 343803, JvmGCTime: 5379, BytesRead / RecordsRead: 31437553053 / 767830000, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 0 / 0, ShuffleBytesWritten / ShuffleRecordsWritten: 0 / 0
OK
ods_fact_sale.id        ods_fact_sale.sale_date ods_fact_sale.prod_name ods_fact_sale.sale_nums
10000   2012-03-11 00:00:00.0   PROD5   96
Time taken: 79.735 seconds, Fetched: 1 row(s)
hive> 
    > DROP INDEX  ods_fact_sale_idx02 on ods_fact_sale;
OK
Time taken: 0.112 seconds
hive> select * from ods_fact_sale where id = 10000;
Query ID = root_20201222171254_29ea80f3-e924-439d-b423-8b55363382bb
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1606698967173_0356
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/yarn application -kill application_1606698967173_0356
Hive on Spark Session Web UI URL: http://hp4:36726

Query Hive on Spark job[1] stages: [1]
Spark job[1] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-1 ........         0      FINISHED    237        237        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 01/01    [==========================>>] 100%  ELAPSED TIME: 50.13 s    
--------------------------------------------------------------------------------------
Spark job[1] finished successfully in 50.13 second(s)
Spark Job[1] Metrics: TaskDurationTime: 394418, ExecutorCpuTime: 336367, JvmGCTime: 3504, BytesRead / RecordsRead: 31437707340 / 767830000, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 0 / 0, ShuffleBytesWritten / ShuffleRecordsWritten: 0 / 0
OK
ods_fact_sale.id        ods_fact_sale.sale_date ods_fact_sale.prod_name ods_fact_sale.sale_nums
10000   2012-03-11 00:00:00.0   PROD5   96
Time taken: 50.279 seconds, Fetched: 1 row(s)
hive> 

猜你喜欢

转载自blog.csdn.net/u010520724/article/details/112257719