Hive使用TEZ执行count(*)结果不正确

现象

Hive使用TEZ作为默认的执行引擎,当表插入完记录后,count()得到的结果与实际的记录数不一致,如果使用MR作为执行引擎来执行count(),结果与实际记录数一致。

解决

使用TEZ执行count()十分高效,绕过了MapReduce操作,实际结果不正确,应该是TEZ内部有某种机制count()直接查询统计信息,然后统计信息不是最新的,导致count(*)结果不正确。

hive> select count(*) from test1;
OK
1131921

查看表结构,我们发现count(*)的记录数与表结构中的numRows保持一致,

hive> show create table test1;
OK
CREATE TABLE `test1`(
  `pripid` string, 
  `uniscid` string, 
  `entname` string, 
  ...)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\u0001' 
  LINES TERMINATED BY '\n' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hadoop1/apps/hive/warehouse/default.db/test1'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true', 
  'numFiles'='28', 
  'numRows'='1131921', 
  'rawDataSize'='685459303', 
  'totalSize'='2323131590', 
  'transient_lastDdlTime'='1531319725')
Time taken: 0.227 seconds, Fetched: 48 row(s)

使用ANALYZE命令对表重新更新统计信息并重新统计后结果正确,

hive> analyze table test1 compute statistics;
Query ID = trafodion_20180711104240_02eb6fb5-f53c-454f-aa1e-8c6ca157b21c
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1531148517927_2403)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED    146        146        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 12.62 s    
--------------------------------------------------------------------------------
Table test1 stats: [numFiles=28, numRows=5562243, totalSize=2323131590, rawDataSize=2317569347]
OK
Time taken: 14.247 seconds
hive> select count(*) from test1;
OK
5562243
Time taken: 0.045 seconds, Fetched: 1 row(s)

猜你喜欢

转载自blog.csdn.net/post_yuan/article/details/80998806