hive strict mode

The strict mode has restrictions in the following three cases:
(1) partition table needs to be added with partition clipping
(2) order by has only one reduce, and limit needs to be added
(3) When join, if there is only one reduce, Cartesian product is not supported .       

 Add a few knives:

After testing hive 1.6 strict mode is turned on, the following two queries are not supported

(1) In/not in subquery, use left join to replace

(2) The two partition tables are associated. After the where condition, only the partition condition of the main table can be placed, and the partition conditions of the two tables cannot be placed after the where condition.

失败的:FROM  table1 a LEFT JOIN table2  b on a.id=b.id  	
WHERE b.dt = 20161026 and a.dt=20161026;
成功的:FROM  table1 a LEFT JOIN table2  b on a.id=b.id  	
and b.dt = 20161026 where a.dt=20161026;

 

 

-------------------------------------------------------------------------------------------------------------------------

  Hive provides a strict mode that prevents users from executing queries that may have unexpected and undesirable effects. That is, some queries cannot be executed in strict
mode .
1) Query of a table with partitions
        If hive is executed on a partitioned table, it is not allowed to execute unless the where statement contains a partition field filter condition to display the data range. In other words,
the user is not allowed to scan all partitions. The reason for this limitation is that partitioned tables typically have very large datasets that grow rapidly.
       If no partition-restricted query is made it could be an unacceptably large resource to process this table:
       hive> SELECT DISTINCT(planner_id) FROM fracture_ins WHERE planner_id=5;
       FAILED: Error in semantic analysis: No Partition Predicate Found for Alias ​​" fracture_ins" Table "fracture_ins
       The following statement adds a partition filter condition to the where statement (that is, restricts the table partition):
       hive> SELECT DISTINCT(planner_id) FROM fracture_ins
       > WHERE planner_id=5 AND hit_date=20120101;
       ... normal results...
 2) Query with orderby
         For queries using orderby, a limit statement is required. Because orderby will distribute all the results to the same reducer
  for processing in order to perform the sorting process, the wallpaper requires the user to increase this limit statement to prevent the reducer from executing additionally for a long time:
hive> SELECT * FROM fracture_ins WHERE hit_date>2012 ORDER BY planner_id ;
FAILED: Error in semantic analysis: line 1:56 In strict mode,
limit must be specified if ORDER BY is present planner_id
        This problem can be solved by adding limit statement:
hive> SELECT * FROM fracture_ins WHERE hit_date>2012 ORDER BY planner_id
        > LIMIT 100000;
        ... normal results ...
  3) Restricted Cartesian product queries
         Users who are very knowledgeable about relational databases may expect to use where statements instead of on statements when executing join queries. The execution
  optimizer can efficiently convert the where statement into the on statement. The bad thing is that hive doesn't perform this optimization, so if the table is large enough, this query can get
  uncontrollable:
  hive> SELECT * FROM fracture_act JOIN fracture_ads
> WHERE fracture_act.planner_id = fracture_ads.planner_id;
FAILED: Error in semantic analysis: In strict mode, cartesian product
is not allowed. If you really want to perform the operation,
+set hive.mapred.mode=nonstrict+
        下面这个才是正确的使用join和on语句的查询:
hive> SELECT * FROM fracture_act JOIN fracture_ads
        > ON (fracture_act.planner_id = fracture_ads.planner_id);
        ... normal results ...

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326969283&siteId=291194637