Big Data Development Training Course: Static and Dynamic Partitioning in Hive

  Partitioning is a way for hive to store data. The column value is used as a directory to store data, which is a partition. In this way, the partition column is used to filter the query, and the data in the corresponding directory can be directly scanned according to the column value, without scanning other unconcerned partitions, which can quickly locate and improve the query efficiency. There are two types of dynamic and static partitions:

  1. Static partition: If the value of the partition is determined, it is called a static partition. When adding a new partition or loading partition data, the partition name has been specified.

  create table if not exists day_part1(

  int hat

  uname string

  )

  partitioned by(year int,month int)

  row format delimited fields terminated by '\t';

  ##Load data specified partition

  load data local inpath '/root/Desktop/student.txt' into table day_part1 partition(year=2017,month=04);

  ##Add partition to specify partition name

  alter table day_part1 add partition(year=2017,month=1) partition(year=2016,month=12);

  2. Dynamic partition: The value of the partition is non-deterministic and determined by the input data

  2.1 Related properties of dynamic partitions:

  hive.exec.dynamic.partition=true : whether to allow dynamic partitioning

  hive.exec.dynamic.partition.mode=strict : Partition mode setting

  strict: At least one of the partitions must be static

  nostrict: can be all dynamic partitions

  hive.exec.max.dynamic.partitions=1000 : maximum number of dynamic partitions allowed

  hive.exec.max.dynamic.partitions.pernode=100 : maximum partitions allowed to be created by mapper/reducer on a single node

  2.2 Operation of Dynamic Partitioning

  ##Create temporary table

  create table if not exists tmp

  (int uid

  commentid bigint,

  recommentid bigint,

  year int,

  month int,

  day int)

  row format delimited fields terminated by '\t';

  ##Download Data

  load data local inpath '/root/Desktop/comm' into table tmp;

  ##Create dynamic partition table

  create table if not exists dyp1

  (int uid

  commentid bigint,

  recommentid bigint)

  partitioned by(year int,month int,day int)

  row format delimited fields terminated by '\t';

  ##strict mode

  insert into table dyp1 partition(year=2016,month,day)

  select uid,commentid,recommentid,month,day from tmp;

  ## non-strict mode

  ##Set non-strict mode dynamic partition

  set hive.exec.dynamic.partition.mode=nostrict;

  ##Create dynamic partition table

  create table if not exists dyp2

  (int uid

  commentid bigint,

  recommentid bigint)

  partitioned by(year int,month int,day int)

  row format delimited fields terminated by '\t';

  ##Load data for non-strict mode dynamic partitions

  insert into table dyp2 partition(year,month,day)

  select uid,commentid,recommentid,year,month,day from tmp;

  3. Pay attention to the details of the partition

  (1) Try not to use dynamic partitioning, because during dynamic partitioning, the number of reducers will be allocated to each partition. When the number of partitions is large, the number of reducers will increase, which is a disaster for the server.

  (2) The difference between dynamic partition and static partition, static partition will create the partition regardless of whether there is data, dynamic partition will be created if there is a result set, otherwise it will not be created.

  (3) The strict mode of hive dynamic partition and the strict mode of hive.mapred.mode provided by hive.

  Hive provides us with a strict mode: in order to prevent users from accidentally submitting malicious hql

  hive.mapred.mode=nostrict : strict

  If the mode value is strict, the following three queries will be blocked:

  (1) For a query on a partitioned table, the filter field in where is not a partition field.

  (2), Cartesian product join query, join query statement without on condition or where condition.

  (3) For the order by query, there is no limit statement for the order by query.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325930158&siteId=291194637