hive mapreduce reducer 调优

The details of hive parameter configuration can be found in the official document: Hive Configuration+Properties

This article focuses on the tuning of the reducer and mainly involves the following three parameters:

hive.exec.reducers.bytes.per.reducer

Default Value: 1,000,000,000 prior to Hive 0.14.0; 256 MB (256,000,000) in Hive 0.14.0 and later Added In: Hive 0.2.0; default changed in 0.14.0 with HIVE-7158 (and HIVE-7917) Size per reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if the input size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later the default is 256 MB, that is, if the input size is 1 GB then 4 reducers will be used.

Description: The file size that each reducer can handle. This parameter controls how many reducers a job will process, based on the total size of the input file. The official default value: 1G

View the default value of the configuration:

hive> set hive.exec.reducers.bytes.per.reducer;
hive.exec.reducers.bytes.per.reducer=1024000000

Temporary tuning:

hive> set hive.exec.reducers.bytes.per.reducer=15364000000;
hive>  set hive.exec.reducers.bytes.per.reducer;
hive.exec.reducers.bytes.per.reducer=15364000000

mapred.reduce.tasks

  • Default Value: -1
  • Added In: Hive 0.1.0 The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is “local”. Hadoop set this to 1 by default, whereas Hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers.

Meaning: Set the reduce number of each job. The official default value is: -1

View the default value of the configuration:

hive> set mapred.reduce.tasks;
mapred.reduce.tasks=-1

Temporary tuning:

hive> set mapred.reduce.tasks=100;
hive> set mapred.reduce.tasks;
mapred.reduce.tasks=100

hive.exec.reducers.max

  • Default Value: 999 prior to Hive 0.14.0; 1009 in Hive 0.14.0 and later
  • Added In: Hive 0.2.0; default changed in 0.14.0 with HIVE-7158 (and HIVE-7917) Maximum number of reducers that will be used. If the one specified in the configuration property mapred.reduce.tasks is negative, Hive will use this as the maximum number of reducers when automatically determining the number of reducers.

Meaning: Set the maximum reduce number. If the reduce number calculated by mapreduce exceeds the set value, the set value is taken.

View the default value of the configuration:

hive> set hive.exec.reducers.max;
hive.exec.reducers.max=1099

Temporary tuning:

hive> set hive.exec.reducers.max=999;
hive> set hive.exec.reducers.max;
hive.exec.reducers.max=999

All the above set values ​​are only temporarily adjusted and will not change the configuration in the configuration file. When you reopen the hive session, you can see the corresponding configuration at a glance.

Guess you like

Origin blog.csdn.net/fly_time2012/article/details/108234443