Hive common parameter configuration



Reprinted from http://dacoolbaby.iteye.com/blog/1880089

hive.exec.mode.local.auto
determines whether Hive should automatically run locally (on GateWay) based on input file size
true

hive.exec.mode.local.auto.inputbytes.max
If hive.exec.mode.local.auto is true, when the input file size is smaller than this threshold, it can automatically run in local mode, the default is 128 MB.
134217728L

hive.exec.mode.local.auto.tasks.max
If hive.exec.mode.local.auto is true, when Hive Tasks (Hadoop Jobs) is less than this threshold, it can automatically run in local mode.
4

Whether hive.auto.convert.join
automatically converts the Common Join on the Reduce side into a Map Join according to the size of the input small table, so as to speed up the Join speed of the large table associated with the small table.
false

hive.mapred.local.mem
The maximum amount of memory for Mapper/Reducer in local mode, in bytes, 0 means no limit.
0

The number of reducers of the Job submitted by mapred.reduce.tasks
, using the configuration of Hadoop Client.
The default is -1, indicating that the number of Job executions is assigned by Hive to
-1

hive.exec.scratchdir
HDFS path to store execution plans of different map/reduce stages and intermediate output results of these stages.
/tmp/<user.name>/hive

hive.metastore.warehouse.dir
The default data file storage path for Hive, usually a path writable by HDFS.
"

hive.groupby.skewindata
determines whether group by operations support skewed data.
The principle is that in Group by, some smaller partitions are merged
false

Whether hive.merge.mapfiles enables
merging of Map-side small files, for versions before Hadoop 0.20, start a new Map/Reduce Job, for versions after 0.20, start a MapOnly Job using CombineInputFormat.
true

Whether hive.merge.mapredfiles enables
the merging of Map/Reduce small files. For versions before Hadoop 0.20, start a new Map/Reduce Job, and for versions after 0.20, start a MapOnly Job using CombineInputFormat.
false

hive.default.fileformat
Hive's default output file format, the same as the one specified when creating the table, the options are 'TextFile', 'SequenceFile' or 'RCFile'.
'TextFile'

hive.mapred.mode
Map/Redure mode, if set to strict, will not allow Cartesian products.
'nonstrict'

Whether hive.exec.parallel enables
concurrent submission of map/reduce jobs.
The default Map/Reduce job is executed sequentially, the default number of concurrent is 8, you can configure
false

hive.security.authorization.enabled Whether
Hive enables authorization authentication.
After enabling authentication, you need to execute the authorization statement to operate the table. For details, see hive.apache.org
false

hive.exec.plan
The path of the Hive execution plan, which will be automatically set to
null in the program

hive.exec.submitviachild
decides whether map/reduce jobs should be submitted using their own separate JVM (Child process), by default, the same JVM as the HQL compiler is used for submission.
false

hive.exec.script.maxerrsize
The maximum number of serialization errors allowed for user scripts executed via TRANSFROM/MAP/REDUCE.
100000

Whether hive.exec.script.allow.partial.consumption
allows the script to process only part of the data, if it is set to true, the unprocessed data caused by broken pipes, etc. will be considered normal.
false

hive.exec.compress.output
determines whether the output of the last map/reduce job in the query is in compressed format.
false

hive.exec.compress.intermediate
determines whether the output of the query's intermediate map/reduce job (intermediate stage) is in compressed format.
It is not necessary for data that is only Mapped. Compression can reduce network IO and improve efficiency.
false

hive.intermediate.compression.codec
The class name of the compression codec for the intermediate map/reduce job (a compression codec may contain multiple compression types), this value may be automatically set in the program. LZO


hive.intermediate.compression.type
The compression type of the intermediate map/reduce job, such as "BLOCK" "RECORD".

hive.exec.reducers.bytes.per.reducer
The average load bytes per reducer.
Changing this parameter can be used to affect the number of Reducers started by hive. By default, each Reducer processes 1G data and
1000000000

hive.exec.reducers.max
The upper limit of the number of reducers.
999

At the statement level of hive.exec.pre.hooks
, the hook class name of the entire HQL statement before execution.
"

At the statement level of hive.exec.post.hooks
, the name of the hook class after the execution of the entire HQL statement is completed.


hive.exec.parallel.thread.number
The number of concurrent threads when submitting concurrently. Default 8
8

Whether hive.mapred.reduce.tasks.speculative.execution
enables the speculative execution of the reducer is the same as mapred.reduce.tasks.speculative.execution.
false

hive.exec.counters.pull.interval
The time in milliseconds that the client pulls progress counters.
1000L

Whether hive.exec.dynamic.partition enables
dynamic partitioning. needs to be turned on
false

After hive.exec.dynamic.partition.mode enables
dynamic partitioning, the dynamic partitioning mode has two optional values, strict and nonstrict. Strict requires at least one static partition column, but nonstrict does not.
It is recommended to set it to nonstrict. In the case of strict, the query table data requires mandatory specified partition.
strict


The maximum number of dynamic partitions allowed by hive.exec.max.dynamic.partitions . Partitions can be added manually.
1000

hive.exec.max.dynamic.partitions.pernode
The maximum number of dynamic partitions allowed by a single reduce node.
100

hive.exec.default.partition.name
The name of the default dynamic partition, which is used when the dynamic partition column is '' or null. ''
'__HIVE_DEFAULT_PARTITION__'

hadoop.bin.path
The path to the Hadoop Client executable script used to submit jobs through a separate JVM, using the Hadoop Client configuration.
$HADOOP_HOME/bin/hadoop

hadoop.config.dir
The path to the Hadoop Client configuration file, using the Hadoop Client configuration.
$HADOOP_HOME/conf

fs.default.name
The URL of the Namenode, using the Hadoop Client configuration.
file:///

map.input.file
The input file for the Map, using the Hadoop Client configuration.
null

mapred.input.dir
Map's input directory, using the Hadoop Client configuration.
null

mapred.input.dir.recursive Whether the
input directory can be nested recursively, using the configuration of Hadoop Client.
false

mapred.job.tracker
The URL of the Job Tracker, using the Hadoop Client configuration, if this configuration is set to 'local', the local mode will be used.
local

mapred.job.name
The job name of Map/Reduce, if not set, the generated job name will be used, and the configuration of Hadoop Client will be used.
null

mapred.reduce.tasks.speculative.execution
Map/Reduce speculative execution, using Hadoop Client configuration.
null

hive.metastore.metadb.dir The
path where the Hive metastore is located.
"

hive.metastore.uris
Hive metadata URI, multiple thrift:// addresses, separated by commas.
"

hive.metastore.connect.retries
Maximum number of retries to connect to the Thrift metadata service.
3

javax.jdo.option.ConnectionPassword
JDO's connection password.
"

hive.metastore.ds.connection.url.hook
The class name of the JDO connection URL Hook, which is used to obtain the connection string of the JDO metastore, and is a class that implements the JDOConnectionURLHook interface.
"

The connection URL for the javax.jdo.option.ConnectionURL
metabase.
"

hive.metastore.ds.retry.attempts
The maximum number of attempts to connect to the background data store when there are no JDO data connection errors.
1

hive.metastore.ds.retry.interval
The time interval in milliseconds between each attempt to connect to the background datastore.
1000

Whether hive.metastore.force.reload.conf
forces the metadata configuration to be reloaded, once reloaded, the value will be reset to false.
false

hive.metastore.server.min.threads
Thrift service thread pool minimum number of threads.
8

hive.metastore.server.max.threads
Thrift service thread pool maximum number of threads.
0x7fffffff

Whether the hive.metastore.server.tcp.keepalive
Thrift service keeps a TCP connection.
true

hive.metastore.archive.intermediate.original Suffix
for the original intermediate directories for archive compression, it doesn't matter what those directories are, as long as conflicts are avoided.
'_INTERMEDIATE_ORIGINAL'

hive.metastore.archive.intermediate.archived
The suffix for the compressed intermediate directories for archive compression, it does not matter what these directories are, as long as conflicts are avoided.
'_INTERMEDIATE_ARCHIVED'

hive.metastore.archive.intermediate.extracted
is used for the suffix of the decompressed intermediate directories for archive compression. It does not matter what these directories are, as long as conflicts can be avoided.
'_INTERMEDIATE_EXTRACTED'

Whether hive.cli.errors.ignore
ignores errors, for SQL files containing many SQL files, you can ignore the wrong line and continue to execute the next line.
false

hive.session.id
The identifier of the current session, in the format of "username_time", is used to record in the job conf, and generally does not need to be set manually.
"

hive.session.silent Whether
the current session is running in silent mode. If it is not in silent mode, the messages typed in the log at the info level will be output to the console in the form of standard error stream.
false

hive.query.string
The query string currently being executed.
"

hive.query.id
The ID of the query currently being executed.
"

hive.query.planid
ID of the map/reduce plan currently being executed.
"

hive.jobname.length
The maximum length of the current job name, hive will omit the middle part of the job name according to this length.
50

hive.jar.path
The path where hive_cli.jar is located when submitting jobs through a separate JVM
"

hive.aux.jars.path
The path where various plug-in jar packages composed of user-defined UDFs and SerDes are located.
"

hive.added.files.path
ADD FILE Path to the added files.
"

hive.added.jars.path
The path to the files added by the ADD JAR.
"

hive.added.archives.path
ADD ARCHIEVE The path to the added files.
"

hive.table.name
The name of the current Hive table, this configuration will be passed into the user script via ScirptOperator.
"

hive.partition.name
The name of the current Hive partition, the configuration will be passed to the user script through the ScriptOperator.
"

Whether the hive.script.auto.progress
script periodically sends a heartbeat to the Job Tracker to avoid the script execution time being too long and the Job Tracker thinking that the script has died.
false

hive.script.operator.id.env.var
The name of the environment variable used to identify the ScriptOperator ID.
'HIVE_SCRIPT_OPERATOR_ID'

hive.alias
The current Hive alias that will be passed into user scripts via ScriptOpertaor.
"

hive.map.aggr
determines whether aggregation operations can be performed on the Map side
true

hive.join.emit.interval
The emission interval for Hive Join operations, in milliseconds.
1000

hive.join.cache.size
Cache size for Hive Join operations, in bytes.
25000

hive.mapjoin.bucket.cache.size
The cache size of the Hive Map Join bucket, in bytes.
100

hive.mapjoin.size.key
The size of each row key of Hive Map Join, in bytes.
10000

hive.mapjoin.cache.numrows
Number of rows cached by Hive Map Join.
25000

hive.groupby.mapaggr.checkinterval Check interval
for Map aggregation of Group By operations, in milliseconds.
100000

hive.map.aggr.hash.percentmemory
The memory percentage of the virtual machine occupied by hash storage aggregated on the Hive Map side.
0.5

hive.map.aggr.hash.min.reduction
The minimum reduce ratio of hash storage for Hive Map-side aggregation.
0.5

hive.udtf.auto.progress Whether
Hive UDTF reports heartbeats periodically, useful when UDTF execution takes a long time and does not output rows.
false

hive.fileformat.check Whether
Hive checks the output file format.
true

hive.querylog.location
The directory where the Hive real-time query log is located. If the value is empty, no real-time query log will be created.
'/tmp/$USER'

hive.script.serde SerDe for
Hive user scripts.
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

hive.script.recordreader
RecordRedaer for Hive user scripts.
'org.apache.hadoop.hive.ql.exec.TextRecordReader'

hive.script.recordwriter RecordWriter for
Hive user scripts.
'org.apache.hadoop.hive.ql.exec.TextRecordWriter'

hive.hwi.listen.host
HOST or IP to which the HWI is bound.
'0.0.0.0'

hive.hwi.listen.port
The HTTP port on which the HWI listens.
9999

hive.hwi.war.file
The path where the war file of the HWI is located.
$HWI_WAR_FILE

hive.test.mode
whether to run Hive in test mode
false

hive.test.mode.prefix
Hive test mode prefix.
'test_'

hive.test.mode.samplefreq
Hive test mode sampling frequency, that is, the number of samples per second.
32

hive.test.mode.nosamplelist
Comma-separated list of exclusions for Hive test mode sampling.
"

hive.merge.size.per.task
The size of the merged file of each task, the number of reducers is determined according to this size, the default is 256 M.
256000000

hive.merge.smallfiles.avgsize
The average size of the small file group to be merged, the default is 16 M.
16000000

Whether hive.optimize.skewjoin
optimizes the join with skewed data, a new Map/Reduce job processing will be started for the skewed join.
false

hive.skewjoin.key
Skew key number threshold, exceeding this value is judged as a skewed Join query.
1000000

hive.skewjoin.mapjoin.map.tasks
handles the upper limit of the number of Maps for Map Join with skewed data.
10000

hive.skewjoin.mapjoin.min.split
The minimum data split size for Map Join processing data skew, in bytes, the default is 32M.
33554432

mapred.min.split.size
Minimum input split size for Map Reduce Job, use the same configuration as Hadoop Client.
1

Whether hive.mergejob.maponly
enables Map Only Merge Job.
true

hive.heartbeat.interval
Heartbeat interval for Hive Job, in milliseconds.
1000

hive.mapjoin.maxsize
Maximum number of rows processed by Map Join. If the number of rows is exceeded, the Map Join process will exit abnormally.
1000000

hive.hashtable.initialCapacity
Hive's Map Join will dump the small table into an in-memory HashTable whose initial size is specified by this parameter.
100000

hive.hashtable.loadfactor
Hive's Map Join will dump the small table into an in-memory HashTable whose load factor is specified by this parameter.
0.75

hive.mapjoin.followby.gby.localtask.max.memory.usage
When MapJoinOperator is followed by GroupByOperator, the maximum usage ratio of memory is
0.55

hive.mapjoin.localtask.max.memory.usage
The maximum proportion of heap memory used by local tasks of Map Join is
0.9

hive.mapjoin.localtask.timeout
Map Join local task timeout, Taobao version-specific feature
600000

hive.mapjoin.check.memory.rows
sets how many rows to check the memory size. If it exceeds hive.mapjoin.localtask.max.memory.usage, it will exit abnormally and Map Join will fail.
100000

Whether hive.debug.localtask
debugs local tasks, currently this parameter does not take effect
false

Whether hive.task.progress enables
counters to record the progress of Job execution, and the client also pulls progress counters.
false

hive.input.format
Hive's input InputFormat.
The default is org.apache.hadoop.hive.ql.io.HiveInputFormat, others are org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

hive.enforce.bucketing
Whether to enable forced bucketing.
false

Whether hive.enforce.sorting
enables forced sorting.
false

hive.mapred.partitioner
Hive 的 Partitioner 类。
'org.apache.hadoop.hive.ql.io.DefaultHivePartitioner'

hive.exec.script.trust
Hive Script Operator For trust
false

Whether hive.hadoop.supports.splittable.combineinputformat
supports splittable CombieInputFormat
false

Whether hive.optimize.cp
optimizes column pruning.
true

Whether hive.optimize.ppd
optimizes predicate pushdown.
true

Whether hive.optimize.groupby
optimizes group by.
true

Whether hive.optimize.bucketmapjoin
optimizes bucket map join.
false

Whether hive.optimize.bucketmapjoin.sortedmerge
tries to use forced sorted merge bucket map join when optimizing bucket map join.
false

Whether hive.optimize.reducedduplication
optimizes reduce redundancy.
true

hive.hbase.wal.enabled
是否开启 HBase Storage Handler。
true

hive.archive.enabled
Whether to enable har files.
false

Whether hive.archive.har.parentdir.settable
enables the parent directory of har files to be settable.
false

Whether hive.outerjoin.supports.filters enables
outer join support filter conditions.
true

hive.fetch.output.serde
for Fetch Task SerDe class
'org.apache.hadoop.hive.serde2.DelimitedJSONSerDe'

hive.semantic.analyzer.hook
Hive semantic analysis Hook, called before and after the semantic analysis phase, used to analyze and modify the AST and the generated execution plan, separated by commas.
null

Whether hive.cli.print.header
displays the column names of the query results, the default is not to display.
false

hive.cli.encoding
Hive default command line character encoding.
'UTF8'

Whether hive.log.plan.progress
logs the progress of executing the plan.
true

Whether hive.pull.progress.counters
pulls counters from Job Tracker, Taobao-specific configuration item.
true

hive.job.pre.hooks
List of Hooks executed before each Job is submitted, separated by commas, Taobao-specific configuration items.
"

hive.job.post.hooks
List of Hooks executed after each Job is completed, separated by commas, Taobao-specific configuration items.
"

hive.max.progress.counters
Hive maximum number of progress counters, Taobao-specific configuration item.
100

hive.exec.script.wrapper
Script Operator The wrapper for script calls, usually a script interpreter. For example, the name of the variable value can be set to "python", then the script passed to the Script Operator will be called with the command "python <script command>", if this value is null or not set, then the The script will be called directly as a "<script command>" command.
null

The hive.check.fatal.errors.interval
client checks the period of serious errors by pulling counters, in milliseconds, a Taobao-specific configuration item.
5000L

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324516139&siteId=291194637