The meaning of the configuration items in the Hive configuration file is explained in detail

Almost all the configuration items of hive are listed in this. The following question only mentions the functions of several configuration items. For more information, you can view the content.
Question guide:
1. What is the configuration item of the hive output format?
2. How to configure hive to be called by various languages?
3. Is the job submitted by hive in hive or hadoop?
4. Is the output of the last map/reduce task of a query compressed? Which configuration item is passed?
5. When the user customizes UDF or SerDe, the jars of these plug-ins must be placed in this directory, through which configuration item?
6. The size of each reducer is 1G by default. If the input file is 10G, there will be 10 reducers; which configuration item can be configured?
7. Does the group by operation allow the data to be tilted, through which configuration item configuration?
8. How to configure the memory usage of map/reduce in local mode?
9. The number of rows cached in memory when doing table join, the default is 25000; which configuration item can be modified?
10. Is the join optimization of data skew enabled? Which configuration item can be optimized?
11. When parallel operation is enabled, how many jobs are allowed to be calculated at the same time, the default is 8; how to modify this configuration item?

Hive configuration:

hive.ddl.output.format: the output format of hive ddl statement, the default is text, plain text, and json format. This is a new configuration released after 0.90;
hive.exec.script.wrapper: the wrapper when hive calls the script. The default is null. If it is set to python, then the statement will become python <script command> when doing the script call operation. If it is null, it will directly execute the <script command>;
hive.exec.plan: the file path of the hive execution plan, the default is null, it will be automatically set at runtime, like hdfs://xxxx/xxx/xx;
hive.exec.scratchdir: hive is used to store different The directory of the execution plan of the map/reduce of the stage, and also stores the intermediate output results. The default is /tmp/<user.name>/hive. We will actually divide it by group, and then build a tmp directory for storage in the group;
hive .exec.submitviachild: In non-local mode, decide whether hive should execute map/reduce in a separate jvm; the default is false, which means that the default map/reduce job is submitted on the hive jvm;
hive.exec .script.maxerrsize: When the user calls transform or map or reduce to execute a script, the maximum number of serialization errors, the default is 100000, generally do not need to be modified;
hive.exec.compress.output: the output of the last map/reduce task of a query The flag of whether to be compressed, the default is false, but it is generally turned on as true. If it is beneficial, save space and increase io when CPU pressure is not considered;
hive.exec.compress.intermediate: similar to the previous one, in one Whether the output of the map/reduce task in the middle of the query should be compressed, the default is false,
hive.jar.path: When using an independent jvm to submit a job, the location of hive_cli.jar has no default value;
hive.aux.jars.path: When the user customizes UDF or SerDe, the jars of these plug-ins must be placed To this directory, there is no default value;
hive.partition.pruning: When the compiler finds that a partition table is used in a query statement but does not provide any partition predicate for query, an error is thrown to protect the partition table, the default is nonstrict; ( To be refined after reading the source code, there is very little information on the Internet)
hive.map.aggr: whether the map-side aggregation is enabled, it is enabled by default;
hive.join.emit.interval: the setting of how many lines are cached for the rightmost operation of the join before issuing the join result Set, the default is 1000; there is a bugfix in hive jira that sets the value too small;
hive.map.aggr.hash.percentmemory: the percentage of memory occupied by the hash table during map-side aggregation, the default is 0.5, this is enabled on the map-side aggregation For later use,
hive.default.fileformat: the default file format of the CREATE TABLE statement, the default is TextFile, and the other options are SequenceFile, RCFile and Orc;
hive.merge.mapfiles: merge small files at the end of the map-only job, default Enable true;
hive.merge.mapredfiles: merge small files after a map/reduce job is over, false is not enabled by default;
hive.merge.size.per.task: the size of merged files at the end of the job, default 256MB;
hive.merge.smallfiles.avgsize: When the output file of the job is smaller than this value, an additional map/reduce job is started to merge small files into large files. The basic threshold for small files. Setting a larger point can reduce the number of small files. mapfiles and mapredfiles are true, the default value is 16MB;
mapred.reduce.tasks: the number of reduce tasks for each job, the default is the configuration of hadoop client 1;
hive.exec.reducers.bytes.per.reducer: each reducer The size, the default is 1G, if the input file is 10G, then 10 reducers will be
created ; hive.exec.reducers.max: the maximum number of reducers, if the mapred.reduce.tasks is set to a negative value, then hive will take This value is used as the maximum possible value of reducers. Of course, it also depends on the size obtained by (input file size/hive.exec.reducers.bytes.per.reducer), and the smaller value is used as the number of reducers. The default value of hive is 999;
hive.fileformat.check: load data file Whether to verify the file format, the default is true;
hive.groupby.skewindata: whether the group by operation allows data skew, the default is false, when set to true, the execution plan will generate two map/reduce jobs, the first MR The result of the map will be randomly distributed to reduce in order to achieve the purpose of load balancing to solve the data skew.
hive.groupby.mapaggr.checkinterval: When the map side is doing aggregation, the number of data rows allowed by the key of the group by exceeds this value. Split, the default is 100000;
hive.mapred.local.mem: in local mode, the memory usage of map/reduce, the default is 0, which means unlimited;
hive.mapjoin.followby.map.aggr.hash.percentmemory: the memory of the hash table during map-side aggregation This setting restricts group by to be performed after map join, otherwise use hive.map.aggr.hash.percentmemory to confirm the memory percentage, the default value is 0.3;
hive.map.aggr.hash.force.flush.memeory.threshold : The maximum available memory of the hash table during map aggregation. If this value is exceeded, the data will be flushed. The default is 0.9;
hive.map.aggr.hash.min.reduction: If the ratio of the capacity of the hash table to the number of input rows exceeds this value Number, then the hash aggregation on the map side will be turned off, the default is 0.5, set to 1 to ensure that the hash aggregation will never be turned off;
hive.optimize.groupby: whether to do bucket group by when doing partition and table query, the default is true ;
Hive.multigroupby.singlemr: output multiple group bys into a single map/reduce task plan, of course, the constraint is that the group by has the same key, the default is false;
hive.optimize.cp: column cropping, the default is true when doing a query read only the columns used, this is a useful optimization;
hive.optimize.index.filter: automatically uses the index, the default is not open false;
hive.optimize.index.groupby: whether to use a clustered index to optimize group -by query, false is closed by default;
hive.optimize.ppd: Whether to support predicate pushdown, which is enabled by default; the so-called predicate pushdown, moves the predicate in the WHERE clause of the outer query block into the lower query block (such as the view) contained in it, so as to be able to proceed earlier Data filtering and possible better use of indexes.
hive.optimize.ppd.storage: When the predicate pushdown is enabled, whether the predicate is pushed down to the storage handler, it is enabled by default, and does not work when the predicate pushdown is off;
hive.ppd.recognizetransivity: whether the origin is repeated under the condition of equal join The predicate filter is enabled by default;
hive.join.cache.size: the number of rows cached in memory when doing a table join, the default is 25000;
hive.mapjoin.bucket.cache.size: each key of the memory cache during mapjoin How many values ​​to store, the default is 100;
hive.optimize.skewjoin: whether to enable the join optimization of data skew, the default is not to enable false;
hive.skewjoin.key: the threshold for judging data skew, if the same key is found in the join to exceed this The value is considered to be that the key is a skewed join key, the default is 100000;
hive.skewjoin.mapjoin.map.tasks: the number of maps of the map join when the data is skewed join, the default is 10000;
hive.skewjoin.mapjoin.min. split: The minimum split size of the map task of the map join when the data is skewed join, the default is 33554432, this parameter should be used in conjunction with the above parameters for fine-grained control;
hive.mapred.mode: The mode when the hive operation is executed, the default is nonstrict non-strict mode, if it is strict mode, many risky queries will be prohibited from running, such as Cartesian product join and dynamic partition;
hive.exec.script .maxerrsize: The maximum number of bytes that a map/reduce task is allowed to print to standard error. In order to prevent the script from filling the partition log, the default is 100000;
hive.exec.script.allow.partial.consumption: whether hive allows scripts not It exits successfully after reading any content from standard input, and false is closed by default;
hive.script.operator.id.env.var: when the user uses the transform function to customize map/reduce, the environment variable that stores the unique script identifier Name, default HIVE_SCRIPT_OPERATOR_ID;
hive.exec.compress.output: control whether the query result output of hive is compressed, the compression method is configured in mapred.output.compress of hadoop, and the default is not compressed false;
hive.exec.compress.intermediate: control Whether the intermediate results of hive queries are compressed, the same as the above configuration, the default is not to compress false;
hive.exec.parallel: whether the execution of hive jobs are executed in parallel, the default is not to enable false, in many operations such as join, there is no between subqueries The association can run independently. In this case, turning on parallel computing can greatly speed up;
hvie.exec.parallel.thread.number: When parallel computing is turned on, how many jobs are allowed to be calculated at the same time, the default is 8;
hive.exec.rowoffset: Whether to provide a virtual column for the row offset, the default is false and not provided, Hive has two virtual columns: one is INPUT__FILE__NAME, which means the path of the input file, and the other is BLOCK__OFFSET__INSIDE__FILE, which means that it is recorded in the file Block offset, which is very helpful for troubleshooting queries that do not meet expectations or null results;
hive.task.progress: controls whether hive periodically updates the task progress counter during execution . Turning on this configuration can help the job The tracker better monitors the execution of tasks, but it will bring a certain performance loss. When the dynamic partition flag hive.exec.dynamic.partition is turned on, this configuration is automatically turned on;
hive.exec.pre.hooks: execution preconditions , A comma-separated list of java classes that implement the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface. After the configuration is configured, the pre-execution hook must be executed before each hive task is executed. The default Is empty;
hive.exec.post.hooks: Same as above, hooks after execution, the default is empty;
hive.exec.failure.hooks: Same as above, hooks when abnormal, executes when an exception occurs in the program, default is empty;
hive.mergejob. maponly: Attempt to generate a map-only task for merge, provided that CombineHiveInputFormat is supported, and the default is true;
hive.mapjoin.smalltable.filesize: the mapjoin threshold of the input table file, if the size of the input file is less than this value, it will try to merge the normal Join is converted to mapjoin, the default is 25MB;
hive.mapjoin.localtask.max.memory.usage: The maximum amount of key/value in the hash table when the mapjoin local task is executed. If this value is exceeded, the local task will automatically exit. The default is 0.9;
hive.mapjoin.followby.gby.localtask .max.memory.usage: similar to the above, except that if there is a group by after mapjoin, this configuration controls the upper limit of the local memory capacity of a query like this, the default is 0.55;
hive.mapjoin.check.memory.rows: in The memory usage check is performed after calculating how many lines, the default is 100000;
hive.heartbeat.interval: the time interval for sending heartbeats, used in mapjoin and filter operations, the default is 1000;
hive.auto.convert.join: according to the size of the input file Decide whether to convert a normal join to an optimization of mapjoin, and false is not enabled by default;
hive.cript.auto.progress: Whether hive's transform/map/reduce script is executed or not automatically send progress information to TaskTracker to avoid unresponsive tasks It was killed by mistake. Originally, when the script was output to standard error, the progress information was sent, but after turning on this option, output to standard error will not cause the information to be sent, so it may cause the script to have an infinite loop, but TaskTracker does not check The loop continues;
hive.script.serde: SerDe constraint when the user script converts input to output, the default is org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe;
hive.script.recordreader: the default reader when reading data from the script, the default is org.apache.hadoop.hive.ql.exec.TextRecordReader;
hive.script.recordwriter: the default writer when writing data to the script, the default is org. apache.hadoop.hive.ql.exec.TextRecordWriter;
hive.input.format: input format, the default is org.apache.hadoop.hive.ql.io.CombineHiveInputFormat, if there is a problem, you can use org.apache.hadoop instead. hive.ql.io.HiveInputFormat;
hive.udtf.auto.progress: Whether hive sends progress information to TaskTracker when UDTF is executed, the default is false;
hive.mapred.reduce.tasks.speculative.execution: Reduce task speculative execution is enabled, The default is true;
hive.exec.counters.pull.interval: the time interval for the job to poll the JobTracker during running, a small setting will affect the load of the JobTracker, a large setting may not see the information of the running task, you need to balance it, the default is 1000 ;
Hive.enforce.bucketing: Whether data bucketing is enforced, the default is false, if enabled, the bucketing will be started when writing table data,

hive.enforce.sorting: when forced sorting is enabled, inserting data into the table will proceed Mandatory sorting, the default is false;

hive.optimize.reducededuplication: If the data has been aggregated according to the same key, then remove the redundant map/reduce jobs. This configuration is the recommended configuration of the document, it is recommended to open, the default is true;

hive.exec.dynamic.partition: Whether to support dynamic partition in DML/DDL, the default is false;

hive.exec.dynamic.partition.mode: default strict, in strict mode, the use of dynamic partition must be confirmed by a static partition, other partitions can be dynamic ;

Hive.exec.max.dynamic.partitions: the upper limit of dynamic partitions, the default is 1000;

hive.exec.max.dynamic.partitions.pernode: the maximum number of dynamic partitions that each mapper/reducer node can create, the default is 100;

hive. exec.max.created.files: The maximum number of HDFS files that can be created by a mapreduce job, the default is 100000;

hive.exec.default.partition.name: When dynamic partitioning is enabled, if the data column contains null or empty strings , The data will be inserted into this partition, the default name is __HIVE_DEFAULT_PARTITION__;

hive.fetch.output.serde: the SerDe required when FetchTask serializes fetch output, the default is org.apache.hadoop.hive.serde2.DelimitedJSONSerDe;

hive.exec .mode.local.auto: Whether hive decides whether to run in local mode automatically, the default is false,

hive.exec.drop.ignorenoneexistent: If the table or view does not exist when dropping a table or view, whether an error is reported, the default is true;

hive.exec.show.job.failure.debug.info: whether to provide one when the job fails Task debug information, the default is true;


hive.auto.progress.timeout: the time interval to run the automatic progressor, the default is 0, which is equivalent to forever;


hive.table.parameters.default: the default value of the attribute field of the new table, the default is empty ;


Hive.variable.substitute: Whether to support variable substitution, if enabled, support syntax such as ${var} ${system:var} and ${env.var}, the default is true;

hive.error.on.empty.partition : Whether to report an error when encountering a dynamic partition with an empty result, the default is false;

hive.exim.uri.scheme.whitelist: A whitelist list provided when importing and exporting data, the list items are separated by commas, the default hdfs ,pfile;

hive.limit.row.max.size: the literal meaning is the minimum amount of row data guaranteed when using limit for subset query of data, the default is 100000;

hive.limit.optimize.limit.file: easy to use limit When querying a data subset, the maximum number of files that can be sampled, the default is 10;

hive.limit.optimize.enable: Whether to enable the optimization option when using simple limit to sample data, the default is false. Regarding the optimization problem of limit, it is explained in the hive programming book that this feature has a drawback, and the uncertainty of sampling is given the risk warning;

hive.limit.optimize.fetch.max: maximum number of rows sampled data using a simple limit allowed by default 50,000 queries query is limited, insert unaffected;

hive.rework.mapredwork: whether redo mapreduce, default Is false;

hive.sample.seednumber: the number used to distinguish the sample, the default is 0;

hive.io.exception.handlers: the list of io exception handling handlers, the default is empty, when the record reader io exception occurs, these handlers To handle exceptions;

hive.autogen.columnalias.prefix.label: When the prefix of column aliases is automatically generated during execution, when aggregate functions like count work, if count(a) as xxx is not explicitly specified, then the default It will be added starting from the number of the column position. For example, the result of the first count will be prefixed with the column name _c0, and so on. The default value is _c. Many people should have seen this during the data development process. Alias;

hive.autogen.columnalias.prefix.includefuncname: Whether to include the function name when automatically generating column aliases, the default is false;

hive.exec.perf.logger: The name of the log class responsible for recording client performance indicators. It must be a subclass of org.apache.hadoop.hive.ql.log.PerfLogger. The default is org.apache.hadoop.hive.ql. log.PerfLogger;

hive.start.cleanup.scratchdir: Whether to clear the scratch directory of hive when starting the hive service, the default is false;

hive.output.file.extension: the output file extension, the default is empty;

hive.insert.into .multilevel.dirs: whether to insert into a multilevel directory, the default is false;

hive.files.umask.value: the dfs.umask value when hive creates a folder, the default is 0002;

hive.metastore.local: control whether hive is connected to one The remote metastore server still opens a local client jvm, the default is true, Hive0.10 has cancelled this configuration item;

javax.jdo.option.ConnectionURL: JDBC connection string, default jdbc:derby:;databaseName=metastore_db;create= true;

javax.jdo.option.ConnectionDriverName: JDBC driver, default org.apache.derby.jdbc.EmbeddedDriver;

javax.jdo.PersisteneManagerFactoryClass: the name of the class that implements JDO PersistenceManagerFactory, default org.datanucleus.jdo.JDOPersistenceManagerFactory; javax.jdo.option.DetachAllOnCommit

: detach all submitted objects after the transaction is submitted, the default is true;

javax.jdo.option.NonTransactionalRead : Whether to allow non-transactional reading, the default is true;

javax.jdo.option.ConnectionUserName: username, the default APP;

javax.jdo.option.ConnectionPassword: password, the default mine;

javax.jdo.option.Multithreaded: whether to support concurrent access metastore, the default is true;

datanucleus.connectionPoolingType: use connection pool to access the JDBC metastore, the default is DBCP;

datanucleus.validateTables: check whether there is a table schema, the default is false;

datanucleus.validateColumns: check whether there is a column schema, the default is false ;

Datanucleus.validateConstraints: check whether there is a constraint schema, the default is false;

datanucleus.stroeManagerType: the metadata storage type, the default is rdbms;

datanucleus.autoCreateSchema: whether to automatically create the necessary schema when it does not exist, the default is true;

datanucleus.aotuStartMechanismMode: if the metadata table is incorrect, an exception is thrown, the default is checked;

datanucleus.transactionIsolation: the default transaction isolation level, the default is read-committed;

datanucleus.cache.level2: use the second-level cache, the default is false;

datanucleus.cache.level2.type: the type of the second-level cache, there are two types, SOFT: soft reference, WEAK: weak reference, the default is SOFT ;

Datanucleus.identifierFactory: the name of the id factory production table and column name, the default is datanucleus;

datanucleus.plugin.pluginRegistryBundleCheck: the behavior when the plugin is found and repeated, the default is LOG;

hive.metastroe.warehouse.dir: the data warehouse Location, the default is /user/hive/warehouse;

hive.metastore.execute.setugi: non-secure mode, set to true will make metastore perform DFS operations with the user and group permissions of the client, the default is false, this attribute requires the server and The client is set at the same time;

hive.metastore.event.listeners: a list of event listeners for metastore, separated by commas, and empty by default;

hive.metastore.partition.inherit.table.properties: the key list that is automatically inherited when creating a new partition, the default is empty;

hive.metastore.end.function.listeners: the list of listeners at the end of the metastore function execution, the default is empty;

hive.metastore.event.expiry.duration: the expiration time of the event in the event table, the default is 0;

hive.metastore.event.clean.freq: the running period of the timer for cleaning expired events in the metastore, the default is 0;

hive. metastore.connect.retries: The number of retries when creating a metastore connection, the default is 5;

hive.metastore.client.connect.retry.delay: the time the client waits for continuous retry connections, the default is 1;

hive.metastore. client.socket.timeout: client socket timeout, the default is 20 seconds;

hive.metastore.rawstore.impl: the storage implementation class of the original metastore, the default is org.apache.hadoop.hive.metastore.ObjectStore;

hive.metastore.batch .retrieve.max: The maximum number of records that can be retrieved from the metastore in a batch acquisition, the default is 300;

hive.metastore.ds.connection.url.hook: the name of the hook when looking up the JDO connection url, the default is javax. jdo.option.ConnectionURL;

hive.metastore.ds.retry.attempts: the number of times to retry the connection when a connection error occurs, the default is 1;

hive.metastore.ds.retry.interval: the interval time for metastore to retry the connection, the default is 1000 milliseconds;

hive. metastore.server.min.threads: the minimum number of worker threads in the thrift service pool, the default is 200;

hive.metastore.server.max.threads: the maximum number of threads, the default is 100000;

hive.metastore.server.tcp.keepalive : Whether the metastore server is open for long connections. Long connections can prevent the accumulation of semi-connections. The default is true;

hive.metastore.sasl.enabled: the security policy of the metastore thrift interface. When enabled, the SASL encryption interface is used. The client must use Kerberos. Mechanism authentication, the default is not to enable false;

hive.metastore.kerberos.keytab.file: After opening sasl, the kerberos keytab file storage path, the default is empty;

hive.metastore.kerberos.principal: Kerberos principal, _HOST part will be Dynamic replacement, the default is hive-metastore/[email protected];

hive.metastore.cache.pinobjtypes: the supported metastore object types in the cache, separated by commas, the default is Table, StorageDescriptor, SerDeInfo, Partition, Database, Type ,FieldSchema,Order;

hive.metastore.authorization.storage.checks: When doing operations similar to drop partition, whether the metastore needs to authenticate permissions, the default is false;

hive.metastore.schema.verification: force the schema consistency of the metastore, if it is enabled, it will be verified in the metastore The version of the information stored in it is consistent with the version in the hive jar package, and the automatic schema migration is turned off. The user must manually upgrade hive and migrate the schema. If it is turned off, it will only give a warning when the version is inconsistent. The default is false. ;

Hive.index.compact.file.ignore.hdfs: The hdfs address stored in the index file will be ignored at runtime, if it is enabled; if the data is migrated, the index file is still available, the default is false;

hive.optimize .index.filter.compact.minsize: The minimum input size automatically applied by the compressed index, the default is 5368709120;

hive.optimize.index.filter.compact.maxsize: Same as above, the opposite meaning, if a negative value represents positive infinity, the default is- 1;

hive.index.compact.query.max.size: The maximum amount of data that can be retrieved by a query using a compressed index, the default is 10737418240 bytes; a negative value represents infinity;

hive.index.compact.query.max.entries : The maximum number of index items that can be read when using a compressed index query, the default is 10000000; a negative value represents infinity;

hive.index.compact.binary.search: Whether to enable binary search in the index table for index item query, the default is true ;

hive.exec.concatenate.check.index: If set to true, then an error will be thrown when doing ALTER TABLE tbl_name CONCATENATE on a table/partition (with index); it can help users avoid index deletion and reconstruction;

hive. stats.dbclass: the database that stores hive temporary statistics, the default is jdbc:derby;

hive.stats.autogather: automatically collects statistics during the insert overwrite command, the default is true;

hive.stats.jdbcdriver: the database temporarily stores hive statistics the jdbc driver;

hive.stats.dbconnectionstring: temporary database statistics connection string, the default jdbc: Derby: the databaseName = TempStatsStore; = Create to true;

hive.stats.defaults.publisher: jdbc or not if dbclass HBase, then use this as the default release, StatsPublisher must implement the interface, default is empty;

hive.stats.defaults.aggregator: If dbclass not jdbc or hbase, then make use of such aggregation required to achieve StatsAggregator interfaces, the default is empty;

hive.stats.jdbc.timeout: The jdbc connection timeout configuration, the default is 30 seconds;

hive.stats.retries.max: the maximum number of retries when an exception occurs when the statistics is published and aggregated in the update database, the default is 0, no retry;

hive.stats.retries.wait: the waiting window between the number of retries, the default is 3000 milliseconds;

hive.client.stats.publishers: the list of statistical publishing classes of jobs that do count, separated by commas, the default is empty; required Implement the org.apache.hadoop.hive.ql.stats.ClientStatsPublisher interface;

hive.client.stats.counters: useless~~~

hive.security.authorization.enabled: whether the hive client is authenticated, the default is false;

hive.security .authorization.manager: The management class for hive client authentication, the default is org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider; user-defined should implement org.apache.hadoop.hive.ql.security.authorization. HiveAuthorizationProvider;

hive.security.authenticator.manager: Hive client authorization management class, the default is org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator; user-defined needs to implement org.apache.hadoop.hive.ql.security .HiveAuthenticatorProvider;

hive.security.authorization.createtable.user.grants: automatically authorize the user when the table is created, the default is empty;

hive.security.authorization.createtable.group.grants: Same as above, automatically authorized to groups, the default is empty;

hive.security.authorization.createtable.role.grants: Same as above, automatically authorized to roles, default is empty;

hive.security. authorization.createtable.owner.grants: Same as above, automatically authorized to the owner, the default is empty;

hive.security.metastore.authorization.manager: Metastore authentication management class, the default is org.apache.hadoop.hive.ql.security.authorization .DefaultHiveMetastoreAuthorizationProvider; user-defined must implement the org.apache.hadoop.hive.ql.security.authorization.HiveMetastoreAuthorizationProvider interface; interface parameters must include the org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider interface; use HDFS permissions Control authentication instead of hive's grant-based approach;

hive.security.metastore.authenticator.manager: Metastore authorization management class, the default is org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator, custom ones must implement org.apache.hadoop.hive.ql.security.HiveAuthenticatorProvider Interface;

hive.metastore.pre.event.listeners: a list of event listener classes executed before the metastore does any database operations;

fs.har.impl: the implementation classes for accessing Hadoop Archives, which are incompatible with versions lower than hadoop 0.20, default It is org.apache.hadoop.hive.shims.HiveHarFileSystem;

hive.archive.enabled: whether to allow archive operations, the default is false;

hive.archive.har.parentdir.settable: when creating HAR files, there must be a parent directory, which is required Manual setting, it will be supported in the new hadoop version, the default is false;

hive.support.concurrency: whether hive supports concurrency, the default is false, if you support read-write lock, you must start zookeeper;

hive.lock.mapred.only.operation : Control whether to lock during query, the default is false;

hive.lock.numretries: the number of retries attempted when acquiring the lock, the default is 100;

hive.lock.sleep.between.retries: the sleep time between retry, The default is 60 seconds;

hive.zookeeper.quorum: zk address list, the default is empty;

hive.zookeeper.client.port: the connection port of the zk server, the default is 2181;

hive.zookeeper.session.timeout: the session timeout time of the zk client, the default is 600000 ;

Hive.zookeeper.namespace: the parent node after all zk nodes are created, the default is hive_zookeeper_namespace;

hive.zookeeper.clean.extra.nodes: clear all extra nodes at the end of the session;

hive.cluster.delegation.token.store. class: The storage implementation class of the proxy token, the default is org.apache.hadoop.hive.thrift.MemoryTokenStore, which can be set to org.apache.hadoop.hive.thrift.ZooKeeperTokenStore to do the load balancing cluster;

hive.cluster.delegation.token .store.zookeeper.connectString: zk's token storage connection string, the default is localhost:2181;

hive.cluster.delegation.token.store.zookeeper.znode: the node and path of token storage, the default is /hive/cluster/delegation;

hive.cluster.delegation.token.store.zookeeper.acl: The ACL for token storage, the default is sasl:hive/[email protected]:cdrwa,sasl:hive/[email protected]:cdrwa;

hive.use.input .primary.region: When creating a table from an input table, create this table to the main region of the input table, the default is true;

hive.default.region.name: the name of the default region, the default is default;

hive.region.properties : The default file system and jobtracker of the region, the default is empty;

hive.cli.print.header: query whether to print the name and column when outputting, the default is false;
hive.cli.print.current.db: whether in the prompt of hive Contains the current db, the default is false;

hive.hbase.wal.enabled: whether to force the wal log to be written when writing to hbase, the default is true;

hive.hwi.war.file: the path of the war file that hive uses on the web interface, The default is lib/hive-hwi-xxxx(version).war;

hive.hwi.listen.host: the host address that hwi is listening on, the default is 0.0.0.0;

hive.hwi.listen.port: the port that hwi is listening on, the default is 9999;

hive.test.mode: Whether hive is running in test mode, the default is false;

hive.test.mode.prefi x: when running in test mode, the prefix string of the table, the default is test_;

hive.test.mode.samplefreq: if hive is running in test mode and the table is not binned, what is the sampling frequency , The default is 32;

hive.test.mode.nosamplelist: the list of tables that will not be sampled when the test mode is running, the default is empty;

Guess you like

Origin blog.csdn.net/Baron_ND/article/details/113631977