Notes on Hive

With Notes Some Hive easy to miss the point or problems, is likely to continue after the update.

 

1, Hive using a inputFormat object input stream is divided into records, and then use a outputFormat object to object formatted output stream, and then use a SerDe in reading data recorded analytical columns, when writing data column encoded into records .

2, where the UDF partition field conditions may result in a full table scan, such as a timestamp conversion function, in the progressive conversion.

3, sub-barrel

4, the view may be used to reduce the complexity of the query, based on the condition limiting filter data.

5, Hive can create an index, but the basic need.

6、Explain Extended

7, in parallel JVM reuse

8、Groovy

9、HCatalog

10、 

Hive CLI will create a file called .history such as components, as well as some of the entries will be created in the / tmp directory and hadoop.tmp.dir directory locally.

Hive metadata store may be connected to a direct connection or connected through JDBC database Thrift, which require the use of various operations of the user's identity.

Hadoop File user rights model (user, group, other), and many other database user privilege model are very different, usually authorization and permission to carry out recovery operations access control table row or column-level database.

hive.files,umask.value 

hive.metastore.authorization.storage.checks

hive.metastore.executr.setugi

hive licensing mode by default does not open hive.security.authorization.enabled hive.security.authorization.createtable.owner.grants      

hive.start.cleanup.scratchdir default false, set to true, then each will clean out temporary directory service restart HiveServer

Properties can be set as the default authorization

11, Hive is a fat client web cli thrift examples are not completely independent of other instances

12, Hive binding zk achieve lock hive.zookeeper.quorum hive.support.concurrency   

13、HiveServiceBAction 

14, a compressed archive

15、Hive Streaming      Transform 

Streaming efficiency is usually lower than the UDF write or rewrite the way InputFormat object serialization deserialization inefficient, difficult to debug.

16, RegexSerDe CSVSerDe JSONSerDe

Guess you like

Origin www.cnblogs.com/GodMode/p/11789726.html