关注我的微信公众号：pythonislover,领取python,大数据，SQL优化相关视频资料！~

Python大数据与SQL优化笔 QQ群：771686295

下面说说Ambari2.7+HDP3.0.1 Spark读取Hive数据设置的一些坑，因为最近有人遇到和我同意的问题，所以这里记录下，希望如果有人用和我一样的大数据环境，别再踩这个坑。

Spark Shell下访问Hive的database/table，需要注意2个Spark配置项：

(1) hive.metastore.uris

配置为： thrift://xxxxx:9083

(2) metastore.catalog.default

配置为：Hive

这个选项默认为Spark，即读取SparkSQL自己的metastore_db，修改完后，Spark Shell会去读取Hive的metastore，这样就可以实现以Spark Shell方式访问Hive SQL方式创建的databases/tables.

hive表读取报错，需要关闭Hive 的ACID，disable transactional 模式。

建表的时候要加一个选择。如下：

create table xxx.***(....) stored as orc TBLPROPERTIES('transactional'='false')

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Transaction/LockManager

Ambari Group	Detail property	backend config parameter	config parameter value
Advanced hive-site	Use Locking <uncheck>	hive.support.concurrency	false
Custom hiveserver2-site	hive.enforce.bucketing	hive.enforce.bucketing	false
General	Allow All Partitions to be Dynamic <NO CHANGE>	hive.exec.dynamic.partition.mode	nonstrict
General	Transaction Manager	hive.txn.manager	org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
General & Advanced hivemetastore-site	Run Compactor <uncheck>	hive.compactor.initiator.on	off
General & Advanced hivemetastore-site	Number of Threads Used by Compactor	hive.compactor.worker.threads	0
Advanced hive-interactive-site & advanced hive-site	hive.strict.managed.tables	hive.strict.managed.tables	false

sc = SparkContext()
SparkContext.setSystemProperty("hive.metastore.uris", "thrift://localhost:9083")
sparkSession = (SparkSession
                .builder
                .appName('xxx')
                .master('yarn')
                .config('spark.submit.deployMode', 'cluster')
                .config('spark.eventLog.enabled', 'false')
                .enableHiveSupport()
                .getOrCreate())

spark SQL的使用方式还是不变

sparkSession.sql(xxxxxxx)

Ambari2.7+HDP3.0.1 Spark读取Hive数据

关注我的微信公众号：pythonislover,领取python,大数据，SQL优化相关视频资料！~

猜你喜欢