Hive On Spark

Spark发行版本里自带了Hive，也就是说，使用Hive时，不需要单独的安装Hive?

Spark SQL supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, it is not included in the default Spark assembly. In order to use Hive you must first run “sbt/sbt -Phive assembly/assembly” (or use -Phive for maven). This command builds a new assembly jar that includes Hive. Note that this Hive assembly jar must also be present on all of the worker nodes, as they will need access to the Hive serialization and deserialization libraries (SerDes) in order to access data stored in Hive.Configuration of Hive is done by placing your hive-site.xml file in conf/.

When working with Hive one must construct a HiveContext, which inherits from SQLContext, and adds support for finding tables in in the MetaStore and writing queries using HiveQL. Users who do not have an existing Hive deployment can still create a HiveContext. When not configured by the hive-site.xml, the context automatically creates metastore_db and warehouse in the current directory.

scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> val dbs = hiveContext.sql("show  databases");

///没做操作前只有default
scala> dbs.collect

///枚举所有的数据表
scala>hiveContext.sql("show tables").collect

还可以使用hiveContext的hql语句

scala> import hiveContext._

///创建表
scala> hql("CREATE TABLE IF NOT EXISTS person(name STRING, age INT)")

scala> hql("select * from person");

scala> hql("show tables");

///加载数据,加载数据时，默认的换行符和默认的列分隔符是什么？
///列分隔的语法：row format delimited fields terminated by '/t'

scala> hql("LOAD DATA LOCAL INPATH '/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/data/person.txt' INTO TABLE person;");

问题：

1. 上面的操作，Hive关联的数据库是哪个？

2. 如果已经单独安装了Hive，是否让Spark去操作那个已经存在的Hive？

【Spark十七】： Spark SQL第三部分结合HIVE

Hive On Spark

未完待续

猜你喜欢