Spark SQL之分布式SQL引擎

Spark SQL还可以使用JDBC/ODBC或命令行接口充当分布式查询引擎。在这种模式下，终端用户或应用程序可以直接与Spark SQL交互来运行SQL查询，而不需要编写任何代码。

Running the Thrift JDBC/ODBC server

这里实现的Thrift JDBC/ODBC服务器对应于Hive 1.2.1中的HiveServer2。您可以使用Spark或Hive 1.2.1附带的beeline脚本测试JDBC服务器。
要启动JDBC/ODBC服务器，请在Spark目录中运行以下命令:

./sbin/start-thriftserver.sh

这个脚本接受所有bin/spark-submit命令行选项，加上一个——hiveconf选项来指定Hive属性。您可以运行./sbin/start-thriftserver.sh --help 获得所有可用选项的完整列表。默认情况下，服务器监听localhost:10000。您可以通过任何环境变量覆盖此行为，即:

export HIVE_SERVER2_THRIFT_PORT=<listening-port>
export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
./sbin/start-thriftserver.sh \
  --master <master-uri> \
  ...

或系统属性:

./sbin/start-thriftserver.sh \
  --hiveconf hive.server2.thrift.port=<listening-port> \
  --hiveconf hive.server2.thrift.bind.host=<listening-host> \
  --master <master-uri>
  ...

现在就可以使用beeline来测试Thrift JDBC/ODBC服务器:

./bin/beeline

直接连接JDBC/ODBC服务器:

beeline> !connect jdbc:hive2://localhost:10000

Beeline会询问您的用户名和密码。在非安全模式下，只需在计算机上输入用户名和空白密码。为了安全模式，请遵循beeline文档中的说明。

Hive 的配置是通过在conf/目录中放置hive-site.xm、core-site.xml和hdfs-site.xml文件来完成的。
您还可以使用Hive附带的beeline脚本。

Thrift JDBC server还支持通过HTTP传输发送Thrift RPC消息。使用以下设置来启用HTTP模式作为系统属性或在conf/中的hive-site.xml文件中:

hive.server2.transport.mode - Set this to value: http
hive.server2.thrift.http.port - HTTP port number to listen on; default is 10001
hive.server2.http.endpoint - HTTP endpoint; default is cliservice

要进行测试，请使用beeline在http模式下连接JDBC/ODBC服务器，使用:

beeline> !connect jdbc:hive2://<host>:<port>/<database>?hive.server2.transport.mode=http;hive.server2.thrift.http.path=<http_endpoint>

Running the Spark SQL CLI

Spark SQL CLI是一种方便的工具，可以在本地模式下运行Hive metastore服务并执行命令行输入的查询。请注意，Spark SQL CLI不能与Thrift JDBC服务器通信。
要启动Spark SQL CLI，请在Spark目录中运行以下命令:

./bin/spark-sql

Hive 的配置是通过在conf/目录中放置hive-site.xm、core-site.xml和hdfs-site.xml文件来完成的。您可以运行./bin/spark-sql --help 获得所有可用选项的完整列表。