Understanding Spark SQL (a) - CLI and ThriftServer

Spark SQL provides two main tools to access the data in the hive, that CLI and ThriftServer. Spark premise is the need to support Hive, need to bring hive and the hive-thriftserver option that is compiled Spark, also need to ensure that hive-site.xml configuration file (can be copied from the hive over) at $ SPARK_HOME / conf directory. In the main configuration file is configured hive metastore of URI (Spark CLI and ThriftServer need) and ThriftServer related configuration items (such as hive.server2.thrift.bind.host, hive.server2.thrift.port, etc.). Note If you are running Hive ThriftServer and Spark ThriftServer on the machine, or else hive.server2.thrift.port the port configuration port and spark hive.server2.thrift.port configuration hive of the same, to avoid simultaneous start when the port conflict.

All you need to start hive metastore before starting the CLI and ThriftServer. Run the following command to start:

[root@BruceCentOS ~]# nohup hive --service metastore &

After a successful start, the process of a RunJar will appear, and listens on port 9083 (the default port hive metastore).

 

 

 First look CLI, CLI is used by spark-sql script. Execute the following command:

[root@BruceCentOS4 spark]# $SPARK_HOME/bin/spark-sql --master yarn

Spark above command starts a yarn client application after execution of pattern, as shown below:

  At the same time it connects to the hive metastore, you can run hive sql statement in spark-sql prompt that appears>, for example:

  Wherein each input and executes a SQL statement is equivalent to the implementation of a Spark Job, as shown:

  That spark-sql script execution will start a yarn clien model Spark Application, then there spark-sql> prompt, each SQL statement will be executed at the prompt of a Job in Spark, but all the same the corresponding a Application. This Application will run, you can continue to enter SQL statements executed Job, until the input "quit;", then it will exit the spark-sql, namely Spark Application is finished.

 

Another better method used by the SQL Spark ThriftServer, you first need to ThriftServer Spark, and then by using the SQL Spark beeline write their own program or at Spark way through JDBC.

Start Spark ThriftServer the following command:

[root@BruceCentOS4 spark]# $SPARK_HOME/sbin/start-thriftserver.sh --master yarn

After executing the above command, it generates a SparkSubmit process actually starts a yarn client Spark Application mode, as shown below:

  And it provides a JDBC / ODBC interface, the user can connect via ThriftServer JDBC / ODBC interface to access Spark SQL data. Spark beeline specifically provided by or connected using JDBC ThriftServer in the program. For example, after starting Spark ThriftServer, Spark SQL can access data using the following command beeline.

[root@BruceCentOS3 spark]# $SPARK_HOME/bin/beeline -n root -u jdbc:hive2://BruceCentOS4.Hadoop:10003

 Above beeline connected to port 10003 on BruceCentOS4, i.e. Spark ThriftServer. All connections to the client beeline ThriftServer or JDBC programs share the same Spark Application, SQL equivalent to submit and execute a Job Application to the execution by beeline or JDBC program. At the prompt, enter "! Exit" command to exit beeline.

Finally, if you want to stop ThriftServer (stop Spark Application), you need to perform the following command:

[root@BruceCentOS4 spark]# $SPARK_HOME/sbin/stop-thriftserver.sh

 

 In summary, the CLI and ThriftServer Spark SQL, the latter more recommended, because the latter is more lightweight, requires only a start ThriftServer (corresponding to a Spark Application) can be a client or a plurality beeline JDBC client program end the use of SQL, while the former start a CLI started a Spark Application, you can only give one user.

 

 

 

 

Guess you like

Origin www.cnblogs.com/roushi17/p/sparksql_cli_thriftserver.html