spark notes of Spark SQL

1. Spark SQL Overview
1.1. Spark SQL Past and Present

            Shark is a Spark design for the large-scale data warehouse system, which is compatible with the Hive. Shark built on the code base of the Hive, and by switching out some of the physical implementation plan of the Hive. This method allows the user Shark can accelerate Hive queries, but Shark inherited a large and complex code Hive makes it difficult to optimize and maintain Shark, Shark same time depends on the version of the Spark. As we encounter the upper limit of performance optimization, and integrated SQL of some complex analysis, we found that the design of the Hive's MapReduce framework limits the development of the Shark. Spark Summit 2014 Nian on 7 May 1, Databricks announced the termination of the development of the Shark will put emphasis on Spark SQL.

1.2. What is Spark SQL

spark notes of Spark SQL

Spark Spark SQL is a module for processing structured data, it provides a programming abstraction called a DataFrame and acts as a distributed SQL query engine.

Compared to Spark RDD API, Spark SQL contains structured data and more information on its operation, Spark SQL use this information for additional optimization, the operator of the structured data more efficient and convenient.

There are several ways to use Spark SQL, including SQL, DataFrames API and Datasets API. But no matter what kind of API or programming language, they are all based on the same execution engine, so you can easily switch between different API, they each have their own characteristics, you look like the kind of style.

1.3. Why learn Spark SQL

We have learned Hive, it is converted into the Hive SQL MapReduce then submitted to the cluster to perform, greatly simplifies the complexity of writing MapReduce programs, since this calculation model MapReduce execution efficiency is relatively slow, so Spark SQL came into being, it is to convert Spark SQL into RDD, and then submitted to the cluster to run, the efficiency is very fast!

1. Easy integration

spark notes of Spark SQL

The spark sql queries seamless mixing procedures, can use the API operation java, scala, python, R and other languages.

2. unified data access

spark notes of Spark SQL

In the same manner to any data source.

3. Compatible Hive

spark notes of Spark SQL

Supports syntax hiveSQL of.

4. Standard data connection

spark notes of Spark SQL

You can use industry-standard JDBC or ODBC connection.

Guess you like

Origin blog.51cto.com/14473726/2429343