1. SparkSQLDriver
For each SQL statement, except those defined by CommandFactory, such as dfs, a SparkSQLDriver object is created, and then its init method and run method are called.
override def run(command: String): CommandProcessorResponse = {
try {
val substitutorCommand = SQLConf.withExistingConf(context.conf) {
new VariableSubstitution().substitute(command)
}
context.sparkContext.setJobDescription(substitutorCommand)
val execution = context.sessionState.executePlan(context.sql(command).logicalPlan)
hiveResponse = SQLExecution.withNewExecutionId(execution) {
hiveResultString(execution.executedPlan)
}
tableSchema = getResultSetSchema(execution)
new CommandProcessorResponse(0)
} catch {
//
}
}
the most important is
val execution = context.sessionState.executePlan(context.sql(command).logicalPlan)
First execute context.sql(command)
Context.sql method as follows
def sql(sqlText: String): DataFrame = sparkSession.sql(sqlText)
sparkSession.sql
plan is the parsed Unsolved Logical plan.
def sql(sqlText: String): DataFrame = withActive {
val tracker = new QueryPlanningTracker
val plan = tracker.measurePhase(QueryPlanningTracker.PARSING) {
sessionState.sqlParser.parsePlan(sqlText)
}
Dataset.ofRows(self, plan, tracker)
}
DataSet.ofRows
ofRows' qe.assertAnalyzed() analyzes the plan
def ofRows(sparkSession: SparkSession, logicalPlan: LogicalPlan, tracker: QueryPlanningTracker)
: DataFrame = sparkSession.withActive {
val qe = new QueryExecution(sparkSession, logicalPlan, tracker)
qe.assertAnalyzed()
new Dataset[Row](qe, RowEncoder(qe.analyzed.schema))
}
context.sql(command).logicalPlan
It is the logicalPlan of DataSet. The code is as follows:
@transient private[sql] val logicalPlan: LogicalPlan = {
val plan = queryExecution.commandExecuted
if (sparkSession.sessionState.conf.getConf(SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED)) {
val dsIds = plan.getTagValue(Dataset.DATASET_ID_TAG).getOrElse(new HashSet[Long])
dsIds.add(id)
plan.setTagValue(Dataset.DATASET_ID_TAG, dsIds)
}
plan
}
QueryExecution
The logicalPlan of DataSet uses queryExecution.commandExecuted
the field, which is lazy. It is initialized when it is used for the first time. Analyzed is used in it, and it is also lazy.
analyzed Convert Unsolved Execution Plan to Resolved Execution Plan. commandExecuted executes eagerlyExecuteCommands(analyzed).
The first time you access commandExecuted, a CommandResult object is generated. If you access the logicalPlan of the DataSet later, the CommandResult object will still be returned.
lazy val analyzed: LogicalPlan = executePhase(QueryPlanningTracker.ANALYSIS) {
// We can't clone `logical` here, which will reset the `_analyzed` flag.
sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker)
}
lazy val commandExecuted: LogicalPlan = mode match {
case CommandExecutionMode.NON_ROOT => analyzed.mapChildren(eagerlyExecuteCommands)
case CommandExecutionMode.ALL => eagerlyExecuteCommands(analyzed)
case CommandExecutionMode.SKIP => analyzed
}
eagerlyExecuteCommands returns a CommandResult object
private def eagerlyExecuteCommands(p: LogicalPlan) = p transformDown {
case c: Command =>
val qe = sparkSession.sessionState.executePlan(c, CommandExecutionMode.NON_ROOT)
val result = SQLExecution.withNewExecutionId(qe, Some(commandExecutionName(c))) {
qe.executedPlan.executeCollect()
}
CommandResult(
qe.analyzed.output,
qe.commandExecuted,
qe.executedPlan,
result)
case other => other
}
SparkSQLDriver.run
Continue back to the main process. The context.sessionState.executePlan parameter is a CommandResult object.
val execution = context.sessionState.executePlan(context.sql(command).logicalPlan)
hiveResponse = SQLExecution.withNewExecutionId(execution) {
hiveResultString(execution.executedPlan)
sessionState.executePlan
The default mode is CommandExecutionMode.ALL. plan is a CommandResult object.
def executePlan(
plan: LogicalPlan,
mode: CommandExecutionMode.Value = CommandExecutionMode.ALL): QueryExecution =
createQueryExecution(plan, mode)
}
protected def createQueryExecution:
(LogicalPlan, CommandExecutionMode.Value) => QueryExecution =
(plan, mode) => new QueryExecution(session, plan, mode = mode)
So val execution = context.sessionState.executePlan(context.sql(command).logicalPlan)
after execution, execution is a QueryExecution object.
SparkSQLDriver.run
hiveResponse = SQLExecution.withNewExecutionId(execution) {
hiveResultString(execution.executedPlan)
}
tableSchema = getResultSetSchema(execution)