environment:
- Hive client (version 1.2.1)
- Spark version 2.1.1.2.6.1.0-129
There is an existing test table datadev.t_student, the fields are as follows
col_name | data_type | comment |
id | string | |
score | int |
At this point, when we execute the following command in Spark-SQL, an error will be reported.
create view datadev.t_student_view as select NVL(id, 'xx') as id from datadev.t_student;
The error message is as follows:
21/12/20 16:57:44 ERROR SparkSQLDriver: Failed in [create view datadev.t_student_view as select NVL(id, 'xx') as id from datadev.t_student]
java.lang.RuntimeException: Failed to analyze the canonicalized SQL: SELECT `gen_attr_0` AS `id` FROM (SELECT nvl(t_student.`id`, 'xx') AS `gen_attr_0` FROM (SELECT `id` AS `gen_attr_1`, `score` AS `gen_attr_2` FROM `datadev`.`t_student`) AS gen_subquery_0) AS gen_subquery_1
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`t_student.id`' given input columns: [gen_attr_1, gen_attr_2]; line 1 pos 45;
'Project ['gen_attr_0 AS id#31]
+- 'SubqueryAlias gen_subquery_1
+- 'Project ['nvl('t_student.id, xx) AS gen_attr_0#30]
+- SubqueryAlias gen_subquery_0
+- Project [id#32 AS gen_attr_1#28, score#33 AS gen_attr_2#29]
+- MetastoreRelation datadev, t_student
The analysis of the error log here shows that Spark-SQL aliases gen_attr_0 for the id field during execution, but gen_attr_0 cannot be obtained in the NVL function, so an error is reported.
However, this SQL is fine in Hive SQL. It should be that there is a problem with the underlying analysis of Spark-SQL.
Solution: Replace the NVL function with the COALESCE function