Spark writes to hive table

1. Problem Description
        Spark has converted DF/DS, and the general way to store hive is:

DF.write.format("orc").mode(SaveMode.Append).saveAsTable("default.student")


        1. If this table does not exist in hive itself, the corresponding table will be automatically created in hive for data storage.

        2. If this table exists in hive, it will be considered in two situations.

        The first case: the existing student table is automatically created using the spark write hive program, so in this case it can be written normally.        

DF.write.format("orc").mode(SaveMode.Append).saveAsTable("default.student")

        Second case: The existing student table is created using the hive command. In this case, an error will be reported.

create table student(name string,sex string) stored as orc; 

        The error message is:

The format of the existing table default.student is `HiveFileFormat`. It doesn't match the specified format `OrcFileFormat`.;

2. Reason analysis
1. Error message analysis
The format of the existing table default.student is `HiveFileFormat` means that the spark program thinks that the orc table created using the command in hive, in its eyes, the storage format is HiveFileFormat.
It doesn't match the specified format `OrcFileFormat` means that the dataFrame of the student to be stored in the spark program is in OrcFileFormat format, which does not match the storage format of the table created in hive.
2. Think about
        why the student table created using the command in hive is in the orc format, but it does not match the orc format specified in the spark program to be stored?

3. Conclusion
Any storage format table created using commands in hive will appear to the spark program as a HiveFileFormat format table.
The spark program only knows that it can create tables in various storage formats to hive through code and can match and store data accordingly.
For tables created using the hive command, the specified matching format of spark is unified into the HiveFileFormat format.
3. Solution:
        For tables in any storage format that have been created by the hive command, use the following writing method when writing in spark:

DF.write.format("Hive").mode(SaveMode.Append).saveAsTable("default.student")


 

[Hive|Spark] Spark writes hive table storage format issue_spark stores data in hive_Hao Shao's blog-CSDN blog

Guess you like

Origin blog.csdn.net/eylier/article/details/130505534