AWS Athena switches SerDe Lib for CSV files

Glue Crawler will use by default:

Serde serialization lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

 As a parsing engine for CSV files.

But if the CSV uses double quotes as the closing character, the engine will treat the double quotes as part of the data and cannot correctly identify the closing character.

So to use another engine instead:

Serde serialization lib org.apache.hadoop.hive.serde2.OpenCSVSerde

This engine uses double quotes as the so-called closing character by default, and commas as the delimiter for CSV. It can be used without configuring any Serde parameters.

However, due to the null value, an error may be reported: HIVE_BAD_DATA: Error parsing field value '' for field x: For input string: ""

Solve the "HIVE_BAD_DATA: Error parsing field value '' for field X: For input string: """ error in Athena

The solution is to click "Edit Schema" to change some columns that may be empty to string type. 

To summarize in two sentences:

1.  Edit table

 

2. Edit schema

modifty column type from "bigint/double" to "string."

Guess you like

Origin blog.csdn.net/rav009/article/details/126362329