CSV content being read as null with Spark

A Beginner :

I am trying to read a CSV file, so that I can query it using Spark SQL. The CSV looks like below:

16;10;9/6/2018

The CSV file contains no headers but we know that first column is a department code, second column is building code and third column is a date of format m/d/YYYY.

I wrote the following code to load the CSV filesv with a custom schema:

 StructType sch = DataTypes.createStructType(new StructField[] {
            DataTypes.createStructField("department",  DataTypes.IntegerType, true),
            DataTypes.createStructField("building", DataTypes.IntegerType, false),
            DataTypes.createStructField("date", DataTypes.DateType, true),


    });
    Dataset<Row> csvLoad = sparkSession.read().format("csv")
            .option("delimiter", ";")
            .schema(sch)
            .option("header","false")
            .load(somefilePath);
    csvLoad.show(2);

When I use csvLoad.show(2) it is only showing me the below output:

|department|building|date|
+----------+---------+---+
|null      |null     |null |
|null      |null     |null |

Can anyone please tell what is wrong in the code ? I am using spark 2.4 version.

TheWhiteRabbit :

The issue is with your date field, since it has a custom format you'll need to specify the format as an option:

Dataset<Row> csvLoad = sparkSession.read().format("csv")
        .option("delimiter", ";")
        .schema(sch)
        .option("header","false")
        .option("dateFormat", "m/d/YYYY")
        .load(somefilePath);

This will result in output:

+----------+--------+----------+
|department|building|      date|
+----------+--------+----------+
|        16|      10|2018-01-06|
+----------+--------+----------+

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=100242&siteId=1