Spark SQL load data and save data general way

Download Data

spark.read.format("…")[.option("…")].load("…")
  • format("..."): Specify the type of data to be loaded, including "csv", "jdbc", "json", "orc", "parquet" and "textFile".
  • load("..."): In the "csv", "jdbc", "json", "orc", "parquet" and "textFile" formats, the path to load data needs to be passed in. The parquet file is loaded by default.
  • option("..."): In the "jdbc" format, the corresponding JDBC parameters, such as url, user, password and dbtable, need to be passed in.
df.write.save("D:\\develop\\workspace\\bigdata2021\\spark2021\\out")

save data

df.write.format("…").mode("...")[.option("…")].save("…")
  • The format and option parameters are the same as above.
  • save: Specify the storage path. The default is snappy compressed parquet file storage format.
  • mode: Used to specify how to process the data. Default error, throw an exception if the file exists; append if the file exists, append; overwrite if the file exists, overwrite; ignore if the file exists, ignore it.
// 默认保存
df.write.save("D:\\develop\\workspace\\bigdata2021\\spark2021\\out")

// 使用format保存指定格式的文件
df.write.format("json").save("D:\\develop\\workspace\\bigdata2021\\spark2021\\out")

// mode指定保存选项  
df.write.format("json").mode("append").save("D:\\develop\\workspace\\bigdata2021\\spark2021\\out")

df.write.format("json").mode("overwrite").save("D:\\develop\\workspace\\bigdata2021\\spark2021\\out")

Guess you like

Origin blog.csdn.net/FlatTiger/article/details/115284467