DataSet that something

 

About DataSet

  • Dataset evolution process: SchemaRDD-> DataFrame-> DataSet
  • DataSet mode of operation and DataFrame almost no difference, it is the emergence of spark1.6
  • DataSet是Strong type

 

 

 

Why should the official launch DataSet?

As a sql: selec a from table, selec wrong syntax, a column name is wrong, is this situation, run different API abnormal findings when time is completely different.

 

DatSet is to allow the emergence of an error (such as column name wrong) runtime can be found at compile time, so as not to apply for job submission and resources.

DS programming contrast with DF

scala> case class People(name: String, salary: String)
defined class People

scala> val ds = spark.read.format("JSON").load("/user/hadoop/examples/src/main/resources/employees.json").as[People]
ds: org.apache.spark.sql.Dataset[People] = [name: string, salary: bigint]

Scala > ds. the SELECT ( " name " ) .Show () // this way and exactly the same DF
 + ------- +
|   name|
+-------+
|Michael|
|   Andy|
| Justin|
|  Berta|
+-------+

 

Scala> ds.map (_. name) .Show () // this is the correct way to use ds, analyze it checked at compile time
 + ------- +
|  value|
+-------+
|Michael|
|   Andy|
| Justin|
|  Berta|
+-------+

 

Guess you like

Origin www.cnblogs.com/xuziyu/p/11139570.html