Crear spark.createDataSet (colección) a partir de una colección existente
scala> val ds1 = spark.createDataset(1 to 10)
ds1: org.apache.spark.sql.Dataset[Int] = [value: int]
scala> ds1.show
Cree spark.createDataSet (RDD) a partir de un RDD existente
scala> val personRDD = sc.textFile("file:///export/person.txt")
personRDD: org.apache.spark.rdd.RDD[String] = file:///export/person.txt MapPartitionsRDD[33] at textFile at<console>:24
scala> spark.createData
createDataFrame createDataset
scala> val ds2 = spark.createDataset(personRDD)
ds2: org.apache.spark.sql.Dataset[String] = [value: string]
scala> ds2.show
+-------------+
| value|
+-------------+
|1 zhangsan 20|
| 2 lisi 29|
| 3 wangwu 25|
| 4 zhaoliu 30|
| 5 tianqi 35|
| 6 kobe 40|
+-------------+
Llame al método toDS a través de la clase de muestra y la colección para obtener el conjunto de datos
scala> case class Person(name:String,age:Int)
scala> val personDataList = List(Person("zhangsan",18),Person("lisi",28))
scala> val personDS = personDataList.toDS
personDS: org.apache.spark.sql.Dataset[Person] = [name: string, age: int]
scala> personDS.show
+--------+---+
| name|age|
+--------+---+
|zhangsan| 18|
| lisi| 28|
+--------+---
Transformar en dataSet a través de dataFrame
scala> case class Person(name:String,age:Long)
defined class Person
scala> val jsonDF = spark.read.json("file:///export/servers/spark-2.2.0-bin-2.6.0-cdh5.14.0/examples/src/main/resources/people.json")
jsonDF: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scala> val jsonDS = jsonDF.as[Person]
jsonDS: org.apache.spark.sql.Dataset[Person] = [age: bigint, name: string]
scala> jsonDS.show
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
Diagrama