spark-sql创建DataFrame/DataSets的几种方式

1.创建DataFrame

1.普通创建

case class Calllog(fromtel: String,totel: String,time: String,duration: Int)

val ds = sc.textFile("/user/data/calllog.csv").map(x=>x.split(","))

val log = ds.map(x=>Calllog(x(0),x(1),x(2),x(3).toInt))

val df = log.toDF

2.使用spark-seesion创建

//代码比较多,就不创建了,知道有这种方式就好了

3.根据带格式的文件创建,例如(json)

val df1 = spark.read.json("/user/data/people.json")

val df2 = spark.read.format("json").load("/user/data/people.json")

2.创建DataSets

1.

val df2 = spark.read.format("json").load("/user/data/people.json")

val ds = Seq(mydata(1,"tom"),mydata(2,"jerry")).toDS

2.

case class People(name:String,age:BigInt)

val data = spark.read.json("/user/data/people.json")

data.as[People]

猜你喜欢

转载自www.cnblogs.com/xxfxxf/p/12093873.html