Spark2.3.3创建DataFrame的14种方式和源码解析(四)【全网最全】

目录

一、问题分析

二、@BeanProperty分析

三、Scala Class创建DataFrame代码

四、结果展示



一、问题分析

       注:此处的普通类指的是scala中定义的非case class的类框架在底层将其视作java定义的标准bean类型来处理而scala中定义的普通bean类,不具备字段的java标准getters和setters,因而会处理失败,可以通过【@BeanProperty】来解决

二、@BeanProperty分析

package scala.beans

/** When attached to a field, this annotation adds a setter and a getter
 *  method following the Java Bean convention. For example:
 *  {{{
 *    @BeanProperty
 *    var status = ""
 *  }}}
 *  adds the following methods to the class:
 *  {{{
 *    def setStatus(s: String) { this.status = s }
 *    def getStatus: String = this.status
 *  }}}
 *  For fields of type `Boolean`, if you need a getter named `isStatus`,
 *  use the `scala.beans.BooleanBeanProperty` annotation instead.
 */
@scala.annotation.meta.field
class BeanProperty extends scala.annotation.StaticAnnotation

三、Scala Class创建DataFrame代码

package blog

import scala.beans.BeanProperty

/**
 * @author: 余辉  
 * @blog: https://blog.csdn.net/silentwolfyh
 * @create: 2019-12-29 11:25
 * @description:
 **/
class StuScala(
                @BeanProperty
                val id: Int,

                @BeanProperty
                val name: String,

                @BeanProperty
                val age: Int,

                @BeanProperty
                val city: String,

                @BeanProperty
                val score: Double)

object StuScala {
  def apply(id: Int, name: String, age: Int, city: String, score: Double): StuScala =
    new StuScala(id, name, age, city, score)
}
package blog

import cn.doit.sparksql.day01.utils.SparkUtils
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, SparkSession}

/**
 * @author: 余辉  
 * @blog: https://blog.csdn.net/silentwolfyh
 * @create: 2019-12-29 11:23
 * @description:
 **/
object DF04_Create_scala_Class {

  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkUtils.getSparkSession()

    val rdd: RDD[String] = spark.sparkContext.textFile("spark_sql/doc/stu.csv")
    val data: RDD[StuScala] = rdd.map(line => {
      val arr = line.split(",")
      StuScala(arr(0).toInt, arr(1), arr(2).toInt, arr(3), arr(4).toDouble)
    })

    val frame: DataFrame = spark.createDataFrame(data, classOf[StuScala])
    frame.printSchema()
    frame.show()
  }
}

四、结果展示

在这里插入图片描述

发布了422 篇原创文章 · 获赞 357 · 访问量 124万+

猜你喜欢

转载自blog.csdn.net/silentwolfyh/article/details/103836925