SparkSQL: Create a literal column

  SparkSQL performed using the data model or at the time of writing data to the database, a number of technical fields typically used, this technique is usually given by a literal field magnitude.

SparkSQL this situation gives a specific solution.

  In org.apache.spark.sql.functions singleton object, the present function is used as the literal field is defined as follows:

  /**
   * Creates a [[Column]] of literal value.
   *
   * The passed in object is returned directly if it is already a [[Column]].
   * If the object is a Scala Symbol, it is converted into a [[Column]] also.
   * Otherwise, a new [[Column]] is created to represent the literal value.
   * The difference between this function and [[lit]] is that this function
   * can handle parameterized scala types e.g.: List, Seq and Map.
   *
   * @group normal_funcs
   * @since 2.2.0
   */
  def typedLit[T : TypeTag](literal: T): Column = literal match {
    case c: Column => c
    case s: Symbol => new ColumnName(s.name)
    case _ => Column(Literal.create(literal))
  }

  Read the source code comments can be seen: This function is used to create a field using a literal value. If the incoming object is an existing field, the field is returned directly; if a

Scala Symbol objects, it is converted into a Column type. In addition, the newly created set out in column also use literal representation, can be renamed by using as its function and alias.

Examples are given below:

object SparkSqlTest {
def main(args: Array[String]): Unit = {
val sparkSession = SparkSession
.builder()
.appName("SparkTest")
.master("local[4]")
.getOrCreate()
val city_data_model = sparkSession
.read
.format("jdbc")
.option("url", "jdbc:mysql://localhost:3306/world")
.option("user", "root")
.option("password", "root")
.option("dbtable", "city")
.load()
import org.apache.spark.sql.functions._
val city_data_model_new_column = city_data_model
.filter(city_data_model("countrycode") === "CHN")
.select(
city_data_model("name"),
city_data_model("district"),
city_data_model("population"),
//创建一个新的列作为统计日期
typedLit(current_timestamp()).alias("statistic_time"),
typedLit("Y").alias("is_valid")
)
city_data_model_new_column.printSchema()
/**
* city_data_model_new_column.printSchema()输出结果:
* root
* |-- name: string (nullable = true)
* |-- district: string (nullable = true)
* |-- population: integer (nullable = true)
* |-- statistic_time: timestamp (nullable = false)
* |-- is_valid: string (nullable = false)
*/
city_data_model_new_column.show(10)

/**
* city_data_model_new_column.show(10)输出结果:
* +------------------+------------+----------+--------------------+--------+
* | name| district|population| statistic_time|is_valid|
* +------------------+------------+----------+--------------------+--------+
* | Shanghai| Shanghai| 9696300|2020-02-11 14:34:...| Y|
* | Peking| Peking| 7472000|2020-02-11 14:34:...| Y|
* | Chongqing| Chongqing| 6351600|2020-02-11 14:34:...| Y|
* | Tianjin| Tianjin| 5286800|2020-02-11 14:34:...| Y|
* | Wuhan | Hubei | 4.3446 million | 2020-02-11 14:34: ... | the Y-|
* | Harbin | Communications Science | 4.2898 million | 2020-02-11 14:34: ... | the Y-|
* | Shenyang | Liaoning | 4.2652 million | 2020-02-11 14:34: ... | the Y-|
* | Kanton [Guangzhou] | Guangdong | 4.2563 million | 2020-02-11 14:34: ... | the Y-|
* | Chengdu | Sichuan | 3.3615 million | 2020-02-11 14:34: ... | the Y-|
* | Nanking [Nanjing] | on Jiangsu | 2.8703 million | 2020-02-11 14:34: ... | the Y-|
* + ---- -------------- + ------------ + ---------- + ----------- + -------- + ---------
* /

}
}

Guess you like

Origin www.cnblogs.com/Lovette-Liu/p/12295014.html