SparkSQL performed using the data model or at the time of writing data to the database, a number of technical fields typically used, this technique is usually given by a literal field magnitude.
SparkSQL this situation gives a specific solution.
In org.apache.spark.sql.functions singleton object, the present function is used as the literal field is defined as follows:
/** * Creates a [[Column]] of literal value. * * The passed in object is returned directly if it is already a [[Column]]. * If the object is a Scala Symbol, it is converted into a [[Column]] also. * Otherwise, a new [[Column]] is created to represent the literal value. * The difference between this function and [[lit]] is that this function * can handle parameterized scala types e.g.: List, Seq and Map. * * @group normal_funcs * @since 2.2.0 */ def typedLit[T : TypeTag](literal: T): Column = literal match { case c: Column => c case s: Symbol => new ColumnName(s.name) case _ => Column(Literal.create(literal)) }
Read the source code comments can be seen: This function is used to create a field using a literal value. If the incoming object is an existing field, the field is returned directly; if a
Scala Symbol objects, it is converted into a Column type. In addition, the newly created set out in column also use literal representation, can be renamed by using as its function and alias.
Examples are given below:
object SparkSqlTest {
def main(args: Array[String]): Unit = {
val sparkSession = SparkSession
.builder()
.appName("SparkTest")
.master("local[4]")
.getOrCreate()
val city_data_model = sparkSession
.read
.format("jdbc")
.option("url", "jdbc:mysql://localhost:3306/world")
.option("user", "root")
.option("password", "root")
.option("dbtable", "city")
.load()
import org.apache.spark.sql.functions._
val city_data_model_new_column = city_data_model
.filter(city_data_model("countrycode") === "CHN")
.select(
city_data_model("name"),
city_data_model("district"),
city_data_model("population"),
//创建一个新的列作为统计日期
typedLit(current_timestamp()).alias("statistic_time"),
typedLit("Y").alias("is_valid")
)
city_data_model_new_column.printSchema()
/**
* city_data_model_new_column.printSchema()输出结果:
* root
* |-- name: string (nullable = true)
* |-- district: string (nullable = true)
* |-- population: integer (nullable = true)
* |-- statistic_time: timestamp (nullable = false)
* |-- is_valid: string (nullable = false)
*/
city_data_model_new_column.show(10)
/**
* city_data_model_new_column.show(10)输出结果:
* +------------------+------------+----------+--------------------+--------+
* | name| district|population| statistic_time|is_valid|
* +------------------+------------+----------+--------------------+--------+
* | Shanghai| Shanghai| 9696300|2020-02-11 14:34:...| Y|
* | Peking| Peking| 7472000|2020-02-11 14:34:...| Y|
* | Chongqing| Chongqing| 6351600|2020-02-11 14:34:...| Y|
* | Tianjin| Tianjin| 5286800|2020-02-11 14:34:...| Y|
* | Wuhan | Hubei | 4.3446 million | 2020-02-11 14:34: ... | the Y-|
* | Harbin | Communications Science | 4.2898 million | 2020-02-11 14:34: ... | the Y-|
* | Shenyang | Liaoning | 4.2652 million | 2020-02-11 14:34: ... | the Y-|
* | Kanton [Guangzhou] | Guangdong | 4.2563 million | 2020-02-11 14:34: ... | the Y-|
* | Chengdu | Sichuan | 3.3615 million | 2020-02-11 14:34: ... | the Y-|
* | Nanking [Nanjing] | on Jiangsu | 2.8703 million | 2020-02-11 14:34: ... | the Y-|
* + ---- -------------- + ------------ + ---------- + ----------- + -------- + ---------
* /
}
}