Spark:Structured Streaming Sink总结

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u012551524/article/details/85335490

测试组件版本:

spark:2.4.0


目前spark2.4支持以下sink:

ForeachBatchSink目前只有spark2.4以上版本支持

ElasticSearchSink实现:

val esOptions = Map(

      "es.write.operation"      -> "upsert" 

      ,"es.mapping.id"           -> "id")

    name.writeStream.options(esOptions)

      .format("org.elasticsearch.spark.sql")

      .option("checkpointLocation","hdfs://zt01/tmp/kafka")

      .start("shopforce/m_retail").awaitTermination()

ForeachSink实现(以写phoenix为例)

name.writeStream.outputMode("append").foreach(new ForeachWriter[Row] {

      def open(partitionId: Long, version: Long): Boolean = {

        true

      }

      def process(value: org.apache.spark.sql.Row): Unit = {

        object phoenixSchema {

          val column1 = StructField("ID",StringType)

          val column2 = StructField("NAME",StringType)

          val structType = StructType(Array(column1,column2))

        }

        val spark = SparkSession.builder.getOrCreate()

        println(value.toString())

        spark.createDataFrame(spark.sparkContext.parallelize(Seq(value))

          .map(x=>Row(x.apply(0).toString,x.apply(1).toString)),phoenixSchema.structType)

          .write.format("org.apache.phoenix.spark").

          mode("overwrite").

          option("table","test1").

          option("zkUrl","BigData-Dev-1:2181")

          .save()

      }

      def close(errorOrNull: Throwable): Unit = {

          println("close")

      }

    }).option("checkpointLocation","hdfs://zt01/tmp/kafka").start().awaitTermination()

KafkaSink实现:

val query = df.writeStream.

      format("kafka")

      .option("kafka.bootstrap.servers", "BigData-Dev-5:9092,BigData-Dev-4:9092,BigData-Dev-3:9092,BigData-Dev-2:9092")

      .option("checkpointLocation","hdfs://zt01/tmp/kafka")

      .trigger(Trigger.ProcessingTime(300))

      .option("topic","my_first_topic").start()

ConSoleSink实现:

name.writeStream.outputMode("append")

      .format("console").option("checkpointLocation","hdfs://zt01/tmp/kafka").start().awaitTermination()

ForeachBatchSink实现:

name.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) =>
  batchDF.write.format("org.apache.phoenix.spark").
    mode("overwrite").
    option("table","NT_SALE_ORDER_REPLICATION").
    option("zkUrl","BigData-Dev-1:2181")
    .save()
}.option("checkpointLocation","hdfs://zt01/tmp/kafka").start().awaitTermination()

新增待补充!

猜你喜欢

转载自blog.csdn.net/u012551524/article/details/85335490