Transformations on DStreams之updateStateByKey 的使用和状态累加 - 代码天地

Transformations on DStreams之updateStateByKey 的使用和状态累加

其他 2019-01-30 12:00:55 阅读次数: 0

Transformations on DStreams之transform的使用实现黑名单操作/指定过滤
https://blog.csdn.net/qq_43688472/article/details/86616864
只处理当前批次的数据，所谓的无状态的方式，来一次，处理一次
有状态：改批次的数据和以前批次的数据是需要“累加”的

例如：今天某点到某点什么数据出现的次数

1.在那个基础上加个时间戳，把他放到某处，在进行累加
2.直接的方式完成
官网：http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams
updateStateByKey(func)
Return a new “state” DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values for the key. This can be used to maintain(维持) arbitrary state data for each key.
累计旧状态进行更新

IDEA操作

package g5.learning

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

import scala.collection.mutable.ListBuffer

object UpdateStateByKeyApp {

  def main(args: Array[String]): Unit = {

    //准备工作
    val conf = new SparkConf().setMaster("local[2]").setAppName("UpdateStateByKeyApp")
    val ssc = new StreamingContext(conf, Seconds(10))

    ssc.checkpoint("hdfs://hadoop001:8020/ss/logs")//这里要加这个，为什么，因为这是个有状态的数据，你要旧数据一个地方存放才能累加
    //业务逻辑
    val lines = ssc.socketTextStream("hadoop001", 9999)
   val results = lines.flatMap(_.split(",")).map((_,1))
val state = results.updateStateByKey(updateFunction)

    state.print()
    //streaming的启动
    ssc.start() // Start the computation
    ssc.awaitTermination() // Wait for the computation to terminate

  }
  def updateFunction(newValues: Seq[Int], runningCount: Option[Int]): Option[Int] = {
    val newCount = newValues.sum
    val pre =runningCount.getOrElse(0) // add the new values with the previous running count to get the new count
    Some(newCount+ pre)
  }

}

问题：

这里你会发现在hdfs上你会产生很多的小文件

猜你喜欢

转载自blog.csdn.net/qq_43688472/article/details/86617536

Transformations on DStreams之updateStateByKey 的使用和状态累加

Transformations on DStreams之transform的使用实现黑名单操作/指定过滤

Concatenating Transformations(变换的累加)

transformations

Flink之KeyedStream Transformations(键控流转换算子)和Multistream Transformations(多流转换算子)

RDDs基本操作之Transformations

matplotlib 高阶之Transformations Tutorial

关于Spark Dataset API中的Typed transformations和Untyped transformations

SparkStreaming（五）操作函数之Transformations

PyTorch学习之数据增强（image transformations）

4.RDD常用算子之transformations

Jetpack篇——LiveData扩展之Transformations

DataSet API编程指南之Transformations（四）

Spark的转化和行动（transformations和action）

详细使用Transformations实现圆角或圆形图片

Spark2.3.1 常用Transformations和Actions

Spark Streaming之updateStateByKey和mapWithState比较

[大数据之Spark]——Transformations转换入门经典实例

ORACLE CBO 的 SQL 自动转换（Cost Based Transformations）之四

Android 开发 glide-transformations 图片处理基本使用

【NLP学习笔记】（二）gensim使用之Topics and Transformations

Spark的这些事<三>——spark常用的Transformations 和Actions

SparkCore中的常见Transformations算子和Action算子

RDD Transformations

[Java] Transformations

Android 开发框架系列 glide-transformations 图片处理基本使用

spark Transformations，Actions

Optimizer Transformations: Star Transformation

【Modern OpenGL】转换 Transformations

Directx 11 Transformations

今日推荐

数学建模Matlab之数据预处理方法

充电桩---ISO15118协议详细介绍

对话Kaldi之父、小米首席语音科学家Daniel Povey：开源环境比金钱和荣誉更吸引我 | AGI技术50人...

Hugging Face全攻略：轻松下载Llama 3模型，探索NLP的无限可能！【实操】

阅读送书抽奖？玩转抽奖游戏，js-tool-big-box工具库新上抽奖功能

百度发布Comate代码知识增强2.0，国内首个支持实时检索智能代码助手

黑客利用扫雷游戏 Python 克隆隐藏恶意脚本，攻击欧洲和美国金融机构

微软对开源字体 Cascadia Code 进行重大更新

好书推荐《ChatGPT原理与架构：大模型的预训练、迁移和中间件编程》

Baidu Comate 智能编码助手：编程新伙伴，效率新飞跃

AI时代：人工智能大模型引领科技创造新时代

百篇博客 · 千里之行

周排行

WebSocket、HTTP 与 TCP

private,public,protected的区别

Python用了这么多年，总结出超实用的功能和特点

dgwp笔记

ModuleNotFoundError: No module named 'gdbm'

数组的去重方法

Ternsorflow 学习：005-MNIST 实现模型

SpringBoot 2 源码学习笔记（二）

jaxws-spring 搭建Web Services笔记

读取properties文件并获取属性值

每日归档

更多

2024-05-27(56)

2024-05-26(6)

2024-05-25(68)

2024-05-24(65)

2024-05-23(9)

2024-05-22(41)

2024-05-21(8)

2024-05-20(36)

2024-05-19(0)

2024-05-18(4)