Serialización kafka personalizada de Springboot y deserialización kafka personalizada de Flink
En escenarios de computación en tiempo real, la mayoría de nosotros usaremos la combinación de Storm+Kafka, Spark+Kafka y Flink+Kafka para completar. Entre ellos, Flink es actualmente un marco de computación de big data popular, que tiene más ventajas que otros marcos de computación de big data.
En la combinación de computación de flujo de Flink+Kafka, la serialización y la deserialización predeterminadas de Kafka son cadenas, es decir, los productores y consumidores de Kafka se pasan a través de cadenas, cuando es necesario pasar objetos. A veces, por supuesto, también podemos transferir objetos en la forma de convertir objetos a json, y este artículo describirá el proceso de transferencia de objetos mediante la personalización de la serialización de Kafka, eliminando la necesidad de objetos-"json-"objetos.
Serialización personalizada del productor Kafka
La serialización personalizada del productor Kafka se configura en función del proyecto Spring Boot.
Configuración de la serialización personalizada de Kafka en Maven
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-amqp</artifactId>
</dependency>
Clase de serialización personalizada de Kafka
Aquí se usa el código Kotlin, que es algo similar al código Java, creo que todos deberían poder entenderlo.
import com.junwei.pojo.TravelerData
import org.apache.kafka.common.serialization.Serializer
import java.io.ByteArrayOutputStream
import java.io.ObjectOutputStream
class TravelerDataSerializer : Serializer<TravelerData> {
override fun serialize(p0: String?, data: TravelerData?): ByteArray? {
if (null == data) {
return null
} else {
val output = ByteArrayOutputStream()
val outputStream = ObjectOutputStream(output)
outputStream.writeObject(data)
return output.toByteArray()
}
}
override fun close() {
}
override fun configure(p0: MutableMap<String, *>?, p1: Boolean) {
}
}
configuración de aplicación.yml
Sptring:
kafka:
topic: traveler-data
bootstrap-servers: bigdata01:9092,bigdata02:9092,bigdata03:9092
producer:
retries: 1
batch-size: 16384
buffer-memory: 33554432
key-serializer: org.apache.kafka.common.serialization.StringSerializer
# 这里配置的是自定义序列化的全类名
value-serializer: com.junwei.browse.util.TravelerDataSerializer
productor de kafka
import com.junwei.pojo.TravelerData
import org.springframework.beans.factory.annotation.Autowired
import org.springframework.beans.factory.annotation.Value
import org.springframework.context.annotation.Configuration
import org.springframework.kafka.core.KafkaTemplate
import org.springframework.stereotype.Component
@Component
@Configuration
class KafkaUtil {
@Autowired
lateinit var kafkaTemplate: KafkaTemplate<String, TravelerData>
@Value("\${
spring.kafka.topic:0}")
private lateinit var topic: String
fun sendMsg(message: TravelerData) {
kafkaTemplate.send(topic, message)
}
}
enviar datos al productor
@ApiOperation(value = "根据id查询景点信息")
@GetMapping("{id}")
fun searchById(@PathVariable id: String, request: HttpServletRequest): Result<*> {
val userId = HeaderUtil.getUserIdFromToken(request)
val travelInfo = travelInfoService.searchById(id, userId)
if (travelInfo != null) {
// 向Kafka发送对象数据
kafkaUtil.run {
sendMsg(TravelerData(userId, id, travelInfo.title, travelInfo.city, travelInfo.topic))
}
}
return if (travelInfo != null) Result.success(travelInfo) else Result.fail()
}
Deserialización personalizada del consumidor de Kafka
La deserialización personalizada de los consumidores de Kafka se configura en función del proyecto Flink.
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.junwei</groupId>
<artifactId>flink-kafka</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<flink.version>1.10.1</flink.version>
<scala.binary.version>2.11</scala.binary.version>
<scala.version>2.11.12</scala.version>
<kafka.version>1.1.1</kafka.version>
</properties>
<dependencies>
<dependency>
<groupId>com.junwei</groupId>
<artifactId>common-pojo</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>${kafka.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.7</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
<scope>runtime</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<executions>
<execution>
<id>scala-compile</id>
<phase>compile</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>scala-test-compile</id>
<phase>test-compile</phase>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<executions>
<execution>
<id>default-compile</id>
<phase>none</phase>
</execution>
<execution>
<id>default-testCompile</id>
<phase>none</phase>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<version>2.8</version>
<configuration>
<downloadSources>true</downloadSources>
<projectnatures>
<projectnature>org.scala-ide.sdt.core.scalanature</projectnature>
<projectnature>org.eclipse.jdt.core.javanature</projectnature>
</projectnatures>
<buildcommands>
<buildcommand>org.scala-ide.sdt.core.scalabuilder</buildcommand>
</buildcommands>
<classpathContainers>
<classpathContainer>org.scala-ide.sdt.launching.SCALA_CONTAINER</classpathContainer>
<classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
</classpathContainers>
<excludes>
<exclude>org.scala-lang:scala-library</exclude>
<exclude>org.scala-lang:scala-compiler</exclude>
</excludes>
<sourceIncludes>
<sourceInclude>**/*.scala</sourceInclude>
<sourceInclude>**/*.java</sourceInclude>
</sourceIncludes>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<version>1.7</version>
<executions>
<execution>
<id>add-source</id>
<phase>generate-sources</phase>
<goals>
<goal>add-source</goal>
</goals>
<configuration>
<sources>
<source>src/main/scala</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>com.junwei.manager.TravelerDataKafkaConsumer</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Configuración del deserializador personalizado de Kafka
import java.io.{
ByteArrayInputStream, ObjectInputStream}
import java.util
import com.junwei.pojo.TravelerData
import org.apache.kafka.common.serialization.Deserializer
class TravelerDataDeserializer extends Deserializer[TravelerData] {
override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {
}
override def deserialize(topic: String, data: Array[Byte]): TravelerData = {
val byteArray = new ByteArrayInputStream(data)
val objectInput = new ObjectInputStream(byteArray)
objectInput.readObject().asInstanceOf[TravelerData]
}
override def close(): Unit = {
}
}
import java.io.{
ByteArrayInputStream, ObjectInputStream}
import com.junwei.pojo.TravelerData
import org.apache.flink.api.common.serialization.DeserializationSchema
import org.apache.flink.api.common.typeinfo.{
TypeHint, TypeInformation}
class TravelerDataSchema extends DeserializationSchema[TravelerData] {
override def deserialize(message: Array[Byte]): TravelerData = {
val byteArray = new ByteArrayInputStream(message)
val objectInput = new ObjectInputStream(byteArray)
objectInput.readObject().asInstanceOf[TravelerData]
}
override def isEndOfStream(nextElement: TravelerData): Boolean = false
override def getProducedType: TypeInformation[TravelerData] = {
TypeInformation.of(new TypeHint[TravelerData] {
})
}
}
Configuración del consumidor de Kafka
import java.util.Properties
import com.junwei.constant.Constant
import com.junwei.pojo.TravelerData
import com.junwei.serialization.{
TravelerDataDeserializer, TravelerDataSchema}
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer
import org.apache.kafka.common.serialization.StringDeserializer
object KafkaConfig {
def getKafkaTravelerConsumer(groupId: String, topic: String): FlinkKafkaConsumer[TravelerData] = {
val properties = new Properties()
properties.setProperty("bootstrap.servers", Constant.KAFKA_IP_PORT)
properties.setProperty("zookeeper.connect", Constant.ZK_IP_PORT)
properties.setProperty("key.deserializer", classOf[StringDeserializer].getName)
// 这里配置自定义反序列化类
properties.setProperty("value.deserializer", classOf[TravelerDataDeserializer].getName)
// offset自动重置
properties.setProperty("auto.offset.reset", "latest")
properties.setProperty("group.id", groupId)
// 这里配置自定义的Schema
new FlinkKafkaConsumer[TravelerData](topic, new TravelerDataSchema(), properties)
}
}
Flink trabajo
Este trabajo de Flink realiza principalmente clasificación y procesamiento de resumen de datos de consumo.
import com.junwei.config.KafkaConfig
import com.junwei.entity.{
CityData, ResultData, TopicData, TravelsData}
import org.apache.flink.api.common.state.{
ValueState, ValueStateDescriptor}
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.functions.KeyedProcessFunction
import org.apache.flink.streaming.api.scala._
import org.apache.flink.util.Collector
object TravelerDataKafkaConsumer {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.addSource(KafkaConfig.getKafkaTravelerConsumer("0", "traveler-data"))
// 经过Kafka自定义反序列化的处理,消费者接收到的直接就是一个对象数据
.map(it => (it.getUserId, it.getTravelId, it.getTravelName, it.getTravelCity.split("·")(0).substring(2), it.getTravelTopic))
.keyBy(_._1).process(new KeyedProcessFunction[String, (String, String, String, String, String), (Boolean, String, ResultData)] {
var resultData: ValueState[ResultData] = _
override def open(parameters: Configuration): Unit = {
resultData = getRuntimeContext.getState(new ValueStateDescriptor[ResultData]("resultData", classOf[ResultData]))
}
override def processElement(value: (String, String, String, String, String),
ctx: KeyedProcessFunction[String, (String, String, String, String, String),
(Boolean, String, ResultData)]#Context, out: Collector[(Boolean, String, ResultData)]): Unit = {
var data = resultData.value()
val name = List[TravelsData](TravelsData(value._2, value._3, 1))
val topic = value._5.split(",").map(it => TopicData(it, 1)).toList
val city = List[CityData](CityData(value._4, 1))
var insertFlag = false
if (null == data) {
insertFlag = true
data = ResultData(value._1, topic, city, name)
} else {
insertFlag = false
data.cityDataList = data.cityDataList.union(city)
.groupBy(_.cityName).map(it =>
CityData(it._1, it._2.map(_.count).sum)
).toList
data.topicDataList = data.topicDataList.union(topic)
.groupBy(_.topicName).map(it =>
TopicData(it._1, it._2.map(_.count).sum)
).toList
data.traversDataList = data.traversDataList.union(name)
.groupBy(_.travelId).map(it =>
TravelsData(it._1, it._2.head.travelName, it._2.map(_.count).sum)
).toList
}
resultData.update(data)
out.collect(insertFlag, resultData.value().userId, resultData.value())
}
}).print("result")
env.execute("traveler")
}
}
Hasta ahora, la serialización personalizada y la deserialización de Kafka se han configurado. Creo que puede realizarlo usted mismo a través de los códigos clave anteriores.