6. kafka serialization and deserialization

https://blog.csdn.net/weixin_33690963/article/details/91698279

Serialization kafka: Manufacturer kafka incoming message before it needs to be serialized into byte, such as the value of the original message is a struct type Thrift, which is to be first custom serialization.

kafka deserialization: consumer after obtaining data from kafka the need to get the data de-serialization order to carry out the relevant business logic processing.

 

A correlation function:

ConsumerRecords API: 

ConsumerRecord API for receiving a record from Kafka cluster. This API is the subject name, partition number (record from receiver) and offset points Kafka recorded partition composition. ConsumerRecord class is used to create consumer records with a particular topic name, and partition count <key, value> of. Correct. It has the following signature.

public ConsumerRecord(string topic,int partition, long offset,K key, V value)
  • Topic - topic name of the user records received from Kafka cluster.

  • Partition - the theme of the partition.

  • Key - key record, if there is no null key will be returned.

  • Value - recorded content.

ConsumerRecords API acts as ConsumerRecord container. This API is used to save a particular topic ConsumerRecord list for each partition. Its constructor is defined as follows.

public ConsumerRecords(java.util.Map<TopicPartition,java.util.List <Consumer-Record>K,V>>> records)

    • TopicPartition - return to a specific topic partition map.
    • Record - ConsumerRecord return list.

II. Serialization

1. Traditional serialization

 

It is clear that there is a sequence of problems, although the store to meet the append mode, but can not read from the first n objects, each must first start reading from.

kafka architecture as a CS, C and S terminal end of the communication is required, the sequence of the target bulk transfer to the S terminal, to transmit data to the server-side batch.size (8K) or time to reach linger.ms (5ms), whereby inference communication terminal C and the terminal S is long connection should be used, and not every time transmission data is to open a socket, and also supports compression, storage, and highly efficient and stable communication is one of the essential characteristics of such software.

 2. transaction commits

  

 

 

It can be seen key and value are serialized into byte [], a length of each byte is skipped to skip a set of key-value, can therefore be looking for the first key-value is offset according to this

How Object sequence into a byte [], it is clear that all fields need to be serialized Object, and then finally converted into a sequence of basic data types and the like String.

 

The following is the Double serialization and de-serialization

 

Using a sequence comparison of the kafka, a Double 8 bytes, 

Using the serialization writeDouble occupies 14 bytes

 

Using the serialization writeObject occupies 84 bytes. Please be sure to caution, take up a lot of bytes, thereby accounting for memory and bandwidth. . .

 

Perhaps the serialized nature of the object is to convert a byte stream to be identified.

Why do not you sometimes think this store?

key1

value1

key2

value2

key3

value3

 

首先这种存储面临一个问题, 换行回车符号需进行转义存储。二是这种存储和读取高效么,一行一行进行处理。仅仅是推测,具体实现是不是这样没有考究。一行一行的读取无非是一个一个字节的读取,读到\n便是一行,这种读取方式应该不是很高明。

 

三. 实战

https://www.jianshu.com/p/35a432bcb006

https://www.jianshu.com/p/85c66aa52e52

Guess you like

Origin www.cnblogs.com/Lee-yl/p/11466457.html