Quick start to kafka. Follow the one-hour intensive course notes of kafka at station B

Quick start to kafka. Follow the one-hour intensive course notes of kafka at station B

Introduction: One-hour Introductory Introductory Course on kafka with the video of station B (high-definition remake without nonsense version).
Here are the notes of the follow-up. If you have time, I hope you can watch the original video.

1. Kafka origin and development

insert image description here
origin:

  • LinkedIn
  • Apache
  • Confluent

Introduction:

  • 0.9.0.x distributed message system
  • 0.10.0.x Distributed stream processing platform

Advantages of Kafka

  • High throughput and good performance
  • Good scalability, support online horizontal expansion
  • Fault Tolerance and Reliability
  • Closely integrated with big data ecology, it can seamlessly connect hadoop, strom, spark, etc.

release version

  • Confluent Platform
  • Cloudera Kafka
  • Hortonworks Kafka

2. Common message queues

2.1 JMS Specification Java Message Service API (Java Message Service)

(1). Queue - point-to-point

insert image description here

(2). Topic - Publish Subscribe

insert image description here

(3).Apache ActiveMQ

2.2 AMQP Advanced Message Queuing Protocol (Advanced Message Queuing Protocol)

2.2.1 AMQP model

  • queue queue
  • mailbox exchange
  • binding binding

insert image description here

Features: Features: Support transactions, high data consistency, mostly used in banking and financial industries

Typical middleware: RabbitMQ

2.3 MQTT Message Queuing Telemetry Transport

Widely used in IOT (Internet of Things)
Designed for sending short messages between small silent devices through low bandwidth

3. Topics, partitions, replicas, message brokers

3.1 Topic theme

It can be understood as a table in the database, and usually stores the same type of messages in the same topic. It's just that the tables in the database are structured, and Topic (topic) is semi-structured data. In special cases, different types of messages can also be stored in the same topic.
insert image description here

3.2 Partition partition

Topics can contain multiple partitions. Kafka is a distributed messaging system, and partitioning is the basis for its distribution. Partitioning makes Kafka scalable. Kafka splits the topic into multiple partitions, and different partitions are stored on different servers, which increases the scalability of Kafka.
insert image description here

3.3 Offset offset

Partitioning is a process that grows the commit log linearly. Once a message is stored in a partition, it cannot be changed.
Kafka records the location of each message by offset. The message can be extracted through the offset, but the content of the message cannot be retrieved or queried through the offset.
Offsets are unique, non-repeatable, and incremental within the same partition.
Offsets can be repeated across partitions.
insert image description here

3.4 Record message record

Messages in Kafka are stored in the form of key-value pairs. If no key is specified, it defaults to empty.
insert image description here
When no key is specified, Kafka writes messages to different partitions in the form of polling.
insert image description here
If the key of the message is specified, the same message will enter the same partition and be written sequentially in the same partition.
insert image description here

3.5 Replication copy

If there is only one copy of the partition, the reliability of the message cannot be guaranteed once it is down or damaged and lost. Kafka guarantees the reliability of the message through Replication (copy).
By replication-factorsetting the number of replicas.

Example: Here replication-factor = 3 means that there are 3 partitions in total.
insert image description here
Often kafka will save the partition through the mechanism of master-slave copy.
The primary partition is called the leader—the writing and reading of messages,
and the slave partition is called the follower—only responsible for copying data from the leader to ensure consistency.

ISR: The replica set being synchronized, in this case [101,102,103].
If a follower copy goes down and cannot synchronize data normally, or if the data differs too much from the leader's data, it will be removed from the ISR set, and will not be added to the ISR until the network is restored or the data is synchronized.
insert image description here

3.6 Broker message broker

Broker is responsible for data read and write requests and writes data to disk. Usually one instance of Broker is started in each server.
We often say that a server is a Broker.

insert image description here
Example:
Kafka cluster A contains 8 servers, that is, 8 Brokers, and the topics in the cluster have 8 partitions, namely p0-p7, and the replication factor = 3, that is, each partition has 3 copies. Each partition has 1 leader and 2 followers.
Take the first partition as an example, p1 is the leader, and Broker will make read and write requests for the p1 partition, while p0 and p2 are followers, and Broker will only perform leader copy operations on it.
insert image description here

3.7 Segment section

3.8 Producer producer

3.9 Consumer

3.10 Consumer Group Consumer Group

4. Environment construction - local pseudo-distributed

For the latest version, go directly to the last chapter, where zookeeper needs to be installed
Build a cluster deployment diagram:
insert image description here

Build in linux
Attach address:
kafka download
Extraction code: nmrs

After downloading, extract it to /optthe directory
insert image description here

tar -zxvf kafka_2.11-2.4.1.tgz

insert image description here

Kafka depends on zookeeper.
insert image description here
Let's start zookeeper first (installed before)

cd /opt/Zookeeper/apache-zookeeper-3.5.6-bin/bin

Start the zookeeper server

# 启动zookeeper服务端
./zkServer.sh start
# 查看zookeeper服务端状态
./zkServer.sh status

insert image description here

We simulated the construction of a cluster by configuring Kafka with 3 different ports.
Create an etc directory for storing configuration files

mkdir etc

insert image description here
copy configuration file

cp config/zookeeper.properties etc

insert image description here
View the contents of the configuration file

vim zookeeper.properties

insert image description here
The configuration is as follows:
insert image description here
Create a zookeeper data directory file

mkdir zkdata_kafka

copy configuration file

cp config/server.properties etc/server-0.properties
cp config/server.properties etc/server-1.properties
cp config/server.properties etc/server-2.properties

insert image description here
Enter the etc directory

cd etc

insert image description here

vim server-0.properties

The configuration is as follows:
insert image description here
Create a log directory
insert image description here

Configure log location

log.dirs=/opt/kafka_2.11-2.4.1/logs/kafka-logs-0

insert image description here
The same rule modifies server-1

vim server-1.properties

The configuration is as follows: ensure that the port does not conflict
insert image description here
log directory:

log.dirs=/opt/kafka_2.11-2.4.1/logs/kafka-logs-1

insert image description here
The same rules modify server-2

vim server-2.properties

The configuration is as follows: ensure that the port does not conflict
insert image description here
log directory:

log.dirs=/opt/kafka_2.11-2.4.1/logs/kafka-logs-2

insert image description here
Enter the bin directory to start

cd /opt/kafka_2.11-2.4.1/bin

Execute and start zookeeper in the directory

./zookeeper-server-start.sh  ../etc/zookeeper.properties 

insert image description here
Start kafka
and open three new windows to enter the bin directory to start

cd /opt/kafka_2.11-2.4.1/bin

Start separately

./kafka-server-start.sh  ../etc/server-0.properties 
./kafka-server-start.sh  ../etc/server-1.properties 
./kafka-server-start.sh  ../etc/server-2.properties 

insert image description here
insert image description here
insert image description here

After starting, you can create a new theme. We create a new session and enter the bin directory.

cd /opt/kafka_2.11-2.4.1/bin

create topic

./kafka-topics.sh --zookeeper localhost:2181 --create --topic test --partitions 3 --replication-factor 2

Create a topic topic, the number of partitions is 3, and the number of partition copies is 2
insert image description here
Check the partition status of the topic

./kafka-topics.sh --zookeeper localhost:2181 --describe --topic test

insert image description here

Create a new session window as a producer

cd /opt/kafka_2.11-2.4.1/bin
./kafka-console-producer.sh --broker-list localhost:9092,localhost:9093,localhost:9094 --topic test

send message
insert image description here

Create a new session window as a consumer

cd /opt/kafka_2.11-2.4.1/bin

monitor

./kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --topic test

Note: Start the consumer to listen to the message first, and then start the producer to send the message

For example: we let the producer send a message
insert image description here
and the consumer successfully listens
insert image description here

5. Listeners and internal and external networks

Configuration file configuration
insert image description here
Listener configuration:
insert image description here
insert image description here

5.1. Listener configuration and access (intranet)

insert image description here

5.2. Aliyun server configuration scenario (combination of external network and internal network)

insert image description here
This time the previous
insert image description here
example:

listeners=INTERNAL://:9092,EXTERNAL://0.0.0.0:9093
advertised.listeners=INTERNAL://kafka-0:9092,EXTERNAL://公网IP:9093
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PL AINTEXT
inter.broker.listener.name=INTERNAL

6. Environment construction - docker deployment kafka⭐

Go to Docker Hub to pull the image
insert image description here
Visit
kafka-github

The contents of the copied docker-compose.yml
insert image description here
are as follows:

# kraft通用配置
x-kraft: &common-config
  ALLOW_PLAINTEXT_LISTENER: yes
  KAFKA_ENABLE_KRAFT: yes
  KAFKA_KRAFT_CLUSTER_ID: MTIzNDU2Nzg5MGFiY2RlZg
  KAFKA_CFG_PROCESS_ROLES: broker,controller
  KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: BROKER:PLAINTEXT,CONTROLLER:PLAINTEXT
  KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka-1:9091,2@kafka-2:9091,3@kafka-3:9091
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: BROKER

# 镜像通用配置
x-kafka: &kafka
  image: 'bitnami/kafka:3.3.1'
  networks:
    net:

# 自定义网络
networks:
  net:

# project名称
name: kraft
services:
  
  # combined server
  kafka-1:
    <<: *kafka
    container_name: kafka-1
    ports:
      - '9092:9092'
    environment:
      <<: *common-config
      KAFKA_CFG_BROKER_ID: 1
      KAFKA_CFG_LISTENERS: CONTROLLER://:9091,BROKER://:9092
      KAFKA_CFG_ADVERTISED_LISTENERS: BROKER://192.168.2.187:9092 #宿主机IP

  kafka-2:
    <<: *kafka
    container_name: kafka-2
    ports:
      - '9093:9093'
    environment:
      <<: *common-config
      KAFKA_CFG_BROKER_ID: 2
      KAFKA_CFG_LISTENERS: CONTROLLER://:9091,BROKER://:9093
      KAFKA_CFG_ADVERTISED_LISTENERS: BROKER://192.168.2.187:9093 #宿主机IP

  kafka-3:
    <<: *kafka
    container_name: kafka-3
    ports:
      - '9094:9094'
    environment:
      <<: *common-config
      KAFKA_CFG_BROKER_ID: 3
      KAFKA_CFG_LISTENERS: CONTROLLER://:9091,BROKER://:9094
      KAFKA_CFG_ADVERTISED_LISTENERS: BROKER://192.168.2.187:9094 #宿主机IP

  #broker only
  kafka-4:
    <<: *kafka
    container_name: kafka-4
    ports:
      - '9095:9095'
    environment:
      <<: *common-config
      KAFKA_CFG_BROKER_ID: 4
      KAFKA_CFG_PROCESS_ROLES: broker
      KAFKA_CFG_LISTENERS: BROKER://:9095
      KAFKA_CFG_ADVERTISED_LISTENERS: BROKER://192.168.2.187:9095

insert image description here

After adding the internal and external network, the configuration file is as follows:

version: "3"

# 通用配置
x-common-config: &common-config
  ALLOW_PLAINTEXT_LISTENER: yes
  KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT

# kafka镜像通用配置
x-kafka: &kafka
  image: bitnami/kafka:3.2
  networks:
    net:
  depends_on:
    - zookeeper

services:

  zookeeper:
    container_name: zookeeper
    image: bitnami/zookeeper:3.8
    ports:
      - "2181:2181"
    environment:
      - ALLOW_ANONYMOUS_LOGIN=yes
    networks:
      - net
    volumes:
      - zookeeper_data:/bitnami/zookeeper

  kafka-0:
    container_name: kafka-0
    <<: *kafka
    ports:
      - "9093:9093"
    environment:
      <<: *common-config
      KAFKA_CFG_BROKER_ID: 0
      KAFKA_CFG_LISTENERS: INTERNAL://:9092,EXTERNAL://0.0.0.0:9093
      KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL://kafka-0:9092,EXTERNAL://192.168.131.10:9093 #修改为宿主机IP
    volumes:
      - kafka_0_data:/bitnami/kafka

  kafka-1:
    container_name: kafka-1
    <<: *kafka
    ports:
      - "9094:9094"
    environment:
      <<: *common-config
      KAFKA_CFG_BROKER_ID: 1
      KAFKA_CFG_LISTENERS: INTERNAL://:9092,EXTERNAL://0.0.0.0:9094
      KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL://kafka-1:9092,EXTERNAL://192.168.131.10:9094 #修改为宿主机IP
    volumes:
      - kafka_1_data:/bitnami/kafka

  kafka-2:
    container_name: kafka-2
    <<: *kafka
    ports:
      - "9095:9095"
    environment:
      <<: *common-config
      KAFKA_CFG_BROKER_ID: 2
      KAFKA_CFG_LISTENERS: INTERNAL://:9092,EXTERNAL://0.0.0.0:9095
      KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL://kafka-2:9092,EXTERNAL://192.168.131.10:9095 #修改为宿主机IP
    volumes:
      - kafka_2_data:/bitnami/kafka
      
  nginx:
      container_name: nginx
      hostname: nginx
      image: nginx:1.22.0-alpine
      volumes:
        - ./nginx.conf:/etc/nginx/nginx.conf:ro
      ports:
        - "9093-9095:9093-9095"
      depends_on: 
        - kafka-0
        - kafka-1
        - kafka-2


volumes:
  zookeeper_data:
    driver: local
  kafka_0_data:
    driver: local
  kafka_1_data:
    driver: local
  kafka_2_data:
    driver: local


networks:
  net:

Configure nginx.conf

stream {
	upstream kafka-0 {
		server kafka-0:9093;
	upstream kafka-1 {
		server kafka-1: 9094;
	upstream kafka-2 {
		server kafka-2:9095;
	}
	server {
		listen 9093;
		proxy_pass kafka-0;
	server {
		listen 9094;
		proxy_pass kafka-1;
	}
	server {
		listen 9095;
		proxy_ pass kafka-2;
	}
}

create folder

mk dir docker-kafka

inside the directory run

docker-compose up -d

Then check whether the image is created successfully'

docker ps -a

create topic

docker exec -it kafka-0 /opt/bitnami/kafka/bin/kafka-topics.sh \
--create --bootstrap-server kafka-0:9002 \
--topic my-topic \
--partiticons 3 --replication-factor 2

create consumer

docker exec -it kafka-0 /opt/bitnami/kafka/bin/kafka-console-consumer.sh \
--bootstrap-server kafka-0:9002 \
--topic my-topic \

create producer

docker exec -it kafka-0 /opt/bitnami/kafka/bin/kafka-console-producer.sh \
--bootstrap-server kafka-0:9002 \
--topic my-topic \

insert image description here

7. Message model and message sequence

A partition is the smallest unit of parallelism.
A consumer can consume multiple partitions.
A partition can be consumed by consumers in multiple consumer groups.
However, a partition cannot be consumed by multiple consumers in the same consumer group at the same time.

Example:
Consumer group A: C1, C2
Consumer group B: C3, C4, C5, C6
For example, partition P0 can be consumed by C1 of consumer group A and C3 of consumer group B.
But the P0 partition cannot be consumed by C1 and C2, because C1 and C2 are in the same consumer group.
insert image description here

7.1 Peer-to-peer

All consumers belong to the same consumer group

If a consumer 4 comes at this time,
insert image description here
consumer 4 can consume the P3 partition.
insert image description here
If consumer 2 hangs up at this time,
insert image description here
consumer 1 can consume the P1 partition
insert image description here

7.2 Publish Subscribe

Each consumer belongs to a different consumer group
insert image description here

7.3 Partitioning and message ordering

producer

  • For messages sent by the same producer to the same partition, the offset sent first is smaller than the offset sent later

insert image description here

Here offSet(M1) < offSer(M2)

  • The order of messages sent by the same producer to different partitions cannot be guaranteed

insert image description here

Here the M3 message and M4 message are placed in different partitions, so the offset size cannot be guaranteed

consumer

  • Consumers consume according to the order in which messages are stored in the partition

insert image description here

Consumption order: M1, M2, M3

  • Kafka only guarantees the order of messages within a partition, not the order of messages between partitions
    insert image description here

The order of consumption here is M4, M1, M2, M3
, because M4 and M1 are messages from different partitions, so it is not certain who will be consumed first.
However, M1, M2, and M3 are in the same partition, and the order of consumption is generally certain, but there is no guarantee that other messages will be interspersed in the middle.

Note:
1. Set up a partition, so that the order of all messages can be guaranteed, but the scalability and performance are lost.
2. By setting the key of the message, messages with the same key will be sent to the same partition

8. Message Passing Semantics

  • at least once
    messages are not lost, but may be repeated
  • At most one
    message may be lost, never resent
  • Exactly once
    to ensure that the message is delivered to the server and is not repeated on the server

Both producers and consumers are required to ensure that

producer

(1). At most once

Regardless of whether the Broker has received it or not, it will be sent once.
insert image description here

(2).At least once

The producer sends a message, and the Broker fails when it receives the receipt. Then the producer thinks that the Broker has not received the message after waiting for a timeout, and then resends the message.
insert image description here

consumer

(1). At most once

The consumer submits the consumption position first, sets offset + 1, and then reads the message. If the reading process fails, the message will be lost and cannot be read again.
insert image description here

(2).At least once

The consumer first reads the message, and then submits the consumption location. If it fails during the submission process, then the next time the message is read, the offset will not change, and it will continue to read the message from the previous time.
insert image description here

exactly once

Only implemented in Kafka 0.11.0 and later versions

9. Producer API

Asynchronous sending model

(1). Introduce dependencies

<dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>2.8.1</version>
        </dependency>

(2). Write producer code

public class AvroProducer {
    
    
    public static void main(String[] args) {
    
    
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9093");
        props.put("linger.ms", 1);
        props.put("key.serializer", StringSerializer.class.getName());
        props.put("value.serializer", AvroSerializer.class.getName());

        User user = User.newBuilder().setFavoriteNumber(1).setUserId(10001l).setName("jeff").setFavoriteColor("red").build();
        ProductOrder order = ProductOrder.newBuilder().setOrderId(2000l).setUserId(user.getUserId()).setProductId(101l).build();

        Producer<String, Object> producer = new KafkaProducer<>(props);
        // 发送user消息
        for (int i = 0; i < 10; i++) {
    
    
            Iterable<Header> headers = Arrays.asList(new RecordHeader("schema", user.getClass().getName().getBytes()));
            producer.send(new ProducerRecord<String, Object>("my-topic", null, "" + user.getUserId(), user, headers));
        }
        // 发送order消息
        for (int i = 10; i < 20; i++) {
    
    
            Iterable<Header> headers = Arrays.asList(new RecordHeader("schema", order.getClass().getName().getBytes()));
            producer.send(new ProducerRecord<String, Object>("my-topic", null, "" + order.getUserId(), order, headers));
        }

        System.out.println("send successful");
        producer.close();

    }
}

The producer mainly uses send to send messages.
After the producer puts the message into the buffer of the corresponding partition, it returns the result and continues to send the next message.
The messages in the buffer are handed over to the Broker by the background startup thread for processing.
insert image description here

(3).View the result

1. Initialize the connection
2. At the same time, the producer puts the message into the buffer
3. Send a success message after the connection is created

# 初始化连接
[2023-04-02 15:04:12,467] TRACE [main] Added sensor with name connections-created: (org.apache.kafka.common.metrics.Metrics)
# 与此同时生产者将消息放入缓冲区
[name=record-queue-time-avg, group=producer-metrics, description=The average time in ms record batches spent in the send buffer., tags={
    
    client-id=producer-1}] (org.apache.kafka.common.metrics.Metrics)
# 发送成功
[RecordHeader(key = schema, value = [111, 110, 101, 104, 111, 117, 114, 46, 107, 97, 102, 107, 97, 46, 101, 120, 97, 109, 112, 108, 101, 46, 97, 118, 114, 111, 46, 80, 114, 111, 100, 117, 99, 116, 79, 114, 100, 101, 114])], isReadOnly = true), key=10001, value={
    
    "order_id": 2000, "product_id": 101, "user_id": 10001}, timestamp=null) with callback null to topic my-topic partition 0 (org.apache.kafka.clients.producer.KafkaProducer)
send successful

send synchronously

Future<RecordMetadata> result = 
		producer.send(new ProducerRecord<String, String>("mytopic", "" + (i % 5), Integer.toString(i)));
		try {
    
       
			  RecordMetadata recordMetadata = result.get();
		} catch (ExecutionException e) {
    
        
			e.printStackTrace();
		}

code show as below:

 /**
     * 同步发送消息
     *
     * @param args
     */
    public static void main(String[] args) {
    
    
        Properties props = new Properties();
        props.put("bootstrap.servers", "192.168.10.17:9093");
        props.put("linger.ms", 1);
        props.put("key.serializer", StringSerializer.class.getName());
        props.put("value.serializer", AvroSerializer.class.getName());

        Producer<String, String> producer = new KafkaProducer<>(props);
        // 发送user消息
        for (int i = 0; i < 20; i++) {
    
    
            Future<RecordMetadata> result =
                    producer.send(new ProducerRecord<String, String>("mytopic", "" + (i % 5), Integer.toString(i)));
            try {
    
    
                RecordMetadata recordMetadata = result.get();
            } catch (ExecutionException | InterruptedException e) {
    
    
                e.printStackTrace();
            }
        }

        System.out.println("send successful");
        producer.close();

    }

Result:
After one message is sent, another is sent

[2023-04-02 15:16:43,381] DEBUG [kafka-producer-network-thread | producer-1] [Producer clientId=producer-1] Sending METADATA request with header RequestHeader(apiKey=METADATA, apiVersion=9, clientId=producer-1, correlationId=1) and timeout 30000 to node -1: MetadataRequestData(topics=[MetadataRequestTopic(topicId=AAAAAAAAAAAAAAAAAAAAAA, name='mytopic')], allowAutoTopicCreation=true, includeClusterAuthorizedOperations=false, includeTopicAuthorizedOperations=false) (org.apache.kafka.clients.NetworkClient)

[2023-04-02 15:16:43,382] DEBUG [kafka-producer-network-thread | producer-1] [Producer clientId=producer-1] Sending transactional request InitProducerIdRequestData(transactionalId=null, transactionTimeoutMs=2147483647, producerId=-1, producerEpoch=-1) to node 192.168.10.17:9093 (id: -1 rack: null) with correlation ID 2 (org.apache.kafka.clients.producer.internals.Sender)

Batch sending

 /**
     * 批量发送消息
     *
     * @param args
     */
    public static void main(String[] args) {
    
    
        Properties props = new Properties();
        props.put("bootstrap.servers", "192.168.10.17:9093");
        props.put("key.serializer", StringSerializer.class.getName());
        props.put("value.serializer", StringSerializer.class.getName());
        // 批量发送
        // 每一批消息最大大小
        props.put("batch.size", 16384);
        // 延迟时间
        props.put("linger.ms", 1000);

        Producer<String, String> producer = new KafkaProducer<>(props);
        // 发送user消息
        for (int i = 0; i < 20; i++) {
    
    
            producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i)));
            try {
    
    
                Thread.sleep(100);
            } catch (InterruptedException e) {
    
    
                e.printStackTrace();
            }
        }

        System.out.println("send successful");
        producer.close();

    }

view log

[2023-04-02 15:29:25,218] TRACE [kafka-producer-network-thread | producer-1] Added sensor with name topic.my-topic.bytes (org.apache.kafka.common.metrics.Metrics)

[2023-04-02 15:29:25,218] TRACE [kafka-producer-network-thread | producer-1] Registered metric named MetricName [name=byte-total, group=producer-topic-metrics, description=The total number of bytes sent for a topic., tags={
    
    client-id=producer-1, topic=my-topic}] (org.apache.kafka.common.metrics.Metrics)

ack attribute and retries attribute

Among them, the acks attribute is very important, explained as follows:
acks: It is the notification of the message
acks: -1 The leader and follower have both received the message, or set to acks:all
acks: 0 After the producer puts the message in the buffer, it returns directly (at most once)
acks: 1 The message has been received by the leader, but it is not known whether the follower has synchronized

Here we set it to all
insert image description here
retries: how many times to retry after failure, the default is 0.

  • at most once

acks = 0 or acks = 1

  • at least once

acks = - 1/all retries > 0

10. Consumer API

(1). Introduce dependencies

consistent with the producer

<dependency>
          <groupId>org.apache.kafka</groupId>
          <artifactId>kafka-clients</artifactId>
          <version>2.8.1</version>
</dependency>

There is a topic in Kafka, __consumer_offsets,
which is used to save which topic and which partition the consumer consumes to, which consumption position
is conducive to quick recovery
, and the committed position shall prevail

Autocommit (up to one time)

enable.auto.commit: Indicates automatic submission
auto.commit.interval.ms: Indicates automatic submission every millisecond

props.put("enable.auto.commit", "true"); 
props.put("auto.commit.interval.ms", "1000"); 

Once the consumer polls the message, it means that the offset position is submitted

ConsumerRecords<String, String> records = consumer.poll(100); 

The complete code is as follows:

 /**
     * 自动提交(最多一次)
     *
     * @param args
     */
    public static void main(String[] args) {
    
    
        Properties props = new Properties();
        props.setProperty("bootstrap.servers", "192.168.10.17:9093");
        props.setProperty("group.id", "group-1");
        props.setProperty("key.deserializer", StringDeserializer.class.getName());
        props.setProperty("value.deserializer", StringDeserializer.class.getName());
        // 支持自动提交
        props.setProperty("enable.auto.commit", "true");
        // 表示每隔多少秒自动提交一次
        props.setProperty("auto.commit.interval.ms", "1000");

        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList("my-topic"));
        while (true) {
    
    
            ConsumerRecords<String, String> records = consumer.poll(100);
            // 打印消息
            for (ConsumerRecord<String, String> record : records) {
    
    
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
            }
        }
    }

Commit manually (at least once)

turn autocommit off

props.put("enable.auto.commit", "false");

Manual submission needs to be called with parameters for batch submission

consumer.commitSync();//批量提交

The complete code is as follows:

/**
     * 手动提交(至少一次)
     *
     * @param args
     */
    public static void main(String[] args) {
    
    
        Properties props = new Properties();
        props.setProperty("bootstrap.servers", "192.168.10.17:9093");
        props.setProperty("group.id", "group-1");
        props.setProperty("key.deserializer", StringDeserializer.class.getName());
        props.setProperty("value.deserializer", StringDeserializer.class.getName());
        // 支持自动提交
        props.setProperty("enable.auto.commit", "false");

        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList("my-topic"));
        final int minBatchSize = 20;
        List<ConsumerRecord<String, String>> buffer = new ArrayList<>();

        while (true) {
    
    
            ConsumerRecords<String, String> records = consumer.poll(100);
            // 打印消息
            for (ConsumerRecord<String, String> record : records) {
    
    
                buffer.add(record);
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
            }
            if (buffer.size() >= minBatchSize) {
    
    
                consumer.commitSync();//批量提交
                buffer.clear();
            }

        }
    }

But if we want to add a few items one by one, we need to add parameters

long lastOffset = partitionRecords.get(partitionRecords.size() - 1).offset();
                 consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));

The complete code is as follows:

    /**
     * 手动提交(至少一次)逐条提交
     *
     * @param args
     */
    public static void main(String[] args) {
    
    
        Properties props = new Properties();
        props.setProperty("bootstrap.servers", "192.168.10.17:9093");
        props.setProperty("group.id", "group-1");
        props.setProperty("key.deserializer", StringDeserializer.class.getName());
        props.setProperty("value.deserializer", StringDeserializer.class.getName());
        // 支持自动提交
        props.setProperty("enable.auto.commit", "false");

        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList("my-topic"));

        while (true) {
    
    
            ConsumerRecords<String, String> records = consumer.poll(Long.MAX_VALUE);
            for (TopicPartition partition : records.partitions()) {
    
    
                List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);

                for (ConsumerRecord<String, String> record : partitionRecords) {
    
    
                    System.out.println(record.offset() + ": " + record.value());
                }

                long lastOffset = partitionRecords.get(partitionRecords.size() - 1).offset();
                consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));
            }
        }
    }

Kafka also supports manually specifying consumption partitions and consumption locations
Specifying consumption partitions

String topic = "foo";
TopicPartition partition0 = new TopicPartition(topic, 0);
TopicPartition partition1 = new TopicPartition(topic, 1);
consumer. assign(Arrays.asList(partition0, partition1));

Designated consumption location

seek(TopicPartition, long)

11. Exactly once

producer

The configuration is as follows:
enable.idempotence: idempotence

props.setProperty("enable.idempotence", "true");
props.setProperty("acks", "all");

consumer

It is not a good way to prevent double consumption through offset.
Usually, a unique ID (such as flow ID, order ID) is added to the message. When processing business, the ID is judged to prevent repeated processing.

12. Transactional messages

Transactions must satisfy atomicity, either all succeed or all fail
insert image description here
Code example:

 /**
     * 事务
     *
     * @param args
     */
    public static void main(String[] args) {
    
    
        Properties props = new Properties();
        props.put("bootstrap.servers", "192.168.10.17:9093");
        props.put("transactional.id", "my-transactional-id");
        Producer<String, String> producer = new KafkaProducer<>(props, new StringSerializer(), new StringSerializer());
        producer.initTransactions();
        try {
    
    
            producer.beginTransaction();
            for (int i = 0; i < 100; i++) {
    
    
                producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i)));
            }
            producer.commitTransaction();
        } catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
    
    
            // We can't recover from these exceptions, so our only option is to close the producer and exit.
            producer.close();
        } catch (KafkaException e) {
    
    
            // For al1 other exceptions, just abort the transaction and try again.
            producer.abortTransaction();

        } finally {
    
    
            producer.close();

        }
    }

Transaction isolation level
Isolation_level Isolation level
The default is: read_uncommitted Dirty read
read_committed Read successfully committed data without dirty read

For example, we configure as follows:
Run in the bin directory

cd /opt/Zookeeper/apache-zookeeper-3.5.6-bin/bin
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mytopic --isolation-level read_committed
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mytopic --isolation-level read_uncommitted

13. Serialization and Avro

Objects are transferred between networks in binary form or saved to files, and can be restored according to specific rules.
insert image description here
insert image description here

13.1 Advantages of Serialization

1. Save space and improve network transmission efficiency
2. Cross-platform
3. Cross-language

Format of kafka Record message
insert image description here

13.2 Nine serialization types provided by kafka

  • Kafka provides 9 basic types of serialization and deserialization, under the org.apache.kafka.common.serialization package
Serialization deserialization
ByteArraySerializer ByteArrayDeserializer
ByteBufferSerializer ByteBufferDeserializer
BytesSerializer BytesDeserializer
ShortSerializer ShortDeserializer
IntegerSerializer IntegerDeserializer
LongSerializer LongDeserializer
FloatSerializer FloatDeserializer
DoubleSerializer DoubleDeserializer
StringSerializer StringDeserializer (String uses UTF8 character set by default)

13.3 Custom serialization

Serialization needs to implement

Package org.apache.kafka.common.serialization
Interface Serializer<T>

Deserialization needs to implement

Package org.apache.kafka.common.serialization
Interface Deserializer<T>

13.4 Common message formats

  • CSV
    is good for simple messages

  • JSON
    is highly readable and takes up a lot of space,
    making it suitable for ElasticSearch

  • Serialized message
    Avro: Hadoop, Hive support
    Protobuf well

  • Custom Serialization
    Avro and Schema

13.5 Use of Avro

Generally combined with big data, we often use Avro
(1) first need to introduce dependencies

<dependency>
      <groupId>org.apache.avro</groupId>
      <artifactId>avro</artifactId>
      <version>1.11.0</version>
</dependency>
<plugin>
                <groupId>org.apache.avro</groupId>
                <artifactId>avro-maven-plugin</artifactId>
                <version>1.11.0</version>
                <executions>
                    <execution>
                        <phase>generate-sources</phase>
                        <goals>
                            <goal>schema</goal>
                        </goals>
                        <configuration>
                            <sourceDirectory>./src/main/avro/</sourceDirectory>
                            <outputDirectory>./src/main/java/</outputDirectory>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

(2) Define a Schema
and create a new User.avsc
insert image description here

{
    
    "namespace": "onehour.kafka.example.avro",
  "type": "record",
  "name": "User",
  "fields": [
    {
    
    "name": "name", "type": "string"},
    {
    
    "name": "favorite_number",  "type": ["int", "null"]},
    {
    
    "name": "favorite_color", "type": ["string", "null"]}
  ]
}

(3) Run the plug-
insert image description here
in Generate after the plug-in runs:
insert image description here
copy this class to
insert image description here

(4) Write the producer

 /**
     * Avro发送消息
     *
     * @param args
     */
    public static void main(String[] args) {
    
    
        Properties props = new Properties();
        props.put("bootstrap.servers", "192.168.10.17:9093");
        props.put("linger.ms", 1);
        props.put("key.serializer", StringSerializer.class.getName());
        props.put("value.serializer", AvroSerializer.class.getName());

        User user = User.newBuilder()
                .setName("jeff")
                .setFavoriteColor("red")
                .setFavoriteNumber(7)
                .build();

        Producer<String, Object> producer = new KafkaProducer<>(props);
        // 发送user消息
        for (int i = 0; i < 10; i++) {
    
    
            producer.send(new ProducerRecord<String, Object>("my-topic", Integer.toString(i), user));
        }

        System.out.println("send successful");
        producer.close();

    }

(5) You need to write your own class to implement the Serializer interface for serialization
AvroSerializer.java code is as follows

package onehour.kafka.example.serialization;

import onehour.kafka.example.avro.v1.User;
import org.apache.avro.message.BinaryMessageEncoder;
import org.apache.kafka.common.header.Header;
import org.apache.kafka.common.header.Headers;
import org.apache.kafka.common.serialization.Serializer;
import org.apache.kafka.common.serialization.StringSerializer;

import java.io.IOException;
import java.lang.reflect.Method;
import java.util.HashMap;
import java.util.Map;

public class AvroSerializer implements Serializer {
    
    

    public static final StringSerializer Default = new StringSerializer();
    private static final Map ENCODERS = new HashMap();

    @Override
    public void configure(Map map, boolean b) {
    
    

    }

    @Override
    /**
     * 根据topic对应的类型序列化
     */
    public byte[] serialize(String topic, Object o) {
    
    
        if (topic.equals("my-topic")) {
    
    
            try {
    
    
                return User.getEncoder().encode((User) o).array();
            } catch (IOException e) {
    
    
                throw new RuntimeException(e);
            }
        }

        return Default.serialize(topic, o.toString());
    }

    @Override
    public void close() {
    
    

    }
}


(6) You need to write your own class to implement the Deserializer interface to deserialize
the AvroDeserializer code as follows

package onehour.kafka.example.serialization;

import onehour.kafka.example.avro.v1.User;
import org.apache.avro.message.BinaryMessageDecoder;
import org.apache.kafka.common.header.Header;
import org.apache.kafka.common.header.Headers;
import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.io.IOException;
import java.lang.reflect.Method;
import java.util.HashMap;
import java.util.Map;

public class AvroDeserializer implements Deserializer {
    
    

    public static final StringDeserializer Default = new StringDeserializer();
    private static final Map DECODERS = new HashMap<>();

    @Override
    public void configure(Map map, boolean b) {
    
    

    }

    @Override
    /**
     * 根据topic对应的类型反序列化
     */
    public Object deserialize(String topic, byte[] bytes) {
    
    
        if (topic.equals("my-topic")) {
    
    
            try {
    
    
                return User.getDecoder().decode(bytes);
            } catch (IOException e) {
    
    
                throw new RuntimeException(e);
            }
        }

        return Default.deserialize(topic, bytes);
    }

    @Override
    public void close() {
    
    

    }
}

(7) Write consumers

 /**
     * Avro
     *
     * @param args
     */
    public static void main(String[] args) {
    
    
        Properties props = new Properties();
        props.setProperty("bootstrap.servers", "192.168.10.17:9093");
        props.setProperty("group.id", "test");
        props.setProperty("enable.auto.commit", "true");
        props.setProperty("auto.commit.interval.ms", "1000");
        props.setProperty("key.deserializer", StringDeserializer.class.getName());
        props.setProperty("value.deserializer", AvroDeserializer.class.getName());

        KafkaConsumer<String, User> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList("my-topic"));
        while (true) {
    
    
            ConsumerRecords<String, User> records = consumer.poll(100);
            // 打印消息
            for (ConsumerRecord<String, User> record : records) {
    
    
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
            }
        }
    }

Result: Consumer

offset = 241, key = 0, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red"}
offset = 242, key = 1, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red"}
offset = 243, key = 2, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red"}
offset = 244, key = 3, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red"}
offset = 245, key = 4, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red"}
offset = 246, key = 5, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red"}
offset = 247, key = 6, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red"}
offset = 248, key = 7, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red"}
offset = 249, key = 8, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red"}
offset = 250, key = 9, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red"}

14. Record Header usage

insert image description here
If you follow the previous explanation, each topic corresponds to an entity class, and you can use the topic name when writing the serialization method.
insert image description here
But the reality is often not the case.
Example:
After a new user successfully registers, purchases an order and then cancels the order.
The order between these events is important.
Kafka does not guarantee the order between partitions. If the message of canceling the order is before the user is registered or the product is purchased, there will be a problem with the processing logic.
insert image description here
The solution is as follows:
In this case.
In order to ensure smooth consumption, all events are placed in the same partition of the same topic. So use the user ID as the partition key so that they are in the same partition.
insert image description here
In this case, there will be multiple different types of partitions in the same topic, and there will be problems in judging the serialization method according to the topic name in the above method.

Confluent provides a solution: Schema Registry.

  • Producers send the schema and structure of the message to the Registry before sending the message. Registry returns an id and the data data itself to Kafka.
  • After consumers get the message, they first read the id, then parse the message schema (schema) in the Registry, and then get the data through the schema.

Disadvantages:
1. Data analysis strongly depends on schema registry
2. Destroys the data itself
insert image description here

Another solution is to introduce the Record Header
insert image description here
to modify the code and add the order avsc
product_order.avsc
insert image description here

{
    
    
  "namespace": "onehour.kafka.example.avro",
  "type": "record",
  "name": "ProductOrder",
  "fields": [
    {
    
    "name": "order_id", "type": "long"},
    {
    
    "name": "product_id", "type": "long"},
    {
    
    "name": "user_id", "type": "long"}
  ]
}

Modify the user, add userid
User.v1.avsc
insert image description here
code as follows:

{
    
    "namespace": "onehour.kafka.example.avro.v1",
  "type": "record",
  "name": "User",
  "fields": [
    {
    
    "name": "name", "type": "string"},
    {
    
    "name": "favorite_number",  "type": ["int", "null"]},
    {
    
    "name": "favorite_color", "type": ["string", "null"]},
    {
    
    "name": "user_id", "type": "long"}
  ]
}

Use plug-ins to generate entity classes,
insert image description here
copy the past,
insert image description here
and modify the producer code

    /**
     * Avro发送消息(Record Header)
     *
     * @param args
     */
    public static void main(String[] args) {
    
    
        Properties props = new Properties();
        props.put("bootstrap.servers", "192.168.10.15:9093");
        props.put("linger.ms", 10);
        props.put("key.serializer", StringSerializer.class.getName());
        props.put("value.serializer", AvroSerializer.class.getName());

        onehour.kafka.example.avro.v1.User user = User.newBuilder()
                .setUserId(10001L)
                .setName("jeff")
                .setFavoriteColor("red")
                .setFavoriteNumber(7)
                .build();
        ProductOrder order = ProductOrder.newBuilder().setOrderId(2000L).setUserId(user.getUserId()).setProductId(101L).build();

        Producer<String, Object> producer = new KafkaProducer<>(props);
        // 发送user消息
        for (int i = 0; i < 10; i++) {
    
    
            Iterable<Header> headers = Arrays.asList(new RecordHeader("schema", user.getClass().getName().getBytes()));
            producer.send(new ProducerRecord<String, Object>("my-topic", null, "" + user.getUserId(), user, headers));
        }
        // 发送order消息
        for (int i = 10; i < 20; i++) {
    
    
            Iterable<Header> headers = Arrays.asList(new RecordHeader("schema", order.getClass().getName().getBytes()));
            producer.send(new ProducerRecord<String, Object>("my-topic", null, "" + order.getUserId(), order, headers));
        }

        System.out.println("send successful");
        producer.close();

    }

Change the serialization class
AvroSerializerHeader.java
Here I created a new class

package onehour.kafka.example.serialization;

import onehour.kafka.example.avro.User;
import org.apache.avro.message.BinaryMessageEncoder;
import org.apache.kafka.common.header.Header;
import org.apache.kafka.common.header.Headers;
import org.apache.kafka.common.serialization.ExtendedSerializer;
import org.apache.kafka.common.serialization.Serializer;
import org.apache.kafka.common.serialization.StringSerializer;

import java.io.IOException;
import java.lang.reflect.Method;
import java.util.HashMap;
import java.util.Map;

public class AvroSerializerHeader implements ExtendedSerializer {
    
    

    public static final StringSerializer Default = new StringSerializer();
    private static final Map ENCODERS = new HashMap();

    @Override
    public void configure(Map map, boolean b) {
    
    

    }

    @Override
    /**
     * 根据topic对应的类型序列化
     */
    public byte[] serialize(String topic, Object o) {
    
    
        if (topic.equals("my-topic")) {
    
    
            try {
    
    
                return User.getEncoder().encode((User) o).array();
            } catch (IOException e) {
    
    
                throw new RuntimeException(e);
            }
        }

        return Default.serialize(topic, o.toString());
    }

    @Override
    public void close() {
    
    

    }

    @Override
    /**
     * 使用header中的schema信息进行序列化
     */
    public byte[] serialize(String topic, Headers headers, Object o) {
    
    

        if (o == null) {
    
    
            return null;
        }

        // 从header中读取schema
        String className = null;
        for (Header header : headers) {
    
    
            if (header.key().equals("schema")) {
    
    
                className = new String(header.value());
            }
        }

        // 使用schema中的className进行序列化
        if (className != null) {
    
    
            try {
    
    
                BinaryMessageEncoder encoder = (BinaryMessageEncoder) ENCODERS.get(className);
                if (encoder == null) {
    
    
                    Class cls = Class.forName(className);
                    Method method = cls.getDeclaredMethod("getEncoder");
                    encoder = (BinaryMessageEncoder) method.invoke(cls);
                    ENCODERS.put(className, encoder);
                }
                return encoder.encode(o).array();
            } catch (Exception e) {
    
    
                throw new RuntimeException(e);
            }
        }

        // 如果header中没有schema信息,则根据topic对应的类型进行序列化
        return this.serialize(topic, o);
    }
}

Change the deserialization class
Here I created a new class
AvroDeserializerHeader.java

package onehour.kafka.example.serialization;

import onehour.kafka.example.avro.User;
import org.apache.avro.message.BinaryMessageDecoder;
import org.apache.kafka.common.header.Header;
import org.apache.kafka.common.header.Headers;
import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.ExtendedDeserializer;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.io.IOException;
import java.lang.reflect.Method;
import java.util.HashMap;
import java.util.Map;

public class AvroDeserializerHeader implements ExtendedDeserializer {
    
    

    public static final StringDeserializer Default = new StringDeserializer();
    private static final Map DECODERS = new HashMap<>();

    @Override
    public void configure(Map map, boolean b) {
    
    

    }

    @Override
    /**
     * 根据topic对应的类型反序列化
     */
    public Object deserialize(String topic, byte[] bytes) {
    
    
        if (topic.equals("my-topic")) {
    
    
            try {
    
    
                return User.getDecoder().decode(bytes);
            } catch (IOException e) {
    
    
                throw new RuntimeException(e);
            }
        }

        return Default.deserialize(topic, bytes);
    }

    @Override
    public void close() {
    
    

    }

    //@Override

    /**
     * 使用header中的schema信息进行反序列化
     */
    @Override
    public Object deserialize(String topic, Headers headers, byte[] bytes) {
    
    
        if (bytes == null) {
    
    
            return null;
        }

        // 从header中读取schema
        String className = null;
        for (Header header : headers) {
    
    
            if (header.key().equals("schema")) {
    
    
                className = new String(header.value());
            }
        }

        // 使用schema中的className进行反序列化
        if (className != null) {
    
    
            try {
    
    
                BinaryMessageDecoder decoder = (BinaryMessageDecoder) DECODERS.get(className);
                if (decoder == null) {
    
    
                    Class cls = Class.forName(className);
                    Method method = cls.getDeclaredMethod("getDecoder");
                    decoder = (BinaryMessageDecoder) method.invoke(cls);
                    DECODERS.put(className, decoder);
                }
                return decoder.decode(bytes);
            } catch (Exception e) {
    
    
                throw new RuntimeException(e);
            }
        }

        // 如果header中没有schema信息,则根据topic对应的类型反序列化
        return this.deserialize(topic, bytes);
    }
}

modify consumer

/**
     * Avro
     *
     * @param args
     */
    public static void main(String[] args) {
    
    
        Properties props = new Properties();
        props.setProperty("bootstrap.servers", "192.168.10.15:9093");
        props.setProperty("group.id", "test");
        props.setProperty("enable.auto.commit", "true");
        props.setProperty("auto.commit.interval.ms", "1000");
        props.setProperty("key.deserializer", StringDeserializer.class.getName());
        props.setProperty("value.deserializer", AvroDeserializerHeader.class.getName());

        KafkaConsumer<String, User> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList("my-topic"));
        while (true) {
    
    
            ConsumerRecords<String, User> records = consumer.poll(100);
            // 打印消息
            for (ConsumerRecord<String, User> record : records) {
    
    
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
                for (Header header : record.headers()) {
    
    
                    System.out.println("headers -->" + header.key() + ":" + new String(header.value()));
                }
            }
        }
    }

operation result:

offset = 251, key = 10001, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red", "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.v1.User
offset = 252, key = 10001, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red", "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.v1.User
offset = 253, key = 10001, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red", "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.v1.User
offset = 254, key = 10001, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red", "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.v1.User
offset = 255, key = 10001, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red", "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.v1.User
offset = 256, key = 10001, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red", "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.v1.User
offset = 257, key = 10001, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red", "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.v1.User
offset = 258, key = 10001, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red", "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.v1.User
offset = 259, key = 10001, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red", "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.v1.User
offset = 260, key = 10001, value = {
    
    "name": "jeff", "favorite_number": 7, "favorite_color": "red", "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.v1.User
offset = 261, key = 10001, value = {
    
    "order_id": 2000, "product_id": 101, "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.ProductOrder
offset = 262, key = 10001, value = {
    
    "order_id": 2000, "product_id": 101, "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.ProductOrder
offset = 263, key = 10001, value = {
    
    "order_id": 2000, "product_id": 101, "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.ProductOrder
offset = 264, key = 10001, value = {
    
    "order_id": 2000, "product_id": 101, "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.ProductOrder
offset = 265, key = 10001, value = {
    
    "order_id": 2000, "product_id": 101, "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.ProductOrder
offset = 266, key = 10001, value = {
    
    "order_id": 2000, "product_id": 101, "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.ProductOrder
offset = 267, key = 10001, value = {
    
    "order_id": 2000, "product_id": 101, "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.ProductOrder
offset = 268, key = 10001, value = {
    
    "order_id": 2000, "product_id": 101, "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.ProductOrder
offset = 269, key = 10001, value = {
    
    "order_id": 2000, "product_id": 101, "user_id": 10001}
headers -->schema:onehour.kafka.example.avro.ProductOrder
offset = 270, key = 10001, value = {
    
    "order_id": 2000, "product_id": 101, "user_id": 10001}

Here version 1.0 implements ExtendedSerializer, and
version 3.0 of ExtendedDeserializer implements Serializer and Deserializer
insert image description here

15. Introduction to Kafka KRaft mode

Kafka version 2.8 introduces a major improvement: the KRaft mode. This feature has been in the experimental stage.
On October 3, 2022, Kafka 3.3.1 was released, officially declaring that the KRaft mode can be used in a production environment.
In KRaft mode, all cluster metadata is stored in Kafka's internal topics, which are managed by Kafka itself and no longer rely on zookeeper.


The KRaft pattern has many advantages:

  • Simplified cluster deployment and management – ​​Zookeeper is no longer required, which simplifies the deployment and management of Kafka clusters. Resource footprint is smaller.
  • Improved scalability and resiliency – the number of partitions in a single cluster can scale to millions. Faster cluster restart and failure recovery times.
  • More Efficient Metadata Propagation – Log-based, event-driven metadata propagation improves the performance of many core Kafka functions.

At present, KRaft is only applicable to new clusters. To migrate an existing cluster from zookeeper mode to KRaft mode, you need to wait for version 3.5 .
3.5 is a bridge version that will officially deprecate zookeeper mode.
Kafka 4.0 (expected to be released in August 2023) will completely delete the zookeeper mode and only support the KRaft mode.

Note: There are major bugs in Kafka 3.3.0 , and it is recommended not to use it.

15.1 KRaft Deployment

(1). Single node deployment

  • Generate cluster uuid

Use the tools provided by kafka

./bin/kafka-storage.sh random-uuid
# 输入结果如下
# xtzWWN4bTjitpL3kfd9s5g

You can also generate it by yourself. The uuid of the kafka cluster should be base64 encoded with 16 bytes, and the length is 22

#集群的uuid应为16个字节的base64编码,长度为22
echo -n "1234567890abcdef" | base64 | cut -b 1-22
# MTIzNDU2Nzg5MGFiY2RlZg
  • format storage directory
./bin/kafka-storage.sh format -t xtzWWN4bTjitpL3kfd9s5g \
                       -c ./config/kraft/server.properties
# Formatting /tmp/kraft-combined-logs

NOTE: If installing multiple nodes, each node will need to be formatted.

  • start kafka
./bin/kafka-server-start.sh ./config/kraft/server.properties
  • configuration file
# The role of this server. Setting this puts us in KRaft mode
process.roles=broker,controller
# The node id associated with this instance's roles
node.id=1
# The connect string for the controller quorum
controller.quorum.voters=1@localhost:9093
# Combined nodes (i.e. those with `process.roles=broker,controller`) must list the controller listener here at a minimum.
listeners=PLAINTEXT://:9092,CONTROLLER://:9093
# Name of listener used for communication between brokers.
inter.broker.listener.name=PLAINTEXT
# 如果要从别的主机访问,将localhost修改为你的主机IP
advertised.listeners=PLAINTEXT://localhost:9092
# This is required if running in KRaft mode.
controller.listener.names=CONTROLLER
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# A comma separated list of directories under which to store log files
log.dirs=/tmp/kraft-combined-logs

(2).docker compose deployment

insert image description here

In Kraft mode, the nodes of the cluster can be set as controllers or borkers, or they can play both roles at the same time.
The broker is responsible for processing message requests and storing topic partition logs, and the controller is responsible for managing metadata and directing the broker to respond according to changes in metadata.
Controllers only occupy a small part of the cluster, generally an odd number (1, 3, 5, 7), and can tolerate no more than half of the node failures.

# kraft通用配置
x-kraft: &common-config
  ALLOW_PLAINTEXT_LISTENER: yes
  KAFKA_ENABLE_KRAFT: yes
  KAFKA_KRAFT_CLUSTER_ID: MTIzNDU2Nzg5MGFiY2RlZg
  KAFKA_CFG_PROCESS_ROLES: broker,controller
  KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER
  KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: BROKER:PLAINTEXT,CONTROLLER:PLAINTEXT
  KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka-1:9091,2@kafka-2:9091,3@kafka-3:9091
  KAFKA_CFG_INTER_BROKER_LISTENER_NAME: BROKER

# 镜像通用配置
x-kafka: &kafka
  image: 'bitnami/kafka:3.3.1'
  networks:
    net:

# 自定义网络
networks:
  net:

# project名称
name: kraft
services:
  
  # combined server
  kafka-1:
    <<: *kafka
    container_name: kafka-1
    ports:
      - '9092:9092'
    environment:
      <<: *common-config
      KAFKA_CFG_BROKER_ID: 1
      KAFKA_CFG_LISTENERS: CONTROLLER://:9091,BROKER://:9092
      KAFKA_CFG_ADVERTISED_LISTENERS: BROKER://10.150.36.72:9092 #宿主机IP

  kafka-2:
    <<: *kafka
    container_name: kafka-2
    ports:
      - '9093:9093'
    environment:
      <<: *common-config
      KAFKA_CFG_BROKER_ID: 2
      KAFKA_CFG_LISTENERS: CONTROLLER://:9091,BROKER://:9093
      KAFKA_CFG_ADVERTISED_LISTENERS: BROKER://10.150.36.72:9093 #宿主机IP

  kafka-3:
    <<: *kafka
    container_name: kafka-3
    ports:
      - '9094:9094'
    environment:
      <<: *common-config
      KAFKA_CFG_BROKER_ID: 3
      KAFKA_CFG_LISTENERS: CONTROLLER://:9091,BROKER://:9094
      KAFKA_CFG_ADVERTISED_LISTENERS: BROKER://10.150.36.72:9094 #宿主机IP

  #broker only
  kafka-4:
    <<: *kafka
    container_name: kafka-4
    ports:
      - '9095:9095'
    environment:
      <<: *common-config
      KAFKA_CFG_BROKER_ID: 4
      KAFKA_CFG_PROCESS_ROLES: broker
      KAFKA_CFG_LISTENERS: BROKER://:9095
      KAFKA_CFG_ADVERTISED_LISTENERS: BROKER://10.150.36.72:9095

Note: 1. If deployed on a server or public cloud, please make the following changes:

KAFKA_CFG_LISTENERS: CONTROLLER://:9091,BROKER://0.0.0.0:9092
KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://服务器IP或公务IP:9092

(3). View metadata

# 创建主题
docker run -it --rm --network=kraft_net \
           bitnami/kafka:3.3.1 \
           /opt/bitnami/kafka/bin/kafka-topics.sh \
           --bootstrap-server kafka-1:9092,kafka-2:9093 \
           --create --topic my-topic \
           --partitions 3 --replication-factor 2

# 生产者
docker run -it --rm --network=kraft_net \
           bitnami/kafka:3.3.1 \
           /opt/bitnami/kafka/bin/kafka-console-producer.sh \
           --bootstrap-server kafka-1:9092,kafka-2:9093 \
           --topic my-topic

# 消费者
docker run -it --rm --network=kraft_net \
           bitnami/kafka:3.3.1 \
           /opt/bitnami/kafka/bin/kafka-console-consumer.sh \
           --bootstrap-server kafka-1:9092,kafka-2:9093 \
           --topic my-topic

# 查看元数据分区
docker run -it --rm --network=kraft_net \
           bitnami/kafka:3.3.1 \
           /opt/bitnami/kafka/bin/kafka-metadata-quorum.sh \
           --bootstrap-server kafka-1:9092,kafka-2:9093 \
           describe --status

#查看元数据副本
docker run -it --rm --network=kraft_net \
           bitnami/kafka:3.3.1 \
           /opt/bitnami/kafka/bin/kafka-metadata-quorum.sh \
           --bootstrap-server kafka-1:9092,kafka-2:9093 \
           describe --replication

# 查看元数据
# 元数据存储在每个节点上,可以在任意节点上查看
docker exec -it kafka-1 \
            /opt/bitnami/kafka/bin/kafka-metadata-shell.sh  \
           --snapshot /bitnami/kafka/data/__cluster_metadata-0/00000000000000000000.log

15.2 The way to zookeeper

From the beginning of Kafka's birth, Zookeeper is inseparable. With the development of Kafka, the disadvantages of Zookeeper gradually emerged.
At the very beginning, Kafka stored both metadata and the consumer's consumption location ( offset offset) in zookeeper.

(1). Offset management

The consumption location is frequently updated data. For zookeeper, the write operation is expensive, and frequent writes may cause performance problems. All write operations are handed over to the leader for execution and cannot be scaled horizontally.
Starting from version 0.8.2 , the consumer's consumption location is no longer written into zookeeper, but is recorded __comsumer_offsetsin , and 50 partitions are created by default, with <consumer group.id, topic, partition number> as the message key, requests can be processed by multiple brokers at the same time, so it has higher write performance and scalability. At the same time, kafka caches the view of the latest consumption location in memory, which can quickly read the offset.

(2). Metadata management

zookeeper mode

Before Kafka 3.3.0, metadata was stored in zookeeper, with the following structure:
insert image description here

Each cluster has a broker as the controller. The controller not only undertakes the work of the broker, but also maintains the metadata of the cluster, such as broker id, topic, partition, leader and in-sync replica set (ISR), and other information. The controller saves this information in ZooKeeper, and most of the read and write traffic of ZooKeeper is done by the controller. When metadata changes, the controller propagates the latest metadata to other brokers.
insert image description here

Note: Each broker can communicate directly with zookeeper. The above figure omits other connections.
For example, when the broker starts, a temporary node /brokers/ids/{id} will be created in zookeeper, and the leader of each partition will also update the replica set (ISR) information being synchronized.

Zookeeper is equivalent to the work order system, the controller is the administrator of the work order system, responsible for arranging work, and the broker is responsible for the work, using the AB angle work system (leader, follower).
Controller has the following functions:

  • Monitor whether the broker is alive (the broker clocks in in zookeeper and goes online, and the controller counts the number of people online)
  • If the topic, partition, replica or broker changes, if necessary, select a new leader for the partition and update the follower list (the work order or personnel change, the controller reassigns work)
  • Use RPC request to notify relevant brokers to become leader or follower (notify relevant personnel to start working)
  • Write the latest metadata into zookeeper and send it to other brokers (update the work order system and inform other personnel of the latest work arrangements)

Note: The selection of a new leader is not based on voting, but the first one in the ISR set is selected as the leader. This sort of selection is more fault-tolerant. For example, in the case of 2N+1 replicas, at most 2N replicas are allowed to fail, and the election method can only allow at most N failures.

question

  • As the number of nodes and partitions grows linearly, the metadata becomes larger and larger, and it takes longer for the controller to propagate the metadata to the broker.
  • ZooKeeper is not suitable for storing large amounts of data, and frequent data changes may cause performance bottlenecks. Additionally, Znode size limits and maximum number of observers can both be constraints.
  • Metadata is stored in ZooKeeper. Each broker obtains the latest metadata from the controller and caches it in its own memory. When updates are delayed or reordered, the data may be inconsistent, and additional verification checks are required to ensure data consistency.
  • When the Controller fails or restarts, the new controller needs to re-pull all the metadata from Zookeeper. When there are many partitions in the cluster (hundreds of thousands or even millions), the time to load metadata will be slow It becomes very long, during which the Controller cannot respond and work, which will affect the availability of the entire cluster.

Note: When the Controller fails or restarts, other brokers will be notified as observers. Each broker tries to create a /controller node in ZooKeeper. Whoever creates it first will become the new controller.

15.3 KRaft mode

insert image description here

(1). Metadata management

insert image description here

Based on the Raft consensus protocol, KRaft elects an active controller through the arbitration (quorom) mechanism. All metadata writing operations are handled by the active controller, and the active controller writes metadata change records to __cluster_metadatathe internal In the topic, in order to ensure the order of writing, this topic has only one partition. The main controller is the leader of this partition, and other controllers act as followers to synchronize the data to the local log. After more than half of the controllers are synchronized, then It is considered that the data writing is successful, and the master controller returns a message to the client.
All controllers cache local metadata logs in memory and keep them updated dynamically. When the master controller fails, other controllers can immediately become the new master controller and take over at any time.
In addition to the controller, each broker, as an observer (Observer), also synchronizes metadata to a local copy and caches it in memory.

docker run -it --rm --network=kraft_net \
           bitnami/kafka:3.3.1 \
           /opt/bitnami/kafka/bin/kafka-metadata-quorum.sh \
           --bootstrap-server kafka-1:9092,kafka-2:9093 \
           describe --replication

insert image description here

view metadata

docker exec -it kafka-1 bash
ls
cd topics/
ls
cat my-topic/0/data

insert image description here

The metadata propagation method is changed from the original RPC request to the synchronous metadata log. There is no need to worry about data differences. The local metadata materialized views of each broker will eventually be consistent because they come from the same log. We can also easily track and eliminate differences through timestamps and offsets.
insert image description here

Controller and broker will periodically write metadata snapshots in memory to the checkpoint (checkpoint) file. The checkpoint file name contains the last consumption location of the snapshot and the ID of the controller. When we restart the controller or broker, there is no need to read from the beginning Fetch metadata, directly load the latest local checkpoint file into memory, and then start reading area data from the last consumption position in the checkpoint file, which shortens the startup time.

15.4 Version selection

At present, KRaft is only applicable to new clusters. To migrate an existing cluster from zookeeper mode to KRaft mode, you need to wait for version 3.5.
3.5 is a bridge version that will officially deprecate zookeeper mode.
Kafka 4.0 (expected to be released in August 2023) will completely delete the zookeeper mode and only support the KRaft mode.
Note: There are major bugs in Kafka 3.3.0 version, it is recommended not to use it.

Guess you like

Origin blog.csdn.net/sinat_38316216/article/details/129898946