Kafka source code analysis——Consumer

Preface

When producers send messages to brokers, these messages are stored on disk. How do consumers consume these messages?

Consumer consumption process

image.png
From the core perspective of source code, Consumer can be divided into the following core parts:

  1. Consumer initialization
  2. How to elect Consumer Leader
  3. How does Consumer Leader specify partitions?
  4. How Consumer pulls data
  5. Consumer offset commit

Consumer initialization

Starting from the construction method of KafkaConsumer, trace to the core implementation method

image.png

The previous code part of this method is some configuration

Core communication components between Consumer and Broker

image.png

ConsumerCoordinator: In Kafka consumption, it is group consumption. The coordinator needs to do a lot of organization and coordination work before specific consumption.

image.png

Fetcher: extractor, because Kafka consumption pulls data, so this Fetcher is the core class for pulling data.

image.png

In this core class, there are many parameter settings, which are related to the consumption configuration.

fetch.min.bytes

The minimum number of bytes that the server should return for each fetch request. If there is not enough data to return, the request will wait until enough data is returned, which defaults to 1 byte. Under multiple consumers, this value can be set larger to reduce the workload of the broker.

fetch.max.bytes

The maximum number of bytes that the server should return for each fetch request. This parameter determines the maximum data that can be successfully consumed.

For example, if this parameter is set to 50M, then the consumer can successfully consume data below 50M, but will eventually get stuck consuming data larger than 10M and retry infinitely. fetch.max.bytes must be set to be greater than or equal to the size of the largest single piece of data.

image.png

fetch.wait.max.ms

If there is not enough data to satisfy fetch.min.bytes, this configuration refers to the maximum time that the server will block before responding to the fetch request. The default is 500 milliseconds. Combined with fetch.min.bytes above.

max.partition.fetch.bytes

Specifies the maximum number of bytes that the server returns to the consumer from each partition. The default is 1MB.

Assuming a topic has 20 partitions and 5 consumers, each consumer must have at least 4MB of available memory to receive records, and in the event of a consumer crash, this memory needs to be larger. Note that this parameter is larger than the server's message.max.bytes, otherwise the consumer may not be able to read the message.

max.poll.records

Controls the maximum number of records returned by each poll method.

image.png

How to elect Consumer Leader

Communication between consumer coordinator and group coordinator

image.png

image.png

image.png

image.png

image.png

Send request to Broker

image.png
Process Broker response
image.png

The consumer coordinator initiates a group request

image.png

image.png

image.png

image.png

image.png

Consumer Partitioning Strategy

partition.assignment.strategy , the strategy for assigning partitions to consumers. The default is Range. Allows custom policies.

Range

Assign contiguous partitions of the topic to consumers. (If the number of partitions is not divisible by consumers, the first consumer will be allocated more partitions)

RoundRobin

Allocate topic partitions to consumers in a round-robin fashion.

image.png

StickyAssignor : Initial partition, the same as RoundRobin. Each allocation change requires minimal changes from the previous allocation.

Target:

1. The distribution of partitions should be as balanced as possible.

2. The result of each reallocation should be as consistent as possible with the result of the previous allocation.

When these two goals conflict, the first goal will be given priority.

For example, there are 3 consumers (C0, C1, C2), 4 topics (T0, T1, T2, T34), and each topic has 2 partitions (P1, P2)

image.png

Custom strategy

extends class AbstractPartitionAssignor, and then adds parameters on the consumer side:

properties.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,类.class.getName());

Partition strategy source code analysis

image.png

image.png

image.png

image.png

image.png

Consumer pulls data

Pull data core Fetch class

image.png

image.png

image.png

commit offset

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

Of course, autocommit auto.commit.interval.ms

image.png

Default 5s

image.png

maybeAutoCommitOffsetsAsync The last thing is that it will be automatically submitted when polling, and it will not be submitted until the auto.commit.interval.ms interval, and it will not be submitted if the next automatic submission time is not reached.

Guess you like

Origin blog.csdn.net/qq_28314431/article/details/133070210