Kafka - new consumer

1. Data sources

The data uses the Flume configured in the previous blog post to write text data to Kafka. However, this time it has changed, the monitoring directory of the data has changed, and the topic name of the written Kafka has also changed A25.

Flume writes data.png-36kB

Here we can see that Flume has finished consuming the newly uploaded A91 data.

2. Consumer code

2.1 Create a consumer

The properties used to create consumers are not very different from those used by producers:

bootstrap.servers: Specifies the connection string to the Kafka cluster.
key.deserializer and value.deserializer are similar to the serializer definition of the producer, but instead of using the specified class to convert Java objects into byte arrays, they use the specified class to convert byte arrays into Java objects.
group.id: optional, specifies which consumer group the KafkaConsumer belongs to.

The code to create the consumer is as follows:

Properties props = new Properties();
props.put("bootstrap.servers", "master:9092");
props.put("group.id", "TestConsumer");
props.put("key.deserializer",
        "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
        "org.apache.kafka.common.serialization.StringDeserializer");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

2.2 Subscribing to topics

consumer.subscribe(Collections.singletonList("A25"));

Because when we use Flume to consume data by sinks, the specified topic is A25, so when we subscribe to the data, it is also A25.

At the same time, you can pass regular expressions to the subscribed topics to match, and subscribe to multiple topics at a time.

2.3 Polling consumption

try {
    while(true) {
        ConsumerRecords<String, String> records = consumer.poll(100);

        logger.info("records length = {}", records.count());

        for (ConsumerRecord record : records) {
            logger.info("topic = {}, partition = {}, offset = {}, key = {}, value = {}\n",
                    record.topic(), record.partition(), record.offset(),
                    record.key(), record.value());
        }
    }
} finally {
    consumer.close();
}

This is an infinite loop. A consumer is actually a long-running application that requests data from Kafka by continuously polling.

The consumer must keep polling Kafka or it will be considered dead and its partition will be handed over to other consumers in the group. The parameter passed to the poll() method is a timeout that controls how long the poll() method will block (blocks when there is no data available in the consumer's buffer). If this parameter is set to 0, poll() will return immediately, otherwise it will wait for the broker to return data for the specified number of milliseconds.

The poll() method returns a list of records. Each record contains information about the topic to which the record belongs, information about the partition where the record is located, the offset of the record in the partition, and the key-value pair of the record. We generally traverse this list and process these records one by one.

Use the close() method to close the consumer before exiting the application. Network connections and sockets are also closed, and a rebalance is triggered immediately, rather than waiting for the group coordinator to find out that it is no longer sending heartbeats and consider it dead, because that would take longer and cause the entire group to be lost for a while. The message could not be read within the time.

The results are as follows:

3. Other configuration

3.1 pom file

 <properties>
    <java.version>1.8</java.version>
    <kafka.version>1.1.0</kafka.version>
</properties>


<dependencies>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>${kafka.version}</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.7.21</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
        <version>1.7.21</version>
    </dependency>
</dependencies>

3.2 log4j.properties

log4j.rootLogger=INFO,console
log4j.additivity.org.apache=true

# 控制台(console)
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.Threshold=DEBUG
log4j.appender.console.ImmediateFlush=true
log4j.appender.console.Target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} [%p] %c  -  %m%n