Apache Kafka (eleven) Topic configuration and composition

Topic configuration and composition

 

Before we only introduces Kafka Producer and Kafka Consumer -related configuration, but not too detailed about the topic configuration. Topic configuration in Kafka is also crucial to use because it's enough to affect cluster performance parameters as well as topic of behavior.

In a topic after being created, it will be the default parameters, but some topic parameters may still need to make some adjustments based on the actual situation, such as:

  • Replication Factor
  • Partition number
  •  Message Size
  • Compression level
  • Log Cleanup Policy
  • Min Insync Replicas
  •  

Which Replication Factor and Partition has been mentioned before the number, so we have not mentioned previously focused on configuration parameters.

 

1. How to configure a Kafka Topic

Here we will briefly describe how to use kafka cli is a topic modify the configuration. First we create a Topic :

> kafka-topics.sh --zookeeper 172.31.24.148:2181 --create --topic configured-topic --partitions 3 --replication-factor 1

 

We can use kafka-configs cli configuration topic parameters, such as:

kafka-configs.sh --zookeeper 172.31.24.148 --entity-type topics --entity-name configured-topic --add-config min.insync.replicas=2 --alter

 

Then describe this Topic :

kafka-topics.sh --zookeeper 172.31.24.148:2181 --describe --topic configured-topic

Topic:configured-topic  PartitionCount:3    ReplicationFactor:1    Configs:min.insync.replicas=2

    Topic: configured-topic Partition: 0    Leader: 0    Replicas: 0    Isr: 0

    Topic: configured-topic Partition: 1    Leader: 0    Replicas: 0    Isr: 0

    Topic: configured-topic Partition: 2    Leader: 0    Replicas: 0    Isr: 0

 

It can be seen in Configs column, more of a configuration.

You can also use --delete-config option to delete a configuration:

kafka-configs.sh --zookeeper 172.31.24.148 --entity-type topics --entity-name configured-topic --delete-config min.insync.replicas --alter

Completed Updating config for entity: topic 'configured-topic'.

 

2. Partitions Segments

We know that a topic is by one or more partitions composition. For a partition , it is composed of one or more Segments (essentially file). As shown below:

 

Each segment will have a starting offset and a Ending offset . The next segment is starting offset to the previous segment of Ending offset +. 1 . The last segment is called the Active segment , it means that the current segment is still "being written" state. In other words, if the new record is written, it is written to the active segment in. In the active segment is offset after the expected value, this segment will be closed, and open a new segment .

So at any time, it is only a segment for the ACTIVE state (that is, data is written to the segment ).

Relevant segment parameters are twofold:

  • log.segment.bytes : single segment maximum amount of data that can be accommodated, the default is 1GB
  • log.segment.ms : Kafka in commit a non-filled the segment time before, waits (default . 7 days)

 

3. Segment and Indexes

Each Segment has two corresponding index file:

  • Offset and Position index mapping between: to let Kafka by offset in the segment found in a message
  • Timestamp and Offset index between: for letting Kafka by a timestamp find a message

It is based on these index files such Kafka can be found in the data within a constant time. After you find this data, we will continue after reading sequential data. This is why Kafka only for sequential read and write, not for random read and write.

 

We further can then view these concepts actually correspond to the files. Before we define kafka of log.dirs as:

/home/hadoop/kafka_2.12-2.3.0/data/kafka-logs

 

Enter this directory, you can see all listed in Topics , as well as their corresponding Partitions :

 

  

Into kafka_demo-0 this partition directory can be seen:

 

 

Which .log to store message files, .index for the Offset and Position indexed files between, .timeindex as Timestamp and Offset index files between.

 

4. Segment Configuration

在进一步了解了segment后,再回头看看segment的两个配置log.segment.byteslog.segment.ms

若是将log.segment.bytes(大小,默认为1GB)调小,则意味着:

  • 每个partition对应更多的segments
  • Log Compaction发生的会更频繁

·       Kafka 会维护更多的open files(报错:Too many open files

在决定此参数的大小时,需要考虑到业务吞吐的大小。如果业务的数据量是每天一个GB的数据,则默认的配置即可适用此场景;而如果业务的数据量是一周一个GB,则可以适当调小此值。

 

而若是将 log.segment.ms(默认为一周)调小,则意味着:

  • 更频繁的log compaction(触发的更频繁)
  • 生成更多的文件

在决定此参数的大小时,需要考虑到:业务中需要log compaction 发生的频率。Log Compaction会在之后介绍。 

 

Guess you like

Origin www.cnblogs.com/zackstang/p/11627884.html