Kakfa study notes (four) - partitions and copy

Previous Kakfa study notes (c) - Java API to send messages consumption

This simply going to talk about Kafka's work processes, deepen the understanding of Kafka's

Partitions and copy

Said before, there are a topic more partition, in fact, the smallest unit of information is divided partition, each partition has multiple replicate (copy, the general said also contains a copy of the master copy). A message sent from the producer, falls on a Broker partition, and finally pull message consumer

Each partition has a Leader (master copy), zero or more follower (the copy). Each leader and follower is a broker. Kafka's leader partition will all equally distributed to the broker, all read and write only be done by the leader , follower synchronization messages only from the leader, not external services.

Look at this chart enhance understanding

producer who knows how a partition of leader is it? When we configure the producer is required to configure a broker list - parameters bootstrap.servers. We will tell the producer of several broker, a broker to which producer will pull leader a list of all partition and then cached, so broker can send messages directly to the leader

We have said before, the message within a partition are ordered. This is because the producer is calculated by its own partition algorithm is a message which should fall into the partition, and then find the partition of the leader (broker), direct the message to the broker, and consumer subscription to this partition is only one, so to ensure that the messages within the partition and orderly.

News endurance of

Before overview say, Kafka itself is a storage system, broker will receive the message is persisted to disk, where it is combined with partitions to find out the persistence Kafka

First we played three broker, and then create a partition number 3, number of copies of the topic 3

> bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 3 --topic test
复制代码

Look at this topic circumstances

> bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic test

Topic:test	PartitionCount:3	ReplicationFactor:3	Configs:segment.bytes=1073741824
	Topic: test	Partition: 0	Leader: 0	Replicas: 0,2,1	Isr: 0,2,1
	Topic: test	Partition: 1	Leader: 2	Replicas: 2,1,0	Isr: 2,1,0
	Topic: test	Partition: 2	Leader: 1	Replicas: 1,0,2	Isr: 1,0,2
复制代码

As we expected, the three divisions, three leader, shared equally to the three broker, and also three copies

We set up in the configuration file log.dirs, this parameter specifies the log Kafka (Kafka log message data is in the form of falling disk) storage location. According to topic-partitionthe format of the data into separate folders inside, for example, we test the above theme, three partitions, log.dirsyou can see where

> ls
test-0
test-1
test-2
复制代码

These log files is the message persisted to disk carrier, one might ask, persisted to disk is not much slower than memory do? However, Kafka is largely dependent on the disk that is set to achieve high throughput. Modern disk optimization has been very good, another sequential disk writes, in some cases indeed faster than the random access memory. A typical example is the operating system like the use of disk for virtual memory. In addition maintenance large amounts of data in memory, Kafka does not need to worry about GC problems (scala also runs on the JVM). Further Kafka sendfile system call directly to avoid switching between kernel mode and user mode and unnecessary data replication. Further, another message is the bandwidth consumption of the system, Kafka functional message compression, the compression algorithm can be specified.

The above is taken from the contents of my official website documents

Reproduced in: https: //juejin.im/post/5cfe48985188257fff23a78d

Guess you like

Origin blog.csdn.net/weixin_34365635/article/details/91472933