One. Kafka 's Glossary
1.topic ( Theme )
topic corresponds to a system message queue MQ queue in conventional, message producer sends must be sent to specify on which topic. In a large application system, according to different functions, to distinguish between different topic (topic order, login topic, the amount of the topic, etc.)
2. partition ( partition )
Below a topic may have a plurality of partition, kafka after receiving the message, the message will be carried out in accordance with load blance (hask (message)% uniformity of the distribution of this message [broker_num]) on a different partition.
The number of partition disposed generally consistent with the number of clusters to kafka (i.e. the number of broker)
3.partition replica ( partition copy )
partition is a partition replica copies of the data, in order to prevent an optimized data loss, partition and replica are not on the same broker at. The number and the number of partition Replica of consistency to achieve high availability
4.broker
Kafka node, a node is Kafka after a broker, may be composed of a plurality of broker cluster .brokerid Kafka IP 3 indicates the general
5. Segment
Partition the physical structure can be divided into a plurality of segment, the message information is stored for each segment
6.producer
Production message, sent to the topic
7.consumer
Subscribed to the specified topic, topic above message consumer information
8.Consumer group
It may be composed of a plurality of consumer consumer group
two. And the name of the principle of interpretation
1.partition
kafka's message is a key- value pairs, or only the topic and value when there is no key is null default is assigned a key in most cases, there are two aspects of this key information: 1 . metadata information 2 . help partition partition, as this key route, the same batch of data written on a partition a message is a producer record (production records) object, it must have included the topic and value these two parameters, and partition is key the absence of all of the message is the same key, will be assigned to the same partition when a key is null when it uses the default partition, this effect is that it will partition random key corresponding to this the producer record into a prtition wherein, the data as much as possible so that a uniform distribution on the topic in order to prevent data skew if the display of a specified key, then it will be based on the partition key value of this hash, then according to the partition modulo number, message store to determine which partition on the topic Let's do a test: when the message has a key deposit and no key How to send data to the location of the partition?
When the message has a key presence deposit
/** * * @des 测试kafka partition 分区信息 * @author zhao * @date 2019年6月27日上午12:17:55 * */ public class PartitionExample { private final static Logger LOG = LoggerFactory.getLogger(PartitionExample.class); public static void main(String[] args) throws InterruptedException, ExecutionException { Properties properties = initProp(); KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties); ProducerRecord<String, String> record = new ProducerRecord<String, String>("test_partition","appointKey","hello"); //指定key时 Future<RecordMetadata> future = producer.send(record); RecordMetadata recordMetadata = future.get(); LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition()); record = new ProducerRecord<String, String>("test_partition","appointKey","world"); future = producer.send(record); recordMetadata = future.get(); LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition()); producer.flush(); producer.close(); System.out.println("===================================="); } private static Properties initProp() { Properties prop = new Properties(); prop.put("bootstrap.servers", "192.168.199.11:9092,192.168.199.12:9092,192.168.199.13:9092"); prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); return prop; } }
/ As can be seen from the log is sent to randomly partition the
22:21:06.231 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 1
22:21:06.258 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 0
When the message is stored in non-key presence
/** * * @des 测试kafka partition 分区信息 * @author zhao * @date 2019年6月27日上午12:17:55 * */ public class PartitionExample { private final static Logger LOG = LoggerFactory.getLogger(PartitionExample.class); public static void main(String[] args) throws InterruptedException, ExecutionException { Properties properties = initProp(); KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties); ProducerRecord<String, String> record = new ProducerRecord<String, String>("test_partition", "hello"); Future<RecordMetadata> future = producer.send(record); RecordMetadata recordMetadata = future.get(); LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition()); record = new ProducerRecord<String, String>("test_partition","world"); future = producer.send(record); recordMetadata = future.get(); LOG.info(">>>>>>>>>>>>>>>>>> {}",recordMetadata.partition()); producer.flush(); producer.close(); System.out.println("===================================="); } private static Properties initProp() { Properties prop = new Properties(); prop.put("bootstrap.servers", "192.168.199.11:9092,192.168.199.12:9092,192.168.199.13:9092"); prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); return prop; } }
// can be seen from the log is sent to the same partition of
22:29:29.963 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 2
22:29:29.969 [main] INFO com.zpb.kafka.PartitionExample - >>>>>>>>>>>>>>>>>> 2
Through the above test results:
when a key or a group key mapping the same partition, all the partition must calculate the mapping relationship, does not necessarily mean that the available partition, because multiple partition, when a partition hang , to take part in the calculation, which means that when you write data, while if it is sent to hang on this partition, will fail to send
only one consumer client read a partition in which a conusmer group inside, not there may be a plurality of the same group which reads a plurality of partition consumer