Netease interview Zhenti: kafka talk about the maintenance status tracking method of consumption?
Analysis: Most of the messaging system at the end of maintenance message broker is spending record: a message is distributed to mark the broker immediately be labeled or wait for the customer to inform the consumer. Such messages can also be deleted immediately after consumption in order to reduce the space occupied.
But this will not have any problem then? If, after a message is sent as soon as the consumer is marked off, failed (such as a crash) the message is lost once the consumer process the message. To solve this problem, many messaging system provides another function: When a message is sent only to be marked as sent status, informed consumer when the consumer has been successful only after the consumer has been marked as a state. This solves the problem of message loss, but created new problems, if the first successful consumer processes the message but failed to send a response to the broker, the message will be consumed twice. When the second question, broker must maintain the status of each message, and each time they have to lock the message and change the status and then release the lock. Such trouble again, not to mention to maintain a large number of state data, such as if the message is sent but not received notification consumption of success, the message will remain in the locked state, Kafka adopted a different strategy. Topic is divided into several partitions, each partition is only a consumer consumption at the same time. This means that each partition is consumed in the log message location is just a simple integer: offset. This makes it easy to label each partition consumption state is very easy, it only takes an integer. Tracking this consumption state is very simple.
Kafka is a high throughput of distributed publish-subscribe messaging system, it can handle all the actions of consumers streaming data in the site. This action (web browsing, search and other user action) is a key factor in many social functions in modern networks. These data usually due to the required throughput is achieved by the polymerization process log and the log. Kafka is a giant in question frequently asked when I was studying Kafka also organize some learning notes, and collected a lot of interview Zhenti, hope to have some help.
Over the years some of the interview Zhenti about Kafka (including analytical):
How to get a list of topic theme
What producers and consumers of the command line?
consumer is a push or pull?
Kafka talk about the maintenance of consumer status tracking method
Talk about the master-slave synchronization
Why messaging system, mysql can not meet the needs?
Zookeeper What is the role of Kafka?
Transaction definition of data transmission What are the three?
Kafka determine whether a node is still alive and that there are two conditions?
There are three key difference between the conventional MQ messaging system Kafka
Ack of three mechanisms of talk about kafka
Consumers fault, appear to solve the problem of how to live a lock?
How to control the position of the consumer
Kafka case distributed (not a single), and how to ensure order in which messages consumption?
What kafka high availability mechanism?
How to minimize data loss kafka
How kafka not repeat consumption data? Such as debit, we can not repeat the buckle.
kafka interview Zhenti resolve
I'm finishing on Kafka (including core knowledge and mind mapping xmind):
Kafka concept:
Kafka is a high-throughput, distributed, publish / subscribe message-based system, originally developed by the company LinkedIn, the Scala language, is now the Apache open source project.
broker: Kafka server is responsible for message store and forward
topic: Message class, Kafka classified according to topic news
partition: the partition topic, a topic may comprise a plurality of partition, topic information stored in the respective partition
offset: the position in the log message, the message will be appreciated that the offset in the partition, the message is representative of the unique serial number
Producer: news producer
Consumer: Consumer news
Consumer Group: Consumer groups, each group must belong to a Consumer
Zookeeper: preserved cluster broker, topic, partition and other meta data; in addition, is also responsible for broker fault found, partition leader election, load balancing and other functions
Kafka data storage design:
partition data file (offset, MessageSize, data)
Data file section segment (sequential read and write, the command segment, a binary search)
Index data file (index segment, sparse storage)
Producer Design:
Load Balancing (partition will be evenly distributed to the different broker)
Batch Transmission
Compression (GZIP or Snappy)
Consumer Design:
Consumer Group
Readers Benefits:
As more questions amount of space limitations, face questions do not all share the article accompanied by a detailed analysis, I hope that my colleagues understood
But consolidation has become a detailed PDF documents can be shared with everyone
Receive way: My concern for the kind of numbers (Java Zhou people) can receive
Road finishing nearly manufacturers face questions (PDF) with a detailed analysis