Kafka organize information

Please explain what is traditional messaging method?

Traditional messaging method includes two:

· Queuing: in the queue, a group of users from the server to read the message, each message sent to one person.

· Publish - subscribe: In this model, the message is broadcast to all users.

Why use kafka, why use a message queue

And clipping buffer: The burst traffic data upstream and downstream may not carry, not enough or the downstream machine to ensure redundancy, kafka can play a role in the intermediate buffer, the message is temporarily stored in kafka, the downstream service can be slow process at their own pace.

Decoupling and scalability: start of the project, and can not determine the specific requirements. Message queue can be used as an interface layer, the decoupling important business processes. Only need to comply with the agreement, it can get scalability for data programming.

Redundancy: can be used many ways, a producer announced that topic can be multiple subscription services to consumer, business to use for multiple unrelated.

Robustness: the message queue request can be stacked, so even if the consumer end of the business for a short time to die, it will not affect the normal operation of the main business.

Asynchronous communication: In many cases, users do not want nor need to immediately process the message. It provides asynchronous message queue processing mechanism, allowing the user to put a message on the queue, but does not deal with it immediately. Think how many messages are placed into the queue to put the number, then go to process them in time of need.

Why so fast kafka

Cache Cache Filesystem Cache PageCache

Because sequential write modern operating system provides a pre-order reading and writing technology, disk write than random write memory in most cases even faster.

Zero-copy zero copy technology to reduce the number of copies

Batching of Messages batch processing amount. The combined request is small, then the flow interacting manner, straight top network limit.

Pull pull mode using a pull mode for message acquisition consumption, in line with the consumer side processing capacity.

kafka producer hit data, ack  

After transmission 1 (default) data to Kafka, acknowledgment messages after successful reception of the leader, even if successfully transmitted. In this case, if the leader goes down, you lose data.
0 producer data sent on the matter, do not wait for any return. Maximum data transmission efficiency in this case, but the data is indeed the lowest reliability.
All follower -1 producer to wait in the ISR are considered confirmed after receiving the data transmission completion time, the highest reliability. When all Replica ISR sends an ACK to the Leader, leader did commit, this time the producer to consider a request commit the messages.

Whether Kafka messages will be lost and repeated consumption?

To determine whether Kafka message loss or duplication, starting from two aspects: send messages and news consumption.

1, message transmission

         Kafka message in two ways: synchronization (sync) and asynchronous (the async), default sync mode can be configured by producer.type properties. Kafka produced was confirmed by the configuration message request.required.acks properties:

0 --- indicates that no successful message received acknowledgment;
1 --- to confirm if reception succeeds when Leader;
0 --- represents confirmation Follower Leader receives and successfully;
In summary, there are six kinds of message production situation, to analyze the following message loss scenario points:

(1) acks = 0, no clusters, and Kafka reception confirmation message, when the network is abnormal, the other buffer is full, the message may be lost;

(2) acks = 1, the synchronous mode, only after confirming successful reception Leader but hung up, no synchronous copy, data may be lost;

2, news consumption

Kafka consumer news consumption has two interfaces, Low-level API and the High-level API:

Low-level API: the consumers to maintain offset equivalent, can achieve complete control of Kafka's;

High-level API: package management and offset of the parition, simple to use;

If you use the Advanced Interface High-level API, there may be a problem is that when a message from a cluster of consumers to take out the news, and submitted a new message offset value, the consumer has not had time to hang up, then the next time the consumer before consumption when not a success message on the "strange" disappeared;

Solution:

        For messages are lost: In synchronous mode, the acknowledgment mechanism is set to -1, which allow messages written Leader and Follower after re-confirmation message is sent successfully; asynchronous mode, in order to prevent the buffer is full, you can not limit settings in the configuration file blocking timeout when the buffer is full so that the producer has been in a blocked state;

        Repeat for the message: Save uniquely identifies the message to the external medium, each time to determine whether or not treated to the time of consumption.

Repeat the message and resolve consumer reference: https: //www.javazhiyin.com/22910.html

Kafka is reflected in the order of how the news?

kafka each partition messages are ordered in writing, the time of consumption, each partition can only be consumed by a consumer in each group to ensure that when the consumer is in order.
The whole topic does not guarantee orderly. If the topic in order to ensure an orderly whole, then the partition will be adjusted to 1.

How to explain Kafka users consume information?

Kafka message is transmitted by using sendfile API completed. It supports the transfer of bytes from the socket to disk, save a copy of the kernel space, and call the kernel between the kernel user.

Guess you like

Origin www.cnblogs.com/yizhou35/p/12026744.html