First, introduce Kafka
Reasons for high throughput:
1. Each consumer reads data in batches
2. Each consumer can consume partition messages at the same time in the consumer cluster
3. The producer sends messages and puts them in the buffer. The disadvantage is that the messages It is easy to lose
4, 0 copies, NIO itself supports, reducing the number of copies
5, to achieve data compression, reduce bandwidth transmission
1. The first time: read the disk file into the kernel buffer of the operating system;
2. The second time: copy the data of the kernel buffer to the buffer of the application application;
3. The third step: copy the application application The data in the buffer is copied to the socket network sending buffer (the buffer belonging to the operating system kernel);
4. The fourth time: copy the data of the socket buffer to the network card, and the network card performs network transmission.
1. Persistent network data to disk (Producer to Broker)
2. Disk files are sent over the network (Broker to Consumer)
The role of brokeid
does not follow the JMS specification Java Message Service (Java Message Server), only publish and subscribe
fast, scalable
partition
There are multiple producers, multiple brokers, and multiple consumers in a Kafka architecture. Each producer can correspond to multiple topics, and each consumer can only correspond to one consumer group.
Author: Big Data Division Chief Data
link: https: //www.jianshu.com/p/4bf007885116
Source: Jane books
are copyrighted by the author. For commercial reproduction, please contact the author for authorization, and for non-commercial reproduction, please indicate the source.
2. Why is RabbitMQ not clustered?
Similar to redis sharding (16384 card slots) cluster is the best, there will be no redundant data
in case of downtime? Copy storage
3. Will queues and switches persist messages?
4. Nouns
Broker: Broker means that one MQ server side, multiple Broker means that multiple different MQ server sides form a group;
Topic: A MQ server on the topic directory can store multiple different topics. Each topic is actually a message classification
: It is the parameter passed by asynchronous communication.
Partition: How large a group of partitions is to divide 10 million pieces of data in a partitioned database into 10 table card slots
Partitioning is implemented in Kafka: a Broker means a region
Producer: Producer delivers messages to MQ.
Consumer: MQ pushes messages to consumers.
Consumer Group: Grouping our consumers.
Offset: The offset is actually the index position of our message.
Kafaka has more partitions and topics, no queues, only topics for publish and subscribe
Five, distributed transactions
Solving distributed transactions is a kind of idea that has nothing to do with the framework.
Common solutions are 2PC / 3PC / MQ
6. Why Kafaka relies on Zookeeper
Register all brokers to zk, so that it is clustered, use node event notification to tell brokers which are groups
Because:
1. Kafka will store MQ information on zk, and consumers will also register on zk. Consumers do not pay attention to a few brokes to get information directly from zk.
2. In order to facilitate the expansion of the entire cluster, use zk Time notification mutual perception
Kafka only needs to modify broke.id to distinguish
7. Is cluster registration registered on each node or just one node?
There is a single node from the master to the slave,
such as Eureka is each node
Eight, the difference between kafka and rabbitmq
Nine, how does kafka guarantee message order
Ten, the difference between the queue and the theme
Queue: First-in-first-out
Topic: Encapsulation of queues
11. Why does MQ have the problem of message sequence?
Background:
1. Consumer cluster
2. MQ server cluster
The situation that the message will not be disrupted:
1. The message delivered by the producer is in the same broker and consumed by the same consumer
How to solve:
1. The key is the same, the hash is the same, at this time the message will be delivered in the same Broke, provided there is only one consumer
2.
Why does MQ cause the problem of message sequence?
1. If the producer sends messages to MQ with the same behavior, there is no need to consider the sequence. When the behavior is different, the sequence of messages must be considered.
2. MQ stores messages by default. It has a certain order by itself, following advanced First-out principle (in the case of a single MQ)
3. If there are multiple consumers subscribing to the same queue, the order of consumers may be disrupted.
4. If the broker is a cluster, because of different Messages may be stored in different brokers, and the order of individual consumers may be disrupted when getting messages
How to solve?
Ensure that the messages delivered by the producer are placed in the same broker, corresponding to only one consumer consumption (kafka sets the same key for the message, you can guarantee delivery to the same broker)
but the throughput of a single consumer is very low, you can use consumption The user obtains messages in batches, and then uses a memory queue to store (also calculate which memory queue is stored according to the key), each memory queue will only be processed by a corresponding thread
Each broker in kafka corresponds to a consumer.
In the same group, only one consumer will eventually consume the same message
12. Reasons for Kafka's high throughput
1. Use sequential writing to store my data
2. Both producers and consumers support batch processing. Producers deliver messages through asynchronous forms + buffers. Disadvantages: data may be lost
3. Zero copy of data, NIO itself Support
4. Store Topic partition
5. Compress data and reduce bandwidth transmission