Basic concepts and architecture of Kafka

1. Overview of Kafka

1.1 Definition

Kafka is an open source distributed event streaming platform ( Event Streaming Platform), which is widely used in high-performance data pipelines, streaming

analytics, data integration, and mission-critical applications.

1.2 Comparison of message queues

At present, the more common message queue products mainly include Kafka, RabbitMQ, RocketMQand so on.

In big data scenarios, it is mainly used Kafkaas a message queue. JavaEEMainly used in development RabbitMQ, RocketMQ.

Several common MQcomparisons:

RabbitMQ RocketMQ Kafka
Company/Community Rabbit Ali Apache
Development language Erlang Java Scala&Java
protocol support AMQP,XMPP,SMTP,STOMP custom protocol custom protocol
availability high high high
Stand-alone throughput generally high very high
message delay microsecond level Millisecond within milliseconds
message reliability high high generally

Pursuit of usability: Kafka, RocketMQ,RabbitMQ

The pursuit of reliability: RabbitMQ,RocketMQ

Pursue throughput capacity: RocketMQKafka

Pursue low message latency: RabbitMQKafka

1.3 Application Scenarios of Traditional Message Queuing

The main application scenarios of traditional message queues include: caching/traffic peak shaving, decoupling and asynchronous communication.

1) Caching/traffic peak elimination

For example, the concurrent volume of Double Eleven has reached 200 million per second, but the processing speed of the business system is only 10 million per second.

The number of requests far exceeds the capacity of the system, and the system will crash and crash at this time.

If you use a message queue to receive these requests and cache them in the message queue, the system only needs to consume data at its own processing speed.

It just takes a little more time, but the availability of the entire business system is guaranteed.
insert image description here

It helps to control and optimize the speed of data flow through the system, and solve the situation that the processing speed of production messages and consumption messages is inconsistent.

2) Decoupling

Regardless of how the provider side and the consumer side change, there is no need for multiple sets of implementations. You only need to interact with the message queue.

The coupling degree and development cost of the system are greatly reduced.
insert image description here

Development is allowed to extend or modify the processing on both sides independently, as long as they ensure that they obey the same interface constraints.

3) Asynchronous communication

For example, in a recharge process, recharge is the most important task and must be executed immediately, while sending text messages is relatively less important.

In this way, we don't need to sequentially execute the two processes of recharging and sending SMS to increase the pressure on the system.

After the recharge is successful, the request for sending text messages can be written into the message queue, so that the consumer service can slowly consume these requests.

Even if the message is lost and the SMS is not successfully sent, it will not affect the core business (recharge), let alone cause system abnormalities.

insert image description here

Allows the user to put a message into the queue, but not process it immediately, and then process them when needed.

2. Two publishing modes of Kafka

2.1 Point-to-point mode

The consumer actively pulls the data, and clears the message after the message is received.
insert image description here

2.2 Publish/Subscribe Mode

There can be multiple topictopics (browse, like, favorite, comment, etc.)

After the consumer consumes the data, other consumers can continue to consume without deleting the data. As for when to delete the data, it will be dealt with later.

Therefore, each consumer is independent of each other and can consume data.

Since this mode is more adaptable to more complex business environments, the publish-subscribe mode is used in most cases.

insert image description here

3. The infrastructure of Kafka

3.1 Infrastructure

insert image description here
1. In order to facilitate expansion and improve throughput, one topicis divided into multiple partition(partitions), and each partition is stored on a different Kafkanode.

The advantage of partitioning is that if a Brokernode can only store 1Tdata, but there is a large amount of data at this time 2T, partitioning can be used at this time to store data

Stored Brokeron two nodes respectively.

2. With the partition design, the concept of consumer group is proposed, and each consumer in the group consumes in parallel.

partition3. In order to improve availability, add several copies for each , and only one copy is Leader, the others are Follower, consumers

Only consume Leaderdata. If Leaderit hangs, there will be Followerelected as the new one Leader.

4. ZKRecord who is leader, but Kafka2.8.0it can also be configured not to use in the future ZK, and it ZKis also a trend not to use it in the future, because it has been

It has become a bottleneck of Kafka.

3.2 Role Description

  • Producer: The message producer is the client that Kafka brokersends messages .
  • Consumer: Message consumer, the client that Kafka brokerfetches messages.
  • Consumer Group( CG): consumer group, consumerconsisting . Each consumer in a consumer group is responsible for consuming data from different partitions, and a partition can only be consumed by one consumer in the group; consumer groups do not affect each other. All consumers belong to a consumer group, that is, a consumer group is a logical subscriber.
  • Broker: A Kafkaserver is just one broker. A cluster consists of multiple broker's. A brokercan hold multiple topic.
  • Topic: It can be understood as a queue, and both the producer and the consumer are facing a topic.
  • Partition: partition. In order to achieve scalability, a very large topiccan be distributed to multiple broker(ie servers), a topiccan be divided into multiple partition, each partitionis an ordered queue.
  • Replica: copy. topicEach partition of a has several replicas, consistingLeader of one and several .Follower
  • Leader: The "primary" of multiple replicas per partition, the objects that producers send data to, and the objects that consumers consume data from Leader.
    List.
  • Replica: copy. topicEach partition of a has several replicas, consistingLeader of one and several .Follower
  • Leader: The "primary" of multiple replicas per partition, the objects that producers send data to, and the objects that consumers consume data from Leader.
  • Follower: The "slave" node in multiple copies of each partition, synchronizes data from Leaderin , and maintains Leadersynchronization with data. LeaderWhen a failure occurs, Followerone becomes the new one Leader.

Guess you like

Origin blog.csdn.net/qq_44749491/article/details/129911221