Kafka series (a) - Kafka Profile

I. Introduction

ApacheKafka is a distributed stream processing platform. It has the following features:

  • Support for publish and subscribe messaging, similar to RabbtMQ, ActiveMQ message queue and so on;
  • Support real-time data processing;
  • To ensure the reliability of message delivery;
  • Support persistent messages are stored, and distributed through multiple copies of a fault-tolerant storage solution to ensure the message;
  • High-throughput, single-Broker can easily handle thousands of megabits per second partition and volume level of the message.

Second, the basic concept

2.1 Messages And Batches

Kafka basic data unit is called Message (message), to reduce network overhead and improve efficiency, a plurality of messages are placed in the same batch (Batch) before writing.

2.2 Topics And Partitions

Kafka messages by Topics (topics) classification, a theme can be divided into several Partitions (partition), a partition is a commit log (commit log). In the manner written additional message partition, and then read in a FIFO order. Kafka for redundancy and scalability by partitioning the data partition may be distributed on different servers, which means a Topic may span multiple servers to provide more powerful performance than a single server.

Since the plurality of partitions comprises a Topic, and therefore can not guarantee the order of the messages in the entire range of Topic, but can guarantee the order of messages within a single partition.

2.3 Producers And Consumers

1. Producer

Producers are responsible for creating messages. Under normal circumstances, the producer on all partitions evenly distributed to the news in the subject, and the message does not care which partition will be written. If we want to write messages to the specified partition, can be achieved by a custom partitioner.

2. Consumers

Consumers are part of a group of consumers, consumers responsible consumption messages. Consumers can subscribe to one or more topics, and in order to read the message generator thereof. Consumers check message offset (offset) to distinguish read through the messages. The offset is an ever-increasing value, when you create a message, Kafka will add it to where, in a given partition, the offset of each message is unique. Consumers to offset last read each partition saved on Zookeeper or Kafka, if consumers turn off or restart, it can retrieve the offset to ensure that the read status will not be lost.

A partition can only be read with a consumer group inside a consumer, but can be read together more consumers in different consumer groups thereof. Multiple consumer groups consumers to read the same topic simultaneously and each other.

2.4 Brokers And Clusters

Kafka is a stand-alone server called Broker. Broker receiving a message from the producer is provided offset message, and submits the message to disk storage. Broker providing services to consumers, responding to a request to read the partition of return has been submitted to the disk message.

Broker is part of a cluster (Cluster) in. Each cluster will elect a Broker as a cluster controller (Controller), the cluster controller responsible for the management, including assigning the partition to monitor the Broker and Broker.

In a cluster, a partition (the Partition) Broker a slave, which is referred to as leader Broker partition (Leader). A partition can be assigned to multiple Brokers, partition copy will happen this time. This replication mechanism provides redundancy for messages partitions, if there is a Broker fails, the other can take over the leadership of the Broker.

Reference material

Neha Narkhede, Gwen Shapira, Todd Palino (a), Xue lamp life (translation). Kafka Definitive Guide Posts & Telecom Press. 2017-12-26

More big data series can be found GitHub open source project : Big Data Getting Started

Guess you like

Origin www.cnblogs.com/heibaiying/p/11371328.html