Kafka (1): Introduction

Introduction

  • Kafka is a distributed streaming platform (message system).

  • Kafka is used for stream processing, website activity tracking, index collection and monitoring, log aggregation, real-time analysis, CEP, importing data into Spark, importing data into Hadoop, CQRS, replaying messages, error recovery, and guaranteeing distributed submissions for memory calculation Microservices).

  • Kafka is used for real-time streams of data, to collect big data, or to do real time analysis (or both) .

  • Kafka is a distributed streaming platform that is used publish and subscribe to streams of records.

    • Kafka is a distributed streaming platform for publishing and subscribing to record streams --- we can publish or get message records through Kafka.

    • Backup mechanism to ensure data security: Kafka is used for fault tolerant storage (for fault-tolerant storage) --- Kafka replicates topic log partitions to multiple servers. (Copy topic log partitions to multiple servers)

    • High-throughput, low-latency Kafka is designed to allow your apps to process records as they occur. (So that you can operate records in real time)

    • Efficient persistence provides stability and speed: Kafka is fast and uses IO efficiently by batching and compressing records. Effectively uses IO through batch processing and compressing records.

    • Kafka is used for decoupling data streams. (Kafka is used for decoupling data streams)

  • Kafka is used to stream data into data lakes, applications, and real-time stream analytics systems. (You can send data streams to data pools, applications, or real-time data analysis systems, such as Hadoop)

    • High concurrency

  • For log data and offline analysis systems like Hadoop, but requiring real-time processing limitations, Kafka is a viable solution. The purpose of Kafka is to unify online and offline message processing through a parallel loading mechanism, and also to provide real-time consumption through cluster machines.

    Kafka Government Network: http://kafka.apache.org/

    Help documentation page: http://kafka.apache.org/documentation.html

    wiki page: https://cwiki.apache.org/confluence/display/KAFKA/Index

Kafka structure

 

 

  1. Broker: A Kafka cluster contains one or more servers. These servers are called brokers.

  2. Topic: Every message posted to the Kafka cluster has a category, and this category is called Topic. (Physically, messages of different topics are stored separately; logically, messages of a topic are stored on one or more brokers, but users only need to specify the topic of the message to produce or consume data without having to care where the data is stored)

  3. Partition: topic is a partition (partition), Partition is a physical concept, each Topic contains one or more Partitions.

  4. Producer: Message producer, responsible for publishing messages to Kafka broker.

  5. Consumer: A message consumer, a client that reads messages from Kafka broker.

  6. Consumer Group: Each Consumer belongs to a specific Consumer Group (you can specify the group name for each Consumer, if you do not specify the group name, it belongs to the default group).

Topic

  • Each topic is a summary of a group of messages. Kafka partitions each topic.

  • Each partition is composed of a series of ordered, immutable messages, which are continuously appended to the partition.

  • Each message in the partition has a continuous sequence number called offset, which is used to uniquely identify the message in the partition.

  • Within a configurable period of time, the Kafka cluster retains all published messages, whether or not they are consumed.

    • For example, if the message retention policy is set to 2 days, then a message can be consumed within two days of being posted. It will then be discarded to free up space. Kafka's performance is constant level independent of the amount of data, so keeping too much data is not a problem (as long as there are enough disks).

  • The only data that each consumer needs to maintain is the position of the message in the log , which is the offset. By resetting this value, the old message data can be read.

Partition

  • Each partition has copies in several services in the Kafka cluster, so that these services that hold copies can jointly process data and requests, and the number of copies can be configured. The copy makes Kafka fault-tolerant.

  • Each partition has one server as "leader" and zero or several servers as "followers"

    • The leader is responsible for handling the reading and writing of messages

    • followers copy the leader

    • If the leader goes down, one of the followers will automatically become the leader.

    • Each service in the cluster will play two roles at the same time: as a leader of a part of the partition it holds, and as a follower of other partitions, so that the cluster will have better load balancing.

  • Leader and follower management are managed through zk clusters.

Producer

  • Producer refers to the message producer of the Kafka cluster. Producer pushes the message to the topic it specifies and is responsible for deciding which partition to publish to.

    • Usually, the partition is randomly selected by the load balancing mechanism, but the partition can also be selected by a specific partition function. Generally speaking, the second method is more commonly used now.

Consumer

  • There are usually two modes for pulling (pull) messages: queuing (queuing) and publish-subscribe (publish-subscribe).

  • In the queue mode, consumers can read messages from the server at the same time, and each message is only read by one of the consumers;

  • In the publish-subscribe model, messages are broadcast to all consumers.

  • Consumers can join a consumer group, each group has and only one server can capture messages.

  • If all consumers are not in different groups, this becomes a publish-subscribe model, and all messages are distributed to all consumers.

  • If all consumers are in the same group, it is the queue mode.

Difference from Flume

 

Guess you like

Origin www.cnblogs.com/renzhongpei/p/12749032.html