Overview of the message queue and the next kafka

Commonly used messaging middleware

Messaging middleware is a very important component of the large data current, decoupling applications to solve, asynchronous communication, flow control and other issues, to build an efficient, flexible, synchronous and asynchronous message transmission process, store and forward, scalable, and the final consistency of the stabilization system. Currently more distributed messaging middleware industry applications include: ActiveMQ, RabbitMQ, Kafka, RocketMQ, though they are distributed messaging middleware, but each use messaging middleware difference is still very large.

ActiveMQ

  • Advantages: Apache open-source, full feature set, multi-document, historic, multi-language support for the client, easy to use.

  • Disadvantages: Performance is relatively low, only supports the master-slave architecture, scalability poor.

RabbitMQ

  • Advantages: in Erlang language, performance than ActiveMQ, feature-rich, multi-protocol support (AMQP, XMPP, SMTP).

  • Cons: Although the performance is better than ActiveMQ, but better than Kafka, RocketMQ there are still gaps, only supports master-slave mode. Poor scalability

Kafka

  • Advantages: Apache open source, very high performance, reliability, and scalability of distributed, multi-language support.

  • Cons: management tool less and less support agreement

RocketMQ

  • Advantages: Apache open source, java language, learning the concept of Kafka, inherited the high performance and scalability. Function better support enterprise applications, such as the timing message.

kafka design goals

Providing a message to the time complexity is O1 persistence mode capability, even if the data of the TB-level access capability can be ensured a constant time level.

High throughput, even on an inexpensive commercial machine, or stand-alone do 100,000 per second transfer rate.

News news support partition, and a distributed consumption, consumption order to ensure that each partition

Support for offline data processing and real-time data processing

Online support level expansion

Characteristics of Kafka

  • High throughput, low latency: kafka can process hundreds of thousands of messages per second, which is the minimum delay of a few milliseconds each topic can be divided into a plurality of partition, consumer group to consume operations for partition.

  • Scalability: kafka cluster support heat expansion

  • Durability, reliability: the message is persisted to local disk, and supports data backup to prevent data loss

  • Resilience: allows the nodes in the cluster fails (if the copy number is n, n-1 nodes allow failure)

  • High concurrency: support thousands of clients simultaneously read and write

Scenarios

Message System:

KafKa message as an excellent system with high throughput, the built-in partition, distributed redundant backup features, provides a good solution for large-scale processing of message;

Application Monitoring:

Use KafKa collection application and server health metrics, such as CPU usage, IO, memory, connections, TPS, QPS, etc., then the index information is processed

Stream processing:

We need to stream data has been collected, such as the system click, browse event, available to other streaming computing framework for processing, Spark Stream, Storm Flink;

Persistent log:

KafKa distributed system can provide a persistent log of external systems. Logs can be backed up among a plurality of nodes

Kafka concept

  • Message (message): a data, each message has a key and a corresponding value.

  • producer (producer): The message is posted to the topics. Producer decided to release the way to the topic of partition

  • consumer (consumer): Subscribers message

  • Consumer Group: consumer logical groups, a message can only be a Consumer Group same time consumption.

  • topic (theme): Classification of the message. Consumer to read the data by subscribing to Topic

  • partition (partition): a topic at least one partition, the partition does not guarantee the consumption of different order of the messages. More partitions sense with higher throughput. And it will open more file handles. Process open file handle is a big bottleneck Kafka system. Kafka broker using a local file system, which will affect Kafka to flow architecture development, perhaps later Kafka will support a distributed file system. As MapR Stream.

  • Broker: Kafka run in a distributed system / cluster approach. Each node in the cluster, said a Broker, is responsible for message persistence, you can scale.

  • Replication: Kafka message backup, data redundancy to ensure data is not lost as much as possible;

  • Offset (offset): the message is stored in the partition broker

  • ISR (In-sync Replica): a list of available synchronous replica, ISR <Replication, all copies of the message in the synchronization process, some node synchronization is slow, and if the difference Leader more, when the copy of the node it will be removed from the ISR, after the synchronization progress to keep up with leader rejoin ISR

Kafka innovation

Persistent message time: no need to track the case to read a particular message, set retention time of the message. Ensure delete the message after being read.

Consumers can manage their own message offset offset, Kafka messages can be stored on the file system, the read and read the message on the same document can be read sequential message. Thus Kafka message processing speed is very fast.

The drawback of Kafka

Topic number of partitions and issues, Kafka in front of thousands of topic, the performance will be very low.

Partition Load balancing manual are not automatic load balancing

There is no fixed serialization mechanism, if the large-scale use, not the same sequence mechanism can not be compatible communication

Lack of mirroring, Kafka simply forwards the message, producers and consumers but can not be transferred in the past

Next generation messaging systems -MapR Stream

Messaging system is known as the next generation platform using MapR distributed file system on a storage stronger than KafKa. Its application is a messaging system at the time of Kafka and other message queue can not be met, before consideration. The current architecture is request-responsive, unable to meet the real-time nature of real life, like the current broadcast system, or other real-time systems, real-time high demands on the news, Lord, are large amount of data, Kafka message Queuing may also not be met.

Guess you like

Origin www.cnblogs.com/Java-no-1/p/11029041.html