Kafka Architecture Design of Distributed Publish-Subscribe Messaging System

Consumer state

In Kafka, the consumers are responsible for maintaining state information (offset) on what has been consumed. Typically, the Kafka consumer library writes their state data to zookeeper. However, it may be beneficial for consumers to write state data into the same datastore where they are writing the results of their processing. For example, the consumer may simply be entering some aggregate value into a centralized transactional OLTP database. In this case the consumer can store the state of what is consumed in the same transaction as the database modification. This solves a distributed consensus problem, by removing the distributed part! A similar trick works for some non-transactional systems as well. A search system can store its consumer state with its index segments. Though it may provide no durability guarantees,this means that the index is always in sync with the consumer state: if an unflushed index segment is lost in a crash, the indexes can always resume consumption from the latest checkpointed offset. Likewise our Hadoop load job which does parallel loads from Kafka, does a similar trick. Individual mappers write the offset of the last consumed message to HDFS at the end of the map task. If a job fails and gets restarted, each mapper simply restarts from the offsets stored in HDFS.If a job fails and gets restarted, each mapper simply restarts from the offsets stored in HDFS.If a job fails and gets restarted, each mapper simply restarts from the offsets stored in HDFS.
There is a side benefit of this decision. A consumer can deliberately rewind back to an old offset and re-consume data. This violates the common contract of a queue, but turns out to be an essential feature for many consumers. For example, if the consumer code has a bug and is discovered after some messages are consumed, the consumer can re-consume those messages once the bug is fixed.
译者信息
使用者的状态

In Kafka, it is the consumer's responsibility to maintain state information (offsets) reflecting which messages have been consumed. Typically, the Kafka consumer's library stores state data in Zookeeper. However, it may be better for consumers to save state information to the same datastore that holds the results of their message processing. For example, a consumer might just want to store some statistics in a centralized transactional OLTP database, in which case the consumer could store message usage status information in the same transaction that made changes to that database data. This removes the distributed part, thus solving the consistency problem in distribution! Similar tricks are available in non-transactional systems. The search system may store user state information with its index segment. Although this may not guarantee the durability of the data, it can be used to keep the index in sync with the user state information: if some index segment information that has not been flushed to disk is lost due to downtime, we can always use the Index processing continues at the offset from the last checkpoint. Similarly, Hadoop's load job loads in parallel from Kafka, and the same trick is available. Each mapper stores the offset of the last message it used in HDFS before the map task ends.
There is an additional benefit to this decision. The user can deliberately rewind (rewind) to the previous offset and use the previously used data again. Although doing so violates the general contract of queues, it is a very basic functionality for many users. For example, if there is a bug in the user's code, and it is discovered after it has processed some messages, then when the bug is corrected, the user has a chance to reprocess those messages.
push vs. pull

A related question is whether consumers should pull data from brokers or brokers should push data to the subscriber. In this respect Kafka follows a more traditional design, shared by most messaging systems, where data is pushed to the broker from the producer and pulled from the broker by the consumer. Some recent systems, such as scribe and flume, focusing on log aggregation, follow a very different push based path where each node acts as a broker and data is pushed downstream. There are pros and cons to both approaches. However a push-based system has difficulty dealing with diverse consumers as the broker controls the rate at which data is transferred. The goal, is generally for the consumer to be able to consume at the maximum possible rate;unfortunately in a push system this means the consumer tends to be overwhelmed when its rate of consumption falls below the rate of production (a denial of service attack, in essence). A pull-based system has the nicer property that the consumer simply falls behind and catches up when it can. This can be mitigated with some kind of backoff protocol by which the consumer can indicate it is overwhelmed, but getting the rate of transfer to fully utilize (but never over-utilize) the consumer is trickier than it seems. Previous attempts at building systems in this fashion led us to go with a more traditional pull model.This can be mitigated with some kind of backoff protocol by which the consumer can indicate it is overwhelmed, but getting the rate of transfer to fully utilize (but never over-utilize) the consumer is trickier than it seems. Previous attempts at building systems in this fashion led us to go with a more traditional pull model.This can be mitigated with some kind of backoff protocol by which the consumer can indicate it is overwhelmed, but getting the rate of transfer to fully utilize (but never over-utilize) the consumer is trickier than it seems. Previous attempts at building systems in this fashion led us to go with a more traditional pull model.

Another problem related to the push and pull of translator information

is whether the user should pull the data from the agent or should the agent push the data to the user. Like most messaging systems, Kafka follows a more traditional design idea in this regard: the producer pushes data to the broker, and the consumer pulls the data back from the broker. Recently, some systems, such as scribe and flume, focus more on the log statistics function and follow a very different push-based design idea, in which each node can act as a proxy, and data is always pushed downstream. Both of the above methods have advantages and disadvantages. However, since the proxy in the Push-based system controls the data transfer rate, it is difficult to cope with a large number of different kinds of users. Our design goal is to allow the user to consume data at its maximum rate. Unfortunately, users tend to be overloaded (effectively a denial of service attack) when data is consumed at a lower rate than it is produced in a Push system. Pull-based systems perform better when the user's processing speed is slightly behind, and also allows users to catch up when they can. Part of the problem can be solved by having the user use some kind of backoff protocol to indicate to the proxy that it is overloaded, but adjusting the transfer rate just enough to fully utilize (but never over-utilize) the user's processing power is comparable to the original It looks much harder. We've tried many times before to build systems this way, and the lessons learned led us to choose the more general Pull model.
Distribution

Kafka is built to be run across a cluster of machines as the common case. There is no central "master" node. Brokers are peers to each other and can be added and removed at anytime without any manual configuration changes. Similarly, producers and consumers can be started dynamically at any time. Each broker registers some metadata (eg, available topics) in Zookeeper. Producers and consumers can use Zookeeper to discover topics and to co-ordinate the production and consumption. The details of producers and consumers will be described Below .
Translator Information
Distributing

Kafka is usually running on servers in a cluster. There is no central "master" node. Proxies are peers to each other and can be added and removed at any time without any manual configuration. Likewise, producers and consumers can be started at any time. Each broker can register some metadata (e.g. topics available) in Zookeeper (distributed coordination system). Producers and consumers can use Zookeeper to discover topics and coordinate with each other. Details about producers and consumers will be described below.
Producer

Automatic producer load balancing

Kafka supports client-side load balancing for message producers or use of a dedicated load balancer to balance TCP connections. A dedicated layer-4 load balancer works by balancing TCP connections over Kafka brokers. In this configuration all messages from a given producer go to a single broker. The advantage of using a level-4 load balancer is that each producer only needs a single TCP connection, and no connection to zookeeper is needed. The disadvantage is that the balancing is done at the TCP connection level, and hence it may not be well balanced (if some producers produce many more messages than others, evenly dividing up the connections per broker may not result in evenly dividing up the messages per broker).
译者信息
生产者

生产者自动负载均衡

For producers, Kafka supports client-side load balancing, and a dedicated load balancer can also be used to load balance TCP connections. A dedicated Layer 4 load balancer load balances TCP connections on top of the Kafka broker. In this configuration, messages sent by a given producer are sent to a single broker. The advantage of using a layer 4 load balancer is that each producer only needs a single TCP connection and does not need to establish any connection to Zookeeper. The downside is that all balancing work is done at the level of TCP connections, so the balancing effect may not be good (if some producers produce far more messages than others, the TCP connections are evenly distributed per proxy may cause the total number of messages received by each broker to be uneven).
Client-side zookeeper-based load balancing solves some of these problems. It allows the producer to dynamically discover new brokers, and balance load on a per-request basis. Likewise it allows the producer to partition data according to some key instead of randomly, which enables stickiness on the consumer (eg partitioning data consumption by user id). This feature is called "semantic partitioning", and is described in more detail below.
The working of the zookeeper-based load balancing is described below. Zookeeper watchers are registered on the following events—
a new broker comes up
a broker goes down
a new topic is registered
a broker gets registered for an existing topic
Internally, the producer maintains an elastic pool of connections to the brokers, one per broker. This pool is kept updated to establish/maintain connections to all the live brokers, through the zookeeper watcher callbacks. When a producer request for a particular topic comes in, a broker partition is picked by the partitioner (see section on semantic partitioning). The available producer connection is used from the pool to send the data to the selected broker partition.
译者信息
Part of the problem can be solved by using client-side load balancing based on zookeeper. Doing so would allow producers to dynamically discover new proxies and load balance them by the number of requests. Similarly, it also allows producers to partition data according to some key value (key) instead of random, thus preserving the relationship with the user (for example, partitioning data usage by user id) ). This partitioning is called "semantic partitioning" and is discussed in detail below.
The following explains the working principle of load balancing based on zookeeper. The zookeeper watcher is registered when the following events occur:
a new agent
is an agent is offline,
a new topic is registered, and an
agent is registered with an existing topic.
The producer internally maintains an elastic connection (connection to the proxy) pool for each proxy. By using the zookeeper monitor's callback function, the connection pool is updated when establishing/maintaining connections to all live brokers. When a producer asks to enter a specific topic, a proxy partition is selected by the partitioner (see the Semantic Partition Summary). Find the available producer connection from the connection pool and send the data to the broker partition just selected through it.
Asynchronous send

Asynchronous non-blocking operations are fundamental to scaling messaging systems. In Kafka, the producer provides an option to use asynchronous dispatch of produce requests (producer.type=async). This allows buffering of produce requests in a in-memory queue and batch sends that are triggered by a time interval or a pre-configured batch size. Since data is typically published from set of heterogenous machines producing data at variable rates, this asynchronous buffering helps generate uniform traffic to the brokers, leading to better network utilization and higher throughput.
译者信息
异步发送

Asynchronous, non-blocking operations are essential for a scalable messaging system. In Kafka, the producer has an option (producer.type=async) that can be specified to use asynchronous dispatch of the produce request (produce request). This allows production requests to be buffered using an in-memory queue, and then sent out in batches at some time interval or pre-configured batch size. Because data is typically published from a heterogeneous set of machines producing data at different data speeds, this asynchronous buffering helps to generate even and consistent traffic for proxies, resulting in more optimal network utilization and higher throughput.
Semantic partitioning

Consider an application that would like to maintain an aggregation of the number of profile visitors for each member. It would like to send all profile visit events for a member to a particular partition and, hence, have all updates for a member to appear in the same stream for the same consumer thread. The producer has the capability to be able to semantically map messages to the available kafka nodes and partitions. This allows partitioning the stream of messages with some semantic partition function based on some key in the message to spread them over broker machines. The partitioning function can be customized by providing an implementation of the kafka.producer.Partitioner interface, default being the random partitioner. For the example above, the key would be member_id and the partitioning function would be hash(member_id)%num_partitions.
Translator Information
Semantic Partitioning

Let's see what a program would do if it wanted to count the total number of visitors to a personal space for each member. All personal space access events for a member should be sent to a particular partition, so all updates to a member can be placed in the same event stream in the same consumer thread. Producers have the ability to semantically map messages onto valid Kafka nodes and partitions. In this way, a semantic partition function can be used to partition the message flow according to a certain key value in the message, and send different partitions to their corresponding brokers. The partition function can be customized by implementing the kafak.producer.Partitioner interface. The random partition function is used by default. In the above example, the key value should be member_id, and the partition function can be hash(member_id)%num_partitions.
Support for Hadoop and other batch data load

Scalable persistence allows for the possibility of supporting batch data loads that periodically snapshot data into an offline system for batch processing. We make use of this for loading data into our data warehouse and Hadoop clusters.
Batch processing happens in stages beginning with the data load stage and proceeding in an acyclic graph of processing and output stages (eg as supported here). An essential feature of support for this model is the ability to re-run the data load from a point in time (in case anything goes wrong).
In the case of Hadoop we parallelize the data load by splitting the load over individual map tasks, one for each node/topic/partition combination, allowing full parallelism in the loading. Hadoop provides the task management, and tasks which fail can restart without danger of duplicate data.
Translator information
Support for Hadoop and other batch data loading The

scalable persistence scheme enables Kafka to support batch data loading and can periodically load snapshot data for Offline system for batch processing. We use this feature to load data into our data warehouse and Hadoop cluster.
Batch processing starts at the data loading stage, then moves on to the acyclic graph processing and output stage (supports here). An important feature to support this processing model is the ability to reload data from a certain point in time (in case any errors in processing occur).
For Hadoop, we parallelize the loading of data by splitting the loading tasks on top of a single map task, one for each combination of nodes/topics/partitions. Hadoop provides task management so that failed tasks can be repeated without the danger of data being duplicated.
Implementation Details

The following gives a brief description of some relevant lower-level implementation details for some parts of the system described in the above section.
API Design

Producer APIs

The Producer API that wraps the 2 low-level producers -kafka.producer.SyncProducerandkafka. producer.async.AsyncProducer.class
Producer {

  /* Sends the data, partitioned by key to the topic using either the */
  /* synchronous or the asynchronous producer */
  public void send(kafka.javaapi.producer.ProducerData producerData);

  /* Sends a list of data, partitioned by key to the topic using either */
  /* the synchronous or the asynchronous producer */
  public void send(java.util. List< kafka.javaapi.producer.ProducerData> producerData);

  /* Closes the producer and cleans up */
  public void close();

}
Translator information
Implementation details

The following are some low-level related functions described in the previous section A brief description of the details of implementing certain parts of the system.
API Design

Producer APIs

Producer API is a repackage for two underlying producers - kafka.producer.SyncProducer and kafka.producer.async.AsyncProducer.class
Producer {

  /* Sends the data, partitioned by key to the topic using either the * /
  /* synchronous or the asynchronous producer */
  public void send(kafka.javaapi.producer.ProducerData producerData);

  /* Sends a list of data, partitioned by key to the topic using either */
  /* the synchronous or the asynchronous producer */
  public void send(java.util.List< kafka.javaapi.producer.ProducerData> producerData);

  /* Closes the producer and cleans up */
  public void close();

}
The goal is to expose all the producer functionality through a single API to the client. The new producer -
can handle queueing/buffering of multiple producer requests and asynchronous dispatch of the batched data -
kafka.producer.Producerprovides the ability to batch multiple produce requests (producer.type=async), before serializing and dispatching them to the appropriate kafka broker partition. The size of the batch can be controlled by a few config parameters. As events enter a queue, they are buffered in a queue, until eitherqueue.timeorbatch.sizeis reached. A background thread (kafka.producer.async.ProducerSendThread) dequeues the batch of data and lets thekafka.producer.EventHandlerserialize and send the data to the appropriate kafka broker partition. A custom event handler can be plugged in through theevent.handlerconfig parameter. At various stages of this producer queue pipeline, it is helpful to be able to inject callbacks, either for plugging in custom logging/tracing code or custom monitoring logic. This is possible by implementing thekafka.producer.async.CallbackHandlerinterface and settingcallback.handlerconfig parameter to that class.
handles the serialization of data through a user-specifiedEncoder-
interface Encoder<T> {
  public Message toMessage(T data);
}
The default is the no-opkafka.serializer.DefaultEncoder
provides zookeeper based automatic broker discovery -
The zookeeper based broker discovery and load balancing can be used by specifying the zookeeper connection url through thezk.connectconfig parameter. For some applications, however, the dependence on zookeeper is inappropriate. In that case, the producer can take in a static list of brokers through thebroker.listconfig parameter. Each produce requests gets routed to a random broker partition in this case. If that broker is down, the produce request fails.
provides software load balancing through an optionally user-specifiedPartitioner-
The routing decision is influenced by thekafka.producer.Partitioner.
interface Partitioner<T> {
   int partition(T key, int numPartitions);
}
The partition API uses the key and the number of available broker partitions to return a partition id. This id is used as an index into a sorted list of broker_ids and partitions to pick a broker partition for the producer request. The default partitioning strategy ishash(key)%numPartitions. If the key is null , then a random broker partition is picked. A custom partitioning strategy can also be plugged in using the partitioner.classconfig parameter.
Translator Information
The purpose of this API is to expose all the functionality of the producer to its consumers (client ). Newly created producers can:
Queue/buffer multiple producer requests and send batches of data asynchronously - kafka.producer.Producer provides a way to batch multiple production requests before serializing them and sending them to the appropriate Kafka broker partition capability(producer.type=async). The batch size can be controlled by some configuration parameters. When the event enters the queue, it will be put into the queue for buffering, until the time reaches queue.time or the batch size reaches batch.size, the background thread (kafka.producer.async.ProducerSendThread) will take this batch of data out of the queue and submit it to Serialize to kafka.producer.EventHandler and send to appropriate kafka broker partition. Through the event.handler configuration parameter, a custom event handler can be inserted into the system. It would be useful to be able to inject callbacks at various stages in the producer queue pipeline in order to insert custom logging/tracing code or custom monitoring logic. Injection can be achieved by implementing the kafka.producer.asyn.CallbackHandler interface and setting the configuration parameter callback.handler to the implementing class.
Use the user-specified Encoder to process data serialization (serialization)
interface Encoder<T> {
  public Message toMessage(T data);
}
The default value of the Encoder is kafka.serializer.DefaultEncoder which does nothing.
Provides a zookeeper-based proxy auto-discovery function - By specifying the zookeeper's connection url using the zk.connect configuration parameter, you can use the zookeeper-based proxy discovery and load balancing functions. In some applications, it may not be suitable to rely on zookeeper. In this case, the producer can obtain a static list of brokers from the broker.list configuration parameter, and each production request will be randomly assigned to each broker partition. If the corresponding proxy goes down, then the production request fails.
Provides software-implemented load balancing through the use of an optional, user-specified Partitioner - data routing decisions are influenced by kafka.producer.Partitioner.
interface Partitioner<T> {
   int partition(T key, int numPartitions);
}
The partition API returns a partition id based on the associated key value and the number of proxy partitions the system has. Using that id as an index, finds a broker partition for the corresponding producer request in the sorted list of broker_id and partition. The default partition strategy is hash(key)%numPartitions. If key is null, a random selection is made. Custom partitioning strategies can also be inserted using the partitioner.class configuration parameter.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326495945&siteId=291194637