kafka Good Design

  Excellent design of the NIO-based programming 

Kafka underlying IO use the NIO, this thing is simple, but also need to mention. We have developed a distributed file system when the unavoidable need to think about what kind of IO? Poor performance of BIO, BIO much better performance than the NIO, and the difficulty of programming is not too large, and that is of course the best performance AIO, but the AIO programming more difficult, more complex code design it, so Kafka chose NIO , but also because of these reasons, we see a lot of open-source technologies are also used NIO.

Excellent design of high performance network design 

Personally I think that part of the network code design Kafka Kafka is relatively essential part of the whole. Step by step we are going to analyze Kafka Server-side support for ultra-high concurrency is how to design their network architecture?

Let us not look kafka own network architecture, we briefly look at the Reactor pattern:

   Figure 1 Reactor model

(1) First, the server creates ServerSocketChannel objects and registered OP_ACCEPT event on the Selector, ServerSocketChannel responsible for monitoring connections on the specified port.
(2) When the client initiates network connection request server when, Selector server listening to OP_ACCEPT event triggered Acceptor to handle OP_ACCEPT events.
(3) creates a correspondence when the Acceptor receives a socket from a client's request for the connection the SocketChannel, this SocketChannel set to non-blocking mode, and register it concerns the I / O event on the Selector. Such as: OP_WRITER, OP_READ event. At this point socket connection client and service side of the officially established.
(4) When the client through the established socket above sends a connection request to the server, the server will Selector OP_READ monitored event, and triggers a corresponding processing logic (read handler). When the server as a client to send a response, Selector server can listen to OP_WRITER events, and triggers a corresponding processing logic (writer handler).

We see this design is that all event processing is done in the same thread. This design is suitable for use in relatively small concurrent clients such scenes. If larger than concurrent, or there is a request processing logic to be more complex, time-consuming, it will affect all of the subsequent request, then it will lead to a lot of task timeout. To solve this problem, we make some adjustments to the above structure, as shown below:

FIG 2 Reactor Improved Model

Accept runs in a single thread, the thread uses ExecutorService achieve, because in that case, when the Accept thread exception exit, ExecutorService will create a new thread to compensate. Read handler which is also a thread pool, all of this inside threads are registered OP_READ events, is responsible for receiving client requests pass over, of course, a thread corresponds to a multiple socket connection. Read handler in thread after a request is received to the request are stored inside MessageQueue. Handler Poll thread pool threads will MessageQueue acquisition request from the queue inside, and then to process the request. This design, even if a request requires a lot of time-consuming, other threads Handler Poll thread pool to handle the request will be back, to avoid blocking the entire server. When the request is finished processing threads Sign OP_WRITER the event handler Pool, realize the function of sending a response to the client.

With this design to solve the problem of performance bottlenecks, but if the sudden occurrence of a large number of network I / O. Single Selector can become a performance bottleneck at the time of distribution of the event. So it's easy to want to separate Selector above should be extended to multiple, so it becomes a piece of architecture diagram chart below:

FIG 3 Reactor Improved Model

If we understand the above design in the future, go to understand Kafka's network architecture is much simpler, as shown below:

FIG 4 Kafka Network Model

This is Kafka's network architecture Server side, is in accordance with previous network architecture evolved out. Accepetor started after receiving a connection request, the request is received after the request is sent to each thread a thread pool (Processor) thread pool after the acquisition request, the request is packaged in its own cache SocketChannel one queue inside. Next to these upper register SocketChannel OP_READ event so that it can receive a request sent by the client, Processor threads put into the package received request Request object to deposit the RequestChannel RequestQueue queue. Then start a thread pool, the default is 8 threads to process the request queue inside. After completed processing the response corresponding to the corresponding ReponseQueue placed inside. Each thread gets Processor response, registered OP_WRITER event from its corresponding ReponseQueue inside the final response is sent to the client.

Personally I feel that the network design of the code Kafka's beautifully designed, because this network architecture to ensure high performance of kafka.

  Excellent written order Design 

A lot of people start to question kafka, we believe that a framework on how the disk system performance is guaranteed. This need to explain with you, the client data is first written to Kafka's operating system is written to the cache (so soon), then cache data is then written to disk in accordance with a certain strategy, and to write when the disk is sequential write, write the order number and if the number of revolutions of the disk to keep up, then almost caught up with the speed of a memory write!

 Table jump of excellent design, sloppy index, zero-copy

Above we see kafka by written order designed to ensure the efficient performance of writing that read data performance is how to design it? kafka is a messaging system, each message inside, there will be offset, if consumer spending offset a message when the positioning is how fast it?

 

01 / jump table

The following screenshot is stored in a file kafka our lines, there are two important documents, one index file is a log file.

 

FIG 5 Kafka storage file

 

log files are stored inside a message, index index information is stored, the file name these two files are the same, in pairs, this is offset log file name in the file name of the first message, file as the first file named 00000000000012768089, this document represents the first message of the offset is 12768089, which means that the second message is 12,768,090.

 

Kafka's in the code, we are one log file is stored in the ConcurrentSkipListMap, is a map structure, key using a file name (that is, offset), value is the log file contents. The ConcurrentSkipListMap jump table is a data structure design is based.

 

Figure 6 concurrentSkipListMap design

 

This way, we want to consume offset a certain size, you can quickly navigate to the log file, according jump table.

02 / sloppy Index

After the above steps, we just locate the log file that is just, but to consumption data specific physical location where? We must rely on kafka's sloppy indexed. We just assumed that positioning to be consumed is offset in 00000000000000368769.log file, if you want to traverse the entire file, then a offset a offset alignment, performance is certainly bad. This time we see the need to use just the index file, the file is kept inside offset messages and their corresponding physical location, but the index is not a deposit for each message are index information, but every few data only a deposit index information, this index file is actually very small, that is, for this reason we call sloppy index tube this way.

 

7 sloppy index

 

For example, now we have to consume offset equal to 368,776 messages, and how to position it according to index files? (1) First, look at the index file, index file data storage are in pairs, such as our representatives to 1,0 mean, offset = 368769 + 1 = 368770 physical location of this stored information is 0 this position. Now that the message we want to target is 368,776 this message, 368,776 minus 368,769 equals 7, we find offset equal to 7 corresponds to the physical location of files in the index, but the index because it is sloppy, we did not find, but we find the offset is equal to the physical value of 14,076.

(2) Next, log on to read the file location file 1407, and then traverse the back of the offset, will soon be able to traverse the data offset equal to 7 (368,776) of, and then the consumer can start from here.

03 / zero-copy

Then the consumer to read the data flow using a zero-copy technology, we look at the following non-zero copy process:

(1) operating system to load data from a disk file into the page cache kernel space;
(2) application will read from kernel space into user space data buffer;
(3) application will read the data written back to the kernel space and placed in the socket buffer;
(4) the operating system copies the data from the socket buffer to the network interface, data can be sent over the network at this time.

 

FIG 8 copy process nonzero

 

We found it on the map would involve two copies of data, Kafka here in order to improve performance, so we adopted a zero-copy, zero-copy "only copy the data to a disk file once the page cache, then the data directly from the page cache sent to the network (the transmission to different subscriber, can use the same cache page), to avoid duplication of copy operation, enhance the performance of the read data.

 

9 Zero-copy process

 Batch of Good Design 

In kafka-0.8 version of the design, the producer sends data to the server, is sent once a low throughput so, the later version which added the concept of buffer and batch submitted, all of a sudden throughput has improved a lot. The figure is a schematic diagram of the messages sent by the producer after the modification: (1) consumption is first packaged into ProducerRecord object.
(2) the sequence of the message (as related to network transmission).
(3) Use the partitioner to partition ( here will determine where the message is to be sent to).
(4) then do not worry down this message is sent, but is saved to the buffer.
(5) have a sender thread from the buffer zone taking in data, the data is encapsulated into a plurality of pieces of the batch, then a sent, because of the design of the batch transmission throughput improved exponentially.

Design of buffer 10 of FIG.

This cache is high-tech zone code, interested students can go read the source code.

Published 18 original articles · won praise 588 · Views 1.03 million +

Guess you like

Origin blog.csdn.net/hellozhxy/article/details/103979771