Kafka breath to say why so fast?

Author: Zhong Tao compiled

Source: https://developer.51cto.com/art/202003/613487.htm

Blog: https://blog.yilon.top

In the past few years, the field of software architecture has undergone tremendous changes. People no longer believe that all systems should be a shared database.

Micro service, event-driven architecture and CQRS (liability command query separation Command Query Responsibility Segregation) is the main tool for building modern business applications.

In addition, networking, the proliferation of mobile devices and wearable devices, further near-real-time capability of the system posed a challenge.

First, let us reach a consensus on the "fast" is the word, the word is multifaceted, complex, highly ambiguous. One explanation is that the "delay, throughput and jitter" as a measure of "fast".

There are, for example, industrial applications, industry itself to set up norms and expectations for the "fast". Therefore, the "fast" depends largely on what your frame of reference Yes.

Apache Kafka at the expense of latency and jitter at the expense of optimized throughput, but did not sacrifice, such as persistence, strict order and record at least once distributed semantics.

When someone says, "Kafka fast," and when they have at least some ability to assume you can think of them refers to the ability to distribute a large number of Kafka recorded in a short time.

Kafka born in LinkedIn, LinkedIn was needed to efficiently transfer large amounts of information, corresponding to the amount of data transmission per hour of TB.

At the time, the message propagation delay is considered acceptable. After all, LinkedIn is not a financial institution engaged in high-frequency trading, nor is it industrial control system running within a determined period. Kafka can be used for near-real-time systems.

Note: The "real-time" does not mean "fast", which means "predictable." Specifically, the real-time action with the means to complete a time limit, which is the deadline.

If a system does not meet this requirement, it can not be classified as a "real-time systems." Delay can tolerate a range of system is known as "near real time" system. From a certain perspective, real-time systems is usually slower than real-time or near real-time system.

Kafka on the speed There are two important aspects need to be discussed separately:

  • And inefficiency between the client and server implementation dependent.
  • Derived from parallelism stream processing.

Server optimization

Log storage

Kafka pointwise, additional logs manner, to a large extent limited to sequential read and write I / O (sequential I / O), which are fast on most storage medium. It is widely mistakenly believe that the hard disk is slow.

However, the performance of the storage medium, largely depends on the mode data is accessed. Also in a conventional 7200 RPM SATA hard disk, a random I / O (Random I / O) as compared to sequential I / O, random I / O performance than sequential I / O slower 3-4 orders of magnitude.

Furthermore, modern operating systems provide pre-read and write-techniques, these techniques may be in units of blocks, reading large amounts of data in advance, and smaller logical write operations into larger physical write operation.

Thus, sequential I / O and random I / O performance difference between the still evident in flash and other nonvolatile solid-state media, but store them in rotation, such as solid state hard drive performance difference is not so obvious.

Batch records

Sequential I / O on most storage media is very fast, comparable to the highest performance of the network I / O's. In practice, this means that a well-designed log persistence layer can keep up with reading and writing speed of the network. In fact, Kafka performance bottlenecks usually not on the hard disk, but the network.

Thus, in addition to the operating system of the batch, Kafka the client and server may accumulate a batch of a plurality of records - records including reading and writing, and then sent through the network.

Batch records can ease the cost of round-trip network, using larger packet, increase bandwidth efficiency.

Batch compression

When compression is enabled, the impact on the batch was particularly evident, because with the increase in the size of the data compression generally become more effective.

In particular when using text-based format, such as the JSON, the effect will be very obvious compression, the compression ratio is typically between 5x to 7x.

In addition, the batch record as a major client operating load in the delivery process, not only has a positive impact on network bandwidth, but also have a positive impact on the service side of the disk I / O utilization.

Cheap consumer

Unlike traditional model of the message queue when the message deletes the message is consumed (cause random I / O), Kafka does not delete them after the message is consumed - instead, it would be tracked independently of each consumer group Offset.

You can refer to Kafka's internal theme __consumer_offsets learn more. Also, since the only additional operation so fast. Size of the message is further reduced (using Kafka compression properties) in the background, leaving only the last known offset any given group of consumers.

This model is the model with the traditional comparison message, which typically provides several different message distribution topology.

One is the message queue - persistent transmission point for messaging, no multipoint functionality.

Another theme is the publish-subscribe messaging allows multipoint communication, but the cost of doing so is persistence. Multipoint message communication model for persistence in the conventional message queue model need to maintain a dedicated message queue for each user stateful.

This amplified read consumption. Writes messages to message producers are driven a plurality of message queues. Another option is to use a fan relay, the relay may fan out from a purchase history in the queue, and writes records to the other plurality of queues, but this will only be a larger delay.

Also, some consumers are generated on the server load - mixed I / O read and write, both sequences, there are random.

Kafka The consumer is "cheap", so long as they do not change log files (only internal process or Kafka producers are allowed to do so).

This means that a large number of consumers can concurrently read data from the same subject, without the cluster collapse. Adding a consumer still has some costs, but mainly written order to read a few mixed order.

Thus, in a variety of consumer systems, we see a theme to be shared fairly normal.

Not refresh buffer write

Another fundamental reason Kafka performance is a worthy reason for further study: Kafka did not call fsync before confirming the write operation. The only requirement is that the recording has been written ACK I / O buffers.

It is a little known fact, but it is a crucial fact. In fact, this is Kafka's execution, as if it were a memory queue as --Kafka is actually a memory queue supported by the disk (restricted buffer / page cache size).

However, this form of writing is unsafe, because a copy of the error may cause data loss, even if the recording seems to have been ACK.

In other words, relational databases, write only buffer does not mean persistence. Kafka guarantee durability is a copy run several synchronized.

Even if a mistake, the other (assuming that more than one) will continue to run - assuming that the cause of the error does not cause the other copy is also wrong.

Thus, no fsync nonblocking I / O and redundant synchronous replica method combination provides high throughput, durability and availability Kafka.

Client optimization

Most databases, queues, and other forms of persistent middleware are built around all-around server (or cluster of servers) and thin client concept design.

Client implementation is generally considered to be much simpler than the server. The server will handle most of the load, but only act as client service side of the facade.

Kafka uses a different client design. Before recording reaches the server, it performs a lot of work on the client.

This record includes segmenting the accumulator, the recording key to give the correct hash index partition, parity, and the records of the recording batch compression.

The client knows cluster metadata and periodically refresh the metadata to keep up with changes to the server topology. This allows the client to more accurately make forwarding decisions.

Unlike blindly send a record to the cluster and forwards it to the latter rely on the appropriate node, the producer client may be directly forwarded to the partitioned host write request.

Similarly, the consumer client can make more informed decisions when obtaining records, such as when a read queries, you can use a copy of the consumer closer to clients geographically. (This feature is available starting 2.4.0 version of Kafka.)

Zero-copy

A typical approach is inefficient replication of data bytes between the buffer. Kafka shared use by the producer, the consumer, the server tripartite binary message format, so that even if the data block is compressed, may pass data without modification.

While eliminating the structural differences between the two sides of communication data is an important step, but it alone will not avoid the replicated data.

Kafka uses Java NIO framework, in particular java.nio.channels.FileChannel of transferTo () method on Linux and UNIX systems solve this problem.

This method allows the transmission of bytes from the source channel to the receiving channel, without the need to transmit the application as an intermediary.

NIO understand the differences, the traditional method would Consider how to do, the source channel into the byte buffer, and then written into the receive channel as two separate operations:

File.read(fileDesc, buf, len); Socket.send(socket, buf, len); 

FIG represented by the following:img

Although FIG pair looks simple, but internally, the copy operation is required four times a context switch between user mode and kernel mode, and the data to be replicated four times before the operation is completed.

The diagram below summarizes each context switching step:

=img

Detailed description:

  • The initial read () method causes a context switch from user mode to kernel mode. File is read, its contents are copied DMA (Direct Memory Access Direct Memory Access) engine to the kernel address space buffer. This is used in the code segment of the buffer is different.
  • Before read () method returns, the data is copied from the kernel buffer to the user buffer space. At this point, our application can read the contents of the file.
  • Subsequent send () method will switch back to kernel mode, copy data from user space to the kernel buffer address space - this time to copy the data to another buffer associated with the target socket. In the background, taken over by the DMA engine asynchronously copy data from the kernel buffer to the protocol stack. send () method does not wait before returning to complete this operation.
  • send () method call returns, cut back to the user state.

Despite the low efficiency of context switching between the user mode and kernel mode, but also the need for an extra copy, but in many cases, it can improve performance.

It can serve as a read-ahead caching, asynchronous prefetch, which runs a request from an application in advance. However, when the amount of data requested is much larger than the buffer size of the kernel, the kernel buffer becomes the performance bottleneck.

Unlike direct copy data, but forces the system is frequently switched between user mode and kernel mode, until all data has been transmitted.

In contrast, the zero-copy is processed in a single operation. The previous example code line of code can be rewritten as:

fileDesc.transferTo(offset, len, socket); 

It is explained in detail below zero copy:

img

In this model, the number of context switches to a decrease. Specifically, transferTo () method indicated by the DMA engine block device data is read into the read buffer.

Then, the read data is copied from the buffer to the socket buffer. Finally, the data is copied from the socket buffer through DMA buffers to the NIC

Therefore, we will copy number is reduced from four to three, and only one copy operation relates to the CPU. We also the number of context switches is reduced from four to two.

This is a huge improvement, but not a zero-copy query. When you run the Linux kernel 2.4 or later, and gather support in the operation of the card, can be further optimized.

As shown below:

img

According to the previous example, call transferTo () method will cause the device by the DMA engine to read data into kernel buffers.

However, for gather operations, replication does not exist between the read buffer and the socket buffer. In contrast, the NIC is given a pointer to the read pointer of the buffer, along with offset and length. In any case, CPU is not involved in the copy buffer.

File size range from several MB to 1GB, the traditional zero-copy copying and compared, the results show the improved performance of the zero-copy two to three times.

But even more impressive is that, Kafka pure JVM to achieve this, there is no local library or JNI code.

Avoid garbage collection

Extensive use of the channel, the buffer and page cache as well as an additional benefit - to reduce the work load of the garbage collector.

For example, running on the machine 32 GB RAM 28-30 GB Kafka generated page cache free space, totally outside the scope of the garbage collector.

Differences in throughput is very small (about a few percentage points), but a properly tuned throughput garbage collector can be very high, especially when dealing with short-lived objects. The real income is to reduce jitter.

By avoiding garbage collection, the server is unlikely to meet the program due to garbage collection caused by the suspension, thus affecting the client, increase communication delay records.

Compared with the beginning of Kafka, now avoid garbage collection is not a problem. ZGC such as Shenandoah and modern garbage collector can be extended to a huge, multi-level TB heap, in the worst case, and can automatically adjust the pause time garbage collection, down to a few milliseconds.

Now, you can see a large number of applications based on the Java Virtual Machine heap cache, rather than heap outside the cache.

Parallelism stream processing

Log in I / O performance efficiency is an important aspect of the performance impact is mainly written. Kafka on the theme structure and consumption ecosystem parallel processing is the basis of its reading performance.

This combination produces a very high-end overall message throughput. The concurrency deep into the operating partitioning scheme and user group, which is actually a load-balancing mechanism of Kafka - will be assigned to each partition average consumers.

This compared with conventional message queues: a plurality of consumers can read the concurrent RabbitMQ provided round-robin fashion to the data from the queue, but this will lose the message ordering.

Zoning and mechanisms conducive to horizontal expansion Kafka service side. Each partition has a dedicated leader. Therefore, any important subject of multiple partitions can use the entire server cluster for writing.

This is another difference between the traditional and Kafka message queue. When using the latter to increase cluster availability, Kafka improve availability through load balancing, durability and throughput.

Publishing themes with multiple partitions, partition designated producers publication record. (There may be a single partition theme, it is not a problem)

Can be done directly by specifying the partition index, or indirectly performed by a recording key, a recording key partition index is determined by calculating a hash value. Recording with the same hash value to share the same partition.

Suppose a theme has multiple partitions, then the record with a different key may appear in different partitions.

However, due to hash collision, recorded with a different hash value may end up in the same partition. This is the essence of hash. If you understand the work of the hash table, everything is natural.

The actual recording process is completed by the consumer, the consumer is completed in an alternative group. Kafka guarantee a partition can only be assigned to a consumer group consumer. (Why "most", when all consumers are offline, it is the consumers 0)

When the first group of consumers subscribe to a topic, it will receive all the partitions on that topic. When the second consumers subscribe to a topic, it will receive about half of the partition, so as to reduce the first consumer load.

Add consumers (ideally, using automatic retractable mechanism) as needed, which allows you to handle the flow of events in parallel, provided that you have an event stream is partitioned.

In certain recording controlled in two ways:

① theme partitioning scheme . Theme should be partitioned to maximize the number of event streams. In other words, only the order of the records provided only when absolutely necessary.

If there is no correlation between any two records, they should not be bound to the same partition. This means that you want to use a different key, because Kafka recorded using the hash value as a key based on the partition map.

② the number of consumer groups . You can increase the number of consumers to load balance inbound records, the number of consumers can be increased up to as much as the number of partitions. (You can add more consumers, but each partition can only have a maximum of consumer activity, the rest of the consumers will be idle)

Please note that you can provide a thread pool, depending on the work load of the implementation of the consumer, the consumer may be a process or a thread.

If you want to know why Kafka so fast, how it is done and whether it suits you, I think you now have the answer.

In order to more clearly illustrate the problem, Kafka is not the fastest messaging middleware, throughput is not the greatest. There are other platforms can provide higher throughput - some are software based and some are hardware based.

It is difficult to achieve both high throughput and low delay, Apache Pulsar [1] is a promising technology, scalable, better throughput - delay profile, while providing sequential and durability.

Kafka is the use of reason, as a complete ecosystem, which on the whole is still unmatched.

It shows excellent performance, while providing a rich and sophisticated environment, Kafka still growing at an enviable rate.

Kafka's designers and maintainers have done a lot in the design of a performance-core solutions work. Its design element very few people think it is an afterthought, or a complement of.

From the work load is transferred to the client, to a persistent server logs, batch processing, compression, zero-copy I / O and parallel stream processing --Kafka to challenge any other messaging middleware vendors, whether commercial or open source of.

The most impressive is that it did this, but without sacrificing durability, recording at least one semantic order and distribution.

Kafka is not the easiest of messaging middleware platform, there are many areas for improvement. Before designing and building high-performance event-driven system, we must master the overall order of the topic, the concept of zoning and parts, consumers and consumer groups.

Although the knowledge curve is very steep, but well worth your time to learn. If you know the proverb "red pill" (red pill, in order to bring depth to explore the pursuit of something or choose to think, do not give up, keep going, even if this road and more difficult to walk), read "Introduction Kafka and flow of events Introduction Kafdrop in to event streaming with Kafka and Kafdrop [2] ".

Related Links:

Guess you like

Origin www.cnblogs.com/sug-sams/p/12604070.html