This article is shared from Huawei Cloud Community "Illustrating Kafka Server Network Model", author: Shi Zhenzhen's grocery store.
Kafka's network model
The network model in Kafka is designed based on the master-slave Reactor multi-threading . Before describing the Kafka network model as a whole, we will now explain what they are used for according to the relevant classes in the source code.
Key class analysis
SocketServer
This class is the core class of network communication, it holds the Acceptor and Processor objects.
ConnectionQuotas
This is the class that controls the quota of connections,
The Broker configurations involved are:
Attributes | describe | default |
---|---|---|
max.connections.per.ip | Maximum number of connections from each IP address. If the limit is reached, new links from that IP address will be dropped. | 2147483647 |
max.connections.per.ip.overrides | Set the maximum number of connections for the specified IP or host, you can configure multiple, separated by commas. For example: "hostname: 100,127.0.0.1:200", this configuration will override the max.connections.per.ip configuration | ”“ |
max.connections | The maximum number of connections in the Broker, of course, the limit of the listener level can also be configured, the method is to add a prefix in front of the configuration; for example: listener.name. The specific listener name.max.connections=xxx. It is important to note that even if the broker has reached the limit of the maximum number of connections, it should allow connections on the listeners between the brokers. In this case, the least recently used connection on the other listener will be closed. The listener between brokers is inter.broker.listener.name determined by configuration |
2147483647 |
AbstractServerThread
AbstractServerThread class: This is the abstract base class of Acceptor thread and Processor thread. It defines an abstract method wakeup()
, which is mainly used to wake up the corresponding Acceptor thread and ProcessorSelector
. Of course, there are also some common methods.
Acceptor 和 Processor
Acceptor thread class: inherited from AbstractServerThread , which is the thread that receives and creates external TCP connections. Each SocketServer instance generally creates an Acceptor thread (if listeners
more than one is configured, multiple Acceptors will be created). Its only purpose is to create a connection and pass the received SocketChannel (the SocketChannel channel is used to transmit data) to the downstream Processor thread for processing. The Processor mainly handles things after the connection, such as read and write I/O.
The Broker configurations involved are:
Attributes | describe | default |
---|---|---|
listeners | Listener configuration, you can configure multiple, if you configure a few, you will create several Acceptors | listeners = PLAINTEXT://:9092 |
socket.send.buffer.bytes | SocketServer's SO_SNDBUF buffer. If the value is -1, the operating system default will be used. | 102400(100 kibibytes) |
socket.receive.buffer.bytes | SO_RCVBUF buffer for SocketServer sockets, if the value is -1, the OS default value will be used | 102400 (100 kibibytes) |
num.network.threads | The number of threads for a single Acceptor to create a Processor processor | 3 |
Processor thread class: This is the processing thread that handles all requests on a single TCP connection. Each Acceptor instance creates several ( num.network.threads
) Processor threads. The Processor thread is responsible for the received SocketChannel (SocketChannel channel is used to transmit data.), registering read and write events, when the data is transmitted, it will immediately read the Request data, after parsing, and then add it to the RequestChannel requestQueue
queue. , and is also responsible for returning the Response to the Request sender.
The Broker configurations involved are:
Attributes | describe | default |
---|---|---|
socket.request.max.bytes | The maximum number of bytes in a socket request. | 104857600(100 mebibytes) |
connections.max.idle.ms | The processor thread closes connections that have been idle for longer than this value | 600000 (10 minutes) |
connection.failed.authentication.delay.ms | This is the amount of time (in milliseconds) to delay the connection close when authentication fails. This must be configured to be less than connections.max.idle.ms to prevent connection timeouts. |
100 |
Simply draw a diagram of the relationship between two classes
- These two classes are the implementation classes of AbstractServerThead , and the superclass is
Runnable
runnable. - Each Acceptor holds
num.network.threads
one Processor thread, if multiple are configuredlisteners
, then the total number of Processor threads islisteners
*num.network.threads
. - Acceptor creates a ServerSocketChannel channel, which is a channel used to monitor new incoming TCP connections.
Through theserverSocketChannel.accept()
method , the SocketChannel channel can be obtained for data transmission. - Each Processor thread has a unique id, and the SocketChannel obtained through the Acceptor will be temporarily put into the queue
newConnections
- Each Processor creates its own Selector
- Processor will continuously obtain new SocketChannel from its own
newConnections
queue , and register read and write events. If data is transmitted, it will read the data and parse it into a Request request.
Since both are executable threads, let's see what the run
methods of the two threads do
Acceptor.run
def run(): Unit = {
//将serverChannel 注册到nioSelector上,并且对 Accept事件感兴趣:表示服务器监听到了客户连接,那么服务器可以接收这个连接了
serverChannel.register(nioSelector, SelectionKey.OP_ACCEPT)
try {
var currentProcessorIndex = 0
while (isRunning) {
try {
//返回感兴趣的事件数量 这里是感兴趣的是SelectionKey.OP_ACCEPT,监听到新的链接
val ready = nioSelector.select(500)
if (ready > 0) {
//获取所有就绪通道
val keys = nioSelector.selectedKeys()
val iter = keys.iterator()
//遍历所有就绪通道
while (iter.hasNext && isRunning) {
try {
val key = iter.next
iter.remove()
//只处理 Accept事件,其他的事件则抛出异常,ServerSocketChannel是 监听Tcp的链接通道
if (key.isAcceptable) {
//根据Key 拿到SocketChannle = serverSocketChannel.accept(),然后再遍历
accept(key).foreach { socketChannel =>
//将socketChannel分配给我们的 processor来处理,如果有多个socketChannel 则按照轮训分配的原则
//如果一个processor 中能够处理的newconnection 队列满了放不下了,则找下一个
// 如果所有的都放不下,则会一直循环直到有processor能够处理。
var retriesLeft = synchronized(processors.length)
var processor: Processor = null
do {
retriesLeft -= 1
//轮训每个processors来处理
processor = synchronized {
// adjust the index (if necessary) and retrieve the processor atomically for
// correct behaviour in case the number of processors is reduced dynamically
currentProcessorIndex = currentProcessorIndex % processors.length
processors(currentProcessorIndex)
}
currentProcessorIndex += 1
} while (!assignNewConnection(socketChannel, processor, retriesLeft == 0))
}
} else
throw new IllegalStateException("Unrecognized key state for acceptor thread.")
} catch {
case e: Throwable => error("Error while accepting connection", e)
}
}
}
}
catch {
省略
}
}
} finally {
省略
}
}
-
Register the ServerSocketChannel channel to nioSelector and pay attention to the event SelectionKey.OP_ACCEPT
serverChannel.register(nioSelector, SelectionKey.OP_ACCEPT)
-
while loop, continue to block listening events, timeout 500ms
// 阻塞查询Selector是否有监听到新的事件 val ready = nioSelector.select(500) // 如果有事件,则查询具体的事件和通道 if(ready>0>{ //获取所有就绪事件准备处理 val keys = nioSelector.selectedKeys() }
-
Traverse the events just listened to, if the SelectionKey does not contain a
OP_ACCEPT
(connection establishment) event, an exception will be thrown, which usually does not occur.Unrecognized key state for acceptor thread
-
If the SelectionKey contains the
OP_ACCEPT
(connection establishment) event, you can get the serverSocketChannel through this SelectionKey, get the socketChannel through the serverSocketChannel , and set the SocketChannel to non-blocking mode.val serverSocketChannel = key.channel().asInstanceOf[ServerSocketChannel] // 调用accept方法就可以拿到ScoketChannel了。 val socketChannel = serverSocketChannel.accept() //设置为非阻塞模式 就可以在异步模式下调用connect(), read() 和write()了。 socketChannel.configureBlocking(false)
-
Next, give the SocketChannel obtained above to the Procesor under the Acceptor in the form of traversal , and let the Processor perform the subsequent processing. The embodiment of the allocation is to save the obtained SocketChannel in the blocking queue in the Processor . The upper limit is 20, which is hard-coded in the code, that is to say, a Processor can only process up to 20 connections at the same time, then all the The maximum number of connections that the Processor can handle is the number of processors * 20; if your connection request is highly concurrent, you can try to increase it
newConnections
newConnections
num.network.threads
-
Finally, if
newConnections
the queue is put into a new SocketChannel, thewakeup()
method corresponding to the Processor instance will be called.
Procesor.run
override def run(): Unit = {
startupComplete()
try {
while (isRunning) {
try {
// setup any new connections that have been queued up
// 将之前监听到的TCP链接(暂时保存在newConnections中) 开始注册监听OP_READ事件到每个Processor的 KSelector选择器中。
configureNewConnections()
// register any new responses for writing
processNewResponses()
//在不阻塞的情况下对每个连接执行任何 I/O 操作。这包括完成连接、完成断开连接、启动新发送或在进行中的发送或接收上取得进展。
// 当此调用完成时,用户可以使用completedSends() 、 completedReceives() 、 connected() 、 disconnected()检查已完成的发送、接收、连接或断开连接。
poll()
// 把请求解析后放到 requestChannels 队列中,异步处理
processCompletedReceives()
//处理已经发送完成的请求
processCompletedSends()
processDisconnected()
closeExcessConnections()
} catch {
// We catch all the throwables here to prevent the processor thread from exiting. We do this because
// letting a processor exit might cause a bigger impact on the broker. This behavior might need to be
// reviewed if we see an exception that needs the entire broker to stop. Usually the exceptions thrown would
// be either associated with a specific socket channel or a bad request. These exceptions are caught and
// processed by the individual methods above which close the failing channel and continue processing other
// channels. So this catch block should only ever see ControlThrowables.
case e: Throwable => processException("Processor got uncaught exception.", e)
}
}
} finally {
debug(s"Closing selector - processor $id")
CoreUtils.swallow(closeAll(), this, Level.ERROR)
shutdownComplete()
}
}
-
configureNewConnections()
: The SocketChannel monitored by the Acceptor before is stored in the blocking queue in the Procesor , and now the blocking queue is taken out one by one, and the SocketChannel channel is registered with the Selector of the Procesor , and the event of interest is the read event.newConnections
newConnections
SelectionKey.OP_READ
-
processNewResponses()
: Go to the unbounded blocking queue in the Processor toresponseQueue
get the RequestChannel.Response data. If there is data and you need to return a Response, the data is returned through the channel. The specific Channel is to obtain the previously constructed KafkaChannel according to the connectionId, and the KafkaChannel will listen to the SelectionKey. OP_WRITE . Then call thewriteTo
method.
As forresponseQueue
when this queue entered the team, we will analyze it later -
poll()
: There is a lot of execution in this method. The bottom layer of this method isselector.poll()
to process the monitored events in batches. It is the final place to perform I/O requests. It is performing any I/O operations on each connection. , which includes completing the connection, completing the disconnection, starting a new send, etc.
Like verifying identity information, as well as handshake, etc., these are also performed here. -
processCompletedReceives()
: Process all completedReceives (received requests that have been completed) for the next processing. The way of processing is to parse the received requests and finally call themrequestChannel.sendRequest(req)
. That is to say, all requests are finally put into the blocking queue in RequestChannelrequestQueue
through parsing , The size of this blocking queue isqueued.max.requests
500 by default; it represents the number of queued requests allowed by the data plane before blocking the network thread
PS: ThiscompletedReceives
ispoll()
an element added in the method. -
processCompletedSends():
It is responsible for processing the callback logic of the Response. By traversing thecompletedSends
(completed sending) collection, it caninflightResponses
be removed from it and get the response object, and then call the callback logic.
PS: ThiscompletedSends
ispoll()
the element added in the method. -
processDisconnected():
In the case of disconnected links, connectionQuotas connection current limit reduces the link, and inflightResponses also removes the corresponding connection. -
closeExcessConnections():
Close over-limit connections . When the total number of connections >max.connections
&& (inter.broker.listener.name!=listener|| number of listeners == 1), some connections need to be closed.
Simply put: even if the Broker has reached the maximum number of connections Restrictions should also allow connections on listeners between brokers, in which case the least recently used connection on the other listener will be closed. The listener between brokers isinter.broker.listener.name
determined by configuration. The
so-called priority shutdown refers to finding the one that has not been used recently among many TCP connections. Here "unused" means that in the recent period, no Request has been sent to the Processor thread via this connection.
RequestChannel
This class holds all the Processors, and a blocking queue holds the pending requests. The maximum length of this queue is queued.max.requests
controlled, and when the number of pending requests exceeds this value, the network will block
The Broker configurations involved are:
Attributes | describe | default |
---|---|---|
queued.max.requests | The number of queued requests allowed by DataPlane before blocking the network thread | 500 |
KafkaApis
The specific Request processing class, all request method processing logic is placed in this.
KafkaRequestHandlerPool
The thread pool of KafkaRequestHandler, the number of KafkaRequestHandler threads is determined by configuration num.io.threads
.
The Broker configurations involved are:
Attributes | describe | default |
---|---|---|
num.io.threads | The number of threads the server uses to process requests, possibly including disk I/O | 8 |
KafkaRequestHandler
Request processing class, each Handler will poll the request in the requestQueue queue of the requestChannel, and then process it. The final processing method called is KafkaApis.handle()
The relationship between these classes is as follows
Summary of Communication Process
- When KafkaServer starts, it will
listeners
initialize the corresponding instance according to the configuration. - One
listeners
corresponds to one Acceptor, and one Acceptor holds several (num.network.threads
)Processor instances. - The nioSelector in the Acceptor registers the ServerSocketChannel channel and listens to the OP_ACCEPT event. It is only responsible for TCP creation and connection, and does not contain read and write data.
- When the Acceptor listens for a new connection, it will
socketChannel = serverSocketChannel.accept()
get the SocketChannel by calling, and then save the SocketChannel in thenewConnection
queue in the Processor. So which Processor is it stored in? Of course, it is round-robin allocation to ensure load balancing.newConnection
Of course, the maximum queue of each Processor is only 20, and the code is hard-coded. If a Processor is full, it will look for the next storage, and if all are full, it will block. The maximum number of requests that all processors of an Acceptor can process concurrently is20 * num.network.threads
. - The Processor will continue
newConnection
to poll data from its own medium, and after getting the SocketChannel, it will register it in its own Selector, and listen to the event OP_READ. If itnewConnection
is empty, the timeout of poll is 300ms. - When listening to a new event, compare READ, it will read the data, parse it into Request, and put Request into the
requestQueue
blocking queue in RequestChannel. All pending requests are temporarily placed here. This queue also has a maximum valuequeued.max.requests(默认500)
, beyond which it will block. - Many ( ) KafkaRequestHandlers are created in
num.io.threads(默认8)
KafkaRequestHandlerPool to process Requests, and they will alwaysrequestQueue
poll new Requests from the queue in RequestChannel for processing. - The specific logic for processing Request is in KafkaApis. When the Request is processed, requestChannel.sendResponse() will be called to return the Response.
- Of course, the request request and the returned response must be in one-to-one correspondence. Which processor listens to your request, which processor needs to return, and they are identified by id.
- The Response is not returned inside, but is first placed in the ResponseQueue queue in the Processor, and then slowly returned to the client.
DataPlane
The data panel is used to deal with the network model module between the Broker and the Broker/Client, as opposed to the controller panel
The controller panel is a network communication module specially used between the Controller and the Broker.
In fact, they are all the same in essence, but in order to isolate the communication of the Controller from the ordinary communication, there are only two concepts.
The above network communication model is analyzed by the data panel, because the essence is the same, but some configurations are different.
Then. The data panel will not be discussed in detail. We mainly talk about the different parts of the controller panel .
Controller Plane
The controller panel is an independent communication module dedicated to handling Controller-related requests.
As we all know, the Controller is a very important role. Basically, most of the related requests for coordinating the entire cluster are related to it, such as creating a topic, deleting a topic, redistributing partition copies, and so on. they are all important
However, in general, there are many requests for the data panel . If the controller-related requests are blocked and cannot be executed due to too many requests, it may cause some impact, so we can make the requests of the Controller class have a separate communication module.
First of all, to enable the controller panel, it must be configured control.plane.listener.name
. And this listener name must listeners
have a configuration in it
Otherwise, there will be no dedicated controller link to the EndPoint.
For example:
Broker configuration
## 所有的监听器
isteners = INTERNAL://192.1.1.8:9092, EXTERNAL://10.1.1.5:9093, CONTROLLER://192.1.1.8:9094
## 监听器对应的安全协议
listener.security.protocol.map = INTERNAL: PLAINTEXT, EXTERNAL:SSL, CONTROLLER:SSL
## 控制器
control.plane.listener.name = CONTROLLER
On startup, the proxy will start listening on "192.1.1.8:9094" using the security protocol "SSL".
On the controller side, when it discovers an endpoint published by the broker via zookeeper, it will use control.plane.listener.name to find the endpoint, which it will use to establish a connection to the broker.
- Must be configured
control.plane.listener.name
to use standalone controller panel - The RequestChannel in the controller panel is
requestQueue
notqueued.max.requests
controlled, but is hard-coded 20. Because the control class request will not have so much concurrency - It is isolated from DataPlane and does not affect each other. However, the connection current limit ConnectionQuotas is shared. When the current limit is used, the two are counted together.
- The control panel has only one Acceptor and one Processor. The difference between this and the data panel is that DataPlane can have multiple processors.
The Broker configurations involved are:
Attributes | describe | default |
---|---|---|
control.plane.listener.name | The listener name of the individual controller panel. If configured, there will be an independent dedicated communication module for Controller-related requests | null |
Above, we mainly analyzed the network communication model in Kafka, so smart you should be able to see that it is implemented using the Reactor mode in the threading model .
Threading Model: Reactor Pattern
For details of this module, please refer to Reactor Model
Reactor 模式,是指通过一个或多个输入同时传递给服务处理器的服务请求的事件驱动处理模式。
服务端程序处理传入多路请求,并将它们同步分派给请求对应的处理线程,Reactor 模式也叫 Dispatcher 模式。
即 I/O 多路复用统一监听事件,收到事件后分发(Dispatch 给某进程),是编写高性能网络服务器的必备技术之一。
根据 Reactor 的数量和处理资源池线程的数量不同,有 3 种典型的实现:
-
单 Reactor 单线程;
-
单 Reactor 多线程;
-
主从 Reactor 多线程。
我们主要了解一下 主从Reactor 多线程
针对单 Reactor 多线程模型中,Reactor 在单线程中运行,高并发场景下容易成为性能瓶颈,可以让 Reactor 在多线程中运行。
方案说明:
-
Reactor 主线程 MainReactor 对象通过 Select 监控建立连接事件,收到事件后通过 Acceptor 接收,处理建立连接事件;
-
Acceptor 处理建立连接事件后,MainReactor 将连接分配 Reactor 子线程给 SubReactor 进行处理;
-
SubReactor 将连接加入连接队列进行监听,并创建一个 Handler 用于处理各种连接事件;
-
当有新的事件发生时,SubReactor 会调用连接对应的 Handler 进行响应;
-
Handler 通过 Read 读取数据后,会分发给后面的 Worker 线程池进行业务处理;
-
Worker 线程池会分配独立的线程完成真正的业务处理,如何将响应结果发给 Handler 进行处理;
-
Handler 收到响应结果后通过 Send 将响应结果返回给 Client。
更详细的介绍可以看 Reactor 模型
问答
- Kafka的网络模型使用了Reactor模式的哪种实现方式?
-
单 Reactor 单线程;
-
单 Reactor 多线程;
-
主从 Reactor 多线程。
答案: 3 。 使用了主从Reactor多线程的实现方式.
MainReactor(Acceptor)只负责监听OP_ACCEPT事件, 监听到之后把SocketChannel 传递给 SubReactor(Processor), 每个Processor都有自己的Selector。SubReactor会监听并处理其他的事件,并最终把具体的请求传递给KafkaRequestHandlerPool。
很典型的主从Reactor多线程模式。
- 什么是ControllerPlane(控制器面板),什么是DataPlane(数据面板)?
控制器面板: 主要处理控制器类的的请求
数据面板: 主要处理数据类的请求。
让他们隔离,互不影响,比如说普通的请求太多,导致了阻塞, 那么Controller相关的请求也可能被阻塞了,所以让他们隔离,不会互相影响。
但是默认情况下, ControllerPlane是没有设置的,也就是Controller相关的请求还是走的DataPlane。 想要隔离的话必须设置control.plane.listener.name
.
- 必须配置
control.plane.listener.name
- 控制器面板的RequestChannel中的
requestQueue
不是由queued.max.requests
控制的,而是写死的 20. 因为控制类请求不会有那么大的并发 - 跟DataPlane相关隔离,互不影响。但是连接限流ConnectionQuotas是共享的,限流的时候,两个是算在一起的
- 控制类面板只有一个Acceptor和一个Processor,这个跟数据面板的区别是 DataPlane的Processor可以有多个。
- Kafka整个请求流程是什么样子的
请看上面网络通信总结部分。
Click Follow to learn about HUAWEI CLOUD's new technologies for the first time~