High-performance network communication theory


High-performance network communication theory

Foreword

Netty had wanted to learn and explore the source code, but many of the underlying principles of knowledge and clear understanding of things before writing netty, then the effect of the learning network communications framework will be more effective.

Data in this chapter explore some of the underlying principles necessary knowledge and high-performance network operating system-related communications framework. Before discussing how to do, why should we first discuss next.

With the rapid development of the Internet, the amount of users exponentially increase, the popularity of the original PC to the current popularity of mobile devices. Subscribers are millions or even billion to the cost per unit, especially instant messaging software, real-time interactive online applications appear, from the original number of online users to tens of hundreds of thousands or even tens of millions later. Performance bottlenecks and network traffic bottlenecks single services rendered slowly. Application architecture from a single application to application data separation, to the highly available distributed cluster architecture. Lack of performance can be achieved by a single service level service clusters way to build an extension, application performance bottlenecks are a good solution. But the scale has brought direct economic costs.

A high-performance network communications framework requires careful design from hardware to the operating system kernel and user mode. Access from the underlying I / O, the kernel of the operating system I / O model, thread scheduling, and users will need well-designed framework, as long as there is any place where omission will appear short-board effect.

I / O access

When we read the data socket, although the code we just call a Readoperation, but the actual operating system level to do a lot of things. First, the operating system needs to switch from user mode to kernel mode, the processor operates the NIC driver controller card, controller card, the control card.

The processor does not directly control hardware.

In order to improve CPU utilization, I / O access method also changed a lot.

  1. Early CPU directly controls the peripheral devices, then the controller increases or I / O module. A processor to perform I / O instructions by sending I / O module command. However, programmable I / O does not notify the processor I / O, the processor thus need to regularly check the I / O module status, busy waiting for it, so the efficiency is not high.
  2. Later CPU supports interrupt mode, without waiting for the processor to perform I / O operations, an interrupt signal the I / O operation to complete generated by the interrupt controller, greatly improving the utilization efficiency of the processor. In this case the I / O operation using a specific in / out (I / O port), or instructions directly to memory read and write mode (memory-mapped I / O). However, these methods require the use of a processor I / O registers accessible by one memory cell, the efficiency is not high, the CPU clock cycles while I / O operations required to be consumed.
  3. To improve efficiency, the DMA controller then increases, it can be obtained from the analog processing control of the memory bus, to read and write I / O's. When the processor control to the DMA controller, the processor will let the DMA I / O data into the hardware I / O hardware buffer, the DMA controller may then begin transmitting the data. In this process without consuming processor cycles. When the DMA operation is complete, it will notify the processor by interrupting the operation.

20191123212543.png

I / O access trends is to minimize interference processor I / O operations, freed from the CPU to I / O tasks, so that the processor can do other things, to improve performance.

For I / O access Interested students can see "Operating System and the essence of design principles (5th Edition)" Chapter XI I / O management-related content and "WINDOWS core principles and implementation of" Chapter VI I / O-related discussion content

I / O model

Before discussing the I / O model, leads to a problem C10K first called. In the early I / O model used is the synchronization model for blocking, when receiving a new TCP connection, it is necessary to assign a thread. Therefore, the increase with the increase of connection threads, frequent memory copy, context switching performance cost lead to poor performance. So how to make the number of concurrent connections reaches 10K become a stand-alone network communications developers hot topic of discussion.

Synchronous blocking

As mentioned earlier, in the most primitive I / O model, file read and write data to the device synchronization wait operating system kernel, the file even if the device does not read data, the thread is blocked live free CPU While blocking always cycle, but if you need to support concurrent connections, you must enable a large number of threads that each connection a thread. This essential cause a lot of thread context switching, concurrent with the amount of increased performance is getting worse.

select models / poll model

In order to solve synchronous blocking performance problem caused by too many threads, non-blocking synchronization scheme generated. A continuous thread through an array of file handles determine whether the file is ready device, so you do not need to wait for each thread synchronization, reducing the number of threads, reducing the performance loss caused by the thread context switching, improved thread utilization. This approach is also referred to as I / O multiplexing. However, because the array is an array of length of a maximum (linux default is 1024), and select the model needs to traverse the array, the time complexity is \ (O _ {(n) } \) so that when a high concurrency time, select the model performance will be getting worse.

A similar poll model and select model, but it uses a linked list rather than an array of memory storage, solves the limitation of concurrent upper limit, but did not solve the fundamental problem of high concurrency select models underneath.

epoll model

In linux2.6 support the model epoll, epoll model to solve the performance bottleneck select models. It event by registering a callback when the data can be read, by adding it to the callback mode, it is added to an event queue read. So that each user does not need to traverse all handles acquisition, time complexity is reduced to \ (O _ {(. 1)} \) . Therefore epoll will not increase the amount of concurrency and performance. With the C10K problem epoll model has the perfect solution.

Asynchronous I / O model

Several models are speaking in front of synchronous I / O model, asynchronous I / O model refers to a data write occurs when completely blocked waiting for synchronization, in other words that the data transfer from the card to the user space processes completely asynchronous, not blocking CPU. To distinguish synchronous I / O and asynchronous I / O is described in more detail, the following give a practical example.

When the application needs to read data from the card, a user first needs to allocate memory space to store data to be read. Operating system kernel buffer to read the data card calls to the kernel space buffer, and then copied to the user space. In this process, synchronous blocking I / O in the data read will be blocked until the user space, synchronous non-blocking I / O only know the data is ready, but the copies from kernel space to user space buffer, the thread will still be obstruction. An asynchronous I / O model upon receiving the I / O completion notification data has been transmitted to the user space. Thus the entire I / O operations are completely asynchronous, asynchronous I / O model performance is optimal.

20191124140638.png

In my other article "Windows core principles - synchronous and asynchronous IO IO" on the windows operating system I / O makes a brief description of the principle, interested students can look.

I / O thread model

From the common thread model threading model has Reactor model and Proactor model, either threading model uses I / O multiplexing technology, using a thread I / O read and write operations into read-write event, we will this thread is called demultiplexer.

Corresponding to the I / O model, Reacor model belongs synchronous I / O model, Proactor model is an asynchronous I / O model.

Reactor model

In Reactor, the need to register the event ready event, the card data is received, DMA transfers data from the buffer to the card kernel buffer, it will notify the demultiplexer read event is ready, then we need to read from kernel space take the user to the space.

Synchronous I / O using the buffered I / O manner, a first request from the kernel memory space for storing the input or output buffer, data is cached in the first buffer.

Proactor model

Proactor model, you need to register I / O completion events, apply for a user space for storing data to be received. Invoke a read operation, when the card data is received, the DMA transfer data from the card directly to the user buffer buffer, and then generates the completion notification, the read operation is completed.

Asynchronous I / O input direct I / O or direct output I / O, user buffer address is passed to the device driver, the data buffer will be read or written directly to the user buffer directly from the user, as compared to buffer I / O reducing memory copy.

to sum up

In this paper, I / O access mode, I / O model, threading model explains the three operating systems for high performance I / O What do something, through efficient CPU utilization, reduce memory replication is the key point to improve performance.

Reference Documents

  1. Getting Started: by far the most thorough of Netty high-performance analytic principles and framework architecture
  2. High-performance network programming (b): a 10-year, well-known problem of concurrent connections C10K
  3. NIO's empty polling epoll bug
  4. Two kinds of efficient server design model: Reactor model and Proactor
  5. TCP transmit and receive buffers
  6. IDE (Integrated Drive Electronics)
  7. "Operating System and the essence of design principles (5th Edition)"
  8. "Principle and implementation WINDOWS kernel"

Source: https://www.cnblogs.com/Jack-Blog/p/11923838.html
Author: Jiege busy
As used herein, "CC BY 4.0" Creative Commons agreement. Welcome to reprint, please indicate the source and link in a prominent location.

Guess you like

Origin www.cnblogs.com/Jack-Blog/p/11923838.html