Linux IO mode

User space and kernel space
now are based on a virtual memory operating system, then for 32-bit operating systems, its address space (virtual storage) is (2 to the power 32) 4G. The core of the operating system kernel, independent of the normal application, you can access a protected memory space, but also all access to the underlying hardware. In order to ensure that users can not directly manipulate the process kernel (kernel), to ensure the safety of the kernel, the operating system of the virtual space is divided into two parts, the kernel space and user space. For purposes of linux system, the highest 1G (from the virtual address 0xC0000000 to 0xFFFFFFFF), for the use of the kernel, called the kernel space, while the lower 3G bytes (0x00000000 to 0xBFFFFFFF) for each process to use, called user space .
Process switch
in order to control the implementation process, the kernel must be able to suspend the process running on the CPU, and resume execution of a previously suspended process. This behavior is known as the process of switching. Therefore we can say that any process is run with the support of the operating system kernel, and is closely related to the kernel. Run from a running process to another process, after the process the following changes:
1. Save processor context, including the program counter and other registers.
2. Update PCB (process management block) information.
3. The process of PCB into the appropriate queue, such as Ready, obstruction of an event queue.
4. Choose another process execution and update PCB.
The memory management data structure is updated.
6. The recovery processor context.
All in all is a waste of resources.
Blocked processes
Executing process, due to the occurrence of certain events did not expect, such as system resource request fails, wait for a completion of the operation, new data has not yet reached or no new job to do and so on, by the system automatically performs blocking primitives (block), the blocked by the state to run their own state. Visible, blocked processes is the process itself is an active behavior, and therefore only in the process of running state (get CPU), it may be blocked into its own. When the process enters the blocked state, it is not take up CPU resources.
Linux IO mode
basic concepts of
file descriptor fd (file descriptor) is a term in computer science, it is an abstract representation of the concept to the file for reference. In the form of a non-negative integer. In fact, an index value that points to the kernel process maintained by each process open file table record. When the program opens an existing file or create a new file, the kernel returns a file descriptor to the process.
IO buffer
cache, also known as standard IO IO, IO default operating system is the most file cache IO. IO caching mechanism in Linux, the operating system will IO data is cached in the file system cache pages (page cache), that is, the data will first be copied to the operating system kernel buffer, and then only from operating system kernel buffer is copied to the address space of the application.
IO mode
just said, for once IO access (to read, for example), the data will first be copied to the operating system kernel buffer, and then will be copied from the operating system kernel buffer into the address space of the application. So, when a read operation occurs it will go through two stages:
1. Wait for data preparation
2. copy the data from the kernel to process
because of these two phases, Linux system generates five alternative network model below.
1. blocking IO (blocking IO)
2. Non-blocking IO (nonblocking IO)
3.IO multiplexing (IO Multiplexing)
4. signal for driving IO (Signal Driven IO)
5. The asynchronous IO (Asynchronous IO)
signal IO driver not commonly used in practice, will not be described.
Blocking IO
default all are blocking socket in Linux, a typical read operation process is probably as follows: When a user process calls recvfrom system call, kernel began the first phase of the IO: Prepare data (for network IO is , many times data has not reached at the outset. For example, have not received a complete UDP packet. this time kernel must wait for enough data to come). This process needs to wait, i.e. data is copied to the buffer in the operating system kernel is a process of. In the process the user side, the whole process will be blocked (of course, is to block the process of their choice). When the kernel wait until the data is ready, it will copy data from kernel to user memory, and then returns the result kernel, the user process before lifting the state of the block, up and running again.
Feature is executed in two stages IO are a block.
Nonblocking IO
Linux, the socket can be provided so that it becomes non-blocking. When the read operation is performed on the non-blocking socket, the process is this: When a user process issues a read operation, if the data kernel is not ready, then it does not block the user process, but immediately returns an error. From the perspective of the user process, it initiates a read operation after, you do not need to wait, but immediately get a result. User process is the result of a judgment error, it knows that the data was not ready, so it can send a read operation again. Once the data is ready for kernel and user processes and system call is received again, it will be immediately copied to the user data memory, and then return.
Characterized by a user process need to constantly ask about kernel data ready or not.
Asynchronous IO
user process initiates read operations, you can immediately start to do other things. On the other hand, from the kernel's point of view, when it receives an asynchronous read, first of all it will return immediately, so it will not block any user processes. Then, kernel waits for data preparation is complete, and then copy the data to the user memory, when it's all complete, kernel will give the user process to send a signal, telling it read operation is complete.

IO multiplexing
IO multiplexing is what we usually say that the select, poll, epoll. Benefits select / epoll process is that a single network can handle a plurality of IO connections simultaneously. Its basic principle is to select, poll, epoll this function will continue to be responsible for all polling socket, a socket when the data arrives, and informs the user process.
When a user process called select, then the whole process will be block, while at the same time, kernel will be responsible for monitoring all select the socket, when the data in any of a socket is ready, select returns. This time user process further read operation, the data is copied from kernel to user processes.
IO multiplexing process is characterized by a plurality of file descriptors waiting simultaneously through a mechanism, which file descriptor (socket descriptor) any of which enters a state ready for reading, SELECT () function can be return.
If the processing is not very high number of connections, then use the select / epoll than the web server is not necessarily multi-threading + blocking IO better performance of the web server, the delay may also be greater. Advantages select / epoll is not a single connection for faster processing, but that can handle more connections.
In IO programming, when it is necessary to handle multiple clients access request can be processed using multi-threading or IO multiplexing. IO blocked by multiplexing a plurality of IO multiplexed onto the same blocking select, so that the system can handle the case where a plurality of single-threaded client requests simultaneously. With the traditional multi-threaded / multi-process model than, IO multiplexing biggest advantage is the small system overhead, the system does not need to create new and additional processes or threads do not need to run these maintenance processes and threads, reducing maintenance system efficiency and save system resources, the IO multiplexed main application scenario is as follows:
1. the server need to process the plurality of the plurality of sockets in listening state or a connected state
2. The server needs to handle a variety of network protocols socket


select, poll, epoll difference
select single biggest flaw is that the process is open FD certain restrictions, which consists of FD_SETSIZE set, the default value is 1024. For those who need to support tens of thousands of large-scale server TCP connection is obviously too little. This macro can choose to modify and recompile the kernel, but this will bring down the network efficiency. We can also be solved by selecting multi-process program (traditional Apache program) problem, but although the cost of the creating process on Linux is relatively small, but still can not be ignored, in addition to exchange of data between processes is very difficult, for java because there is no shared memory, needs through socket communication or other means of data synchronization, which brings additional performance loss, increasing program complexity, it is not a perfect solution.
FD epoll no limit, limit the maximum file FD is the operating system that it supports a number of handles, this tree is much larger than 1024 ,. For example, on a machine with 1GB of RAM is about 100 000 handles specific values can be cat / proc / sys / fs / file- max view, this value is usually larger relationship with system memory.
IO epoll efficiency does not increase as the number of FD decreases linearly
select, poll the other socket when a fatal weakness is set much, due to network delays or link is idle, at any one time only a small part of the socket is active, but select, poll every time a set of linear scan all calls, resulting in decreased efficiency linear presentation. epoll not have this problem, it only would be active socket operation because the kernel implementation epoll fd is implemented according to each of the above callback function, then only active socket will trigger callback function.
epoll acceleration using mmap kernel and user space messaging
Whether select, poll or epoll kernel needs to FD message notification to the user space, how to avoid unnecessary memory copy is very important, epoll by kernel and user space mmap (a file or other objects are mapped into memory) with a memory implementation.
The epoll API easier to
include the creation of a epoll descriptor, add an event listener, listening block waiting for the event to close the epoll descriptor, and so on.
Worthy of note is used to overcome the select, poll disadvantage of the method is not only epoll, epoll is just a Linux implementation. There kqueue under FreeBSD.
epoll edge-triggered (ET) and level-triggered (LT)
epoll file descriptor has two operation modes: LT (level trigger) and ET (edge trigger). LT mode is the default mode, the following differences:
LT Mode: When the descriptor epoll_wait detected event and this event notification application, the application may not process the immediate event. The next call epoll_wait, will respond to the application again and notice this event.
ET mode: When epoll_wait detected event descriptor and the event notification application, the application must handle the event immediately. If untreated, the next time you call epoll_wait, will not respond to the application again and notice this event.
ET model epoll largely reducing the number of events triggered repeatedly, so the efficiency is higher than the LT model. epoll works in ET mode, you must use non-blocking sockets, in order to avoid blocking a file handle to read, write blocking file descriptors to handle multiple tasks starve to death.

Guess you like

Origin www.cnblogs.com/caohongchang/p/11588588.html