Linux IO Scheduler (Linux IO scheduler)

Preface:

Partitioning each block of the block device or apparatus, correspond with its own request queue (request_queue), and each request queue can select an I / O scheduler to coordinate the submitted request. I / O scheduler basic object is to be arranged in their corresponding request on the sector number of a block device, to reduce the movement of the magnetic head, to improve efficiency. Request queue in each device request is acknowledged in sequence. In fact, in addition to the queue, each scheduler maintains itself a different number of queues, is used to submit the request up processing, and foremost in the queue will be timely request queue waiting to be moved in response to the request.
Here Insert Picture Description
Kernel IO scheduler implemented in four main -Noop, Deadline, CFG, Anticipatory.

1, Noop algorithm

Noop kernel scheduling algorithm is the simplest IO scheduling algorithm. Noop scheduling elevator dispatcher also known algorithm that IO requests put into a FIFO queue, and then performed one after the IO request, of course, for some continuous IO requests on the disk, Noop appropriate to do some merge algorithm. The scheduling algorithm is particularly suitable for those who do not want to reorganize IO scheduler application order request.

This scheduling algorithm is more obvious advantages in the following scenarios:

1) There are more intelligent scheduling IO device IO scheduler below. If your Block Device Drivers are Raid, or SAN, NAS and other storage devices, which will better organize the IO request, without IO scheduler to do extra work scheduling;

2) the underlying upper application know better than IO device scheduler. Or the upper application requests arrive IO IO scheduler is that it has been carefully optimized, then the IO scheduler does not need to gild the lily, just down the order execution upper convey IO requests can be.

3) For non-rotary head's storage devices, using Noop better. Because for rotary head of the disk, the request IO scheduler restructuring takes some CPU time, but for SSD disks, these recombinant IO requests CPU time can be saved, because the SSD provides a more intelligent scheduling request algorithm, the kernel does not need to gild the lily. The article mentioned the SSD using Noop would be better.

NOOP算法的全写为No Operation。该算法实现了最最简单的FIFO队列,所有IO请求大致按照先来后到的顺序进行操作。之所以说“大致”,原因是NOOP在FIFO的基础上还做了相邻IO请求的合并,并不是完完全全按照先进先出的规则满足IO请求。
假设有如下的io请求序列:
100,500,101,10,56,1000
NOOP将会按照如下顺序满足:
100(101),500,10,56,1000

2, Deadline algorithm

Deadline core algorithm is to ensure that each IO request within a certain period of time must be to the service, in order to avoid a request hunger.

Deadline algorithm introduced four queue, which queue may be divided into four categories, each of the read and write by the composition of two queues, one queue for a request to sort the starting sector number, by red black tree organization called sort_list; another request to sort their generation time, a linked list organization called fifo_list. Whenever it is determined the one direction of transmission (read or write), it will be from the corresponding number of continuous sort_list dispatch request to the request queue requst_queue's, specific numbers determined by fifo_batch. Only three cases will lead to the end of a bulk transfer:

1) corresponding to sort_list has not requested

2) incrementing a request does not meet the requirements of the sector

3) a request is the last request the bulk transfer.

All requests in the generation will be assigned on a time limit value (depending of jiffies), press duration values ​​are sorted in fifo_list, the time limit read request length defaults to 500ms, length defaults to 5s time for a write request, we can see the kernel read request is very eccentric, in fact, not only that, the deadline scheduler, also the definition of a starved and writes_starved, writes_starved default is 2, can be understood as a write request hunger line, the kernel is always a priority read request, starved show after the current read requests processed batch number, only starved than writes_starved, only to consider the write request. Therefore, if the time limit has been exceeded a write request, the request will not necessarily be an immediate response, because the read request batch processing is not finished, even if processed, must wait until starved than writes_starved have a chance to be acknowledged. Why kernel favoritism read request? This is considered as a whole performance. Read relations requests and applications are synchronized, because the application you want to wait to read the content of the finished work to the next step, so read request will block the process, and the write request is not the same, the application issues a write request, the memory when the content of the write block device is not large influence on the program, the scheduler will read request priority.

By default, the timeout read request is 500ms, write request timeout is 5s.

The article said that in some multi-threaded applications, Deadline algorithm is better than CFQ algorithm. The article said that some database applications, Deadline algorithm is better than CFQ algorithm.

DEADLINE在CFQ的基础上,解决了IO请求饿死的极端情况。除了CFQ本身具有的IO排序队列之外,DEADLINE额外分别为读IO和写IO提供了FIFO队列。读FIFO队列的最大等待时间为500ms,写FIFO队列的最大等待时间为5s。FIFO队列内的IO请求优先级要比CFQ队列中的高,,而读FIFO队列的优先级又比写FIFO队列的优先级高。优先级可以表示如下:
FIFO(Read) > FIFO(Write) > CFQ

3, Anticipatory algorithm

Anticipatory core algorithm is the principle of locality, it expects a process after a request last night, will continue to do IO IO request here. In the IO operation, there is a phenomenon called "false idle" (Deceptive idleness), it means a process after a wave had just finished reading operation, appears to be idle, I do not read it, but in fact it is process the data, after the data has been processed, it will then read, this time if the IO scheduler to request a deal with another process, then when the next original fake idle process a request, and the head is again seek to just the location, which greatly increases the seek time and head rotation time. So, Anticipatory algorithm will be done after a read request, then wait for a certain time t (usually 6ms), if within 6ms, this process also read request over, then I continue to serve, otherwise, a process of reading the next processing write requests.

In some scenarios, Antocipatory algorithm will be very effective performance. This article has to say, the article also has a review.

It is worth mentioning that, Anticipatory algorithm from the Linux 2.6.33 version, was removed because the CFQ through configuration can achieve the effect of Anticipatory algorithm.

CFQ和DEADLINE考虑的焦点在于满足零散IO请求上。对于连续的IO请求,比如顺序读,并没有做优化。为了满足随机IO和顺序IO混合的场 景,Linux还支持ANTICIPATORY调度算法。ANTICIPATORY的在DEADLINE的基础上,为每个读IO都设置了6ms的等待时间 窗口。如果在这6ms内OS收到了相邻位置的读IO请求,就可以立即满足。
IO调度器算法的选择,既取决于硬件特征,也取决于应用场景。
在传统的SAS盘上,CFQ、DEADLINE、ANTICIPATORY都是不错的选择;对于专属的数据库服务器,DEADLINE的吞吐量和响应时间都表现良好。然而在新兴的固态硬盘比如SSD、Fusion IO上,最简单的NOOP反而可能是最好的算法,因为其他三个算法的优化是基于缩短寻道时间的,而固态硬盘没有所谓的寻道时间且IO响应时间非常短。

4, CFQ algorithm

CFQ (Completely Fair Queuing) algorithm, by definition, absolute fairness algorithm. It attempts to process all blocks competition device assigns a right to use the request queue and a time slice in a time slice allocated to the scheduler process, the process can read and write requests to the underlying block device, when the process time slice is consumed After the request queue process will be suspended, waiting to be scheduled. IO priority, time slice each process, and each process queue length of each process depends IO process a priority, the CFQ scheduler will consider it as one of the factors to determine the process when the request queue can be acquired right to use the block device. IO highest to lowest priority can be divided into three categories: RT (real time), BE (best try), IDLE (idle), where the RT and BE and can be subdivided into eight sub-priority. In fact, we already know the CFQ scheduler is fair for in terms of process, and only the synchronous request (read or syn write) is for the process exist, they will be placed in the request queue process itself, and all the same priority asynchronous request, whichever comes from the process, will be placed in the public queue, the queue asynchronous request a total of 8 (RT) +8 (bE) +1 (IDLE) = 17 Ge.
Since Linux 2.6.18, CFQ IO scheduling algorithm as the default.
For general-purpose servers, CFQ is a better choice.

CFQ算法的全写为Completely Fair Queuing。该算法的特点是按照IO请求的地址进行排序,而不是按照先来后到的顺序来进行响应。
假设有如下的io请求序列:
100,500,101,10,56,1000
CFQ将会按照如下顺序满足:
100,101,500,1000,10,56
在传统的SAS盘上,磁盘寻道花去了绝大多数的IO响应时间。CFQ的出发点是对IO地址进行排序,以尽量少的磁盘旋转次数来满足尽可能多的IO请求。在 CFQ算法下,SAS盘的吞吐量大大提高了。但是相比于NOOP的缺点是,先来的IO请求并不一定能被满足,可能会出现饿死的情况。
Published 297 original articles · won praise 6 · views 8525

Guess you like

Origin blog.csdn.net/qq_23929673/article/details/96728753