"Linux Kernel Design and Implementation" Reading Notes-Block IO Layer

Block device

The hardware devices in the system that can randomly access fixed-size data slices become block devices . Such as hard disk, floppy disk, CD, flash memory, etc.

Another type of basic device is a character device. Such as keyboard, serial port, etc.

The smallest addressing unit of a block device is a sector.

The sector size is an integer multiple of 2, and is generally 512 bytes.

The size of the sector is the physical attribute of the device, and the sector is the basic unit of all block devices.

All disk operations performed by the kernel are performed in blocks .

A block cannot be smaller than a sector, it can only be several times the size of a sector.

The block cannot exceed the length of one page.

The block size is usually 512 bytes, 1KB and 4KB.

The relationship between blocks and sectors:

 

Buffer

When a block is transferred into memory, it must be stored in a buffer .

The buffer is equivalent to the representation of the disk block in memory.

A page can hold one or more blocks in memory.

Each buffer has a corresponding descriptor, represented by the buffer_head structure, called the buffer header (include\linux\buffer_head.h):

struct buffer_head {
    unsigned long b_state;      /* buffer state bitmap (see above) */
    struct buffer_head *b_this_page;/* circular list of page's buffers */
    struct page *b_page;        /* the page this bh is mapped to */
    sector_t b_blocknr;     /* start block number */
    size_t b_size;          /* size of mapping */
    char *b_data;           /* pointer to data within the page */
    struct block_device *b_bdev;
    bh_end_io_t *b_end_io;      /* I/O completion */
    void *b_private;        /* reserved for b_end_io */
    struct list_head b_assoc_buffers; /* associated with another mapping */
    struct address_space *b_assoc_map;  /* mapping this buffer is
                           associated with */
    atomic_t b_count;       /* users using this buffer_head */
};

b_state : indicates the state of the buffer, which is a combination of one or more of the following flags.

There is also a bh_state is BH_PrivateStart, this flag is not a flag of the available state, it is used to indicate the start bit that can be used by other codes.

b_count : indicates the use count of the buffer, which can be increased or decreased by get_bh() and put_bh(). Before operating the buffer header, use the get_bh() function to increase the reference count of the buffer header to ensure that the buffer header does not recur Was assigned.

b_blocknr : Represents the physical disk block corresponding to the buffer.

b_bdev : Specify the logical block number in the block device.

b_page : Represents the physical page of memory corresponding to the buffer.

b_data : Point to the corresponding block.

b_size : Indicates the size of the block.

The buffer header describes the mapping relationship between the disk block and the physical memory buffer.

 

bio

The basic container for block IO operations in the kernel is represented by the bio structure.

This structure represents the on-site (active) block IO operations organized in the form of a segment linked list.

A fragment is a small contiguous memory buffer, so there is no need to guarantee that a single buffer must be contiguous.

The buffer is described by fragments. Even if the buffer is scattered in multiple locations in the memory, the bio structure can guarantee the execution of IO operations to the kernel.

The bio structure is as follows (include\linux\bio.h):

struct bio {
    sector_t        bi_sector;  /* device address in 512 byte
                           sectors */
    struct bio      *bi_next;   /* request queue link */
    struct block_device *bi_bdev;
    unsigned long       bi_flags;   /* status, command, etc */
    unsigned long       bi_rw;      /* bottom bits READ/WRITE,
                         * top bits priority
                         */
    unsigned short      bi_vcnt;    /* how many bio_vec's */
    unsigned short      bi_idx;     /* current index into bvl_vec */
    /* Number of segments in this BIO after
     * physical address coalescing is performed.
     */
    unsigned int        bi_phys_segments;
    unsigned int        bi_size;    /* residual I/O count */
    /*
     * To keep track of the max segment size, we account for the
     * sizes of the first and last mergeable segments in this bio.
     */
    unsigned int        bi_seg_front_size;
    unsigned int        bi_seg_back_size;
    unsigned int        bi_max_vecs;    /* max bvl_vecs we can hold */
    unsigned int        bi_comp_cpu;    /* completion CPU */
    atomic_t        bi_cnt;     /* pin count */
    struct bio_vec      *bi_io_vec; /* the actual vec list */
    bio_end_io_t        *bi_end_io;
    void            *bi_private;
#if defined(CONFIG_BLK_DEV_INTEGRITY)
    struct bio_integrity_payload *bi_integrity;  /* data integrity */
#endif
    bio_destructor_t    *bi_destructor; /* destructor */
    /*
     * We can inline a number of vecs at the end of the bio, to avoid
     * double allocations for a small number of bio_vecs. This member
     * MUST obviously be kept at the very end of the bio.
     */
    struct bio_vec      bi_inline_vecs[0];
};

bi_io_vec points to a linked list of bio_vec structure diagrams, which contains all the fragments (include\linux\bio.h) that a specific IO operation lock needs to use:

struct bio_vec {
    struct page *bv_page;
    unsigned int    bv_len;
    unsigned int    bv_offset;
};

bv_page is the physical page where the fragment is located, bv_len is the offset address of the block in the physical page, and bv_offset is the length of the block starting at the given offset. It can be seen that the block described by bio does not need to be stored continuously.

The number of bio_vec pointed to by bi_io_vec indicated by bi_vcnt .

bi_idx is the current index pointing to bi_io_vec .

bi_cnt records the reference count of the bio structure. When it is 0, the bio structure should be revoked.

Each block IO request is represented by a bio structure.

 

IO scheduling

Block devices store their pending block IO requests in the request queue, which is represented by the structure request_queue .

Each request request can be composed of multiple bio structures.

Disk addressing is one of the slowest operations in the entire computer.

In order to optimize the addressing operation, the kernel will neither simply follow the order in which the requests are received nor submit them to the disk immediately.

The pre-operation of merging and sorting will be performed first .

The IO scheduler is responsible for submitting IO requests, and its job is to manage the request queue of the block device.

IO scheduling algorithm is similar to elevator, so it is also called elevator scheduling.

There are currently four types of scheduling:

  • Linus elevator IO scheduling
  • Deadline IO scheduling
  • Completely fair queued IO scheduling
  • No operation IO scheduling

 

Guess you like

Origin blog.csdn.net/jiangwei0512/article/details/106150991