Operating System 17: External memory organization and file storage management

Table of contents

1. Organization of external storage

(1) Continuous organization

(2) Link organization

2.1 - Implicit linking

2.2 - Explicit linking

(3) Index organization method

3.1 - Single-level index organization

3.2 - Multi-level index organization

3.3 - Incremental index organization

2. Management of file storage space

(1) Free list method and free linked list method

1.1 - Free List Method

1.2 - Free list method

(2) bitmap method

(3) group link method

3.1 - Organization of free blocks

3.2 - Allocation and reclamation of free disk blocks


1. Organization of external storage

        The physical structure of the file is directly related to the organization of the external storage. For different storage organization methods, different file physical structures will be formed. Currently commonly used external storage organization methods are:

  • Continuous organization . When files are organized in a continuous manner, a continuous disk space is allocated for each file , and the physical structure of the files thus formed will be a sequential file structure.
  • How the links are organized . When the link organization method is used for files, discontinuous disk space can be allocated for each file , and all disk blocks of a file are linked together through link pointers, thus forming a linked file structure.
  • Index organization . When the index organization method is adopted for files , an index file structure will be formed.

//The advantages and disadvantages of various file organization methods can be compared to the advantages and disadvantages of the corresponding data structure performance, such as arrays, linked lists, Hash tables, etc.

(1) Continuous organization

        The continuous organization method is also called the continuous allocation method, which requires a group of adjacent disk blocks to be allocated for each file . Usually they are all on one track, and the head does not have to be moved when reading/writing . When using the sequential organization method, the records in the logical file can be sequentially stored in adjacent physical disk blocks. The file structure formed in this way is called a sequential file structure , and the physical file at this time is called a sequential file . //Generate sequential files to ensure that the logical record sequence is consistent with the sequence of disk blocks

This organization mode ensures that the record order         in the logical file is consistent with the order in which the files occupy disk blocks in the storage . In order for the system to find the address where the file is stored, the disk block number where the first record of the file is located and the file length (in disk blocks) should be recorded in the "file physical address" field of the directory entry.

        The main advantage of the serial organization method: easy and fast access to the sequence . The disk blocks occupied by the files loaded by continuous allocation may be located on one or several adjacent tracks, and the moving distance of the magnetic head is the least . Therefore, the speed of this file access is one of several storage space allocation methods The tallest one. //Fastest

        The main disadvantages of the serial organization method:

  • Allocating continuous storage space will generate many external fragments , which seriously reduces the utilization of external storage space. If the compact method is regularly used to eliminate debris, it will take a lot of machine time.
  • The length of the file must be known in advance . The size of the file can only be estimated sometimes. Generally, the file length is estimated to be larger than the actual one (insufficient prevention), resulting in waste.
  • There is no flexibility to delete and insert records . In order to maintain the order of the file, when deleting and inserting records, it is necessary to physically move adjacent records, and dynamically change the size of the file.

(2) Link organization

        When the link organization method is used, multiple discontinuous disk blocks can be allocated to the file , and then multiple discrete disk blocks belonging to the same file are linked into a linked list through the link pointer on each disk block, thus forming The physical files are called linked files.

        The main advantages of the link organization method are: //Solve the shortcomings of the continuous allocation method

  • The external fragmentation of the disk is eliminated, and the utilization rate of the external memory is improved.
  • It is very easy to insert, delete and modify records.
  • It can adapt to the dynamic growth of files without knowing the size of files in advance.

        Linking methods can be divided into two forms: implicit linking and explicit linking .

2.1 - Implicit linking

        When using the implicit link organization method, each directory entry in the file directory must contain pointers to the first and last disk blocks of the linked file , and each disk block must contain a pointer to the next A pointer to the block. As shown below:

// The link sequence shown in the figure above is: 9->16->1->10->25, a total of 5 disk blocks are used

        If a pointer occupies 4 bytes, only 508 bytes per block are available to the user for a disk with a block size of 512 bytes. // Too many pointers, wasting disk space

        The main problem with the implicit link organization is that it is only suitable for sequential access, and it is extremely inefficient for random access . In addition, linking a large number of discrete disk blocks only through link pointers is less reliable , because as long as any one of the pointers fails, the entire chain will be disconnected. //Random access is inefficient + poor reliability

        Ideas for improving disk space occupied by pointers:

        In order to improve the retrieval speed and reduce the storage space occupied by pointers, several disk blocks can be formed into a cluster , and disk blocks are allocated in units of clusters , but this method increases internal fragmentation . //It is to bundle multiple disks together and use a pointer, a compromise version of continuous allocation and chain allocation

2.2 - Explicit linking

        The pointers used to link each physical block of the file are explicitly stored in a link table in the memory , and only one table is set in the entire disk. //The pointer is no longer implicitly placed in the physical disk block, but placed in a table in memory

        As shown in the figure above, the serial number of the table is the physical disk block number, starting from 0 to N - 1 (N is the total number of disk blocks). Store the link pointer in each table entry, that is, the next disk block number. //The link sequence is: 2->4->5->1

        In this table, the first disk block number belonging to a certain file is filled in the "physical address" field of the FCB of the file as the file address. Since the process of finding records is carried out in memory , it not only significantly improves the retrieval speed , but also greatly reduces the number of disk accesses . //address speed up

        Since all disk block numbers assigned to files are placed in the table, the table is called the file allocation table FAT (File Allocation Table).

//In the actual application process of the file allocation table, it can be found that there is always a contradiction between FAT and disk size and utilization

FAT12 -> FAT16 -> FAT32 -> NTFS in windows system:

        FAT12 uses disk blocks as the basic allocation unit , and each FAT entry occupies 12 bits, so there are up to 4096 (2^12) entries in the table. //If the size of the disk block is fixed, the size of the storage space is limited, so use a cluster (combination of multiple disk blocks)

        FAT16  increases the number of table entries to 16, and the maximum number of table entries will increase to 65536 (2^16), supporting the increase of hard disk capacity. //If the table entry is fixed and the disk capacity is too large, the larger the cluster (combination of multiple disk blocks), the larger the cluster, the more internal fragments

        FAT32 increases the number of table entries to 32 bits, which can support smaller clusters and make the disk have higher memory utilization. //The main reason is to solve the problem of excessive internal fragmentation as the disk space increases, because  FAT32 is larger than FAT16 files, so the access efficiency is slower than FAT16.

        NTFS uses clusters as the basic unit of disk space allocation and recovery . A file occupies several clusters, and a cluster belongs to only one file. In this way, when allocating disk space for files, you don't need to know the size of the disk block. You only need to select clusters of corresponding sizes according to different disk capacities, so that NTFS has independence from the physical block size of the disk . //  NTFS (New Technology File System) takes the cluster as the basic unit

(3) Index organization method

        Although the link organization method solves the problems existing in the continuous organization method, there are two other problems, namely:

  • It cannot support efficient direct access . To access a larger file, it is necessary to sequentially search for many disk block numbers in the FAT.
  • FAT needs to occupy a large memory space . When the disk capacity is large, FAT may occupy more than several MB of memory space.

3.1 - Single-level index organization

        Thought: In fact, when opening a certain file, you only need to load the number of the disk block occupied by the file into the memory, and there is no need to load the entire FAT into the memory .

        The index allocation method is to allocate an index block (table) for each file, and record all disk block numbers allocated to the file in the index block . When creating a file, it is only necessary to fill in the pointer to the index block in the directory entry created for it.

        The main advantage of the index organization is that it supports direct access . When the i-th disk block of a file is to be read, the disk block number of the i-th disk block can be found directly from the index block of the file; in addition, the index allocation method will not generate external fragments . When the file is large, the index allocation method is undoubtedly better than the link allocation method. // suitable for medium and large files

        The main problem with the index organization method is that whenever an index file is created, an index block should be allocated for the file, and all disk block numbers assigned to the file should be recorded in it. Hundreds of block numbers can be stored in each index block. But for small and medium-sized files, they usually only occupy a few to tens of disk blocks, or even less, but an index block must still be allocated for it. It can be seen that when the index allocation method is used for small files, the utilization rate of the index blocks will be extremely low . //For small files, waste index block space

3.2 - Multi-level index organization

        When the file is too large and there are too many index blocks, another level of index should be established for these index blocks , which is called the first-level index, that is, the system allocates another index block as the index block of the first-level index. , second block, ... and other index block disk block numbers are filled in this index table, thus forming a two-level index allocation method. If the file is very large, three-level and four-level index allocation methods can also be used. //Create an index for the index block

        The main advantage of multilevel indexing is that it greatly speeds up the lookup of large files . Its main disadvantage is that when accessing a disk block, the number of times it needs to start the disk increases with the number of index levels , even for small files. The actual situation is that there are usually medium and small files in the majority, while large files are rare. Therefore, it can be seen that if only a multi-level index organization method is adopted in the file system, an ideal effect cannot be obtained. //The multi-level index method is suitable for searching large files

3.3 - Incremental index organization

        Idea: In order to take care of small, medium, large and extra-large jobs more comprehensively, various organizational methods can be adopted to form the physical structure of the file . //Small files use direct addressing, medium files use single-level indexes, and large files use multi-level indexes

        The so-called incremental index organization method is organized based on the above-mentioned basic idea. It not only adopts the direct addressing method, but also adopts the single-level and multi-level index organization method (indirect addressing) . This type of organization is often referred to as a hybrid organization . This is the organization used in UNIX systems.

        There are 13 address items in the index node of UNIX System V, namely addr-0 ~ addr-12 . The external memory organization of the system is as follows:

  1. direct address . In order to improve the retrieval speed of the file, direct address items (generally 10, addr-1 ~ addr-9 ) can be set in the index node to store the direct address (disk block number), so that you can directly obtain the The disk block address of the file . This addressing method is generally called direct addressing .
  2. An indirect address . For large and medium-sized files, it is unrealistic to use only direct addresses. For this reason, it is necessary to use the address item addr-10 in the index node to provide an indirect address (single indirect) . The essence of this method is the single-level index allocation method.
  3. Multiple indirect addresses . When the file length is too large, the address space is still insufficient when using one indirect address and 10 direct address items, and the system needs to adopt the second indirect address allocation method. At this time, use the address item addr-11 to provide the secondary indirect address (double indirect). The essence of this method is a two-level index allocation method. Similarly, the address item addr-12 is used as a triple indirect address.

2. Management of file storage space

        In order to implement any of the above file organization methods, it is necessary to allocate disk blocks for the file, so it is necessary to know which disk blocks on the disk are available for allocation. Therefore, when allocating disks for files, in addition to the file allocation table , the system should also set up a corresponding data structure for the allocatable storage space, that is, set up a Disk Allocation Table (Disk Allocation Table) to remember the storage available for allocation. space situation. //File allocation disk -> file allocation table + disk allocation table

        In addition, means should be provided for allocating and reclaiming disk blocks. Regardless of the allocation and recovery methods, the basic allocation unit of storage space is disk blocks rather than bytes .

        The following introduces several commonly used file storage space management methods.

(1) Free list method and free linked list method

1.1 - Free List Method

        What is a free list?

The system creates a free table for all free areas         on the external storage , and each free area corresponds to a free entry , which includes the entry number , the first disk block number of the free area, and the free block number of the area. number and other information. Then arrange all the free areas in the increasing order of their starting disk block numbers to form a free disk block table, as shown in the figure below. //The free list is used in the continuous allocation of files

        Allocation and recovery of storage space

        The allocation of free extents is similar to the partition (dynamic) allocation of memory . It also adopts the first-fit algorithm and the best-fit algorithm. They have roughly the same utilization rate of storage space and are better than the worst-fit algorithm.

When the system allocates free disk blocks         for a newly created file , it first searches the entries in the free list sequentially until it finds the first free area whose size can meet the requirements, and then allocates the disk area to the user (process) , while modifying the free list. //adaptive algorithm allocation

When the system reclaims         the storage space released by the user , it also adopts a method similar to memory reclamation, that is, it should consider whether the reclaimed area is adjacent to the front area and the back area of ​​the insertion point in the free list, and the adjacent ones should be be combined. //Recycle free area merge

        It should be noted that although the continuous allocation method is rarely used in memory allocation, in the management of external memory, because this allocation method has a higher allocation speed and can reduce the I/O frequency of accessing the disk , it is used in many allocations. method still has its place.

1.2 - Free list method

        The free linked list method is to pull all free extents into a free chain . According to the different basic elements used to form the chain, the linked list can be divided into two forms: free disk block chain and free disk area chain . //The units that make up the chain are different

        Free Disk Chain

        The free disk block chain pulls all the free space on the disk into a chain in units of disk blocks , and each disk block has pointers to subsequent disk blocks.

        When a user requests to allocate storage space for creating a file, the system starts from the head of the chain and sequentially removes an appropriate number of free disk blocks and allocates them to the user. When the user releases storage space due to file deletion, the system hangs the reclaimed disk blocks at the end of the free disk block chain in turn. //Disk allocation and recycling

        The advantage of this method is that the process of allocating and reclaiming a disk block is very simple, but when allocating a disk block for a file, the operation may have to be repeated many times, and the efficiency of allocation and reclaim is low. And because it is based on disk blocks, the corresponding free disk block chain will be very long. //Low efficiency when allocating and reclaiming multiple disk blocks

        chain of free extents

        The free extent chain pulls all free extents (each extent can contain several disk blocks) on the disk into a chain. In addition to containing a pointer for indicating the next free extent, each extent should also have information that can indicate the size of the extent (the number of disk blocks).

        The method of allocating extents is similar to the dynamic partition allocation of memory, and usually adopts the first-fit algorithm. When reclaiming an extent, the reclaimed area should also be merged with adjacent free extents. When using the first adaptation algorithm, in order to improve the retrieval speed of free extents, an explicit link method can be used, that is, a linked list is established for free extents in memory.

         The advantages and disadvantages of the free block chain are just opposite to those of the free block chain , that is, the process of allocation and recovery is more complicated, but the efficiency of allocation and collection may be higher, and multiple consecutive blocks are allocated for files each time. And the chain of free extents is short.

(2) bitmap method

        What is a bit view?

        A bit map uses a binary bit to represent the usage of a disk block in the disk . When its value is "0", it means that the corresponding disk block is free; when it is "1", it means it has been allocated. The essence is to use two states of one bit to mark the two situations of free and allocated .

        All disk blocks on the disk have a binary bit corresponding to it, so that a set of bits corresponding to all disk blocks is called a bit map . Usually mxn digits can be used to form a bit map, and mxn is equal to the total number of blocks of the disk, as shown in the following figure. The bitmap can also be described as a two-dimensional array map [ m, n ]: //A binary bit represents a disk block

        Disk allocation

        When assigning disk blocks according to the bitmap, it can be divided into three steps: //Search->Convert->Modify

  1. The bit map is scanned sequentially to find a binary bit with value "0".
  2. Convert the found binary bits to their corresponding disk block numbers.
  3. Modify the bitmap so that map[i, j] = 1.

        Recycling of disks

        The recovery of the disk block is divided into two steps: // conversion -> modification

  1. Converts the block number of a reclaimed block into a row and column number in the bitmap.
  2. Modify the bitmap. Let map[i, j] = 0.

The main advantage of the bit view is that it is easy to find a contiguous free disk block or a group of free disk blocks         from the map . For example, if we need to find 6 adjacent free disk blocks, we only need to find 6 bits whose value is "0" consecutively in the bit map.

        In addition, since the bitmap is small and takes up less space , it can be stored in the memory, so that it is not necessary to read the extent allocation table into the memory every time the extent is allocated, thus saving a lot of disk space. Start the operation.

//Under the space occupied by the bit view, the search speed is fast

        Therefore, bitmaps are often used in microcomputers and minicomputers , such as CP/M, Apple-DOS and other OSs.

(3) group link method

        Neither the free list method nor the free linked list method is suitable for large file systems, because this will make the free list or free linked list too long . In the UNIX system, the group chaining method is adopted, which is a free disk block management method formed by combining the above two methods. It has both the advantages of the above two methods and overcomes the disadvantages of both methods. The disadvantage of the table being too long. //free list + free linked list

3.1 - Organization of free blocks

        Free disk block number stack: used to store the disk block numbers (up to 100 numbers) of a group of free disk blocks currently available , and the number N of free disk blocks (numbers) still in the stack. By the way, N also doubles as the top pointer of the stack. For example, when N = 100 , it points to S.free(99) . //Use the stack to save the disk block number, each stack saves 100 disk blocks

        Since the stack is a critical resource, only one process is allowed to access it at a time, so the system sets a lock for the stack . The figure below shows the structure of the free disk block number stack. Among them, S.free(0) is the bottom of the stack, and the top of the stack when the stack is full is S.free(99) .

        File area: All free disk blocks in the file area are divided into several groups, for example, every 100 disk blocks are regarded as a group. Assume that there are 10,000 disk blocks on the disk, and each block is 1KB in size. Among them, disk blocks No. 201 ~ 7999 are used to store files, that is, they are used as file areas. In this way, the last group of block numbers in this area should be 7901 ~ 7999 ; The last group is 7801 ~ 7900 , and the block number of the penultimate group is 301 ~ 400 ; the first group is 201 ~ 300 , as shown in the figure above.

        Disk block chain: record the total number N of disk blocks contained in each group and all the disk block numbers of this group into S.free(0) ~ S.free(99) of the first disk block of the previous group . In this way, the first disk block of each group can be linked into a chain.

        Free disk block number: record the total number of disk blocks and all disk block numbers of the first group into the free disk block number stack as the number of free disk blocks currently available for allocation.

        End sign: the last group has only 99 disk blocks, and the disk block numbers are respectively recorded in S.free(1) ~ Sfree(99) of the previous group and "0" is stored in S.free(0). ”, as the end sign of the free disk block chain . (Note: The number of disk blocks in the last group should be 99, not 100, because this refers to the available free disk blocks. Its number should be (1 ~ 99), and the number 0 of the free disk block chain end sign.)

3.2 - Allocation and reclamation of free disk blocks

        When the system wants to allocate the disk blocks required by the file for the user, it must call the disk block allocation process to complete.

        The process first checks whether the free disk block number stack is locked, and if not, takes a free disk block number from the top of the stack, assigns the corresponding disk block to the user, and then moves the pointer on the top of the stack down by one space.

        If the block number is already at the bottom of the stack, that is, S.free(0) , this is the last block number that can be allocated in the current stack. Since the next group of available disk block numbers is recorded in the disk block corresponding to the disk block number, the disk read process must be called to read the contents of the disk block corresponding to the stack bottom disk block number into the stack as a new disk block number. The content of the block number stack , and allocate the disk block corresponding to the bottom of the original stack (the useful data in it has been read into the stack). Then, allocate a corresponding buffer (as the buffer of the disk block). Finally, decrement the number of free disk blocks in the stack by 1 and return. //Special processing is required for S.free(0) , because this disk block has the next set of available disk block numbers

        When the system reclaims free disk blocks, it must call the disk block recovery process to reclaim. It records the disk block number of the reclaimed disk block on the top of the free disk block number stack, and performs the operation of adding 1 to the number of free disk blocks. When the number of free disk blocks in the stack has reached 100, it means that the stack is full, and the 100 disk block numbers in the existing stack will be recorded in the newly recovered disk blocks, and the disk block numbers will be used as the bottom of the new stack.

Guess you like

Origin blog.csdn.net/swadian2008/article/details/131706926