Operating System Notes - File Management

4. File management

4.1 File system basics

4.1.1 Basic concepts of files

File concept

Data processing is one of the main functions of computers. Data management and data storage related to data processing are essential and even important links.In computers, large amounts of data and information are stored and managed through files. In input and output performed by users, files are used as the basic unit. The file system is responsible for managing files and providing users with methods to access, share, and protect files .

A file is a collection of related elements with a file name. It is the largest data unit in the file system. It describes a set of objects. Each file has a file name. Users access the file through the file name.

File structure:

  • data item. Data items are the lowest form of data organization in the file system and can be divided into the following two types.
    • Basic data item: a value used to describe a certain attribute of an object, such as name, date or certificate number, etc. It is the smallest logical data unit that can be named in the data, that is, atomic data.
    • Combined data item: composed of multiple basic data items.
  • Record.A record is a collection of related data items used to describe the properties of an object in some aspect.For example, a candidate registration record includes a series of fields such as the candidate's name, date of birth, school code, ID number, etc.
  • document.A file refers to a set of related information defined by the creator. It can be logically divided into structured files and unstructured files.
    • In a structured file, the file consists of a group of similar records, such as the application information records of all candidates who applied for a certain school, also known as recorded documents.
    • An unstructured file is treated as a character stream, such as a binary file or character file, also called a streaming file.

The scope of file representation is very wide. The system or user can name a program or data collection with certain functions as a file . For example, a named source program, target program, a batch of data, and system programs can all be considered files. In some operating systems, a device is also regarded as a special file. In this way, the system can implement unified management of devices and files, which not only simplifies the system design, but also facilitates users.

File properties

  • name. The file name is unique and saved in an easily readable form.
  • identifier. A unique label for a file within the system, usually a number that is transparent to the user.
  • file type. Used by file systems that support different types of files.
  • File location. Pointer to file.
  • File size, creation time, user ID, etc.

Classification of files

  • Classified by use

    • System Files. Files composed of system software. Most system files only allow users to call and execute them, but do not allow users to read or modify them.
    • Library file. Various standard procedures, functions and application files provided by the system to users. This type of file allows users to call and execute it, but users are also not allowed to modify it.
    • User files. Files saved by the user's entrusted file system, such as source programs, target programs, original data, etc.Such files can only be used by the file owner or users authorized by the owner (even user files cannot be accessed arbitrarily. Remember, only authorized users or file owners can access)
  • Classification by protection level

    • Read-only file. Read-only files allow the owner or authorized users to read the file, but do not allow writing (note that only authorized users or owners can read, not any user, the same situation below).
    • Read and write files. The file allows the owner or authorized users to read and write to the file, but prohibits unauthorized users from reading and writing.
    • executable file. This file allows authorized users to call and execute, but does not allow reading and writing of the file (it must be clear that reading, writing, and executing are different operations, and do not mistakenly think that reading and executing are the same).
    • Files are not protected. Unprotected files refer to files without any access restrictions.
  • Classification by information flow direction

    • Input file. For example, files on a card reader or keyboard can only be read, so such files are input files.
    • Output file. For example, files on the printer can only be written out, so these files are output files.
    • Input/output files. For example, files on disks and tapes can be read and written, so such files are input/output files.
  • Classify by data form

    • Source File. A file composed of source programs and data. Generally, files formed by source programs and data input by a terminal or input device are source files. Source files generally consist of ASCII codes or Chinese characters.
    • Target file. A file formed by the object code after the source file has been compiled but not yet linked . Object files are binary files.
    • executable file. The compiled object code is linked by the linker to form a executable file .

File operations

  • Basic file operations
    • Create a file. When creating a new file, the system must first allocate necessary external memory space for it and create a directory entry in the directory.
    • Delete Files. When deleting a file, you should first delete the directory entry of the file, make it an empty entry, and then reclaim the storage space occupied by the file.
    • Read the file. The system gives the file name and file memory target address to the file calling program, searches the directory at the same time, sets a read pointer based on the external memory address of the file, and updates the read pointer when a read operation is performed .
    • Write file. The system passes the file name and file memory address to the file calling program, searches the directory at the same time, sets the write pointer according to the external memory address, and updates the write pointer when a write operation is performed .
    • Truncate the file. When the file content is no longer needed or needs to be completely updated, the file can be deleted and re-created or all attributes of the file can be kept unchanged and the file content can be deleted, that is, its length is set to 0 and its space is released.
    • Set the read/write location of the file. By setting the read/write position of the file, each operation on the file does not have to start from the beginning of the file, but can start from a specific position .
  • File opening and closing operations
    • open a file. The system copies the file's attributes from external storage to memory, and sets a number (or index) to return to the user. In the future, when the user wants to operate on the file, he only needs to use the number (or index number) to make a request to the system. This avoids the system from re-retrieving files, which not only saves retrieval overhead, but also improves the speed of file operations.
      • File pointer. The system tracks the last read and write position as the current file location pointer, which is unique to a process that opens the file and, therefore, must be saved separately from the disk file attributes.
      • File open count. When a file is closed, the operating system must reuse entries in its open file table (the table that contains information about all open files), otherwise there will not be enough space in the table. Because multiple processes may have the same file open, the system must wait for the last process to close the file before deleting the open file entry. This counter tracks the number of opens and closes. When the count reaches 0, the system closes the file, deleting the entry .
      • File disk location. Most file operations require the system to modify file data. This information is kept in memory to avoid reading from disk for each operation.
      • access permission. Every process opening a file needs an access mode (create, read-only, read-write, add, etc.). This information is saved in the process's open file table so that the operating system can allow or deny subsequent I/O requests.
    • Close the file: The system deletes the number (or index number) of the open file and destroys its file control block. If the file is modified, the modifications need to be saved to external storage.

4.1.2 Logical and physical structure of files

The logical structure of the file refers toThe organizational form of a file observed from the user's point of view is the data and its structure that the user can directly process. Because it is independent of the physical characteristics of the file, it is also called file organization.;andFrom a computer perspective, the storage organization form of files on external memory is called the physical structure of the file.

The logical structure of a file has nothing to do with the characteristics of the storage device, while the physical structure has a lot to do with the characteristics of the storage device.

From the perspective of logical structure, files can be divided into two forms: one is a structured record file; the other is an unstructured streaming file . andThe logical structure of record files usually includes sequence, index and index sequence

From the physical structure point of view, the organization forms of files include continuous allocation, link allocation and index allocation .

4.1.3 Logical structure of files

Usually, a structured file consists of several records, so it is called a record file. A record is a collection of related data items, and a data item is the smallest logical unit that can be named in a data organization. For example, each employee status record consists of data items such as name, gender, date of birth, salary, etc. The employee status record of a unit constitutes a file. In summary, data items make up records, and records make up files.Record files can be divided into equal-length record files and variable-length record files. All records in an equal-length record file have the same length, and the lengths of each record in a variable-length record file can be unequal.

An unstructured file is composed of several characters and can be regarded as a character stream, which is called a streaming file. Streaming files can be thought of as a special case of recorded files. In UNIX systems, all files are considered streaming files, and the system does not format the files.

File storage devices are usually divided into equal-sized physical blocks, which are the basic units for allocating and transmitting information. The size of the physical block is related to the device, but has nothing to do with the size of the logical record. Therefore, several logical records can be stored in one physical block, and one logical record can also be stored in several physical blocks. In order to effectively utilize external storage devices and facilitate system management, file information is generally divided into logical blocks equal to the size of physical storage blocks.

sequence file

The sequential structure, also known as the continuous structure, is the simplest file structure, which continuously stores the information of a logical file. Files stored in a sequential structure are called sequential files or continuous files.

According to whether the record is of negative length, sequence files are divided into fixed-length record sequence files and variable-length record sequence files .

Depending on whether the records in the file are sorted by keywords, sequential files are divided into string structures and sequential structures :The order between the records in the string structure has nothing to do with the keywords, while all records in the sequence structure are sorted according to the keyword order.

The main advantage of sequential files is that sequential access is faster; if the file is a fixed-length record file, random access can also be performed based on the file starting address and record length. However, because file storage requires continuous storage space, fragmentation will occur and is not conducive to dynamic expansion of files.

index file

The index structure creates an index table for the information of a logical file. The entries in the index table store the length of the file record and the starting position of the logical file, so the length information of the record is no longer stored in the logical file . The index table itself is a fixed-length file, and each logical block can be variable-length. Both the index table and the logical file constitute the index file.

The advantage of the index file is that it can be accessed randomly and it is easy to add and delete files . However, the use of index tables increases the cost of storage space. In addition, the search strategy of index tables has a great impact on the efficiency of the file system.

Files in index order

Indexed sequential files are a combination of sequential files and index files.. The index sequence file divides all the records in the sequence file into several groups, creates an index table for the sequence file, and creates an index entry in the index table for the first record in each group, which contains the keyword of the record. and a pointer to the record.

The index table contains two data items: keywords and pointers. The index items in the index table are arranged in the order of keywords.. The logical file (main file) of an indexed sequential file is a sequential file. The keywords within each group do not have to be arranged in order, but the keywords between groups are arranged in order.

Indexing sequential files greatly improves the speed of sequential access, but it still requires configuring an index table, which increases storage overhead.

Direct files and hash files

Establish a correspondence between the keyword and the physical address of the corresponding record, so that the physical address of the record can be found directly through the value of the keyword, that is,The value of the keyword determines the physical address of the record. Files with this structure are called direct files.. This mapping structure is different from sequential files or index files.Features without order

A hash file is a typical direct file. The keyword is converted through a hash function, and the conversion result directly determines the physical address of the record . Hash files have high access speed, but conflicts may occur due to the same hash function values ​​of different keywords.

4.1.4 Directory structure

File Directory

There are many types and large quantities of files in computer systems. In order to effectively manage these files and facilitate users to find the files they need, they should be properly organized. File organization can be achieved through directories. A collection of file descriptions is called a file directory.The most basic function of a directory is to access files by file name.

  • Implement "access by name". Users only need to provide the file name to operate the file. This is not only the most basic function of directory management, but also the most basic service provided by the file system to users.
  • Improve retrieval speed. This requires a reasonable design of the directory structure when designing the file system. This is an important design goal for large file systems.
  • Files with the same name are allowed. In order to make it easier for users to name and use files according to their own habits, the file system should allow the use of the same name for different files. At this time, the file system can distinguish this by different working directories.
  • Allow file sharing. In a multi-user system, multiple users should be allowed to share a file, which can save file storage space and facilitate users to share file resources. Of course, corresponding security measures need to be adopted to ensure that users with different permissions can only obtain corresponding file operation permissions to prevent unauthorized behavior.

Usually, a file directory is also treated as a file, called a directory file. Since there are generally many files in the file system and the file directories are also large, the file directories are not placed in the main memory, but in external memory .

File control blocks and index nodes

  • file control block

    From the perspective of file management, a file consists of two parts: File Control Block (FCB) and file body .The file body is the file itself, and the file control block (also known as the file description) is the data structure that saves the file attribute information., the exact content it contains will vary by operating system, but should at least contain the following information.

    • file name. This information is used to identify the symbolic name of a file. Each file must have a unique name so that users can perform file operations based on the file name.
    • The structure of the file.This information is used to explain whether the logical structure of the file is a record file or a streaming file. If it is a record file, it is necessary to further explain whether the record is long, the length and number of records; it indicates whether the physical structure of the file is a sequential file, index sequential file or index file
    • The physical location of the file. This information is used to indicate the storage location of the file on the external storage , including the name of the device where the file is stored, the storage address of the file in the external storage, and the file length. The form of the file's physical address depends on the physical structure . For example, for a continuous file, the physical address of the first block of the file and the number of blocks it occupies should be given. For an indexed sequential file, only the physical address of the first block should be given, while for an index file, the physical address of the first block should be given. Give the index table address.
    • Access control information. This information is used to indicate the access permissions of the file, including the storage permissions of the file owner (also called the file owner), the permissions of users in the same group of the file owner, and the permissions of other general users.
    • management information. This information includes the date and time the file was created, the date and time the file was last accessed, and information on the current file usage status.
  • index node

    In the process of retrieving directory files, only the file name is used. Only when a matching directory entry is found, the physical address of the file needs to be read from the directory entry. In other words, when retrieving the directory, other description information of the file will not be used, so there is no need to load it into memory . Therefore, some systems adopt the method of separating file name and file description information, and form a separate index node for the file description information, referred to as i node.Each directory entry in the file directory consists only of the file name and a pointer to the i-node of the file.

    The index nodes stored on the disk are called disk index nodes. Each file has a unique disk index node , which mainly includes the following contents.

    • File primary identifier. The identifier of the person or group that owns the file.
    • file type. Including ordinary files, directory files or special files.
    • File access permissions. The access rights of various users to the file.
    • File physical address. Each index node directly or indirectly gives the number of the disk block where the data file is located.
    • File length. The file length in bytes.
    • File link count. Indicates the number of pointers pointing to the file name in this file system.
    • File access time. The time when this file was last accessed, the time when it was last modified, and the time when the index node was last modified.

    When the file is opened, the disk index node is copied to the memory index node for use. The index node stored in the memory is called the memory index node , which adds the following content.

    • Index node number. Used to identify memory index nodes.
    • state. Indicates whether the i node is locked or modified.
    • Visit count. The number of processes currently accessing the file.
    • Logical device number. The logical device number of the file system to which the file belongs.
    • Link pointer. Set pointers to the free list and hash queue respectively.

Single level directory structure

The single-level directory structure (or one-level directory structure) is the simplest directory structure. In the entire file system, the single-level directory structure only establishes one directory table, and each file occupies one entry in it .

Insert image description here

When creating a new file , first determine whether the file name is unique in the directory. If there is no conflict with the existing file name, find an empty entry in the directory table and fill in the relevant information of the new file. . When deleting a file , the system first finds the directory entry of the file from the directory table, finds the physical address of the file, reclaims the storage space occupied by the file, and then clears the directory entry occupied by it. When accessing a file , the system first searches the directory table according to the file name to determine whether the file exists. If the file exists, it finds the physical address of the file and then completes the operation on the file.

The advantages of a single-level directory structure are that it is easy to implement and manage, but it has the following disadvantages .

  • Duplicate file names are not allowed (obviously). A file in a single-level directory is not allowed to have the same name as another file. But for multi-user systems, this is difficult to avoid. Even in a single-user environment, when the number of files is large, it is difficult to figure out which files are there, which makes the file system extremely difficult to manage.

  • File search is slow. For a slightly larger file system, because it has a large number of directory entries, it may take a long time to find a specified directory entry.

Secondary directory structure

The secondary directory structure divides the file directory into a main file directory and a user file directory. The system creates a separate User File Directory (UFD) for each user, and the entries in it register all the files created by the user and their description information. The Master File Directory (MFD) records the status of each user's file directory in the system. Each user occupies an entry. The entry includes the user name and the storage location of the corresponding user directory. This forms a secondary directory structure.

Insert image description here

When a user wants to access a file , the system first searches for the user's file directory in the main file directory based on the user name, and then finds the corresponding directory entry in the user file directory based on the file name to obtain the physical address of the file. Then complete the access to the file.

When a user wants to create a file , if it is a new user, that is, there is no corresponding registration entry for this user in the main file directory table, the system will allocate an entry in the main directory for it and allocate storage for the user file directory. space, and allocate an entry for the new file in the user file directory, and then fill in the relevant information in the entry.

When a file is deleted , just delete the directory entry for the file in the user's file directory.If the user directory table is empty after deletion, it means that the user has left the system., so that the corresponding entry of the user in the main file directory table can be deleted.

tree directory structure

In order to facilitate the system and users to organize, manage and use various files more flexibly and conveniently, the hierarchical relationship of the secondary directories is promoted, forming a multi-level directory structure, also known as a tree directory structure.

Insert image description here

In the tree directory structure, the first-level directory is called the root directory (tree root), the non-leaf nodes in the directory are directory files (also called subdirectories), and the leaf nodes are files . The system assigns each file a unique identifier (internal identifier) ​​that is transparent to the user.

The tree directory structure introduces the following concepts

  • pathname. In a tree directory structure, pathnames are often used to uniquely identify files. The path name of a file is a string, which is formed by concatenating all directory names and data file names on the path from the root directory to the file being found, using the delimiter "\" .The path starting from the root directory is called an absolute path, and the path starting from the current directory to the file is a relative path.
  • Current directory. When the tree directory has many levels, it will be inconvenient for users if they have to use the complete path name every time to find files, and the system itself will also need to spend a lot of time searching the directory. Effective measures should be taken to solve this problem. Considering that files accessed by a process within a period of time are usually local, a certain directory can be designated as the current directory (or working directory) during this period of time. The process's access to each file is relative to the current directory. At this time, the path used by the file is called a relative path. The system allows the file path to go up, and uses "..." to give the parent directory of the directory (file) .

The tree directory structure can easily classify files, the hierarchical structure is clear, and it can also manage and protect files more effectively. However, when searching for a file in a tree directory, intermediate nodes need to be accessed step by step according to the path name, which increases the number of disk accesses and thus affects the query speed.

Graphics directory structure

The tree directory structure is convenient for file classification, but it is not convenient for file sharing. For this reason, some directed edges pointing to the same node are added to the tree directory structure, making the entire directory a directed acyclic graph. This is the graphics directory structure. The purpose of introducing this structure is to achieve file sharing .

Insert image description here

When a user requests to delete a shared node, the system cannot simply delete it, otherwise other users will not be able to find the node when accessing it . For this, you canSet a sharing counter for each shared node, whenever the shared chain of the node is added, the counter is incremented by 1; whenever a user proposes to delete the node, the counter is decremented by 1. Only when the sharing count is 0, the node can be truly deleted, otherwise only the sharing chain of the user who made the deletion request will be deleted.

4.1.5 File sharing

Achieving file sharing is an important function of the file system. File sharing means that different users can use the same file. File sharing can save a lot of external memory space and main memory space, reduce input/output operations, and provide convenient conditions for cooperation between users. File sharing does not mean that users can use files without restrictions, and the security and confidentiality of files cannot be guaranteed. That is to say,File sharing should be conditional and controlled. Therefore, file sharing needs to solve two problems: first, how to realize file sharing; second, access control for various users who need to share

shared motivation

  • Different users in a multi-user operating system need to share some files to complete tasks together.
  • Communication between different computers on the network requires the support of the sharing function of the remote file system.

Sharing method based on index nodes (hard link)

Sharing of traditional tree directory files is achieved by different users setting the FCB of their respective files to the same physical address, that is, different directory entries point to the same physical blocks. When one of the directory entries adds a physical block (new content is added to the file), the other directory entry does not add it, so the new physical block cannot be shared by the two directory entries.

The index node separates the file description information in the FCB into a data structure, that is to say, the physical block information is in the index node. At this time, the directory entry only contains the file name and the pointer to the index node. Two different directory entries only need to point to the same index node to achieve sharing , that isA shared file has only one index node. If directory entries with different file names need to share the file, all pointers in the directory entries only need to point to the index node.

Insert image description here

Add another count value to the index node to count the number of directory entries pointing to the index node. In this way, you need to determine the count value before deleting the file. Only when the count value is 1, delete the index node. point, if the count value is greater than 1, just decrease the count value by 1.
This method can realize file sharing with different names, but when the file is shared by multiple users, the file owner cannot delete the file .

Use symbolic links to achieve file sharing (soft links)

Insert image description here

The method is to create a new directory entry called a link . For example,In order to enable user B to share a file of user C, the system can create a new directory entry pointing to the file for user B and place it in user B's directory. The new directory entry contains the path name of the shared file. , which can be an absolute path or a relative path

When a file needs to be accessed, the directory table is searched. If the directory entry is marked as a link, the name of the real file (or directory) can be obtained, and then the directory is searched. Links can be marked using the project format (or via a special type), which is actually an indirect pointer with a name. When traversing the directory tree, the system ignores these links to maintain the loop-free structure of the system.

When using symbolic links to share files, only the file owner has a pointer to its index node; other users who share the file only have the path name of the file and do not have pointers to its index node. This prevents the file owner from deleting the shared file and leaving a dangling pointer. When the file owner deletes a shared file and other users try to access a deleted shared file through the symbolic link, the access will fail because the system cannot find the file, so the symbolic link is deleted. There will be no impact. A great advantage of the symbolic chain method is that it can be used to link (through a computer network) files in computers anywhere in the world. In this case, you only need to provide the network address of the machine where the file is located and the file path in the machine. That’s it.

4.1.6 File protection

File protection is used to protect files from physical damage and illegal access.

access type

The protection of files can start from restricting the access types to files. The access types that can be controlled include read, write, execute, add, delete, list (listing file names and file attributes), etc. In addition, you can also control the renaming, copying, editing, etc. of files.

Access control

Access control is to adopt different access types for different users to access the same file. According to different user permissions, users can be divided into owners, workgroup users, other users, etc. Then adopt different access types for different user groups to prevent files from being accessed illegally .

There are usually 4 methods of access control: access control matrix, access control list, user permission table and password and password

The three methods of access control matrix, access control list and user permission list are relatively similar.A certain data structure is used to record the operation permissions of each user or user group for each file. When accessing the file, these data structures are checked to see whether the user has the corresponding permissions to protect the file.. Passwords and passwords are another access control method.

Password means that the user provides a password when creating a file, and the system attaches the corresponding password when creating an FCB for it. The user must provide the corresponding password when requesting access. This method has less overhead, but the password is stored directly inside the system, which is not secure enough.

Password means that the user encrypts the file, and the key is required when the file is accessed. This method has strong confidentiality and saves storage space, but encoding and decoding takes a certain amount of time.

4.2 File system and implementation

4.2.1 Hierarchical structure of file system

Insert image description here

A file system refers to a collection of software and data related to file management in an operating system. From a system perspective, a file system is a system that organizes and allocates storage space for files, is responsible for storing files, and protects and retrieves stored files. Specifically, it is responsible for creating, undoing, reading, writing, modifying, and copying files for users. From a user's perspective, the file system mainly implements name-based access. That is to say, when the user requests the system to save a named file, the file system stores the user's file in the appropriate location in the file storage according to a certain format; when the user requests to use the file, the system can store the user's file according to the file name given by the user. Find the required file from file storage.

A reasonable hierarchical structure of the file system can be divided into user interface, file directory system, access control verification, logical file system and file information buffer, and physical file system .

  • User interface. The operating system usually uses the graphical desktop as an interface. Of course, the black cmd under Windows and the easy-to-use command windows on Linux and Mac are all user interfaces. This user is a broad concept and does not only refer to programmers. For example, to view the contents of file F, you can issue commands to the operating system through interface operations. This is the first layer, the most abstract and top-level user-facing interface, which connects the real world and the virtual world.
  • File directory system. What the operating system needs to do after getting the command is to search the directory and get the index information of file F. This index information can be passed through FCB or index node. The i node pointer is found when accessing by name. A file has an FCB or an i node (index node). This is what is done at the second level: the file directory system.
  • Access control verification. After finding the FCB, not everyone is qualified to see the F file, and your qualifications need to be checked. The FCB has permission information about whether you can access this file. This is access control verification.
  • Logical file system and file information buffer. After confirming that you can enter, we will start to actually help you find the specific physical address. We should establish such a concept: the operating system usually manages the logical address first, and then obtains the physical address according to the corresponding policy. The function of this part is to obtain the logical address of the corresponding file, and the specific physical address needs to be obtained in the physical file system.
  • Physical file system. This is the underlying implementation, which is divided into two parts: auxiliary storage allocation management and device management. Under UNIX, devices are also files.

4.2.2 Implementation of directory

linear table

The simplest way to implement a directory is to use a linear table (array, linked list, etc.) that stores file names and data block pointers.. When creating a new file , you must first search the directory table to ensure that no file with the same name exists, and then add a directory entry after the directory table. To delete a file , the directory table is searched based on the given file name, and the space allocated to it is released. Using a linked list structure can reduce the time of deleting files. Its advantage is that it is simple to implement. However, because linear tables need to use a sequential method to find specific items, it is more time-consuming to run.

hash table

The hash table gets a value based on the file name and returns a pointer to the element in the linear list . This method greatly shortens the time to search the directory, and insertion and deletion are relatively simple, but some measures are needed to avoid conflicts (the hash function values ​​of two files with different names are the same).This method is characterized by the fixed length of the hash table and the dependence of the hash function on the table length.

4.2.3 Implementation of files

The realization of files mainly refers to the realization of files on memory, that is, the realization of the physical structure of files, including external memory allocation methods and file storage space management.

External memory allocation method

The physical structure of a file refers to the storage organization form of a file on external memory, and is related to the external memory allocation method. The external memory allocation method refers to how disk blocks are allocated for files. Different allocation methods will form different physical structures of files.

Generally speaking, there are two ways to allocate external memory: static allocation and dynamic allocation .Static allocation allocates all the space required at one time when the file is created; dynamic allocation allocates based on the dynamically growing file length, and can even allocate one physical block at a time.. Different methods can also be used in the allocation area size. A complete area can be allocated to a file to hold the entire file, which is called contiguous allocation of files . However, file storage space is usually allocated in units of blocks or clusters (several consecutive physical blocks are called clusters, usually of fixed size).

Commonly used external memory allocation methods includeContinuous allocation, linked allocation and index allocation

  • continuous allocation

    Continuous allocation is the simplest disk space allocation strategy. This method requires allocating continuous disk areas for files . In this allocation algorithm, the user must indicate the size of the storage space required for the file to be created before allocation, and then the system searches for free areas. management table to see if there is a large enough free area for its use . If there is, the file is allocated and the file cannot be created and the user process must wait. Required storage space: If not, the file cannot be created and the user process must wait.

    When using the continuous allocation method, records in the logical file can be stored sequentially in adjacent physical disk blocks. The file structure formed in this way is called a sequential file structure, and the physical file at this time is called a sequential file . This allocation method ensures that the order of records in the logical file is consistent with the order of the disk blocks occupied by the files in the storage.

    The advantages of continuous allocation are that the search speed is faster than other methods (only the starting block number and file size are required), and the information about the physical storage location of the file in the directory is also relatively simple. Its main disadvantage is that it is prone to fragmentation and requires regular storage space compaction .

  • Link assignment

    For situations where the file length needs to be dynamically increased or decreased and the user does not know the file size in advance , the allocation strategy of link allocation is often used. There are two implementation solutions.

    • Implicit link. In this implementation, the pointers used to link physical blocks are implicitly placed in each physical block . The directory entry has pointers to the first and last disk blocks of the indexed sequential file. In addition, in each disk block Contains a pointer to the next disk block. If you want to access a certain disk block, you need to read the pointers from each disk block starting from the first disk block , so there is a problem of low random access efficiency, because the pointer error of any one of the disk blocks will cause the following disk blocks to fail. The location is lost, so this implementation is less reliable.

      Insert image description here

    • Explicit linking. This implementation is used to explicitly store pointers to physical blocks in a linked table , with one linked table set for each disk. This table is also called the File Allocation Table (FAT). Operating systems such as MS-DOS, Windows and 0S/2 all use FAT. Since it is still a link method, when looking for the corresponding physical block address of a record in FAT, you still need to search one by one, and you cannot search randomly. But compared with implicit links, this solution searches in memory instead of on disk, so it can save a lot of time.

      Insert image description here

      The advantages of link allocation are simplicity (only the starting position is required), and file creation and growth are easy to implement. The disadvantage is that disk blocks cannot be accessed randomly, the link pointer will occupy some storage space, and there are reliability issues.

    • index allocation

      Although the link allocation method solves the problems existing in the continuous allocation method, new problems arise. First of all, when a record in a file is required to be randomly accessed, it needs to be searched sequentially according to the link pointer, so the search is very slow. Secondly, link pointers take up a certain amount of disk space. In order to solve these problems, the index allocation method was introduced .In the index allocation method, the system allocates an index block to each file, and the index table is stored in the index block. Each entry in the index table corresponds to a physical block allocated to the file.

      Insert image description here

      The index allocation method not only supports direct access, but also does not generate external fragmentation, and the problem of limited file length is also solved. The disadvantage is that the allocation of index blocks increases the overhead of system storage space. For index allocation methods, index block size selection is a very important issue. In order to save disk space, it is hoped that the index block should be as small as possible, but if the index block is too small, it cannot support large files, so some technologies must be used to solve this problem. In addition, accessing files requires two accesses to the external memory - first to read the contents of the index block, and then to access the specific disk block, thus reducing the file access speed .

      In order to use the index table more effectively and avoid accessing the external memory twice when accessing the index file, you can first transfer the index table into the memory when accessing the file. In this way, the file access only needs to access the external memory once.

      When the file is large, the file's index table will be large.If the size of the index table exceeds one physical block, the index table itself can be used as a file, and an "index table" can be created for it. This "index table" serves as the index of the file index, thus forming a secondary index. The entries in the first-level index table point to the second-level index, and the entries in the second-level index table point to the physical block number where the file information is located. By analogy, the index can be established level by level to form a multi-level index.

      • Single-level index allocation.The single-level index allocation method is to put together the disk block numbers corresponding to each file, allocate an index block (table) to each file, and then record all the disk block numbers assigned to the file in the index block , so the index block is an array containing multiple disk block numbers.

        Insert image description here

      • Two-level index allocation. When the file is large and one index block cannot fit in the block column of the file, the index block can be re-indexed to form a secondary index. The index address of the directory entry of the test file is the block number of the primary index, and each element in the primary index The block number is the block number of the second-level index, and the block numbers in the second-level index constitute the block number sequence of the file.

        Insert image description here

File storage space management

In order to realize the management of free storage space, the system should record the status of free storage space in order to implement storage space allocation.

The following introduces several commonly used free storage space management methods.

  • free file table method

    A continuous free area on a file storage device can be regarded as a free file (also called a blank file or a free file). The free file table method creates a separate directory for all free files, and each free file occupies an entry in this directory. The contents of the entry include the first free block number, physical block number and number of free blocks .

    Insert image description here

    When a user requests to allocate storage space, the system scans the free file directory in sequence until it finds a free file that meets the requirements. When the user revokes a file, the system reclaims the space occupied by the file. At this time, it is also necessary to sequentially scan the free file directory to find an empty entry, and fill the first physical block number of the released space and the number of blocks it occupies into this entry.

    This free file directory method is similar to the management of dynamic partitions of memory.When the number of blocks requested is exactly equal to the number of free blocks in a directory entry, all of these blocks are allocated to the file and the entry is marked as empty. If the number of blocks in the entry is more than the number of blocks requested, the excess block numbers are left in the table and the entries in the entry are modified. Similarly, during the release process, if the released physical block number is adjacent to the physical block number in a directory entry, free files will also be merged.. This method only works well when there are only a few free files in the file storage space. If there are a large number of small free files in the storage space, the free file directory will become very large and its efficiency will be greatly reduced. This management method only works for contiguous files .

  • free block linked list method

    The free block linked list method links all free blocks on the file storage device together to form a free block chain, and sets a head pointer to point to the first physical block of the free block chain.. When the user creates a file, several free blocks are taken from the head of the chain and allocated to the file as needed. When a file is revoked, its storage space is reclaimed, and the reclaimed free blocks are sequentially linked into the free block list. You can also change the free disk blocks in the linked list to free disk extents (each free disk zone contains several consecutive free disk blocks). Such a chain is called a free disk extent chain. Among them, in addition to the pointer used to indicate the next free extent, each extent should also contain information that can indicate the size of this extent. The method of allocating extents is similar to the dynamic partition allocation of memory, usually using the first-fit algorithm. When reclaiming extents, the reclaimed areas must also be merged with adjacent free extents.

  • Bitmap method

    The bitmap method creates a bitmap for file storage. In the bitmap, each binary bit corresponds to a physical block,If a bit is 1, it means that the corresponding physical block has been allocated; if it is 0, it means that the corresponding physical block is free.

    Insert image description here

    When a request is made to allocate storage space,The system sequentially scans the bitmap and finds a set of binary bits with a value of 0 as needed, and then through simple conversion, the corresponding disk block number can be obtained, then change these bits to 1. When reclaiming storage space, you only need to clear the corresponding bit of the bitmap to 0.

    The size of the bitmap is determined by the size of the disk space (the total number of physical blocks). Because the bitmap uses only one binary bit to represent a physical block, it is usually small and can be saved in main memory, which makes the allocation of storage space and recycling are faster. However, when implementing this method, it is necessary to convert between the binary position in the bitmap and the disk block number .

  • Group linking method (UNIX file storage space management method)

    The group linking method is suitable for large file systems. This method divides all free blocks of a file into groups of 100 blocks each, and records the number of disk blocks in each group and all the disk block numbers of the group into the first disk block of the previous group. The number of disk blocks in the group and all disk block numbers in the first group are recorded in the super block . In this way, the first disk block in each group is linked into a linked list, and multiple disk blocks in the group form a stack. The first block of each group is a stack that stores the block number of the next group. The stack is a critical resource and can only be accessed by one process at a time, so the system sets a lock to access it mutually.

    Insert image description here

    • Method to allocate free blocks.When the system wants to allocate free blocks to a file, it first searches for the number of disk blocks in the first group. If there is more than one disk block, it decreases the number of free disk blocks in the super block by 1 and allocates the disk block at the top of the stack.If there is only one block left in the first group (the block where the disk block number and disk block number of the next group are placed, not a free block) and the disk block number on the top of the stack is not the end mark 0 (it means that this group is not the last group) ), first read the content of the block into the super block (the next group forms the first group, so the number of disk blocks and the disk block number of the next group need to be placed in the super block), and then allocate the block (The information in this block is no longer useful, and this block has become a free block): If the disk block number at the top of the stack is the end mark 0, it means that the disk has no free disk blocks and the allocation was unsuccessful .
    • Method for recovering idle disk blocks. When the system reclaims free blocks, if the first group is less than 100 blocks, it only needs to put the block number of the free disk block on the top of the stack of the super block's free disk block, and add 1 to the number of free disk blocks in it. If the first group already has 100 blocks, first write the number of disk blocks and the disk block number in the first group to the free disk block, and then set the "number of disk blocks = 1 and the block number on the top of the stack = the free disk block". "Disk block number" is written into the super block (the free disk block becomes the new first group, and the original first group becomes the second group) .

    The group link method occupies a small space, and the super block is not large and can be placed in the memory. This allows most of the work of allocating and recycling free disk blocks to be performed in the memory, improving efficiency.

4.3 Disk organization and management

4.3.1 Disk structure

The physical structure of the disk

Disk is a typical direct access device, which allows the file system to directly access any physical block on the disk. A disk drive generally consists of several disks that can rotate at high speed in a fixed direction. Each disk surface corresponds to a magnetic head, and the magnetic arm can move along the radius. A series of concentric circles on the disk are called tracks. The tracks are divided into sectors of equal size along the radial direction. All the tracks on the disk that are a certain distance from the center of the disk form a cylinder. Therefore, each track on the disk Physical blocks can be represented by cylinder numbers, head numbers, and sector numbers.

Insert image description here

information in disk structure

  • Boot control block. Usually it is the first block of the partition. If the partition does not have an operating system, it will be empty.
  • Partition control block. This includes detailed information about the partition, such as the number of blocks in the partition, the size of the blocks, the number of free blocks and pointers, etc.
  • Directory Structure. Use directory file organization.
  • File control block. This includes file information such as file name, owner, file size, and data block location.

Disk access time T a

Access time T a = seek time + rotation delay + transmission time

  • Seek time T s

    After the disk receives the read command, the time required for the magnetic head to move from the current position to the target track position is the seek time T s . This time is the sum of the time s to start the magnetic arm and the time it takes for the magnetic head to move n tracks. m is the time required to move each track.

    Ts=s+n×m

  • Rotation delay T r

    The time required to rotate the disk and locate the sector where the data is located is the rotation delay T r . Assuming the rotation speed of the disk is r, then

    Tr=(1/r)/2=1/(2r)

    The physical meaning of T r is the time it takes for the disk to rotate half a revolution.

  • Transmission time T t

    The time to read data from the disk is the transmission time T t . The transfer time depends on the number of bytes read and written each time b and the rotation speed of the disk, that is

    T t =b/(rN)

    r is the rotation speed; N is the number of bytes on a track.

4.3.2 Scheduling algorithm

A disk is a device that can be shared by multiple processes. When multiple processes request access to the disk, an appropriate scheduling algorithm should be used to minimize the average access time ( mainly seek time ) of each process to the disk.

First come, first served algorithm

The FCFS algorithm is the simplest disk scheduling algorithm.This algorithm schedules processes in the order in which they request access to the disk.. The algorithm is characterized by being reasonable and simple, but it does not optimize seeking.

Shortest seek time first algorithm

The algorithm selects the request closest to the track where the current head is located as the next service object.. The seek performance of this algorithm is better than that of the FCFS algorithm, but it cannot guarantee the shortest average seek time, and may cause the requests of some processes to be preempted by the requests of other processes and not be served for a long time (this phenomenon is called "starvation"). ".

Scanning Algorithm or Elevator Scheduling Algorithm

algorithmSelect the request closest to the track where the current head is located in the current moving direction of the magnetic head as the next service object.. Because the law of head movement in this algorithm is quite similar to the operation of an elevator, it is also called an elevator scheduling algorithm. The algorithm has better seeking performance and avoids the "starvation" phenomenon, which is unfair to two requests (usually requests from both ends are served last).

Loop scan algorithm

The algorithm is an improvement on the SCAN algorithm, which stipulates that the magnetic head moves in one direction, for example, move from the inside out, and when the head moves to the outermost track, it immediately returns to the innermost track, and scans in a loop. This algorithm eliminates the unfairness of track requests at both ends.

Insert image description here

Insert image description here

4.3.3 Disk management

Disk formatting

A new disk is simply a blank disk containing magnetic recording material. Before a disk can store data, it must be divided into sectors so that the disk controller can read and write them, a process called low-level formatting . Low-level formatting uses a unique data structure for each sector of the disk. The data structure of each sector usually consists of a header, a data area (usually 512B) and a tail. The header and trailer contain some information used by the disk controller.

In order to use a disk to store files, the operating system also needs to record its own data structures on the disk .

  • Divide the disk into partitions consisting of one or more cylinders (common C drive, D drive, etc.).
  • To logically format the physical partition (create a file system), the operating system stores the initial file system data structures on the disk, which include free and allocated space and an initially empty directory.

boot block

When the computer starts, it needs to run an initialization program (bootloader), which initializes the CPU, registers, device controllers, memory, etc., and then starts the operating system. To do this, the bootloader should find the operating system kernel on disk, load it into memory, and go to the initial address to start running the operating system.

The bootloader is usually stored in ROM. In order to avoid the problem of changing the bootloader code that requires changing the ROM hardware, only a small bootloader is kept in the ROM, and the fully functional bootloader is saved in the boot block of the disk. On the disk, the boot block is located in a fixed location on the disk. A disk with a boot partition is called a boot disk or system disk.

bad sectors

Because hardware has moving parts and poor fault tolerance, it is easy to cause one or more sectors to become corrupted. There are various ways to handle these blocks depending on the disk and controller used.

  • For simple disks, such as integrated electronic drives (IDE), bad sectors can be handled manually. For example, the Format command of MS-DOS will scan the disk to check for bad sectors when performing logical formatting. Bad sectors are marked on the FAT and therefore are not used by programs.
  • For complex disks, such as Small Computer System Interface (SCSI), the controller maintains a linked list of disk bad blocks. This linked list is initialized during low-level formatting at the factory and is continuously updated throughout the disk's lifetime. Low-level formatting reserves some blocks as spares, transparent to the operating system. The controller can logically replace bad blocks with spare blocks, a scheme called sector sparing.

Guess you like

Origin blog.csdn.net/pipihan21/article/details/129809281