2024 Postgraduate Entrance Examination 408-Operating System Chapter 4-File Management Study Notes

Article directory

1. File system basics

1.1. First introduction to file management

image-20230716003325187

Five questions :

Question 1: There are various files stored in the computer. What tree shapes does a file have?

Question 2: How should the data inside the file be organized?

Question 3: How should the files be organized?

Question 4: Looking from the bottom up, what functions should the OS provide to facilitate users and applications to use files?

Question 5: Looking from top to bottom, how should file data be stored in external memory (disk)?


1.1. File attributes

文件名: The file name is determined by the user who created the file, mainly to facilitate users to find the file. Files with the same name are not allowed in the same directory.

  • There is no way to tell which file each is using the file name. Behind the scenes, the operating system sets an identifier for each file.

image-20230716004541050

标识符: Each file identifier in a system is unique and unreadable to users. The primary identifier is just an internal name used by the operating system to distinguish each file.

类型: Specifies the type of file.

  • The operating system can set a default opening application for different types of files. For example, txt files are opened by Notepad by default.

image-20230716004623378

位置: The path where the file is stored (for users to use), the address in external memory (for operating system use, not visible to users).

  • When we double-click to open this file, the operating system needs to read the file data from external storage into memory, so the operating system page cares about the location of the file in external storage.

大小: Specifies the file size.

创建信息、上次修改时间文件所有者信息

保护信息: Access control information to protect files

  • The operating system actually groups each user. Users in different groups have different operating permissions for files. The protection information of the file allows the operating system to better protect the security of a file.

image-20230716004637336


1.1.2. How should the data inside the file be organized (unstructured vs. structured)

无结构文件: Such as text files, consisting of some binary or character streams, also known as "streaming files".

image-20230716131256882

有结构文件: Such as a database table, which consists of a group of similar records, also known as a "record file".

  • A record is a collection of related data items, such as name, student number, and gender.
  • Data items are the most basic data units in files.

image-20230716131414593

Here is the mind map:

image-20230716131539864

In a structured file, how should each record be organized? Should it be stored in sequence or in index table record order?


1.1.3. How should documents be organized?

Windows uses a directory format, with the top level being the root directory:

image-20230716131718953


1.1.4. What functions should the operating system provide upwards?

Actual examples of four system calls :

①Create system call : When we right-click - New - Folder in the file manager, the graphical interactive process actually calls the "create system call" behind the scenes.

image-20230716132126677

②read system call : When we open a text file, the data of the file will be read into the memory and let the CPU process it. This reading process actually calls the read system call to read the file data from external storage into memory and display it on the screen.

③write system call : You can "write files". When we edit in the open text document, the copy in the memory is actually modified. When the save button is clicked, the Notepad application will actually go to the operating system to provide "Write file", that is, the write system call, writes the file data from the memory back to the external memory.

image-20230716132139728

④Delete system call : Select the file, right-click and click Delete. At this time, the file data will be deleted from the external memory through the delete system call provided by the operating system through a graphical interactive process.

image-20230716132522294

image-20230716132047968

The above gives the most basic functions of the operating system for files. Based on these functions, we can implement some more complex functions such as copying files, cutting and other operations.

  • Copy function: involves creating a file, reading the source file into memory, and then writing the data in memory to a new file.

1.1.5. Looking from top to bottom, how should files be stored in external storage?

Like memory, external memory is also composed of storage units. Each storage unit can store a certain amount of data (such as 1B ), and each storage unit corresponds to a physical address.

image-20230716133817316

External memory is divided into disk blocks . The operating system allocates storage space for files in "block" units. Therefore, even if a file is only 10B in size, it still needs to occupy a 1KB disk block. The data in the external memory is read in. Memory is also measured in blocks .

  • The blocks of external memory here are actually similar to the fact that memory is divided into "memory blocks", and the external memory is divided into "blocks/disk blocks/physical blocks". The size of each disk block is equal. Each block generally contains an integer power of 2 addresses. In the figure below, a block is 2 10 addresses, which is 1KB.
  • The number of bits in an address within a block depends on the size of the disk block.

The file logical address can also be divided into: (logical block number, address within the block).

The operating system also needs to convert the logical address into the physical address of the external memory: (physical block number, intra-block address).

The questions to explore are as follows : Is the file data placed in contiguous disk blocks? What is the order between the various disk blocks? How does the operating system manage free disk blocks?

image-20230716133947715

The physical structure of a file discusses how the file data should be physically stored and organized. The logical structure of the file mentioned earlier refers to what each record of the file should look like logically. Organizational relationship issues.


1.1.6. Other file management functions that need to be implemented by the operating system

文件共享: Enables multiple users to share a file.

文件保护: How to ensure that different users have different operating permissions on files.

After that, we will mainly discuss it in conjunction with the practical application of the Widnows operating system.


Knowledge review and important test points

image-20230716134118940


1.2. Logical structure of files

image-20230716134808728

Glossary :

  • Logical structure: refers to how the data inside the file is organized from the user's perspective.
  • Physical structure: refers to how the file data is placed in external storage from the perspective of the operating system.

Practical example :

  • Logical structure: The linear table in the data structure is a logical structure. From the user's perspective, the linear table is a sequence of elements with a sequential relationship, such as a, b, c, d, e.
  • Physical structure: For linear tables, this linear structure can be implemented with different physical structures, such as sequence lists and linked lists. Each element in the sequence list is logically adjacent and physically adjacent. Each element in the linked list does not need to be physically adjacent, so the sequential list can achieve "random access", but the linked list cannot achieve "random access".

1.2.1. Unstructured files and structured files (involving fixed-length and variable-length records)

According to whether files have a structure , they can be divided into unstructured files and structured files.

无结构文件: The data inside the file is composed of a series of data or character streams, also known as "streaming file", such as .txt files in Windows systems.

image-20230716135222348

有结构文件

Structured file: It consists of a group of similar records, also known as "record file". Each record consists of several data items. For example: In the database table, each record has a data item that can be used as a keyword. For example, the student ID in the picture below can identify the ID of different records.

image-20230716135358939

Depending on whether the length (storage space occupied) of each record is equal, it can be divided into : fixed-length records and variable-length records.

①For fixed-length records, as shown below, we can set a specified length for each data item:

image-20230716135632882

②Variable-length records: If some fields in the records we store are of uncertain length, they are called variable-length records:

image-20230716135744715


1.2.2. Sequential files (string structure, sequential structure)

The following is divided into three categories according to how each record in a structured file is logically organized: sequential files, index files, and index sequential files.

Sequential file : The records in the file are arranged sequentially (logically) one after another. The records can be fixed-length or variable-length. Individual records can be physically stored sequentially or in a chain.

image-20230716140121285

  • Sequential storage: similar to a sequential table, logically adjacent and physically adjacent.
  • Linked storage: similar to a linked list, logically adjacent but not physically adjacent.

Depending on whether the record is sorted in the order of keywords, the sequential file can be divided into : string structure and sequential structure.

image-20230716140202747

Ask a question : If you already know the starting address of the file at this time, it is the location where the first record is stored.

Thinking 1: Can you quickly find the address corresponding to the i-th record? That is, random access can be achieved.

Thought 2: Can you quickly find the location where the record corresponding to a certain keyword is stored?

image-20230716141252597

Characteristics of variable-length records : You can see that the location of a specific conditional record can only be inferred based on the record length given by the previous record.

image-20230716141134764

Characteristics of fixed-length records: The length of each record is fixed, so we can directly perform random access

image-20230716141211206

Conclusion :

1. Physically, chain storage is used, and random storage cannot be achieved.

2. Physically, sequential storage is used. Variable-length records cannot achieve random access, while fixed-length records can achieve random access. Among them, the string structure cannot quickly locate the record corresponding to a certain keyword because the order is not based on the keyword. Sorted, so we cannot locate the record corresponding to a certain keyword, but the sequential structure can be quickly located, and half search can be achieved.

Note : For the exam, "sequential file" refers to a sequential file that is physically stored sequentially. It is difficult to add/delete a record for a sequential file, while the string structure is relatively simple.

  • The reason why it is difficult to add/delete a record in a sequential file: We need to ensure that the sequential file is sorted by keywords.
  • The string structure is relatively simple. The reason: since the string structure is not sorted according to keywords, you can just delete it directly or add it to the end.

1.2.3. Index file (index table)

Reason for occurrence : For variable-length record files, if you want to find the i-th record, you must first find the previous i-1 records. Then for some scenarios, variable-length records will be used, so how do we solve this problem quickly? Positioning problem?

Solution : We can design an index table. In the index table, we maintain an index number to achieve fast retrieval . The length m refers to the length of the corresponding record pointing to the logical file . There is also a field that is the pointer ptr used to point to the corresponding record. logical file .

  • For example, each entry in the index table occupies 4 bytes, so one entry in the index table is 12 bytes.

image-20230716143005138

The index table itself is a sequential file of fixed-length records. By sorting the index numbers in the order of keywords, you can quickly find the index entry corresponding to the i-th record. It can also support half search based on keywords.

Add/delete function: If you want to add/delete a record, you need to modify the index table.

Application scenario: Because index files have a very fast retrieval speed, they are mainly used in situations that require high timeliness of information processing.

Other designs that do not use index numbers as keywords to create index tables separately: Multiple index tables can be created by merging different data items. For example, in the student table, you can use the keyword "student number" to create an index table, or create an index table by name (duplicate names are not recommended) to quickly retrieve files.


1.2.4. Index sequence file (optimized index file, grouped index table—>multi-level index)

The disadvantages of the index file are : Since the index file has an index entry for each record, the index table corresponding to a record table will be very large.

  • For example: each record of the file occupies 8B, and each index table entry occupies 32 bytes, then the index table is at most 4 times larger than the file content itself, so the utilization of storage space is too low.

Idea : An index table is also used, but this index table is not arranged sequentially according to keywords and corresponds to each record in the logical file. Instead, it is grouped according to the characteristics in the logical file. Each group is a For sequential files, the records within the group do not need to be sorted by keywords.

image-20230716144114982

  • For example, if there are 1,000 records and a group of 100 records, then we can divide it into 10 groups, and then the index table will only have ten records.

Index sequential file : It is a combination of the ideas of index file and sequential file. In the index sequence table, an index table is also created for the file, but the difference is that not each record corresponds to an index table entry, but a group of records corresponds to an index table entry.

Question : Using this strategy can indeed make the index table slimmer, but will it cause the problem of slow retrieval of sequential files with variable-length records?

Prove it with an example :

image-20230716144445769

  • Using index files : If a sequential file has 10,000 records, and the file is retrieved based on keywords, the average search time for retrieving the index table will be 5,000 times. (Find the number of records for each record, add them all up and divide by the total length).
  • Adopt an index sequential file structure : 10,000 records can be split into 100 groups, one group of 100, then first search the index table groups in order (100 groups, the index table length is 100, an average of 50 times is required), and then determine After grouping, search for records in the group (100 records per group, average search 50 times). At this time, the total average number of searches is reduced to 50+50=100 times.

At this time, a question is raised : If the file has 10 6 records, divided into 1000 groups, each group has 1000 records, then retrieving a record based on keywords requires an average of 500+500=1000 searches, so the number of searches is still very high, how to solve it ?

  • Use **[multi-level index sequential file]**.

Optimization idea : You can implement a secondary index table. For 10 6 record files, you can first create a low-level index table for the file, with every 100 records as a group. At this time, there are a total of 10,000 entries in the low-level index table. The 10,000 fixed-length records are grouped again, and a top-level index is created for each group of 100. At this time, the top-level index has a total of 100 entries.

image-20230716144919231

In the end, we queried a table of 10 6 and only needed 150 times to find it.


Review of knowledge points and important test points

image-20230716145354750


1.3. File directory

image-20230716152200444

image-20230716152431625

image-20230716152612713

To implement the file directory function, a very key data structure is required: 文件控制块.

With the development of operating systems: a variety of operating systems have emerged 结构目录.

索引节点: An optimization for file control blocks.


1.3.1. File control block

First, when we click on the D drive, we can see that there are many file directories and text documents. In fact, these are all recorded in a file directory table :

image-20230716153430190

When you double-click to open the photo folder in the directory, the operating system will query the directory file in the root directory of the D drive. After finding the directory entry corresponding to the photo file, it will read the photo from the external storage based on the file storage location recorded in the directory entry. , the data in this directory file, so that you can know what content is displayed in the entire directory.

Similarly, the directory files corresponding to photos are also composed of directory items one by one. The following is the directory file corresponding to the "photos" directory:

image-20230716153455157

A record in a directory file is a " 文件控制块(FCB)".

  • Then the actual ordered collection of FCBs is the so-called "file directory", and an FCB is a file directory entry. Obviously, each file will correspond to an FCB.

FCB content : includes basic file information (file name, physical address, logical structure, physical structure, etc.), storage control information (whether it is readable/writable, list of prohibited users, etc.), usage information (such as the creation time of the file) ,Change the time).

FCB implementation function : It realizes the mapping between file names and files, so that users (user programs) can realize "access by name".


1.3.2. Functions of file directories

Contains the functions of searching, creating files, deleting files, displaying directories, and modifying directories.

What actually happens when we perform corresponding operations on a directory?

image-20230716160044698

  • Search: When a user wants to use a file, the system searches the directory based on the file name and finds the directory entry corresponding to the file.
  • Create a file: When creating a new file, you need to add a directory entry to the directory to which it belongs.
  • Deleting files: When deleting a file, the corresponding directory entry needs to be deleted in the directory.
  • Display directory: Users can request to display the contents of the directory, such as displaying all files and corresponding attributes in the directory.
  • Modify the directory: Certain file attributes are stored in the directory, so changes in these attributes require modification of the corresponding directory entries. (eg: file renaming).

1.3.3. Directory structure

①Single-machine directory structure

Early operating systems did not support multi-level directories. Only one directory table was created in the entire system, and each file occupied one directory entry.

image-20230716160535254

Duplicate file names are not allowed in a stand-alone directory structure.

When creating a file : You need to first check whether there are files with the same name in the directory table. Only after confirming that there are no files with the same name can the file be created, and the directory entry corresponding to the new file be inserted into the directory table.

Problem : If used by many users, the file names will inevitably be repeated, so the single-machine directory is not suitable for multi-user operating systems.


②Two-level directory structure

Early multi-user operating systems : adopted a two-level directory structure, divided into a master file directory (MFD, Master File Directory) and a user file directory (UFD, User File Directory).

Composition : The user file directory consists of the user's file FCB.

image-20230716161237624

Features provided :

  1. Files of different users are allowed to have the same name. Although the files are the same, they correspond to different files.
  2. This directory structure can implement access restrictions, that is, when accessing a certain file, check to see if it is the corresponding user. If not, then you have no access rights.

③Multi-level directory structure (tree directory structure)

image-20230716162645198

Access path : When a user (or user process) wants to access a file, he or she must use the file path name to identify the file. The file path name is a string. Directories at all levels are separated by "/". The path starting from the root directory is called an absolute path .

  • For example: the absolute path of selfie.jpg is "/photos/2015-08/selfie.jpg".

Search process : The system finds the lower-level directories layer by layer based on the absolute path. ① Just start reading the directory table of the root directory from the external storage; ② After finding the storage location of the "Photos" directory, read the corresponding directory table from the external storage; ③ Find the storage location of the "2015-08" directory, and then read it from the external storage. The external storage reads the corresponding directory table; finally the storage location of the file "selfie.jpg" is found. The whole process requires three disk reading operations.

  • Each time the directory table is called out from external storage.

Note : In many cases, users will continuously access multiple files in the same directory, such as reading multiple photo files in the "2015-08" directory. Obviously, the search will start from the root directory every time, which will be very inefficient, so You can set one 当前目录.

Example : For example, if the directory file is currently in "Photos", and the directory table has been transferred into the memory, you can set it as the "current directory". When users want to access a file, they can use a "relative path" starting from the current directory.

For example, in the Linux system: you can use "./" to indicate the current path, and you only need to query the photo "catalogue table" in the memory to get to the storage location of the "2015-08" catalog table, and load the directory from external storage. You can know the location where "selfie.jpg" is stored.

  • The relative path is: " ./2015-08/自拍.jpg"

Benefits : By using "current directory" and "relative path", the number of disk I/O is reduced, which improves the efficiency of file access.

Existing problems : Although it is easy to classify files, have clear hierarchies, and effectively manage and protect files, the tree structure is not convenient for sharing files.

  • Proposed solution: acyclic graph directory structure.

④Acyclic graph directory structure

image-20230716163518885

Essence : A certain file entry in the file directory table of multiple users directly points to the same file (all contents in the same directory are shared).

How to delete shared files, if there are multiple user files pointed to?

  • At this time, we set a sharing counter for each shared node to record how many places are sharing the node at this time. If there are currently two user directory file entries pointing to this shared file, then the counter is 2.
  • When a user makes a request to delete a node, only the user's FCB will be deleted and the sharing counter will be -1. The shared node will not be deleted directly.

image-20230716163952297

**When are shared files actually deleted? **The node will be deleted only when the sharing counter decreases to 0.

Note : Sharing a file is not the same as copying a file. In a shared file, each user points to the same file, so as long as one of the users modifies the file data, all users can see the changes in the file data.

  • If you copy a file, the files pointed to by each user are independent copies. Modifying the content of a file will not affect the original files of other users.

1.3.4. Index node (improvement of FCB)

The file directory is actually composed of multiple FCB blocks. Since each FCB contains multiple information, when we actually search and search, except for the file name, we do not pay attention to other things, so when searching If all the entries in the entire directory table are directly read during the process, there will be a large I/O loss. Can we create a query index, locate the entire file entry separately through the index position, and then read other entries? information?

  • At this point we launched 索引结点.

Index node : We put all redundant entries except file names into this table.

Process : We will first read the corresponding file index table, which records all file names and index node pointers in the corresponding file directory, and try to match all file names in it. If the match is successful, then based on the index node pointer The index node table is read from external storage. At this time, all other redundant fields of the file entry can be read in the index node table.

image-20230716164812381

Actual case comparison :

①Do not use index nodes: Assume that an FCB is 64B and the disk block size is 1KB. Then each disk block can store 16 FCBs. At this time, if there are 640 directory entries in a file directory, 40 disks will be occupied. block, so to search the directory based on a certain file name, an average of 320 directory entries need to be queried (16x40/2=320), and the disk needs to be started 20 times on average (320/16=20, each disk I/O reads one block).

  • 20次IO。

② Use the index node mechanism: the file name occupies 14B, and the index node pointer occupies 2B, then each disk block can store 64 directory entries (file name + index node = 16B, 1KB/16B = 64), then according to, There are 640 items in a file directory. On average, 320 directory items need to be queried (16x40/2=320), so 5 disk blocks are needed (320/64=5).

  • 5次IO。

Process : When the directory entry corresponding to the file name is found, the index node is transferred into the memory. Various information about the file is recorded in the index node, including the storage location of the file in external memory, and finally based on the physical location in the index node File found.

Conclusion : By using the index node mechanism, only the file name + index node is read each time, which greatly reduces the amount of data read, so our disk IO will be less and the performance will be better.

Noun description :

  • 磁盘索引结点: Index node stored in external memory.
  • 内存索引结点: After the index node is placed into memory. Some information is added to the memory index node, such as whether the file has been modified multiple times, how many processes are accessing the file at this time, etc.

Knowledge review and important test points

image-20230716173218144


1.4. Physical structure of files

image-20230716173930133


1.4.1. File distribution method

image-20230716174035341

The main discussion is how file data should be stored in external memory?


Understand file blocks and disk blocks

First, let’s take a look at the external memory structure. The external memory structure is similar to the main memory. The storage unit in the disk will also be divided into blocks :

image-20230716174611784

Note : In most operating systems, disk blocks are the same size as memory blocks (pages).

The interaction between actual memory and external memory is carried out in units of "blocks", and one block is read/written at a time :

image-20230716174753583

  • The actual size of a piece of disk is the same as the size of a piece of memory!

In memory management, the process logical address space is divided into pages. In external memory management, in order to facilitate the management of file data, the logical address space of the file is also divided into file "blocks" one by one.

The logical address of the file represents:(逻辑块,块内地址)

image-20230716175054467

The operating system allocates storage space units for files : .

Usage process : The user uses (logical block number, block address) to operate his own file. The operating system converts the logical block number and block address provided by the user into the physical block number and block address actually stored in the file block.


Method 1: Continuous allocation

At this time, the physical structure of the file focuses on the core issue: how to map logical block numbers to physical block numbers.

The idea of ​​continuous allocation method : each file is required to occupy a set of consecutive blocks on the disk.

  • For example: file aaa is logically divided into three blocks. If continuous allocation is used, logically adjacent blocks must also be physically adjacent and must occupy a group of consecutive blocks.

image-20230716175858522

For example, the logical block number to be searched is: (aaa, 2), then the physical block number = 4 + 2 = 6, and the physical block number address can be determined at this time.

image-20230716180419157

Search process : The user gives the logical block number to be accessed, the operating system finds the corresponding directory entry (FCB) of the file, then determines the starting block number, and then the starting block number + the address within the block can determine the physical address.

  • During the search process, it is actually necessary to check whether the logical block number provided by the user is legal: logical block number >= length means illegal.

Advantage 1 : The continuous allocation method supports sequential access and direct access (ie random access).

  • Sequential access: means that if I want to access block No. 2, I must first find block No. 1 before I can find it.
  • Random access: It means that if I want block No. 2, I can complete the positioning directly without relying on the block number before block No. 1.

Advantage 2 : Using continuously allocated files is the fastest in sequential reading/writing.

When we read the yellow block, because it is stored continuously, the distance the magnetic head moves is very short and the time it takes is also very short; when we read the purple block, because it is discontinuous, it takes a long time to read a block. The longer the head distance is, the longer it will take.

image-20230716180743647

Disadvantage 1 : Physically, continuously allocated files are inconvenient to expand.

The scenario is as follows: If the following three yellow blocks want to be expanded, since the continuous allocation strategy must ensure that the blocks are continuous, then the three blocks of data can only be migrated to the green area as a whole at this time. Extended by 1 bit.

image-20230716181255902

Disadvantage 2 : Physically continuous allocation is used, the storage space utilization is low, and hard-to-use disk fragments will be generated.

The scenario is as follows: At this time, there are two areas in the disk. The orange area is non-free and the green area is free. No block in the green area is continuous and they are all discrete. So in this scenario, if you create a new file, you need to 3 blocks, insufficient storage space cannot be allocated at this time, and only a few blocks can be wasted.

image-20230716181628294

Solution: You can use compaction to deal with fragmentation, but it takes a lot of time.

Continuous allocation summary :

  • Advantages: Supports sequential access and direct access (that is, random access); continuously allocated files are the fastest when accessed sequentially.
  • Disadvantages: Inconvenient file expansion, low storage space utilization, and disk fragmentation.

Method 2: Link method

implicit link

The link allocation method does not support random reading, but can only read sequentially. The corresponding chain link is as shown in the figure below :

image-20230716194010033

The corresponding file directory entry will have an additional end block number field. In the implicit link, the starting block number and ending block number of a file will be specified. Then, it will read from the disk block with the starting number until reading. Ending block number terminates:

image-20230716194058903

To realize the conversion from logical block number to physical block number : first, the user gives the logical block number that he wants to access, the operating system finds the directory entry (FCB) corresponding to the file, then finds the starting block number from the directory entry, and then according to each The block points to the end block number for sequential access.

Number of disk I/Os : Reading logical block number i requires i+1 disk I/Os.

  • For example, in the picture above: 9->2->23->14->16, then for No. 23, we must get the address of the ending block number No. 2 from block No. 9, and then after reading block No. 2, The 23rd block number can be read only after reading the end block number. When this is the third logical block, it takes 3 times.
  • Note: The reason why reading the logical block i above requires i+1 disk I/Os is that the block No. 0 will be read first, so i+1 times are required.

Conclusion : Files using chain allocation (implicit linking) only support sequential access, not random access, and the search efficiency is low. In addition, the pointer to the next disk block also consumes a small amount of storage space.

Is it convenient to expand the file in this way?

  • If you want to expand the file at this time, you can just find a free disk block, hang it at the end of the file's disk block chain, and modify the FCB of the file.

For example: Let's say that the aaa file extends a disk block to 8. At this time, the following changes only need to modify the next address corresponding to block 16 and the end block number in the file directory record item to be modified to 8.

image-20230716195136606

Conclusion : The implicit link allocation method is very convenient for file expansion. In addition, all free disk blocks can be utilized, there will be no fragmentation problems, and the external memory utilization rate is high.

Advantages : It is very convenient to expand files, there will be no fragmentation problems, and the external memory utilization rate is high.

Disadvantages : Only sequential access is supported, random access is not supported, search efficiency is low, and the pointer to the next disk block also consumes a small amount of storage space.


explicit link

显式链接: Explicitly store the pointers used to link each physical block of the file in a table, that is 文件分配表(FAT,File Allocation Table). Only one file allocation table will be created for a disk. The file allocation table will be put into the memory when it is turned on and will remain in the memory.

The format of the file directory entry is as shown in the figure below. Compared with the previous link allocation, it only contains the starting block number:

image-20230716195909723

Example 1 : Assume that a newly created file "aaa" stores disk blocks in order 2->5->0->1, then the corresponding FAT allocation table and disk status are as shown in the figure below

image-20230716200040324

  • Among them -1 means termination.

Example 2 : Assume that a newly created file "bbb" is stored in disk blocks 4->23->3.

image-20230716200134728

文件分配表(FAT): Through a file allocation table, the link information of each disk block of each file is displayed in the same file allocation table.

  • This is why it is called display link.

Note : Only one FAT is set for a disk. When the computer is turned on, the FAT is read into the memory and resides in the memory. Each entry of FAT is physically stored continuously, and the length of each entry is the same, so the "physical block number" field can be implicit (default starts in the order 0, 1, 2...).

How to convert the logical block number of a file to the physical block number?

  1. The user gives the logical block number i to be accessed, and the operating system finds the directory entry (FCB) corresponding to the file.
  2. Find the starting block number from the directory entry. If i>0, query the file allocation table FAT in the memory, and then find the physical block number corresponding to the logical block i. (No disk read operation is required for the process from logical block number to physical block number)

Conclusion : Files using chain allocation (explicit allocation) support sequential access and random access (when you want to access logical block number i, you do not need to access the previous logical block number 0-i-1 in sequence). Since the block number conversion process does not require disk access, the access speed is much faster than implicit linking.

Advantages : Explicit links do not produce external fragments, files can be easily expanded, external memory utilization is high, and random access is supported. Compared with implicit links, there is no need to access the disk during address conversion, so the file is Access efficiency is high.

Disadvantages : The file allocation table requires a certain amount of storage space.


Method 3: Index allocation

Understand the principles of index allocation

Index allocation allows files to be allocated discretely in various disk blocks. The system will create an index table for each file, and the index table records the physical blocks corresponding to each logical block of the file.

  • The function of the index table is similar to the page table in memory management: establishing a mapping relationship between logical pages and physical pages.

noun:

  • 索引块: The disk blocks stored in the index table.
  • 数据块: The disk block where the file data is stored.

For the difference between index blocks and data blocks, let's look at the figure below. Here we assume that the data of a newly created file "aaa" is stored in disk blocks 2->15->13->9, and disk block No. 7 is "aaa "The index block stores the memory of the index table in the index block, which is the table on the right of the black horizontal line.

image-20230716203714576

For data blocks, it is the physical block number pointed to in the index table, and the disk block corresponding to the physical block number is the data block!

Note: In the link allocation method of displaying links, one disk corresponds to one table in the file allocation table FAT. The display index allocation table is one file corresponding to one table.

Is the length of the physical block number fixed?

  • You can use a fixed length to represent the physical block number. For example: assuming that the total capacity of drive C is 1TB = 2 40 B and the disk block size is 1KB, there are a total of 2 30 disk blocks. If you want to represent the serial numbers of all disk blocks, then If 30 bits are required, 4B can be used to represent the disk block number.

  • The disk block number in the index table can be implicit , because the physical block number is of fixed length , then we can easily find the position of the i-th disk block number in the index table.

image-20230716204352873

If you add a specified file and add a disk block, is it difficult to expand?

  • It's very simple. We only need to randomly find a free disk block, and then we directly add a record of logical block number-physical block number at the end of the index table pointed to by the index block in the corresponding file item.

image-20230716205128729

How to convert the logical block number of a file to the physical block number?

  1. The user gives the logical block number i to be accessed, and the operating system finds the directory entry (FCB) corresponding to the file...
  2. The storage location of the index table can be obtained from the directory entry, read the index table from external storage into the memory, and find the physical block number corresponding to the index table starting from the logical block number 0 position.

Advantages: It can be seen that index allocation can support random access, and file expansion is also easy to implement. You only need to allocate a free block to the file and add an index entry.

Disadvantages: Index tables require a certain amount of storage space.


Three index allocation schemes

Problem: If each disk block is 1KB and an index table entry is 4B, then one disk block can only store 256 index entries. If the size of a file exceeds 256 blocks, then one disk block cannot hold the entire index table of the file. , how to solve this problem?

Multiple solutions: ① Link solution. ②Multi-level index. ③Mixed index.


Solution 1: Link solution

Linking scheme : If the index table is too large to fit in one index block, multiple index blocks can be linked together for storage.

For example : Assuming that the disk size is 1KB and one index entry occupies 4B, one disk block can only store 256 index entries.

Linking method : Then the logical block number in the index table is 0-255. We can set a physical address pointing to the next index table at position 255.

image-20230716210036050

Corresponding scenario problem : If we only want to access the 256th physical block number of the file "aaa" at this time, according to the above logic, we need to first read the 255th block from 0 in the 7th index block, and then use the 255th block. Go to the link to the 256 block, and then you can read the data.

  • For example: Assuming that the disk block size is 1KB and one index entry occupies 4B, then one disk block can only store 256 index entries. If the size of a file is 256*256KB=65536KB=64MB, then the file has a total of 256*256index entries. Since one disk block requires 256 index entries, such a large index table requires two disk blocks to represent it. At this time, we need to The last 255 bits in the first index block set the pointer of the next index table, so if we want to read the 256th index block, we need to read the first 255 index blocks first.

Evaluation : This kind of efficiency is very low. How to solve it?

  • Solution: Multi-level index solution.

Option 2: Multi-level index

多层索引: Create a multi-level index (the principle is similar to a page table). Make the first-level index point to the second-level index block, and you can also create a third-level and fourth-level index block according to the file size requirements.

image-20230716235014226

  • The first-level index table cannot exceed one index block, and the second-level index table still cannot exceed one index table.

For example : Assuming that the disk block size is 1KB and an index entry is 4B, then one disk block can only store 256 index entries. If the file can use two-level indexing, the maximum length of the file can reach 256*256*1KB=65536KB=64MB.

At this time, if you need to find which entry in the index table based on the logical block number, for example, you want to access logical block No. 1026:

1026/256=4
1026%256=2

You can first transfer the primary index table into the memory, query the No. 4 table entry, transfer the corresponding secondary index table into the memory, and then query the No. 2 table entry of the secondary index table to know the disk where logical block No. 1026 is stored. Block number.

Number of I/Os: 3 disk I/Os required.

Note : If a multi-level index is used, the size of each level index table cannot exceed one disk block.

If a three-level index is used, the maximum file length is 256*256*256*1KB=16GB. At this time, if the target data block is accessed, 4 disk I/Os are required.

Conclusion : If a K-level index structure is used and the top-level index table is not transferred into memory, accessing a data block requires k+1 disk read operations.

Small question : If a file is inherently small, with data blocks of only 1KB, but due to the physical use of a two-level index, reading 1KB of the file still requires three disk read operations. How to solve this problem?

  • Solution: Use hybrid indexing.

Option 3: Hybrid index

混合索引: Multi-level index allocation combination. For example, the top-level index table of a file contains both a direct address index (directly pointing to the data block), a first-level indirect index (pointing to a single-level index table), and a two-level indirect index (pointing to a two-level index table).

image-20230717000059963

The structure above is indexed by eight direct addresses, which will directly point to 5 data blocks. A first-level indirect index will point to a single-layer index table, and each index table will only correspond to 256 data blocks, so accordingly The secondary index will point to a maximum of 256*256 data blocks.

The maximum data block length of the above result is: 8+256+256*256=65800.

Next, calculate the number of I/O required to access different logical blocks :

  • Prerequisite: If the top-level index table has not been read into memory.
  • Access logical blocks 0-7: read the disk twice. (Read top-level index table, final physical address data block)
  • Accessing logical blocks 8-263: reading the disk three times. ,
  • Accessing logical block 264-65799: four disk reads.

Advantages : For small files, fewer disk reads are required to access the target data block (generally there are more small files in computers).


Index allocation (summary)

image-20230717001047432

Super important test points :

  1. You must be able to calculate the maximum length of the file based on the structure of multi-level indexes and hybrid indexes (key: the index table at each level cannot exceed one block at most).
  2. Able to analyze the number of disk reads required to access a certain data block (key: There will be a pointer to the top-level index block in the FCB, so the top-level index block can be read according to the FCB. Each time the next-level index block is read, Disk reading operations are required in sequence. In addition, pay attention to the question conditions— whether the top-level index block has been transferred into memory ).

Knowledge review and important test points

image-20230717001008763

Note: If the question directly mentions link allocation, then the default is implicit link.


1.4.2. File storage space management

image-20230717134751103

image-20230717135101590

It refers to the management of free disk blocks.


1.4.2.1. Division and initialization of storage space

When installing the Windows operating system, a necessary step is to partition the disk, such as C drive, D drive, E drive, etc.

What is a file volume ?

  • 文件卷: Divide the physical disk into file volumes (logical volumes, logical disks).

Question: The file volume is divided into directory area and file area. What are their functions and what data are stored?

A physical disk is divided into multiple file volumes (actually partitions), and each file volume contains a directory area and a data area.

  • Directory area: The directory area mainly stores file directory information FCB and user disk storage space management information.
  • File area: stores file data.

image-20230717135718348

Some systems support very large files, which can support multiple physical disks to form a file volume.


Management method one: free list method

Suitable for continuous allocation implementation.

**How ​​to record free area? **You can see that there are two fields below. The first field refers to the first free disk block number, and the second field refers to the consecutive blocks starting from the first free disk block number.

image-20230717142615317

How to allocate disk blocks : It is very similar to dynamic partition allocation in memory management. To allocate continuous storage space for a file, the first adaptation, best adaptation, and worst adaptation algorithms can also be used to decide which partition to allocate to the file.

Example : If the first-fit allocation algorithm is used, a newly created file requires 3 blocks:

image-20230717142950238

**How ​​to recycle disk blocks? **Similar to dynamic partition allocation in memory management, four situations are required when reclaiming a certain storage.

1. There are no adjacent free areas before and after the recycling area: add a new free area.

2. There are free areas before and after the recycling area: the front, back and current position will be merged into one free area, and one free area will be reduced at this time.

3. In front of the recycling area is the free area: merging will not cause the number of free tables to change.

4. The recovery area is followed by the free area: merging will not cause the number of free tables to change.

Example : If the occupied blocks 15, 16, and 17 are recycled.

image-20230717143133126


Management method two: free list method

Understand the two free linked list methods and their differences

The free linked list method is divided into two methods : 空闲盘块链, 空闲盘区链.

image-20230717143341715

Difference: One uses disk blocks as units, and the other uses panels as units for linking, as shown in the figure below.

image-20230717143518347

What are the differences between these two methods for disk block allocation and recycling?


①Idle disk chain

The chain head and chain tail pointers are stored in the operating system.

image-20230717143921881

How to allocate : If a file applies for K disk blocks, K free disk blocks will be allocated starting from the chain head, and the chain head pointer of the free chain will be modified.

How to recycle : The recycled disk blocks are hung to the end of the chain in turn, and the chain tail pointer of the free chain is modified.

Usage scenario : Suitable for discretely allocated physical structures.

Disadvantages : Allocating multiple disk blocks to a file may require repeated operations.


②Free extent chain

image-20230717144624101

The operating system stores chain head and chain tail pointers.

How to allocate : If a file applies for K disk blocks, you can use first adaptation, best adaptation and other algorithms to search from the chain head, find a free disk with a size that meets the requirements according to the algorithm rules, and allocate it to the file. If there are no suitable consecutive free blocks, disk blocks from different extents can also be allocated to a file at the same time.

  • Note: The corresponding chain pointer, extent size and other data may need to be modified after allocation.

How to recycle : If the recycling area is adjacent to a free extent, the recycling area needs to be merged into the free extent. If the recovery area is not adjacent to any free area, the recovery area will be hung to the end of the chain as a separate free extent.

Applicable scenarios : Applicable to both discrete allocation and continuous allocation, it is more efficient when allocating multiple disk blocks to a file.

Compare free disk blocks : Compared with free disk block chain, it is more efficient to allocate multiple disk blocks to a file.

  • Free disk block chains can only be linked one by one, while free disk block chains can be linked one area at a time.

Management method three: bitmap method

位示图: Each binary bit corresponds to a disk block. In this example, "0" means that the disk block is free, and "1" means that the disk block has been allocated.

image-20230717145610853

Generally, continuous "words" are used to represent it. For example, in this example, the word length of a word is 16 bits. Each bit in the word corresponds to a disk block, so (font size, bit number) can be used to correspond to a disk block number. For some questions, describe the bits (row number, column number).

Important test point : How to calculate the given (font size, bit number) and disk block number from each other.

The following are calculated based on the disk block, font size, and bit number starting from 0 :

Official :

  • (Font size, bit number)—>Disk block number: (Font size, bit number) = The disk block number corresponding to the binary digits of (i, j) is b = ni + j.
  • Block number -> (font size, bit number): The font size corresponding to disk block b is i = b / n, and the bit number j = b % n.

Examples of actual mutual conversions are as follows :

image-20230717145954277

How to allocate : If the file requires K blocks, ① scan the bitmap sequentially and find K adjacent or non-adjacent "0"s. ②Calculate the corresponding disk block number based on the font size and bit number, and assign the corresponding disk block to the file. ③Set the corresponding bit to "1".

How to recycle : ① Calculate the corresponding font size and bit number based on the recovered disk block number. ② Set the corresponding binary bit to "0".

Applicable scenarios : Suitable for both continuous allocation and discrete allocation.


Management method four: Group link method (applicable to large file systems)

Learn about group chaining

Reason : Free list method and free linked list method are not suitable for large file systems because the free list or free linked list may be too large. The group link method is used in UNIX systems to manage disk free blocks.

成组链接法: A disk block is specifically used as a "super block" in the directory of the file volume. When the system starts, the super block needs to be read into the memory, and the "super block" data in the memory and external storage must be consistent.

image-20230717150459162

The distribution of the corresponding super blocks is shown in the figure below :

image-20230717153054903

In the super block, the first red one records how many free disk blocks there are in the current super block. It actually refers to the 100 disk block numbers 201-300, which are all free block numbers. For 201-300, they all indicate the free block number pointed to.

  • For the first block under the pointed block number, AND is still used to indicate how many free blocks there are in the current block.

Note 1 : The block numbers in a group do not need to be consecutive, they can point to different disk blocks discretely.

Note 2 : If there is no next set of free blocks, set this to a special block. For example, -1 in the upper right corner means there is no free block.


How to distribute? (two cases)

Requirement 1: Need a free block

practice:

  1. First, you need to check whether the disk blocks in the first group are enough for this file. Since the super block has been read into the memory at this time, there is no need to read the disk at this time. You only need to find the super block data in the memory and check it. Whether the number of free disk blocks in the next group is > the number of disk blocks requested at this time (1<100).
  2. At this time, the first block 201 in the super block will be allocated, and the free number on the first block will be -1.

image-20230717153948092

Requirement 2: Allocate 100 free blocks

practice:

  1. Check whether the number of blocks in the first group is enough, 100=10, enough.
  2. At this time, 201-300 will be allocated (100 free blocks in the first group are allocated). However, since the next group of information is stored in block 300, the parts 301-400 will be copied to the original super block (used to save other free blocks).

image-20230717154609706


How to recycle? (two cases)

Recycling case 1 : Assume that each group has a maximum of 100 free blocks. At this time, the first group already has 99 blocks, and one more block needs to be recycled.

Method: At this time, a block will be added to the last position in the super block, and the fast number +1 will be added.

image-20230717154844457

Recycling case 2 : Assume that each group has a maximum of 100 free blocks. At this time, the first group already has 100 blocks, and one more block needs to be recycled.

Method: The data in the super block needs to be copied to the newly recycled block, and the content in the super block needs to be modified so that the newly recycled block is called the first group.

image-20230717155220181


Review of knowledge points and important test points

image-20230717155417540


1.5. Logical structure vs physical structure

image-20230717104454288

Case 1: Use C language to create unstructured files

Case description

Example : Use C language to create unstructured files

image-20230717105654374


Logical structure (user perspective)

image-20230717105803001

If we want to find the 16th character o, we can read this o based on the 16th position pointed to by the logical address:

image-20230717105921957

  • fp points to the file address we want to operate.
  • Each character occupies one byte. You can use fseek to point the file pointer to position 16, which is the 16th byte of the file.

Phenomenon : To the user, all characters are stored continuously and occupy a continuous logical address space. We only need to provide the logical address of the character we want to access in the file to find any data sequentially.


Physical structure (from operating system perspective)

image-20230717110332136

The size of a disk block is 1KB. The entire file will be split into logical blocks. The logical block numbers are 0, 1, 2, and 3. The operating system will then decide whether to use continuous allocation or allocation based on its file management strategy. For other allocation methods, the effect on the disk in the picture above is that of continuous allocation. The continuous allocation method is also continuous in terms of physical structure.

Let ’s analyze the underlying principles of using library functions in C language before :

image-20230717110537507

fgetcThe underlying read system call is used, and the operating system converts (logical block number, intra-block offset) into (physical block number, intra-block offset).

  • The user gives a logical address, and fgetc will convert the logical address passed to him into (logical block number, offset within the block), and then convert the logical block number into the corresponding physical address according to his own storage allocation strategy. block number.

It can also be used to store actual data in physical structures 链接分配方式. The logical address will be converted into (logical block number, start address, end address) for search:

image-20230717110635179

The same method can also be used 索引分配: the operating system maintains an index table for each file, which records the mapping relationship between logical block number -> physical block number.

image-20230717110831490

Note : No matter what allocation method and physical address the operating system uses, as long as we provide the logical address of the file we want to access, the operating system can always convert it into the corresponding physical block number and intra-block offset. .


Case 2: Creating sequential files in C language

Case description

The following is to create a structure and write multiple structure data to a file:

image-20230717111920691

  • For the Student_info structure defined in the above figure, the occupied space is 64B.

Among them, fwrite(参数1,参数2,参数3,参数4)function analysis:

  • Parameter 1: Pointer indicating the data block that needs to be written to the file. This can usually be a pointer to data in memory.
  • Parameter 2: Indicates the size of each data block (in bytes).
  • Parameter 3: Number of data blocks to be written.
  • Parameter 4: Pointer indicating the file to which data is to be written.

Logical structure (user perspective)

From the user's perspective, each record is stored continuously, and each user record occupies the same space.

Next, let’s take a look at how visiting student 5 uses C language functions to implement :

image-20230717112237783

  • The fessk function points the fp pointer to the location of the fifth student record, where the second parameter refers to the logical address of the student 5 record.

Physical structure (from operating system perspective)

In fact, in terms of physical structure, the operating system stores data in physical addresses in blocks:

image-20230717112406316

There are also many options for final data storage: link, index, continuous

image-20230717112504561


Point 1: Sequential files can be stored sequentially and chained.

From a user perspective, we can implement sequential storage and chain storage :

image-20230717114018637

From the perspective of the operating system, the actual physical address can be allocated continuously or chained to be stored on the disk :

image-20230717131233862

PS

  1. For each record in the file, a sequence table or chain storage is used: these are designed by the user who created the file.
  2. The entire file (data blocks in actual physical addresses) is allocated continuously or linked: it is determined by the operating system.

Point 2: Distinguish between index tables created from the user perspective and index tables allocated by operating system indexes

image-20230717131703851

这种索引文件都是由文件的创建者自己来决定的。

流程:首先将文件前的1M字节的索引项全部读出来,接着找到目标学生的索引项,根据这些索引项找到目标学生他的索引项,最终根据这个索引项的信息来确定目标学生存在哪一个逻辑地址中,接着再去实际物理位置读取信息即可。

物理结构:同样索引文件也可以采用多种分配方式进行分配空间

这里来举索引文件采用索引分配的例子:可以看到用户视角与操作系统视角都维护了索引表

image-20230717132817837

要区分索引文件的索引表以及索引分配的索引表区别

  • 索引文件的索引表:用户自己建立的,映射:关键字—>记录存放的逻辑地址。
  • 索引分配的索引表:操作系统建立的,映射:逻辑块号—>物理块号。

总结:作为用户我们自己可以在逻辑上建立我们自己定义的索引表,而在物理结构中操作系统若是采用索引分配那么也会根据实际数据来构建索引表,这两个点要能够进行区分。


知识回顾

image-20230717132158629


1.6、文件的基本操作

image-20230717160212576

image-20230717160157385


1.6.1、创建文件(create系统调用)

创建系统调用:create系统调用

image-20230717161530716

在进行create系统调用时需要提供三个参数

  • 参数1:所需的外存空间大小,如:一个盘块,即1KB。
  • 参数2:文件存放路径,如:“D:/Demo”。
  • 参数3:文件名(默认创建的为"新建文本文档.txt")。

操作系统在进行create系统调用时,主要做的事情

  1. 在外存中找到文件所需空间(如空闲链表法、位示法、成组链接法等管理策略,找到空闲空间)
  2. 根据文件存放路径的信息找到该目录对应的目录文件,在目录中创建该文件对应的目录项,在目录中包含了文件名、文件在外存中的存放位置等信息。

1.6.2. Delete files (delete system call)

Delete file system call : delete system call

image-20230717163905899

Parameters required to initiate the Delete system call :

  • Parameter 1: File storage path, such as "D:/Demo".
  • Parameter 2: File name, such as "test.txt".

When the operating system handles the Delete system call, it mainly does several things :

  1. Find the corresponding directory file according to the file storage path, and find the directory entry corresponding to the file name from the directory .
  2. According to the file storage location, file size and other information recorded in the directory entry, the disk blocks occupied by the file are recycled . [When recycling, different management strategies such as free list method, free linked list method, and bitmap method can be used for different processing]
  3. Delete the directory entry corresponding to the file from the directory .

1.6.3. Open file (open system call)

Open file system call : open system call

In many operating systems, before operating on a file, the user is required to first use the open system call to "open the file". Several main parameters need to be provided:

  • Parameter 1: File storage path, such as "D:/Demo".
  • Parameter 2: File name, such as "test.txt".
  • Parameter 3: The type of operation to be performed on the file, such as: r read-only, rw read-write, etc.

When the operating system handles the open system call, it mainly does several things :

1. Find the corresponding directory file according to the file storage path, find the directory entry corresponding to the file name in the directory, and check whether the user has the specified operation permission.

image-20230717164519278

2. If the user has operation permission, the directory entry will be copied to the "open file table" in the memory, and the number of the corresponding title will be returned to the user. The user will then use the number of the open file table to specify the file to be operated. .

image-20230717164633098

After obtaining the number in the corresponding open file table in memory, you need to check the directory every time, which can speed up file access.

Introducing another open file table :系统打开表

系统打开文件表, after the entire system, some information about all files being used by other processes will be recorded in this open file table. In addition, each process will also have its own open file table, which records the files opened by its own process.

image-20230717165131857

  • There will be a corresponding entry in the process's open file table 系统表索引号, and the index number points to the entry specified in the system's open file table.
  • 读写指针: The read/write pointer records the position of the process's read/write operation on the file.
  • 访问权限: If the process originally declared "read-only" when opening the file, then the process cannot write to the file.

The system's open file table contains one 打开计数器. If multiple processes open the same file, the counter will be +1 on the specified file table entry. As shown in the figure below, two processes open the same file. At this time, the file is opened. The counter is 2:

image-20230717165403317

The function of the open counter : If we delete when the open counter is not 0, it will prompt "The file cannot be deleted temporarily". What the actual system does behind the scenes is to first check the system open file table, and based on the open counter To confirm whether there is a process using the file at this time.


1.6.4. Close the file (close system call)

Close system call : close system call

When the process has finished using the file and wants to "close the file", the operating system mainly does the following things when processing the Close system call:

1. Delete the corresponding entry in the open file table of the process. For example, delete the files of process B as follows

image-20230717165915745

2. Reclaim the memory space and other resources allocated to the file.

3. The open counter of the system open file table is count-1. If count=0, then the system open file table entry will be deleted! ! !


1.6.5. Reading files (read system call)

Read file system call : read system call

Effect : The file data can be read into the memory. When a document is double-clicked, the system's file reading function is triggered, which is the read system call. The file data is read from the external storage into the memory and displayed on the screen.

For example : when you double-click the test text document, the corresponding file directory entry will be found first, and then it will be read into the "Open File Table" of the "Notepad" process.

image-20230717170214652

The process uses the read system call to complete the read operation :

1. At this time, you need to ① indicate which file it is, you also need to indicate ② how much data to read (for example: read 1KB), and ③ indicate where the read data should be placed in the memory.

  • Indicate which file it is: In systems that support the "open file" operation, you only need to provide the index number of the file in the open file table.

2. When processing the read system call, the data of the user-specified size will be read from the external memory pointed to by the read pointer into the user-specified memory area.


1.6.5. File writing function (write system call)

Write file system call : write system call

image-20230717171043426

Function : You can write changed file data back to external storage.

  • For example: Edit the file content in the "Notepad" application. After clicking save, the "Notepad" application uses the "write file" function provided by the operating system, which is the write system call, to write the file data from the memory back to the external storage.

The process uses the write system call to complete the write operation :

1. You need to specify which file (usually the index number of the file table), how much data to write (such as 1KB), and where in the memory the data written back to the external storage is placed.

2. When the operating system processes the write system call, it will write data of the specified size from the memory area specified by the user back to the external memory pointed to by the write pointer.


Knowledge review and important test points

image-20230717171431548

Comparison between opening a file and reading a file: Only when reading a file will the file data be actually read from the external storage into the memory. For read/write operations, the user does not need to provide the file name or file path, only the file descriptor, which is The physical address of the file can be determined by opening the index number in the file table.


1.7. File sharing

image-20230717172110946


1.7.1. Sharing method based on index nodes (hard link)

索引结点: It is a file directory slimming strategy. Since only the file name needs to be used when retrieving files, other information besides the file name can be placed in the index node, so that the directory entry only needs to contain the file name, index Node pointer.

image-20230717182951656

Link counter : A link counter count is set in the index node to represent the number of user directory items linked to this index node. If count=2, it means that there are two user directories linked to the index node at this time or that there are two users sharing this file.

Remove link :

  • count=2: There are currently two users sharing this file. If a user wants to "delete" the file, the directory entry corresponding to the file in the user directory will first be deleted, and the corresponding index node count value will be -1 .
  • count>0: It means that there are other users who want to use the file, and the file data cannot be deleted temporarily, otherwise the pointer will be suspended.
  • count = 0: The system is responsible for deleting files.

When deleting a user shared file as follows :

image-20230717183403098


1.7.2. Sharing method based on symbolic chain (soft link)

When creating a file based on the symbolic link sharing method, the file type is not a target file, but a Link type file, which records the storage path of file 1, such as "C:/User1/aaa", similar to the Windows operating system Shortcut for:

image-20230717183418765

This type of Link method is similar to the shortcut in Windows.

  • The soft link method does not directly point its directory entry to the file index node, but creates a new Link-type file. The storage path of the file is recorded in the Link-type file. The operating system will then use this path to Find the desired shared content file.

For example : When User3 accesses "ccc", the operating system determines that the file "ccc" belongs to the Link type file, so it searches for directories layer by layer based on the path recorded in it, and finally finds the "aaa" entry in User1's directory, and then it is found. The index node of file 1.

Actual Windows case: For example, the shortcut icon on our desktop

image-20230717183823410

Delete the soft link file : If the directory file pointed to by our soft link is deleted, then the current soft link file will be invalid!

Example: If we delete the QQ startup .exe in advance, and then double-click to open the corresponding shortcut, the following prompt will appear

image-20230717183951583


Knowledge review and important test points

image-20230717184141630

Note : For the soft link method, every time you access a shared file, you need to query the directory layer by layer. This process requires I/O disk operations, so using the soft link method to access a shared file will be faster than Hard links are slower.


1.8. File protection

image-20230717184533187

image-20230717185246337


1.8.1. Password protection

Method : Set a "password" for the file, such as abc111232. The user must provide the "password" when requesting access to the file.

Principle : The password is generally stored in the FCB or index node corresponding to the file. The user needs to enter the "password" before accessing the file. The operating system will compare the password provided by the user with the password stored in the FCB. If it is correct, the user will be allowed to access the file. .

Advantages : There is not much overhead in saving passwords, and the time overhead in verifying passwords is very small.

Disadvantages : The correct "password" is stored inside the system, which is not safe enough.


1.8.2. Encryption protection

Principle : Use a certain password to encrypt the file (for example, the password is 5 digits and the file is 100 digits, then every five digits are encrypted). When accessing the file, you need to provide the correct "password" before the file can be accessed normally. Decrypt.

File content : The actual content is encrypted.

Practical process : If the user can provide the correct password, the encrypted file can be decrypted into the original data form.

Case 1 : Use the correct password to decrypt, and the final file content is the original data

image-20230717190342943

Case 2 : Using the wrong password resulted in the final decrypted file content being still incorrect.

image-20230717190412454

Advantages : Strong confidentiality, no need to store "password" in the system. When the user wants to view the file, he can use the corresponding password to decrypt it.

Disadvantages : Encoding/decoding, or encryption/decryption, takes a certain amount of time.


1.8.3. Access control (understanding streamlined access control)

Implementation method : Add an access control list (Access-Control List, ACL) to the FCB (or index node) of each file, which records the operations performed by each user on the file.

Process principle : When a user accesses a file, he can query the access control list of the corresponding file to see whether he has permission. If he does not have permission, access is denied.

The access types are as follows :

image-20230717190529200

Each file has a corresponding access control list :

image-20230717190534871

Disadvantages : Some computers may have many users, so the access control may be very large. Use a streamlined access list to solve the problem.

精简访问列表: In "group" units, mark what operations each "group" user can perform on the file.

For example, groups: system administrator, file owner, file owner's partners, and other users.

We only need to set the corresponding access permissions for each group:

image-20230717191403425

Query permission process : When a user wants to access a file, the system will check whether the group to which the user belongs has the corresponding access permissions.


Knowledge review and important test points

image-20230717191651783

Encryption protection is more secure than password protection, but it is more expensive.

Access control is more flexible and divides access rights into multiple types.

Note : If access permissions are controlled for a certain directory, then the same access permissions must be controlled for all files in the directory.


2. File system

2.1. Hierarchical structure of file system

image-20230717192338482


2.1.1. File system is divided from top to bottom

The file system is divided into various levels from top to bottom as shown below :

image-20230717194016863

User interface [Chapter on basic file operations]: The file system needs to provide some simple and easy-to-use functional interfaces to upper-level users. This layer is used to handle system call requests issued by users, such as Read, Write, Open, Close and other system calls.

File Directory System [File Directory Chapter]: Users access files through file paths, so this layer needs to find the corresponding FCB or index node based on the file path given by the user. All management work related to directories and directory items is completed at this layer, such as managing active file directory tables, managing open files, etc.

Access control module [File Protection Section]: In order to ensure the security of file data, it is also necessary to verify whether the user has access rights. This layer mainly completes the related functions of file protection.

Logical file system and file information buffer [Logical structure of file]: The user specifies the file record number that he wants to access. This layer needs to convert the record number into the corresponding logical address.

Physical file system [Section on the logical structure of files]: This layer needs to convert the logical address of the file provided by the previous layer into the actual physical address.

Auxiliary allocation module [File storage space management]: Responsible for the management of file storage space, responsible for allocating and reclaiming storage space.

  • If some records are added or deleted to the file at this time, then it is obviously necessary to allocate some new physical blocks to the file, or if some records are deleted, some physical blocks originally occupied by the file need to be recycled.

Device management module [Disk Management Chapter]: Interacts directly with the hardware and is responsible for some management tasks directly related to the hardware, such as allocating devices, allocating device buffers, disk scheduling, starting devices, releasing devices, etc.


2.1.2. Case review of all hierarchical structures

Case : Suppose a user requests to delete the last 100 records of the file "D:/working directory/student information.xlsx".

Process :

  1. Users need to make the above request through the interface provided by the operating system [User Interface].
  2. Since the user provides the storage path of the file, the operating system needs to search the directory layer by layer to find the directory item [File Directory System].
  3. Different users have different operating permissions on files, so in order to ensure security, it is necessary to check whether the user has access permissions [access control module/access control verification layer].
  4. After verifying user permissions, the "record number" provided by the user needs to be converted into the corresponding logical address [logical file system and file information buffer].
  5. After knowing the logical address of the target record, it needs to be converted into the actual physical address [physical file system].
  6. To delete these records, you must make a request to the disk device [Device Manager Module].
  7. After deleting these records, some disk blocks will be free, so these free disk blocks must be recycled [auxiliary allocation module].

2.2. Global structure (layout) of the file system

image-20230717195120097


2.2.1. Physical formatting

image-20230717195442666

When disks were first produced, they were not divided into sectors.

物理格式化: Low-level formatting, used to divide sectors, detect bad sectors, and replace bad sectors with spare sectors.

  • Bad sectors are transparent to the operating system: when the operating system wants to access a bad sector, it already knows it is a bad sector after physical formatting. At this time, the operating system will use a spare sector when accessing it. It replaces sectors with good sectors and quietly completes the replacement work behind the scenes, so bad sectors are also transparent to the operating system.

2.2.2. Logical formatting

逻辑格式化: Divide the disk into partitions. How many partitions are divided into a disk, the size of a partition, and the address range are all recorded using the partition table.

  • The gray area contains actual data, and the white area is used to save other files and other directories. It is temporarily blank after logical formatting, and can only be slowly filled later by creating and copying files.

image-20230717200551711

  • Note: For the role of "Master Boot Record MBR, Boot Block", you can learn it in conjunction with the "Operating System Boot" section in Chapter 1.

An independent file system can be established in each partition. For example, in the C drive partition, a unix file system is established. The corresponding unix file system structure is as follows :

  • 引导块: Responsible for initializing the operating system when booting.
  • 超级块: You can quickly find all free blocks in the disk through the super block. [Quickly find several free disk blocks]
  • 空闲空间管理: Bitmap can quickly determine whether a specific disk block is currently free. [Quickly determine whether a disk block is free]
  • i节点区: Index node, each file has a corresponding index node, all index nodes are placed in this node area, and are stored continuously in this area, and the size of an index node is the same, and can be quickly passed through the index node Locate address (index node size, subscript).
  • 根目录: Any file system must start from the root directory to create next-level directories or store new files.

Super block and free space management have functional overlap, but there are certain differences in actual use.


2.2.3. The structure of the file system in memory and external memory

Memory and external memory structure distribution

The memory is divided into : 用户区, 内核区.

内核区

  • Directory caching: Directory files accessed recently will be cached in memory without having to read them from disk every time. This can speed up directory retrieval.
  • System open file table: one file per system.
  • User process open file table: one file per process. The record contains which files were opened by the process.

image-20230717202501756


The process behind opening a file using the open system call

The process behind opening a file with the open system call :

image-20230717203855420

  1. When calling the open function, first go back to the external storage file directory A and read the directory into directory M for cache.
  2. At this time, you can check each directory entry, find the FCB and copy it to the system open file table, indicating that the file has been opened and the open count is 1.
  3. The process that initiates the open system call has a process open file table and creates a new entry in the open file table. This entry will record its opening method (permission) and record the corresponding system open file table index.

After the open function call is completed, a file descriptor will be obtained. This file descriptor is the index of the process's open file table. Through this index, the directory entry in the system's open file table can be located, and finally the physical address of the file can be directly located. .


The execution flow process of the read function

The execution flow process of the read function :

The fd file descriptor obtained after calling the open function above corresponds to the process open file table in the kernel area, and then finds the directory entry of the corresponding system open file table through the open file table index, and finally determines the physical address of the file for external access. Store read data. [The process only has one disk access I/O]

The whole process is shown in the figure below :

image-20230717203704514


2.3. Virtual file system & file system mounting (installation)

image-20230717204252719


2.3.1. No virtual file system: ordinary file system

First look at a normal file system :

image-20230717205129269
We found that when calling the open file function for different file systems, such as UFS file system, NTFS file system, and FAT file system, the corresponding functions of different file systems are different.


2.3.2. Understanding virtual file systems

For different systems, if we want to complete the same function call, we need to write different function call codes. Once the underlying system is different, we need to modify the code. How to solve this problem?

  • The operating system kernel provides a unified and standard function calling interface to upper-level users. The virtual file system is introduced into the operating system, and its English abbreviation is: vfs.

虚拟文件系统

image-20230717205704470


2.3.3. Characteristics of virtual file system (four points)

System features of virtual files :

1. Provide a unified and standard system call interface to the upper-level user process to shield the implementation differences of the underlying specific file systems.

2. VFSThe underlying file system is required to implement certain specified functions, for example: If open/write/reada new file system wants to be used on a certain operating system, it must meet the VFS requirements of the operating system.

  • Problem solved : If I want to access more file systems, do we have to modify the code every time I access a new one in the virtual file system? No, we have defined an interface specification, and anyone who wants to access this operating system must implement it according to my specification.

3. Every time a file is opened, VFS creates a new one in the main memory vnodeand uses a unified data structure to represent the file, no matter which file system the file is stored in.

You can see that different file systems represent different directory entries. For UFS file systems inode, directory entries for FAT file systems contain a variety of fields:

image-20230718001541593

So in the end, we actually need to read the directory items uniformly. A vnode node is provided in VFS, and the attributes of other corresponding file directory items will be uniformly copied to the table entries of this node. At this time, VFS can use a unified data The structure vnode represents the information of any file.

image-20230718001724722

Every time a file is opened, a new vnode is created in the main memory.

  • vnode and inode are two completely different things. vnode only exists in main memory . Every opened file will have a corresponding vnode in main memory. The inode will be loaded into the main memory and stored in the external memory at the same time ( the inode will be loaded into the main memory and stored in the external memory ).

4. It vnodealso contains a function pointer, which is used to point to the function list of a specific file system .

The advantage of including such a function pointer: as long as we open a file, then for any operation on the file such as read or write, we can first find the vnode of the file and then find the corresponding function pointer according to the function pointer recorded by the device. List of file system functions, executing specific functions.

image-20230718002848026


2.3.4. File system mounting

文件系统挂载(mounting): That is, file system installation/mounting.

How to mount a file system into the operating system?

Things to do when mounting a file system :

① Register the newly mounted file system in VFS . The memory 挂载表(mount table)contains information related to each file system, including file type, system type, capacity, etc.

  • The newly mounted file system is in memory, and the virtual file system manages a data structure called a mount table. In the virtual file system, there are three entries in the mounting table that point to the three file systems ufs, ntfs, and fat respectively. If we want to mount a new file system, the newly mounted file system must be created for it. A table entry.

image-20230718003753621

②The newly mounted file system must provide a function address list to VFS .

③Add the new file system to the mount point (mount point) , that is, mount the new file system to a parent directory.

For example: When a Windows U disk is inserted into a computer, it needs to be mounted to the virtual file system of the operating system (actually it will be mounted to the corresponding newly assigned drive letter); the MAC mount point is the volume directory under the root directory.

image-20230718003441526


Organized by: Long Road Time: 2023.7.16-18

Guess you like

Origin blog.csdn.net/cl939974883/article/details/131773252