【Wang Dao Notes-Operating System】Chapter 4 File Management

1. The concept of files

(1) Definition of documents

  When the system is running, the computer uses the process as the basic unit for resource scheduling and allocation; while the user's input and output, the file is the basic unit for long-term storage of information and future access. The file system is the file management system.

  A file can be compared to a book in a library:

  • First of all, the main body of a book must be the content in the book, which is equivalent to the data in the file ;
  • Secondly, books of different categories need to be placed in different library, and then numbered, and then the number is registered in the library management system, which is convenient for readers to refer to, which is equivalent to the classification and search of documents ;
  • Finally, some foreign-language books that are out of print or relatively expensive can only be loaned to VIP members or other readers with higher authority, while some ordinary books can be borrowed by anyone. This is the access authority in the file .

  The structure of the file :

  • data item. Basic data items (the smallest logical unit, such as name, date), combined data items (composed of multiple basic data items).
  • recording. A set of related data items used to describe the attributes of an object in a certain aspect. For example, the registration record of an examinee includes the examinee's name, date of birth, school code, ID number, etc.
  • file. Can be divided into structured files (composed of a group of similar records, such as all candidates), unstructured files (binary files or character files, streaming files).

  In fact, there is no strict definition of a file. The file can be numbers, letters, or binary codes, and the basic access unit can be bytes, lines, or records. Files can be stored in hard drives or other secondary storage for a long time, allowing controllable shared access between processes, and can be organized into complex structures.

(2) File attributes

  • name. only.
  • Identifier. The unique label of the file, usually a number, is not readable by humans.
  • Types of.
  • position. pointer.
  • size. The current size of the file (in bytes, words, or blocks), and it can also include the maximum value allowed by the file.
  • protection. Access control information.
  • Time, date and user ID. Information about file creation, last modification and last access is used to protect and track the use of files.

  The information of all files is stored in the directory structure, and the directory structure is stored on the external storage. File information is transferred to memory when needed. Generally, directory entries include the file name and its unique identifier, and the identifier locates information about other attributes.

Second, the operation of the file

1. File creation and deletion

  • Create file: There are two necessary steps to create a file. One is to find space for the file in the file system; the other is to create an entry for the new file in the directory, which records the file name, location in the file system, and other possible information.
  • Delete file: First find the directory entry of the file to be deleted from the directory, make it empty, and then reclaim the storage space occupied by the file (including file control block, buffer, etc.).

2. Opening of the file

  Many systems require the open system call when using a file for the first time. The operating system maintains an open-file table. When a file operation is required, the file can be specified through an index in the table, and the search link is omitted. When the file is no longer in use, the process can close it, and the operating system deletes the entry from the open file table.
  If the request to call open (create, read-only, read-write, add, etc.) is allowed, the process can open the file, and open usually returns a pointer to an entry in the open file table. Perform all I/O operations by using this pointer instead of the file name to simplify steps and save resources.

3. Commonly used system calls

Insert picture description here
Insert picture description here

Third, the directory structure

1. Absolute path and relative path

  When the user wants to access a certain file, the file path name is used to identify the file. The file path name is a character string. All directory names and data file names on the path starting from the root directory to the searched file are linked with the separator "/". to make.
  Absolute path: The path from the root directory.
  Relative path: starting from the current directory of the user (process) to the path of the file to be found, all directory names and data file names are linked with a separator "/". The access of each file by the process is relative to the current directory. Setting the current directory is helpful to speed up the retrieval of files.

2. File control block

  The file control block (FCB) is a data structure used to store various information needed to control the file to achieve "access by name". An ordered collection of FCBs is called a file directory, and an FCB is a file directory entry. In order to create a new file, the system will allocate an FCB and store it in the file directory as a directory entry.

  The FCB mainly contains the following information:

  • Basic information, such as the file name, the physical location of the file, the logical structure of the file, and the physical structure of the file.
  • Access control information, such as file access permissions, etc.
  • Use information, such as file creation time, modification time, etc.

3. Several directory structures

  • Single-level directory structure

  Only one directory table is created in the entire file system, and each file occupies a directory entry.
Insert picture description here

  The single-level directory structure implements "access by name", but it has the disadvantages of slow search speed, file names that are not allowed to be duplicated, and file sharing inconvenience. It is not suitable for multi-user systems.

  • Two-level directory structure (to solve the problem of duplicate names)

  The two-level directory structure can solve the problem of file duplication among multiple users, and the file system can implement access restrictions on the directory to ensure security. However, the two-level directory structure lacks flexibility and cannot classify files.

Insert picture description here

  • Multi-level directory structure (UNIX system) (clear hierarchy)

  The tree-shaped directory structure can easily classify files, and the hierarchical structure is clear, and it can also manage and protect files more effectively. However, when searching for a file in the tree-shaped directory, it is necessary to access the intermediate nodes level by level according to the path name, which increases the number of disk accesses, which will undoubtedly affect the query speed.

Insert picture description here

  • Acyclic graph directory structure (realize sharing)

  On the basis of the multi-level directory structure, the acyclic graph directory structure facilitates the sharing of files, but makes the management of the system more complicated (for example, the deletion of shared nodes requires attention).

Insert picture description here

Fourth, file sharing and file protection

1. File sharing

(1) Hard link file sharing

  Hard link sharing adopts index node method. Only the file name and the pointer to the corresponding index node are set in the directory of the tree structure. A pointer to the index node of the file is set in the user directory of the shared file. There should also be a link count count in the index node, which is used to indicate the number of user directory entries linked to this index node (or can be said to be connected to this file). When count> 1, the file owner cannot Delete the file as shown in the figure.

Insert picture description here

  Advantages: Realize the sharing of different names.
  Disadvantages: The file owner cannot delete files shared with others.

(2) Soft link (symbolic link) file sharing

  When using symbolic chains to realize file sharing, only the owner of the file has a pointer to its index node. When user B wants to share a file F of user A, the system can create a new file of type LINK that only contains the path name of the shared file F in the directory of user B. Call this link method a symbolic link.
  Advantages: The file owner can delete files shared by others.
  Disadvantages: When other users read shared files, they need to look up one by one according to the components of the path name, which is expensive to access.

2. File protection

  File protection is achieved through password protection, encryption protection, and access control. Among them, password protection and encryption protection are used to prevent user files from being accessed or stolen by others, while access control is used to control how users access files.

Five, the realization of the file

1. Management of disk non-free blocks: file allocation method

Insert picture description here

Insert picture description here

2. Management of free disk blocks: file storage space management

Skip it first.

6. Disk organization and management

1. The structure of the disk

Insert picture description here

2. Disk read and write time

  One disk read and write operation time = seek time + delay time + transmission time.

  • Seek time (most impact) : the time required to move the head to the track. Ts=m*n+s. There are a total of n tracks, and the time required for each track is m, and s is the time to start the magnetic arm.
  • Delay time: the time required for the head to locate to a sector of a certain track, generally a half turn. Tr=1/r/2. r is the disk rotation speed.
  • Transmission time: the time required to read from the disk or write to the disk, which is determined by the number of bytes b and the rotation speed r. Tt=b/(rN). N is the number of bytes in a track. The processing time of sector data has a greater impact.
    Insert picture description here

3. Disk scheduling algorithm

  Several common scheduling algorithms are as follows:

Insert picture description here
Insert picture description here

4. Measures to reduce delay time: alternate numbering of panels

Insert picture description here

  The disk is a continuous rotation device. After the head reads/writes a physical block, it takes a short processing time to start reading/writing a block. Alternating numbering can effectively reduce the delay time of continuous access.
  The transmission time is determined by the disk itself, and the seek time and delay time can be optimized by the operating system through algorithm scheduling.


Attachment: Kingway multiple choice questions notes

1. In the UNIX operating system, input/output devices are regarded as (special files).

2. The logical structure of the file is designed for the convenience of users, and the physical structure is designed for the convenience of storage media characteristics and operating system management.

3. If a user process reads data in a disk file through the read system call, in the following description of this process, the correct one is (AB).
A. If the data of the file is not in the memory, the process enters the sleep waiting state (blocking)
B. Requesting the read system call will cause the CPU to switch from the user mode to the core mode
C. The parameters of the read system call should contain the name of the file (open The parameters in include the path name and file name of the file, and read only needs to use the file descriptor returned by open)

4. In the UNIX operating system, the index structure of the file is placed (index node).
The UNIX system uses a tree directory structure.

5. Set the current reference count of file F1 to 1, first establish a symbolic link (soft link) of file F1, file F2, and then create a hard link file F3 of file F1, and then delete file F1. At this time, the reference count values ​​of the file F2 and the file F3 are (1, 1) respectively.
When a symbolic link is established, the reference count value is directly copied; when a hard link is established, the reference count value is increased by 1. When deleting a file, the delete operation is invisible to the symbolic link, which does not affect the file system. When the file does not exist when the file is accessed through the symbolic link in the future, the symbolic link is directly deleted; but for the hard link, the symbolic link cannot be deleted directly. The count value is reduced by 1. If the value is not 0, this file cannot be deleted because there are other hard links pointing to this file.
When F2 is established, the reference count values ​​of F1 and F2 are both 1. When F3 is created again, the reference count values ​​of F1 and F3 both become 2. When F1 is deleted later, the reference count value of F3 is 2-1=1, and the reference count value of F2 remains unchanged.

6. If the hard link of the file f1 is 2, the two processes open fl and f2 respectively, and the corresponding file descriptors are fd1 and fd2, then the correct one in the following description is (BC).
A. The positions of the read and write pointers of f1 and f2 remain the same.
B.f1 and f2 share the same memory index node.
C.fd1 and fd2 respectively point to one item in the respective user open file table

7. Access to a file is jointly restricted by (user access rights and file attributes).

8. System-level security management includes registration and login.

9. In a file system, for each file, user categories are divided into 4 categories: security administrator, file owner, file owner’s partner, and other users; access permissions are divided into 5 types: full control, execution, modification, and reading , Write. If a binary bit string is used to represent file permissions in the file control block, in order to represent the access permissions of different types of users to a file, the number of bits describing the file permissions should be at least (20).

10. Suppose there are 7 address items in the file index node, among which 4 address items are direct address index, 2 address items are first-level indirect address index, and 1 address item is second-level indirect address index. Each address item The size is 4B. If the size of the disk index block and the disk data block are both 256B, the maximum length of a single file that can be represented is (1057KB).
The size of each disk index block and disk data block is 256B, and each disk index block has 256/4=64 address items. Therefore, the size of the data block pointed to by the four direct address indexes is 4×256B; the number of direct address indexes contained in the two first-level indirect indexes is 2×(256/4), that is, the size of the data block pointed to is 2×(256 /4) × 256B. The number of direct address indexes contained in a secondary indirect index is (256/4) x (256/4), that is, the size of the data block it points to is (256/4) x (256/4) x 256B. Therefore, the total size of the data block pointed to by the 7 address items is 4×256+2×(256/4))×256+(256/4)×(256/4)×256=1082368B=1057KB.

11. In a file system, its FCB occupies 64B, a disk block size is 1KB, and a first-level directory is used. Assume that there are 3200 directory entries in the file directory. It takes (100) disk accesses on average to find a file.
The number of disk blocks occupied by 3200 directory items is 3200×64B/1KB=200. Because the average number of accesses to the first-level directory is 1/2 the number of disk blocks (search for all directory entries in the directory table in order, each directory entry is an FCB), so the average number of accesses to the disk is 200/2=100 times.

12. After the search is completed using the sequential search method, the logical address of the file can be obtained.

13. There is a record file, which adopts the link distribution method, the fixed length of the logical record is 100B, and the record group decomposition technology is used when storing on the disk. The disk block length is 512B. If the directory entry of the file has been read into the memory, after the 22nd logical record has been modified, the disk has been started (6) times.
The 22nd logical record corresponds to 4 (22×100/512=4, remaining 152) physical blocks, that is, the fifth physical block is read in. Since the physical structure of the file is a link file, it needs to be pointed from the directory entry The first physical block is read, and the physical address of the fifth block is obtained when the fourth block is read sequentially. The disk is started 5 times in total. The modification also requires a write-back operation. Since the physical address of the block has been obtained during the write-back, only one access to the disk is required, so a total of 6 disks need to be booted.

14. Store 10 direct index pointers in the index node of the file, and 1 primary and secondary index pointers each. The disk block size is 1KB, and each index pointer occupies 4B. If the index node of a file is already in the memory, read the disk block where the file offset (addressed by byte) is 1234 and 307400 into the memory, and the number of disk blocks that need to be accessed are (1 , 3).
The size of the data block pointed to by the 10 direct index pointers is 10×1KB=10KB. Each index pointer occupies 4B, then each disk block can store 1KB/4B=256 index pointers, the size of the data block pointed to by the primary index pointer is 256×1KB=256KB, and the size of the data block pointed to by the secondary index pointer is 256 ×256×1KB=2KB=64MB.
Addressing by byte, when the offset is 1234, because 1234B<10KB, the disk block address where it is located can be obtained from the direct index pointer. The index node of the file is already in the memory, so the address can be obtained directly, so only one access to the disk is required.
When the offset is 307400, because 10KB+256KB<307400B<64MB, it can be known that the content of the offset is in a certain disk block pointed to by the secondary index pointer, and the index node is already in the memory, so it is obtained by first accessing the disk twice The disk block address where the file is located, the content can be read out once the disk is accessed again, and a total of 3 disk accesses are required.

15. The file system uses a bitmap method to indicate the allocation of disk space. The bitmap is stored in blocks 32 to 127 on the disk. Each disk block occupies 1024B. The disk blocks and bytes in the block are numbered starting from 0. Assuming that the disk block number to be released is 409612, the disk block number where the bit to be modified in the bitmap is located and the byte sequence number in the block are (82, 1) respectively.
Disk block number = starting block number + [disk block number/(1024×8)]=32+[409612/(1024×8)]=32+50=82, here is the byte number in the block instead of the bit number , So it needs to be divided by 8 (1B=8 bits), the byte number in the block=[(disk block number%(1024×8))/8]=1.

16. Among the following options, the data structure that can be used for file system management of free disk blocks is (ACD).
A. Bitmap B. Index node C. Free disk block chain D. File allocation table (FAT)

17. The disk is a time-sharing shared device, but at most one job can start it at each moment.

18. CDs, U disks, and magnetic disks can be accessed both sequentially and randomly.
The tape can only be accessed sequentially.

19. The file on the disk is in block, and the reading and writing are also in block.

20. The system always accesses a certain track of the disk and does not respond to access requests for other tracks. This phenomenon is called magnetic arm sticking. Among the following disk scheduling algorithms, the one that does not cause the magnetic arm to stick is (A).
A. First-come, first-served (FCFS)
B. Shortest seek time first (SSTF)
C. Scanning algorithm (SCAN)
D. Cyclic scanning algorithm (CSCAN)
When the system always has a track access request, it will continue Meet the access conditions of the shortest seek time priority, scanning algorithm and cyclic scanning algorithm, and will always serve the access request. The first-come, first-served service is scheduled in the order of requests, which is fairer, so choose A.

21. The cluster and disk sector sizes of a file system are 1KB and 512B, respectively. If the size of a file is 1026B, the disk space allocated to the file by the system is (2048B).
In order to improve disk access time, most operating systems allocate space in clusters, so the answer is D.

Guess you like

Origin blog.csdn.net/Tracycoder/article/details/109393972