Operating system ~ file management, file directory, file physical structure, file sharing and file protection realization

File management

There are various files stored in the computer. What attributes does a file have?
How should the data in the file be organized? How should the
files be organized?

File attributes

What attributes does a file have?

File name: The file name is determined by the user who created the file, mainly for the convenience of the user to find the file, and the file with the same name is not allowed in the same directory.

Identifier: Each file identifier in a system is unique and is not readable to users. Therefore, the identifier is only an internal name used by the operating system to distinguish each file.

Type: Specify the type of file

Location: The path where the file is stored (used by the user), the address in the external storage (used by the operating system, invisible to the user)

Size: Specify the file size creation time, last modification time and file owner information

Protection information: access control information for file protection

Insert picture description here

The organization structure of the internal data of the file

The organization structure of the files is like the combination mode in the design mode, with each file as the leaf node and the directory as the parent node

Insert picture description here

Functions provided by the operating system to files

Create file (create system call)

Delete file (delete system call)

Read file (read system call)

Write file (write system call)

Open file (open system call)

Close the file (close system call)

How to store files in external storage

The external memory is the same as the internal memory. The external memory is also composed of storage units, and each storage unit can store a certain amount of data (such as 1B). Each storage unit corresponds to a physical address

The operating system allocates storage space for files in units of "blocks", so even if a file is only 10B in size, it still needs to occupy 1KB of disk blocks. The data in the external memory is also read in blocks when it is read into the memory

Similar to the division of memory into "memory blocks", external memory is divided into "blocks/disk blocks/physical blocks". The size of each disk block is equal, and each block generally contains an integer power of 2 addresses (for example, in this example, a block contains 2^10 addresses, that is, 1KB). Similarly, the logical address of a file can also be divided into (logical block number, block address), and the operating system also needs to convert the logical address into the form of external memory physical address (physical block number, block address). The number of bits in the address of the block depends on the size of the disk block

File Directory

Insert picture description here
When we double-click "Photo", the operating system will find the directory entry (that is, record) corresponding to the keyword "Photo" in this directory table, and then read the information of the "Photo" directory from the external storage into the internal memory, so, The contents of the "Photos" directory can be displayed.
s
A record in the directory file table is a "file control block (FCB)

An ordered collection of FCBs is called a "file directory", and an FCB is a file directory entry. The basic information of the file is contained in the FCB? File name, physical address, logical structure, physical structure, etc.), access control information (whether readable/writable, list of users who are prohibited from access, etc.), usage information (such as file creation time, modification time, etc.).
The most important and basic thing is the file name and the physical address where the file is stored.

Multi-level directory

Insert picture description here
When a user (or user process) wants to access a file, the file path name is used to identify the file. The file path name is a string. The directories at all levels are separated by "/". The path from the root directory is called the absolute path .

For example: the absolute path of selfie.jpg is "/photo/2015-08/自拍.jpg"

The system finds the next
level of directories layer by layer according to the absolute path . At the beginning, read the directory table of the root directory from the external
storage ; after finding the storage location of the "photo" directory , read the corresponding catalogue from the external storage; read the disk I/O operation at a time. …P finally found the storage location of the file "self-timer.jpg". The whole process requires 3 read disk I/o operations.

Many times, users will access multiple files in the same directory consecutively (for example: viewing multiple photo files in the "2015-08" directory in succession). Obviously, it is very inefficient to search from the root directory every time. So it can Set a "current directory".

For example, the catalog file of "Photos" has been opened at this time, that is to say, this catalog table has been loaded into the memory, then it can be set as the "current catalog". When a user wants to access a file, he can use the "relative path" starting from the current directory.

In Linux, "." means the current directory, so if "photo" is the current directory, the relative path of "selfie.jpg" is:

"./2015-08/Selfie.jpg". Starting from the current path, you only need to query the "photo" directory table in the memory to know the storage location of the "2015-08" directory table, and transfer to the directory from external storage to know the storage location of the "selfie.jpg" Up.

It can be seen that after the introduction of "current directory" and "relative path", the number of disk l/o has been reduced. This improves the efficiency of accessing files.

Index node (improvement of FCB)

Insert picture description here
What are the benefits of thinking?

Assuming that an FCB is 64B and the size of the disk block
is 1KB, only 16 FCBs can be stored in each disk block. If there are a total of
640 directory entries in a file directory, a total of 640/16=40 disk blocks need to be occupied. Therefore, to retrieve the directory according to a certain file name, 320 directory entries need to be queried on average, and the disk needs to be booted up on average 20 times (read one block each time the disk I/o).

If the index node mechanism is used, the file name occupies 14B, and the index node pointer station 2B, each disk block can store 64 directory entries, and then on average, only 320/64=5 disk blocks need to be read to retrieve the directory by file name. Obviously, this will greatly increase the speed of file retrieval.

The physical structure of the file

Similar to memory paging, storage units in the disk are also divided into "blocks/disk blocks/physical
blocks". In many operating systems, the size of the disk block is the same as the size of the memory block and page

Insert picture description here
Insert picture description here
Insert picture description here
So how are these disk blocks allocated when the file is cut into small disk blocks on the disk?

Link assignment one by one implicit link

Insert picture description here
The user gives the logical block number i to be accessed, and the operating system finds the directory entry (FCB) corresponding to the file

Find the starting block number (i.e. block number o) from the directory entry, read the logical block number 0 into the memory, and know the physical block number stored in the logical block number 1, then read the logical block number 1, and find the number 2 The storage location of the logic block... and so on.
Therefore, to read logical block i, a total of i+1 disk I/o is required.

Conclusion: Files that use chain allocation (implicit link) only support sequential access, not random access, and the search efficiency is low. In addition, the pointer to the next disk block also consumes a small amount of storage space.

The link distribution method of implicit link is adopted, which is very convenient for file expansion. In addition, all free disk blocks can be used, there is no fragmentation problem, and the utilization of external memory is high.

Link assignment one-explicit link

The pointers used to link the physical blocks of the file are explicitly stored in a table. That is, the File Allocation Table (FAT, File Allocation Table)
Insert picture description here
assumes that a newly created file "aaa" is sequentially stored in disk block 2>5>0→1,
assuming that a newly created file "bbb" is sequentially stored in disk block 4> 23 >3
Note: Only one FAT is set for one disk. When booting up, read FAT into memory and stay in memory. Each entry of the FAT is physically and continuously stored, and each entry has the same length, so the "physical block number" field may be implicit.

The user gives the logical block number i to be accessed, and the operating system finds the directory entry (FCB) corresponding to the file Insert picture description here

Find the starting block number from the directory entry, if i>o, query the file allocation table FAT in the memory, and then find the physical block number corresponding to the logical block i. The process of converting the logical block number into a physical block number does not require a disk read operation.
Conclusion: The files that use chain allocation (explicit link) support sequential access and random access (when you want to access logical block i, you do not need to access the previous logical blocks 0~i-1 in sequence), because The process of block number conversion does not require access to the disk, so the access speed is much faster compared to implicit links.
Obviously, explicit links will not produce external fragments, and files can be easily expanded.

Index allocation

Index allocation allows files to be discretely allocated in each disk block. The system will create an index table for each file. The index table records the physical blocks corresponding to each logical block of the file (the function of the index table is similar to that in memory management. Page table-establish the mapping relationship between logical pages to physical pages). The disk blocks stored in the index table are called index blocks. The disk block where the file data is stored is called the data block.
Insert picture description here
Assume that the data of a newly created file "aaa" is sequentially stored in disk block 2>5→13>9. Disk block No. 7 is used as the index block of "aaa", and the content of the index table is stored in the index block.

Note: In the chain allocation method of explicit link, the file allocation table FAT corresponds to one disk. In the index allocation method, the index table corresponds to one file.

The storage location of the index table can be known from the directory entry, read the index table from external storage into the memory, and search the index table to store only the location of the logical block No. i in the external storage.

It can be seen that the index allocation method can support random access. File expansion is also very easy to implement (just need to allocate a free block to the file, and add an index table entry), but the index table needs to take up a certain amount of storage space

If the size of a file exceeds 256 blocks, then one disk block cannot fit the entire index table of the file. How to solve this problem?

Multi-level index

① Link scheme: If the index table is too large and one index block cannot fit, then multiple index blocks can be linked and stored.
Insert picture description here
Assuming that the disk block size is 1KB and one index entry occupies 4B, one disk block can only store 256 index entries.
If the size of a file is 256 256KB =65,536 KB =64MB,
the file has a total of 256
256 blocks, which corresponds to 256*256 index items, and 256 index blocks are needed for storage. These index blocks are connected by a link scheme.
If you want to access the last logical block of the file, you must find the last index block (the 256th index block), and each index block is linked by pointers, so the first 255 indexes must be read sequentially Piece.
This is obviously very inefficient. How to solve it?

② Multi-level index: build multi-level index (the principle is similar to multi-level page table). Make the first-level index block point to the second-level index block. The third and fourth layer index blocks can also be created according to the requirements of the file size.

Insert picture description here
Assuming that the disk block size is 1KB and one index entry occupies 4B, one disk block can only store 256 index entries.
If a file uses a two-level index, the maximum length of the file can be up to 256 256 1KB =65,536 KB =64MB.
According to the logical block number, it can be calculated which entry in the index table should be looked up. For example, if you want to access logical block
1026 , 1026/256=4 , 1026%256= 2.
Therefore, you can first load the primary index table into the memory, query the 4th entry, and load the corresponding secondary index table into the memory. , And then query the No. 2 entry of the secondary index table to know the disk block number stored in the No. 1026 logical block. To access the target data block, 3 disk I/Os are required.

File Sharing

Note: Multiple users share the same file, which means that there is only "one copy" of file data in the system. And as long as a user modifies the data of the file, other users can also see the changes in the file data.
If multiple users have "copied" the same file, there will be "several copies" of file data in the system. One of the users modified his own file data, which has no effect on the file data of other users.

Sharing method based on index node (hard link)

Knowledge review: Index nodes are a strategy for file catalog slimming. Since only the file name is needed when retrieving a file, other information besides the file name can be placed in the index node. In this way, the directory entry only needs to include the file name and index node pointer.

A link count variable count is set in the index node to indicate the number of user directory entries linked to this index node.
If count = 2, it means that there are two user directory links connected to the index node at this time, or that there are two users sharing this file
Insert picture description here

Symbolic chain-based sharing method (soft link)

When User3 accesses "ccc", the operating system judges that the file "ccc" belongs to the Link type file, so it will look up the directory layer by layer according to the path recorded in it, and finally find the "aaa" entry in the directory table of User1, so it is found Index node of file 1.
Insert picture description here

File protection

Password protection

Set a "password" for the file (for example: abc112233), the user must provide the "password" when requesting to access the file.
The password is generally stored in the FCB or
index node corresponding to the file . The user needs to enter the "password" before accessing the file. The operating system will compare the password provided by the user with the password stored in the FCB. If it is correct, the user is allowed to access the file
. The time overhead is also very small. Disadvantages: The correct "password" is stored inside the system, which is not safe enough.

Encryption protection

Use a certain "password" to encrypt the file. When accessing the file, you need to provide the correct "password" to decrypt the file correctly.
Eg: One of the simplest encryption algorithm-XOR encryption.
Assuming that the "password" used for encryption/decryption is "01001"

Insert picture description here
Advantages: strong confidentiality, no need to store "passwords" in the system
Disadvantages: encoding/decoding, or encryption/decryption, takes a certain amount of time.

Access control

Add an access control list (Access-Control List, ACL) to the FCB (or index node) of each file. The table records which operations each user can perform on the file.

Insert picture description here
Insert picture description here
Add an access control list (Access-Control List, ACL) to the FCB (or index node) of each file. The table records what operations each user can perform on the file.
Simplified access list: Use "groups" as the unit to mark which operations users of each "group" can perform on files.
Such as: divided into several groups: system administrator, file owner, file owner's partner, and other users.
When a user wants to access a file, the system checks whether the group to which the user belongs has the corresponding access authority.

Insert picture description here

File system hierarchy

Insert picture description here
Use an example to help memorize the hierarchical structure of the file system:
suppose a user requests to delete the last 100 records of the file "D:/working directory/student information.xlsx". 1. The user needs to send the above request through the interface provided by the operating system-user interface
2. Because what the user provides is the storage path of the file, the operating system needs to look up the directory layer by layer to find the corresponding directory item-file directory system
3. Different users have different operation permissions on files, so in order to ensure security, it is necessary to check whether the user has access permissions-an access control module (access control verification layer)
4.After verifying the user's access authority, the "record number" provided by the user needs to be converted into the corresponding logical address-logical file system and file information buffer
5. After knowing the logical address corresponding to the target record, it needs to be converted into the actual physical address—physical file system 6. To delete this record, a request must be made to the disk device-Device Management Program Module
7. After deleting these records, there will be some disk blocks free, so these free disk blocks must be recycled-auxiliary allocation module

Guess you like

Origin blog.csdn.net/Shangxingya/article/details/113810781