File system and soft and hard links

1. File system

File operation is the relationship between the process and the opened file, but it is impossible for the operating system to open all the files on the disk at the same time. The opened files must be managed, and the unopened files must be managed for the convenience of us to read at any time.

1. Understand the physical structure of the disk

insert image description here

The disk is almost the only mechanical device in the computer and is a peripheral. It is slow, but it is cheap and has a lot of storage. It is still the storage device of choice for enterprises. For the disk we should have the following understanding:

1. The disk is written to the disk by changing the north and south poles of the disk through the charge and discharge of the magnetic head.

2. The disk is not just one disk, but a stack. Each disk has two sides, and there are as many heads as there are disks.

3. There is a very small distance between the magnetic head and the disk surface, and they do not directly touch together (if the magnetic head touches the disk surface, it may scratch the disk surface and cause data loss)

4. All the magnetic heads are connected together and can only move together (advance and retreat together), that is, they point to the same position on different disk surfaces

2. Disk storage structure

insert image description here

A disk surface is composed of multiple tracks, and a track is divided into multiple sectors. Circles with the same radius in each disk form a cylinder together.

When reading disk data, it is read in units of sectors. Although the circumference of the tracks closer to the center of the circle is smaller, they access the same amount of data, that is, the storage capacity of each sector is the same, which is 512 bytes.

a. Disk read

Disk writing data relies on the magnetic head, and reading data still relies on the magnetic head. When the disk starts to operate, the platter is spinning at a high speed and the head is swinging back and forth. The process of magnetic head swing is the process of positioning the magnetic track (cylinder). After positioning the magnetic track (cylinder), the magnetic head is fixed on that magnetic track and no longer deflects. At this time, confirm which magnetic head it is and finally confirm that the data is in Which sector of the track.

There are multiple disks in the disk, so there are multiple heads, so after confirming the track, you need to confirm the head. Confirming the head is also confirming which disk the data is on, which also provides a prerequisite for confirming the sector.

This positioning method is called C (cylinder) H (head) S (sector) positioning method

3. The logical structure of the disk

insert image description here

If the CHS positioning method is used, this is a three-dimensional addressing method, and the efficiency is not high enough. Therefore, for efficiency and convenient management, in fact, when the operating system manages the disk, it abstracts the disk into a linear structure (array). At this time, we need to read the data in a certain area. We only need to get the subscript of this area. This subscript is called the LBA address in the operating system.

The LBA address is also used inside the operating system. When we actually want to read data from the disk, we need to convert the LBA address into a CHS address (obtained by calculation).

a. Why doesn't the operating system use the CHS address directly?

1. For the convenience of management, CHS addressing is three-dimensional, while the array subscript is one-dimensional addressing

2. Let the code of the operating system be decoupled from the hardware, so that changes in the hardware do not affect the operating system

b. The size of the actual IO once

Although the size of a sector is 512 bytes, it is still too small for the file. After all, most of our files are several megabytes to several gigabytes. Therefore, every time the operating system goes to the disk to read data, it will use 1KB, 2KB, and 4KB as the basic unit (mostly 4KB). That is to say, even if you only need to read a bit in a sector, the operating system will load 4KB of data.

That is to say, after the operating system loads 4KB data, it may not be fully utilized, but this does not necessarily mean waste. According to the principle of locality: when we access the data, there is a high probability that we will access the data around the data. So loading a space of 4KB at one time is also a kind of preloading to some extent.

When comparing the sequence table and the linked list, it is said that the sequence table has a high CPU cache hit rate. Isn't it because the data of the sequence table is stored in a continuous space, because a space of 4KB will be loaded at one time, so the next time you visit the subsequent elements of the sequence table, you will find that it is already in memory.

The memory in the operating system is actually divided into blocks one by one. The size of these blocks is 4KB, which is the page frame.

The files on the disk, especially binary executable files, are also divided into 4KB blocks, which are page frames.

4. Disk partition management

Our disk is 512G at every turn. It is not easy to manage such a large space. The operating system adopts the idea of ​​​​divide and conquer when managing the disk. First, the disk is partitioned (the disk is divided into C disk and D disk), and then the partitions are grouped.

4.1. ext file system

insert image description here

Note : Boot Block is a boot block with a size of 1kb. It is specified by the PC standard and is used to store disk partition information and boot information. Any file system cannot operate this block. Every other Block Group has the same composition structure

Super Block : Stores the structural information of the file system itself, the number of unused Data Blocks and inodes, the size of a Data Block and inodes, the time of the last mount, the time of the last write of data, and the last check of the disk Information related to other file systems such as time, mainly including the total amount of Date Block and inodes. There is a Super Block in each Block Group, among which the one in Group0 is the main one, and the others are supplemented; this is a kind of backup, because once the Super Block is damaged, the entire file system will be affected.

Group Descriptor Table : block group description table, storing attribute information of all block groups in the partition

Block Bitmap : It is a bitmap structure. Use 1 to indicate that the Data Block at this location is occupied, and use 0 to indicate that it is not occupied.

inode Bitmap : Bitmap structure, 1 means the inode is occupied, 0 means it is not occupied

inode Table : inode table, which stores all available inodes (used + unused) in the group, the size of each inode is 128/256 bytes, inodes and files correspond one-to-one, and each inode stores Almost all attributes of the file (in the operating system, the inode is the unique identifier of the file). The filename is not stored in the inode

Data Blocks (data block) : Store all the Data Blocks in the group. The Data Block is used to store the data of the file. The size of each Data Block is the same. (Large files occupy more data blocks, while small files occupy less data blocks)

Through the above information we need to know these:

1. Formatting is to rewrite the file system, and the recovery of the file system is to use other undamaged Super Blocks to restore the damaged Super Block

2. To create a new file is to find an unoccupied inode in the inode Bitmap to store the attributes of the file, then go to the Block Bitmap to find the data written in the file by an unoccupied Data Block, and finally establish a link between the inode and the Data Block mapping relationship

a. File search

There is a one-to-one correspondence between inodes and files. Inodes in the same partition are continuous, and inodes in different partitions are not related. Finding a file is to find out whether the bit corresponding to the inode is 1 in the inode Bitmap. If it is, it means that the file exists, and the corresponding bit of the inode in the bitmap has an offset relative to the first bit. , this offset can be used as the relative position of the inode in the inode Table. This will get the attribute content of the file.

There is a block[15] array in the inode, and the subscripts 0-11 in the array store the data blocks used by the file, and the data block corresponding to block[12] does not store the content of the file, but stores other The address of the data block, the data of the file can also be written into the data block stored in the data block, and the data block is stored in the data block, which is the secondary index. block[13] is the third-level index, and block[14] is the fourth-level index. In this way, quite a large amount of data can be stored after being expanded step by step.

The size of a data block is only 4KB, and the address of a data block may only be 4 bytes. The storage and search of large files can be solved by building an index

b. File deletion

To delete a file in the file system, as long as the inode corresponding to the file is set from 1 to 0 in the inode Bitmap, the file will be deleted.

Because once an inode is set from 0 to 1 in the inode Bitmap, the operating system will think that the inode is not occupied, and will allocate this inode to a newly created file and overwrite the file attributes.

Almost all deletions in the computer are such lazy deletions.

When creating a new file, it is actually necessary to write data, but to delete a file, just set the corresponding bit from 1 to 0 in the inode Bitmap, which is why deleting a file is much faster than downloading a file .

Because deleting a file is just setting the corresponding bit from 1 to 0 in the inode Bitmap, this also provides the possibility for file recovery. When a file is accidentally deleted, it is best not to do any operations other than recovering the file, otherwise it may be possible. There will be new data overwriting the data block (or inode) of the previously deleted file.

4.2 Properties and data of directories

While the operating system identifies files with inodes, users use filenames, which in turn don't store filenames. A directory also has its own inode and data block. The sublets stored in the inode of the directory are the attributes of the directory, and the data blocks of the directory store the mapping relationship between the file names and inodes of all files in the directory.

This is why there cannot be two files with the same name in the same directory, because inodes correspond to files one by one (that is, an inode can only be mapped to one file name).

When learning Linux permissions, it is mentioned that to create a new file in a directory, you must have write permission. This is because creating a new file in a directory requires writing the new file and its inode corresponding to the data block corresponding to the directory. Mapping relations.

2. Soft and hard links

Create and delete soft links

insert image description here

1. Use ls -liinodes that can view files

2. A soft link is an independent file with its own inode and data block

3. To delete soft links, unlinkyou can use or rmdirectly delete soft link files

Soft link usage scenarios

Soft links are equivalent to shortcuts under Windows:

insert image description here

Deleting a soft link will not affect the source file, but once the source file is deleted, the soft link will become invalid

Create and delete hard links

insert image description here

It is observed that the inode of the hard link file is the same as the source file, and the hard link does not have its own inode and is not an independent file. It is just a mapping relationship between the corresponding file name and the inode.

insert image description here

The number of hard links means that there are several files pointing to this inode. There is a ref variable in the inode, and a new hard link ref++ is created. Otherwise, ref–. Only when the ref is reduced to 0 will the file be truly deleted. (The same is true for closing files, the file will be closed only if no process uses the file) This method is called reference counting.

That is to say, the hard link is actually an alias for the file, and the ref is increased by one, so that after the file is deleted, the data of this file can still be accessed by the hard link.

Hard link usage scenarios

When we create a new file, its hard link number is 1 (representing itself). But when we create a directory, the default hard link number is 2, because in addition to the directory itself, a file will be created by .default represents the current directory

insert image description here

When we create a directory, and will be generated by default ., ..which .represents the current directory and ..represents the upper-level directory, which is why I cd ..can return to the upper-level directory.

If I continue to create directories under the dir directory, the number of hard links of dir will continue to be ++, because there will be ..points to the dir directory under the new directory

Guess you like

Origin blog.csdn.net/m0_62633482/article/details/130402469