[Linux] How to find files in the disk -- the physical structure and logical structure of the disk

img

Halo, this is Ppeua. I usually update C language, C++, data structure algorithms... If you are interested, please follow me! You won't be disappointed.


Insert image description here

0. Introduction to disk physical structure

What is used in old-fashioned computers or server backends is not today's high-speed disk SSD, but a mechanical hard disk.

image-20231129224123667

It consists of magnetic heads, disks, head arms and other mechanical structures... There is a magnetic head on each disk surface, and there is a small gap between the magnetic head and the disk surface to facilitate high-speed reading. Therefore, dust-free processing is required when the disk is packaged. Once it falls into the dust, it will cause sparks and lightning...

The disk is a permanent storage device. Its storage principle is that the magnetic head is charged. There are countless magnetic points on the disk. The positivity or negativity of a point can represent a 0 or 1. The magnetic head can write to the disk through charging and discharging. 01, and then store the data.

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

When a disk is read, it accesses sectors. Determining a sector requires the following two steps:

  • Determine which disk it is (determine which head to use)
  • Determine which cylinder (track) it is

That is, Cylinder Header Sector CHS addressing method.

So how does the CPU connect to the disk to read data?

d33b4587dcc6de08b7c6640224315e3

The CPU reads the information from these registers.

  • Control register: control IO direction
  • Data register: what data is written/read to disk
  • Address register: from what address to write/read
  • Status register: whether the task is completed

1. Disk logical structure

image-20231130082123222

This tape can actually be viewed as an array whose index starts from 0. It is a linear structure. Each disk surface can be viewed as several tapes spliced ​​together.

It is a sector-based array. That is, we can determine which sector it is by giving an array number. Determine which disk surface and which track it is on through the sector, and then determine which disk it is on. Which sector on the track is the LBA logical address and HCS address< a i=4>can be converted to each other.

2. File system partitioning

So how does the operating system manage this file?

We usually partition our disks. For example, divide an 800GB disk into partitions of 200GB, 300GB, 100GB, and 200GB.

This is how we partition under WINDOWS, divided into disks C D E F...

image-20231130083734785

Its essence is still a physical disk, it just adds a descriptor to specify which logical disk it belongs to from where to where.

struct partion{
    
    
  int start;
  int end;
};
partion num[4];

We take 200GB as an example below. This 200GB will be divided by the system into several BLOCK + one BOOT BLOCK

A BLOCK is divided into SUPER BLOCK, GROUP DESCRIPTOR TABLE, BLOCK BITMAP, INODE BITMAP, INODE TABLE, and DATA BLOCKS.

c76e7d76b82da85ef9139d43f0344f5

We use the idea of ​​​​divide and conquer to divide the management of a large disk into how to manage a small block.

So what does each block mean?

Inode Table:Stores all attributes of a file including the physical address on the disk

This is the information of one Inode, and InodeTable stores several Inodes Note that the file name is not stored here!!

struct Inode{
    
    
  	  所有者
      所属组
      权限
      大小
      ACM修改之间
      在磁盘上的物理地址[15]
};

Inode BitMap:Using the Bit method, it stores which Inode position in the InodeTable is not used

Data Blocks: stores the file content. The smallest unit stored on the disk is a block. The size of a block is usually 4KB, and DataBlocks stores several such blocks.

Files are stored in sectors, and a sector can be divided into many, many blocks. So a file may be stored in multiple blocks. Even if the size of a file is 1KB, for the convenience of management, a block will be allocated for storage. .

The size of the disk physical address array stored in the Inode is usually 15, which does not mean that a file can only occupy 15 blocks. When the file occupies exactly 15 blocks or less, the addresses of these blocks will be directly filled in the array. When it exceeds the limit, two-level indexing and three-level indexing will be used.

For example, a block address that does not store data is stored in the 13th block, and the block is processed again and mapped to the real physical address. This issecondary index a>,

For example, in the 14th block, a block address that does not store data is stored. The block stores the secondary index again, and then maps it to the real physical address. This is the third level of Index

Block BitMAP:Using the Bit method, which location in the Data Blocks is stored that is not used

Group Descriptor Table:Records the number of resources in the entire block (total number of inode blocks, number of used inodes, blocks)

SuperBlocks: records the storage rules of the entire partition storage (how many Block groups there are, what is the specific BlockGroup structure) and also stores the number of resources in the entire block. This block will not exist in every Block group, but will randomly exist in several Block Groups to prevent accidental damage.

Therefore, before we use the disk, we must format it, select a different file system, and preload information.

3. How to understand file directories

​ Under Linux, we can use ls -i to view the inode to which the file belongs.

image-20231130094128462

As you can see, the file directory also has its own Inode. So the file directory is also a file!

The content is the file name in the directory and the kv structure of the Inode, so we usually use the file name, and the system can find the corresponding Inode for us and perform Further steps.

So we can understand:

  • Why can’t we obtain the file contents in the current directory without R permission? The essence is that data blocks cannot be read

  • Why can’t we create files in the current directory without w permission? The essence is that we cannot write data blocks and create KV structures

  • Why can’t we enter the current directory without x permission? The essence is that we cannot enter the current directory

4. Add, delete, check and modify files

  • What does the system do when creating a new file?

    First, create the corresponding KV structure in the current directory and assign the corresponding Inode to its file name.

    Determine which partition is stored based on the Inode, and set the corresponding Inode BitMap to 1, indicating that the Inode has been used.

    Enter the InodeTable to modify its permission information and allocate disk blocks to it.

    Enter Block BitMap to modify usage

    Enter the DataBlock to modify the file content

  • What does the system do when a file is deleted?

    Store the current path, then traverse from the current directory to the root directory, and then traverse the stored path back to get the Inode of the file.

    Determine which partition is stored based on the Inode.

    Just set the corresponding BitMap to empty. There is no need to manually delete the contents in the DataBlock.

  • What does the system do when searching for a file?

    Store the current path, then traverse from the current directory to the root directory, and then traverse the stored path back to get the Inode of the file.

    Determine which partition is stored based on the Inode.

  • What does the system do when modifying a file?

    Store the current path, then traverse from the current directory to the root directory, and then traverse the stored path back to get the Inode of the file.

    Determine which partition is stored based on the Inode.

    Search InodeBitMap (to determine whether the file exists), InodeTable (to access file information), BlocksBitMap (to determine whether the file content exists), and DataBlocks (to access file content) in order

5. Soft links and hard links

5.1 Soft links

We can create soft links using the following command

ln -s src target

A soft link with target pointing to src will be created.

For example:

We created a file named hello.txt in the current directory with the content hello.

When creating a soft link, the src file needs to have an absolute path, otherwise an inaccessible error will occur.

implement

ln -s $PWD/hello.txt soft-hello

image-20231130100804302

Let's check its inode attribute. If the inode is different, it is a different file.

The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

So let’s access the content of this soft link

image-20231130101512355

The content is the same.So is this a copy of the src file?

We execute a loop script and keep entering "hello" into src, which is the hello.txt file.

while :; do echo "hello" >> hello.txt ; done	

After waiting for a few seconds, there are already many strings in the hello.txt file. Let’s check the size of each file.

image-20231130103148601

This is not a copy. can be understood as a "shortcut", and its file content is the path of the corresponding linked file. Every time the file is accessed, it is actually passed through The path in the content of this file accesses the src file.

5.2.1 Soft link application scenarios

Its application scenarios are similar to shortcuts under Windows.

When we hide the executable file of a file in a deep path, we can place a soft link pointing to the file in a convenient place so that it can be accessed quickly.

For example:

image-20231130104446745

There is an executable program of myls in this path, which I want to be able to execute directly in home.

So

ln -s $PWD/myls ~/mylsq

image-20231130105512383

At this time, there is no need to bring an absolute path to execute.

5.2 Hard links

We can create hard links using the following command

ln src target		

Similarly, we create a hello.txt with a hello section in it

ln hello.txt hard-link

image-20231130143806897

If you print these two paragraphs separately, you will find that the contents in the two paragraphs are the same.

image-20231130143900504

Similarly, let's execute the above loop script again to see what happens

image-20231130144031755

The contents of both files are increased.

Therefore, a hard link is a reference to the original file, similar to & in C++. Modifying the hard link file will cause the original file to be modified.

The number after the permission indicates the number of hard links to the current file, similar to a reference count.

The contents of a folder are the KV structure of file names and inodes. The essence of hard links is that multiple file names point to the same inode.

5.2.1 Application scenarios of hard connection

Hard links have fewer application scenarios.

But it can save space because it does not occupy additional data blocks and can protect specific data. This inode will only be deleted when all files pointed to by the file inode are deleted.

The... and . in the folder represent the upper-level directory and the current directory respectively. They are all hard links created to the target folder. So we can freely use relative paths to access files.

image-20231130145233608

Each subfolder under the current folder will have a hard link to it.
structure. The essence of a hard link is that multiple file names point to the same inode

5.2.1 Application scenarios of hard connection

Hard links have fewer application scenarios.

But it can save space because it does not occupy additional data blocks and can protect specific data. This inode will only be deleted when all files pointed to by the file inode are deleted.

The... and . in the folder represent the upper-level directory and the current directory respectively. They are all hard links created to the target folder. So we can freely use relative paths to access files.

[External link pictures are being transferred...(img-anLwkhlV-1701877678457)]

Each subfolder under the current folder will have a hard link to it.
image-20230905164632777

Guess you like

Origin blog.csdn.net/qq_62839589/article/details/134843933