How does Linux read a file/how does the data recover?

First and Foremost is written at the forefront.
This article will continue to be updated. I am learning the Linux multi-process multi-threading of Mr. Zhu Youpeng, which is the most simple understanding of Linux_os

Linux read file process

File IO efficiency and standard IO

In Linux, it is called API, in C language it is called library function

(1) File IO refers to the open, close, write, read and other API functions we are currently talking about

It constitutes a system for reading and writing files. This system can complete file reading and writing very well, but the efficiency is not the highest.

(2) Application layer C language library functions provide a list of functions used to read and write files, called standard IO.

Standard IO consists of a series of C library functions (fopen, fclose, fwrite, fread),

These standard IO functions are actually encapsulated by file IO

(In fact, fopen calls open, and fwrite uses write to complete file writing).

After standard IO is encapsulated, it is mainly to add a buffer mechanism buffer at the application layer .

In this way , the content we write through fwrite does not directly enter the buf in the kernel.

Instead, first enter the buffer maintained by the application layer standard IO library.

Then the standard IO library itself chooses a good time according to the best count of a single write of the operating system to complete the buf write to the kernel

(The buf in the kernel is then selected according to the characteristics of the hard disk and finally written to the hard disk).

As shown in the figure: the
Document flow
first buffer is in the application layer, and the second buffer is in the kernel.

After talking about the file reading process, let's take a look at how the file is stored in the hard disk:

File storage in hard disk

(1) Files are usually stored in the hard disk. The files stored in the hard disk are stored in a fixed form, which we call static files.

(2) A hard disk can be divided into two major areas:

One is the hard disk content management table entry, and the other is the area where the content is actually stored.

When the operating system accesses the hard disk, it first reads the hard disk content management table.

Find the sector level information of the file we want to access,

Then use this information to query the area where the content is actually stored, and finally get the file we want.

(3) The first information the operating system gets is the file name, and the final result is the file content.

The first step is to query the hard disk content management table. This management table records various information of each file in units of files.

Each file has an information list (we call it inode, i-node, which is essentially a structure.
This structure has many elements. Each element records some information about the
file, including the file name and the file on the hard disk. Corresponding sector number, block number and other things...)

Emphasize: The hard disk management is based on files, each file has an inode,

Each inode has a digital number, corresponding to a structure, and various information is recorded in the structure.

Opened files and vnodes in memory

(1) The running of a program is a process, and the file we open in the program belongs to a process.

Each process has a data structure used to record all the information of this process (called process information table),

A pointer in the table will point to a file management table,

The file management table records all the files opened by the current process and their related information.

The index used to index each open file in the file management table is the file descriptor fd,

What we finally found is a management structure vnode of the opened file

(2) A vnode records various information about an opened file,

And we only need to know the fd of this file,

You can easily find the vnode of this file and perform various operations on this file.
conclusion

  1. Create a process, get a process information table
  2. Get the fd of the file according to the location of the file stored on the hard disk (file describe)
  3. As long as a file is opened, a file management table vnode is created, which records various information of the opened file
  4. With the above three information, you can read and write the file.
  5. In order to avoid multiple accesses to the hardware, a caching mechanism is established. Each cache is 2M, and the location you want to modify is located through the lseek function.
  6. Two processes share the realization of the same file: the o_append attribute.

Principles of Data Recovery

Contact the usual practice, when you format the hard disk (U disk), you find that there are: quick format and low-level format.

Quick formatting is very fast. It only takes 1 second to format a 32GB U disk. Normal formatting is slow.

The difference between these two? In fact, quick format only deletes the hard disk content management table (inode) in the U disk.

The actual stored content has not changed. This formatted content may be retrieved.

The bottom-level formatting is difficult to retrieve. It needs to be retrieved through flash memory, which is very expensive.

Please criticize and correct

over

Guess you like

Origin blog.csdn.net/Vast_Wang/article/details/102263198