C# Stream and IO Detailed Explanation (5) - Detailed process of reading files

【Foreword】

What we are talking about here is blocking reading and writing files. It only talks about the main processes, and does not include the more detailed processing in each process to deal with different situations.

【Read file】

  • In application (user layer)
    • C# calls the FileStream.Read related interface, passing the file handle FileHandle, the size of the data to be read, count, the array startIndex where the data needs to be placed, and other parameters to the ReadFile function of the C interface.
    • A system call is initiated in the C function, that is, the library function read() is called (a simple understanding of the system call: the operating system is the manager of the application program, and the system call is the communication interface provided by the operating system to the program, similar to the object Manager we often write The interface provided to the object instance is the same)
  • System call (entering the kernel layer)
    • CPU soft interrupt: When the CPU is running the application, executing the code, and then executing the code for reading the file, it finds that this is a system call function and it is an IO operation for reading the file, so a soft interrupt occurs. The CPU saves the current process information and runs other processes.
      • If there is no interruption, then the CPU needs to poll the IO device status and wait for the IO device to complete the request to read data. The CPU can do nothing during this period, which is a huge waste of CPU resources.
      • Although the CPU switching process will also consume a certain amount of time, this is much smaller than waiting for general IO devices to complete requests.
      • Generally speaking, the speed of the CPU is several orders of magnitude higher than that of the IO device, but if there is a fast IO device, then the CPU interrupt and switching processes may take more time than polling, so CPU interrupt is not necessarily better than polling. When programming, avoid a large number of IO requests in a short period of time, otherwise the CPU will continue to be interrupted, overloading the operating system and causing livelock.
      • Interrupts are generally not used in network scenarios because a large number of network packets will be received in a short period of time and interruptions will cause livelock.
    • Virtual file system VFS: There are many implementations of file systems. The virtual file system is an abstract file system model that provides a common interface. The operating system only needs to call the interface and does not need to care about the specific implementation of the file system. Similar to the base class we often write in programming.
    • Page Cache: If the data to be read happens to exist in the page cache, then copy the data from the kernel state to the memory of the user process.
    • File system: The file system provides a specific implementation of VFS. If there is no data to be read in the page cache, the file system's specific implementation of the read interface will be called.
    • Through the block management layer: The general block layer provides a unified interface for file system implementers to use without caring about the differences between different device drivers, so that the implemented file system can be used for any block device. After abstracting the device, whether it is a disk or a mechanical hard disk, the file system can use the same interface to read and write logical data blocks. The general block layer will issue an IO scheduling request
    • IO scheduling layer: IO scheduling requests issued to the general block layer may not be executed immediately. Other applications may also have issued IO scheduling requests not long ago. These IO scheduling requests will be cached, and the IO scheduling requests will be issued uniformly when the operating system deems it appropriate. When is the right time? This is the problem that IO scheduling algorithms need to solve. Common scheduling algorithms include first come first served (FCFS), shortest seek time first (SSTF), elevator algorithm, etc.
    • Device driver: The operating system needs to complete the interaction with the hardware device, and the device driver is responsible for this part of the function. The operating system provides a standard interface, and the device manufacturer is responsible for writing the specific implementation of the device driver. This allows the operating system to be compatible with devices with the same functions produced by different manufacturers without having to care about specific implementation details. Therefore, any device plugged into the system will first go through the steps of installing the driver. We know that interfaces have a general disadvantage, that is, they cannot use the special functions of the interface implementer, because the essence of the interface is to abstract the majority rather than all. For example, in programming, if there are many subclasses, there will always be a subclass with a special method, but most subclasses do not. This method cannot be exposed on the interface for use by interface callers. In terms of hardware, this means that a certain device has a special function, but the operating system cannot use it. For file systems, the driver here is the disk driver.
  • hardware layer
    • Find the sector number where the data is located
      • Disk includes hard disk, which is divided into mechanical hard disk HDD and solid state disk SSD. Disk has a certain number of sectors. There are n sectors on the disk, numbered from 0 to n-1, which is also the address space of the disk.
      • Generally, the memory size of a sector is 512 blocks. The manufacturer guarantees that the read and write of a single 512 bytes is atomic, that is, either the complete read and write is completed, or it is not completed, and there is no other possibility. The size of a general operating system read and write at a time is 4kb (or more)
    • The disk drive will find the track where the data is based on the sector number to be read
      • There is a fixed mapping relationship between sector numbers and specific tracks. Knowing which sector the data is in, the disk drive can know which track the data is in. But you still need to move the disk arm to the track to read the data. This is a seek process. If the data is on different tracks, you need to switch between multiple tracks. Therefore, there is a seek time.
      • There are multiple sectors on a track. After the disk arm moves to the track, it needs to wait for the disk to rotate to a certain sector before data can be read, so there is a rotation time.
    • Magnetic head reads and writes data

[Detailed process in more detail]

Interruption process:

  • The hardware will provide different execution modes to assist the operating system. In user mode, applications cannot fully access hardware resources. In kernel mode, the operating system can access all resources of the machine. Instructions for trapping into the kernel and returning to user mode from a trap are also provided.
  • System calls have clear calling conventions. The parameters and return values ​​must be placed in which registers, and the trapped instruction must be executed. This requires manual coding in assembly.
  • When the machine starts, the operating system will set up a trap table, which establishes a mapping relationship between trap instructions and trap handlers. The operating system informs the hardware of the location of the trap handler through a special instruction. All this is done in kernel mode
  • When an interrupt is sent, the operating system obtains control of the CPU and the current process becomes blocked (it becomes ready after the data reading is completed). The operating system determines the process to be run next through the scheduling algorithm, and then returns to the user state through the trap return instruction.
  • If there are no interrupts caused by system calls, the operating system will also gain control of the CPU when a clock interrupt occurs.

Memory and device interaction:

  • DMA (Direct Memory Access): Data transfer is required between the memory and the device. Originally this part was completed by the CPU. In order to improve efficiency, this matter was left to DMA.
    • When reading data: The operating system tells DMA the starting address where the read data is stored, the size, and which device to read from.
    • When the DMA task is completed, an interrupt request will be thrown to tell the operating system that the data transfer task is completed.
  • interactive mode:
    • Privileged instructions: These instructions specify methods for the operating system to send data to specific device registers. These methods specify the protocol for the interaction between the two parties.
    • Memory mapped I/O: The hardware provides the device register as a memory address. When the device register needs to be accessed, the operating system reads or writes to the memory address.

How to find sectors:

  • The file system records information about each file, including the size of the file, which data blocks the file data is located in (i.e., sectors), file owner, access rights, access and modification times, and other information. This information is in an inode data structure.
  • The inode information of all files must also be saved on the disk. Each file and folder has an inode number (inumber). The inode information can be obtained through inumber. This constitutes an inode table, which is retrieved from the disk when the file system is initialized. loaded into memory. File system initialization is generally performed when the operating system is started, that is, when the computer is turned on.
  • Folders also have corresponding inodes, which contain a list of (folder name or file name, inumber). You can find the folders and files under the folder (i.e. directory), which form a directory tree.
  • When a file is opened, the inumber of the file will be found recursively based on the incoming path and directory tree, and the inode information of the file will be found from the inode table.
  • When a file is opened, a data structure of a file object is created. This data structure stores the inumber of the file to be read and returns a file descriptor (also called file handle in Windows).
  • When reading a file, find the file object according to the file handle, find the inumber of the file, go to the file inode, and find the data of which sectors to read based on the StartIndex and length to be passed in.
  • In multiple processes, in order for different processes to share the file object, there will be an extra layer of packaging. Each process has a file descriptor and a descriptor table. Multiple processes share a file table. The descriptor table points to a file object in the file table.

[Reference] 

"Introduction to Operating Systems"

Guess you like

Origin blog.csdn.net/enternalstar/article/details/133172685
Recommended