[OS Notes] File Management 1

The content is based on the notes made during the class review, and the illustrations are mostly from the screenshots of the course

insert image description here

sequence

insert image description here
Figure: File function of the operating system

The file system is used to manage external storage (removing users from touching the underlying complicated details)

  • Convenient for users
  • Document security is guaranteed
  • Effectively improve the resource utilization of the system

insert image description here

Need to consider:

  • How the data inside the file is organized (unstructured and structured)
  • How to organize files (tree, hierarchy, directory)
  • What functions should the OS provide to facilitate users and applications to use files (5 basic file operations + open and close, etc.)
  • How to store files on external storage

fivebasic file operations

  • Create a file
  • Delete Files
  • read file
  • write file
  • Set the read/write location of the file (change sequential access to random access)

Other operations include opening openand closing close, file attribute operations, and directory operations


The logical structure of the file:

  • According to whether it is structured or not (structured files and unstructured files)
  • By organization (order, index, index order)

The physical structure of the file:

  • stored consecutively on disk
  • Discretely stored on disk (read first order)
  • Management of free disk blocks

basic concept

The management of files by the operating system is divided into two aspects:

  • Management of non-free disk blocks (physical structure/allocation of files)
  • Management of free disk blocks (file storage space management)

The smallest unit of disk reading and writing is a sector. The size of a sector is generally 512B, and multiple sectors are combined into a data block (logical block)

block = disk block = physical block

The basic operation unit of the file system is the data block, and the memory block and the disk block always correspond to each other during IO.

It is precisely this correspondence that the user can use to (逻辑块号, 块内地址)represent the logical address of the file, and the mapping from the logical address to the physical address of the file is the responsibility of the operating system.

In most cases, the disk block size is the same as the memory block (or memory page) size

Now that we know that the storage of files depends on data blocks one by one, thenThe physical structure of the fileThe key issue is nothing more than which data blocks a file contains and the sequence association between data blocks.


There are three ways to physically structure/allocate files:

  • continuous allocation
  • Link assignment (divided into explicit link and implicit link)
  • index allocation

From the perspective of physical space, it is divided into

  • Continuous space allocation (storage) method
  • Discontinuous space allocation (storage) method

Three logical structures (file organization)

insert image description here

insert image description here

sequence file

insert image description here
Only fixed-length sequence structures can be accessed randomly, but variable-length sequence structures cannot. The string structure has to be visited from the beginning every time

The key to random access is whether it can be directly based on iii presumediii -bit element address

It is difficult to add and delete sequential files (the string structure is slightly simpler)

index file

Composed of tables + actual data , the data can be stored discretely, and the index table itself isSequential files with fixed-length records

It doesn't matter even if the records are of variable length, the entries in the index table are of fixed length. found the iiItem i can get the first address and length of the record. This way the search speed is fast.

Multiple index tables can be created based on different keywords

If the index entry is not significantly smaller than the record entry, the storage space utilization rate is not high (large storage overhead)

index sequence file

The product of the combination of sequential files and index files

Indexed sequential files are a combination of indexed and sequential file ideas. In index sequential files, an index table is also created for the file, but the difference is that not each record corresponds to an index entry, but a group of records corresponds to an index entry (usually the first in each group record the corresponding index entry)

If the file is particularly large, multi-level indexes can be used to further optimize query efficiency

File Directory

insert image description here
insert image description here

Files are managed by FCB (similar processes use PCB), and files correspond to FCBs one by one

  • An ordered set of FCBs is called a file directory
  • An FCB is a file directory entry
  • After the file is changed, the corresponding FCB must also be changed
  • A directory of files is also seen as a file (directory file)

FCB contains the basic information of the file (file name, physical address, logical structure, physical structure, etc.), access control information (whether readable/writable, list of users prohibited from accessing, etc.), use information (such as the establishment of the file time, modification time, etc.)

The most important, the most basic isfile nameThe physical address where the file is stored

Directory operations:

  • Search: When the user wants to use a file, the system needs to search the directory according to the file name to find the directory item corresponding to the file
  • Create a file: When creating a new file, you need to add a directory entry to the directory to which it belongs
  • Delete file: When deleting a file, you need to delete the corresponding directory entry in the directory
  • Display directory: users can request to display the content of the directory, such as displaying all files and corresponding attributes in the directory
  • Modify directory: Some file attributes are saved in the directory, so when these attributes change, the corresponding directory items need to be modified (such as: file renaming)

Before creating a new file, check whether there is a duplicate name in the corresponding directory

The directed acyclic graph directory is actually a tree directory that allows "one child and multiple parents"


The original words on page P255 of the book: "When users want to create a new file, they only need to check thein its own UFD and its subdirectoriesWhether there is the same file name as the new file, if not, you can add a new directory entry in UFD or one of its subdirectories"

But the map above said that in the tree structure, files in different directories can have the same name. Personally, I think it is possible to have the same name, as long as they are not in the same directory.

Three major physical structures (file distribution method)

insert image description here
Similar to memory paging, storage units in disks are also divided into "blocks/disk blocks/physical
blocks". In many operating systems, the size of disk blocks is related to the size of memory blocks and pages.same size

Data exchange between memory and disk (that is, read/write operations, disk I/O) is performed in units of "blocks". That is, read in one block at a time, or write out one block at a time

The operating system allocates storage space for filesin blocksof

Users operate their own files through logical addresses, and the operating system is responsible for the mapping from logical addresses to physical addresses

insert image description here
Supporting random access means that the disk address can be calculated instead of having to access it sequentially

continuous allocation

Files are stored in the continuous physical space of the disk and need to be recordedstarting blockandlength

Physical block number = starting block number + logical block number

It is necessary to verify the legality of the logical block number (whether it exceeds the file length)

Benefits : Highest read and write efficiency, support sequential access and direct access (that is, random access)

Disadvantages : Disk space fragmentation, difficulty in file length expansion, high overhead for compact work, and low utilization of storage space

link assignment

Divided into implicit links and explicit links, the title does not clearly indicate that it is the default implicit link

FAT and NTFS technology

implicit link

Similar to a linked list.

Features:

  • Only sequential access, not random access
  • visit section iii disk blocks requirei + 1 i+1i+1 disk I/O

advantage:

  • no external debris
  • Easy to add and delete
  • No need to know the file size in advance, easy to expand the length

shortcoming:

  • There is internal fragmentation (the internal fragmentation is larger in units of clusters)
  • No random access (access overhead is still a bit large)
  • The space is not all for storing data (it also stores pointers to the next data block)

explicit link

Explicit allocation is realized based on the file allocation table (FAT, File Allocation Table), which is resident in memory after booting (improving efficiency). Since the entry length is fixed and the table is stored continuously, the iiTable entries corresponding to i disk blocks can be calculated.

Features:

  • One disk corresponds to one FAT
  • Subscript ii of the first disk block occupied by the PCB recordi , read the disk to obtain its data, if you want to obtain subsequent data, then access the ii ofFATItem i , which records the number of the next disk block

advantage:

  • Very convenient for file expansion
  • no fragmentation issues
  • High storage utilization
  • Support random access
  • Compared with implicit links, address translation does not require disk access, so file access is more efficient

shortcoming:

  • FAT occupies a certain amount of storage space

Actual development:

  1. FAT12
  2. Cluster-based FAT12
  3. FAT16
  4. FAt32

index allocation

insert image description here

index block + data block

Features:

  • File discrete storage
  • Each file corresponds to an index table
  • The block that stores the index table is called the index block (the block is the maximum length of a single table)
  • The disk blocks where file data is stored are called data blocks

advantage:

  • easy to expand
  • Support random access

shortcoming:

  • The index table needs to occupy a certain amount of storage space

When each item in an index block represents a disk block, if it is not enough at this time, there must be an improvement method

  • link scheme
  • multi-level index
  • mixed index

The link scheme is to connect multiple index tables, but to access the second table, you must first access the first table to know where the second table is (the later the accessed table, the lower the efficiency)

Multi-level index: Create a multi-level index (the principle is similar to a multi-level page table). Make the index blocks of the first layer point to the index blocks of the second layer. The third and fourth layer index blocks can also be established according to the requirements of the file size.

With a K-level index structure, and the top-level index table is not loaded into the memory, accessing a data block only requires K+1 disk read operations

insert image description here

Logical Structure vs Physical Structure

insert image description here

The logical structure is actuallyuser perspective, users do not pay attention to how to store, but only consider how to interpret information from a bunch of bit streams. The disk (via the operating system) can present the complete bitstream to the user when the user only knows the information that needs to be used with these bitstreams.

physical structure isdisk perspective, it is used to represent a bunch of bit streams of a file, only considering how to store them, not how to interpret them. When the bitstream information is needed, it can be restored intact. The disk only knows that the user (via the operating system) gave it a complete string of bit streams, and does not know how to interpret the bit stream.

The link bridge between the two is the operating system. The operating system holds this bunch of bit streams. It must process them for the user (read and display) and tell the disk how to save them (manage the disk). In addition, the relationship between files must also be managed, that is, the "file directory"


Unstructured file:

insert image description here
insert image description here
insert image description here
insert image description here


Sequential file:

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/qq_39377889/article/details/128321688