Operating System 6 - File Management

This series of blogs focuses on sorting out the core content of the operating system course of Shenzhen University, and the bibliography "Computer Operating System" (if you have any questions, please discuss and point out in the comment area, or contact me directly by private message).


synopsis

This blog mainly introduces the relevant knowledge of the seventh chapter file management of the operating system and the eighth chapter disk storage management.

Table of contents

1. Files and file systems

2. The logical structure of the file

3. Directory management

1. File Control Block (FCB)

2. Index node

3. Various file directories

3.1 Single-level file directory

3.2 Two-level directory

3.3 Tree structure directory

3.4 Directory query technology

4. External memory allocation method

1. Continuous external memory allocation

2. Link Assignment

3. Index assignment

4. Examples of external storage allocation

5. File storage space management

1. Free list method

2. Free list method

2.1 Free disk block linked list method

2.2 Free area linked list method

3. Bitmap method

4. Group Linking Method - Key Points

6. File sharing

1. Detour file sharing method -> directed acyclic graph DAG

2. How to share index nodes

3. Symbolic link for file sharing

7. Disk Fault Tolerance Technology (SFT)

1. First-level fault-tolerant technology (SFT-I)

2. Second level fault tolerance technology (SFT-Ⅱ)

3. Third-level fault-tolerant technology (SFT-Ⅲ)

Eight, data consistency control

1. Affairs

2. Concurrency control

3. Data consistency of disk block numbers (duplicate data)


The operating system manages the system's software resources (whether it is application software or system software) in the form of files, and the operating system that undertakes this part of the function (system resource management) is called a file system. This chapter introduces the logical organization of files and the physical organization on the file storage; the structure and management of the file system directory that realizes "access by name" and file sharing and protection; the file storage space allocation and recovery algorithm and the disk format of the file system; File system security.

1. Files and file systems

The management function of the file system is to realize the program and data it manages by organizing a series of files , and the data group can be divided into three levels : data items, records, and files .

  • Data item: the lowest level of data organization, the basic data item is used to describe the character set of a certain attribute of an object (such as student number), the combined data item (group item) is a collection of several basic data items, such as salary (basic + award)
  • Record: A collection of related data items, used to describe the properties of an object in a certain aspect, using the keyword Key (one or several data items) to uniquely identify a record
  • File: A collection of related information with a file name , such as a source program, an executable binary code program, a batch of data, a table or an article, etc.

For files, it has the following basic descriptions:

Files are storable: facilitate long-term preservation of information

Users access files by name: it is convenient for users to access information without knowing the specific location of the file stored on the disk

Files mainly include unstructured files (streaming files) and structured files

1. There are structure files:

Structured files are composed of records (fixed and variable length) .

2. Unstructured files:

An unstructured file is directly composed of characters (or Byte) , which can be regarded as a special case of a record file, that is, a file containing only one unidentified record

Streaming files are convenient for OS management, and it is also convenient for users to flexibly organize the internal logical structure of files. Unix, DOS, windows and other OSes all use streaming files.

As for the file system, the organization that implements file management in the OS is called the file management system, referred to as the file system, and the introduction is as follows:

1. File system functions:

  • Implement the allocation and recovery of file storage space (that is, disk management)
  • Implement the mapping of filenames to filespaces
  • Provides file sharing capabilities as well as protection and confidentiality measures
  • Realize various file operations required by users 

2. File system interface:

  • Command interface: the interface between the user and the file system, such as searching for files (dir/ls), etc.
  • Program interface: the interface between the user program and the file system, implemented through system calls, such as open, read, write, close, etc.

2. The logical structure of the file

The logical structure of the file: that is, the file organization, which is the data and its structure that the user can directly process

The physical structure of the file: that is, the storage structure of the file, which is the storage organization form of the file on the external storage

As for the logical structure type of the file, it is divided into two types according to whether the file has a structure (with or without), and it is divided into three types according to the file organization method (sequence, index, index order)

There is structure file:

  • Sequence file: arrange records in sequence, variable length can be fixed length
  • Index file: Create an index table for variable-length records, set an entry for each record, and speed up retrieval
  • Index sequential file: create an entry for the first record of a set of records

Unstructured file:

  • streaming file

1. Sequential file:

The records of sequential files can be arranged in various orders, generally there are two kinds:

(1) String structure: the order of records is usually sorted according to the time of storage , regardless of keywords

(2) Sequential structure: records are arranged according to user-specified keywords (words) , such as positive integer ids

For the reading/writing of sequential files, the fixed-length records are directly located , and each record in the variable-length records has the record length, and the records are read/written sequentially . The best application scenario is batch access to records in the file .

  • Disadvantages: It is difficult to read/write a single record (especially for variable-length records), and it is difficult to add or delete a record

2. Index file:

For fixed-length record files, it is easy to achieve random access through calculation even for sequential files, but it is difficult to change the length.

Create an index table for variable-length record files, the index table is sorted by record key , which itself constitutes a sequence file of fixed-length records

Tips: In order to achieve retrieval by different users for different purposes, multiple index tables (corresponding to retrieval with different keywords) can be created for a file .

  • Advantages: Improve the search speed of files, and facilitate users to insert and delete records.
  • Disadvantages: In addition to the establishment of the index table, each record also needs an index item, which increases the overhead. 

3. Index sequence file:

Divide the records in the variable-length record file into several groups , create an index item for the first record in each group, and then form an index table. The index table is sorted by the record key, which itself forms a sequence of fixed-length records document

Tips: For relatively large files, multi-level indexes can be established .

The search speeds of the above three files are compared as follows:

Search speed comparison of three files: 100W records

Sequential files: the average search takes 50W (N/2) times

Index file: search by index method, additional 100W index space is required

Index order file:

The first-level index (N square root) is searched 1000 times on average

  • Every 100 records is a group, and there are 10,000 groups in total, that is, there are 10,000 group indexes
  • Every 100 group indexes is 1 group, there are 100 groups in total, that is, there are 100 group indexes
  • Create a secondary index table ((3/2) N cubic)
  • Search: group index table - group index table - group record table
  • 50+50+50=150 times 

3. Directory management

In modern computer systems, a large number of files must be stored. In order to implement effective management, they must be properly organized, mainly through file directories.

A directory is a data structure that organizes files FCB together. Sometimes it can also be regarded as a file, called a directory file. The main requirements of directory management are as follows:

  • Realize access by name: users only need to provide the file name to perform corresponding operations on the file
  • Improve retrieval speed of catalog
  • File sharing: multiple users can share the same file
  • Allow duplicate file names: Different users can use the same file name

1. File Control Block (FCB)

FCB is a data structure that describes and controls files, including basic information such as file names, access control information, and usage information

The file system operates on files based on FCB. FCB corresponds to files one by one. It is a directory entry. A collection of FCBs constitutes a directory.

1. Basic information:

  • filename: fileid
  • File physical location: Indicates the storage location of the file on the external storage
  • Logical structure of the file
  • The physical structure of the file

2. Access control information:

  • Access permissions of the file owner (read-only, read-write, execute, etc.)
  • Approve User Access
  • General user access rights

3. Usage information:

  • file creation time
  • Last modified (or accessed) time
  • current usage information

4. FCB structure:

5. MS file system:

2. Index node

When the file name is stored together with other description information of FCB, it needs to occupy more space

When operating on files, first you only need to search by name. If the file name is stored separately from other description information of the FCB , the overhead of finding the file can be saved

The FCB is too large, and the number of FCBs stored in a single disk block is too small, resulting in the need to access multiple disk blocks for file search

The FCB is 64B, the disk block is 1KB, and only 16 FCBs are stored in a single disk block

If a directory contains 640 files, 40 disk blocks are required, and an average search needs to access 20 disk blocks

  • The introduction of the index node: store the file description information in the index node, the file name and the index node pointer form a directory table, which is convenient for searching

The directory entry in the UNIX system occupies 16 bytes : 14B for the file name, 2B for the pointer

1KB disk block stores 64 directory items. The above example only needs 10 disk blocks, and the average search access is 5 disk blocks.

Core: Combine other description information of FCB to form an index node data structure (Unix OS adopts index node structure)

3. Various file directories

Different directory forms correspond to different file system management methods. There are several common file directories and directory queries as follows.

3.1 Single-level file directory

Throughout the OS, create a directory table and assign a directory entry (FCB) to each file

Status bit: whether the directory entry has been used (1-used, 0-unused) In MS-Dos, if the first letter of the file name is "0xE5", it means unused

1. Create a new file:

Check the directory, if there is no file with the same name, then find an empty directory entry (the status bit is 0), write the new file name, physical address and other attributes into the directory entry, and set the status bit to 1

2. Delete old files:

Find the directory entry corresponding to the file, reclaim the external storage space, and set the status position to 0

  • Advantages: simple
  • Disadvantages: slow search speed, no duplicate names allowed, inconvenient for file sharing

3.2 Two-level directory

In the system, create a master file directory (MFD), and create a separate user file directory (UFD) for each user, and each UFD is a directory item of MFD

  • Advantages: Compared with single-level directories, two-level directories have the following advantages:

1. Improve the speed of directory retrieval (n/2 times) 2. In different user directories, the same name can be used. 3. Different users can access the same shared file in the system with different file names 

  • shortcoming:

1. Users cannot continue to create their own new subdirectories. 2. When multi-user collaborative development, isolation becomes a disadvantage, and file sharing between users is inconvenient

3.3 Tree structure directory

In modern OS, the most common and practical file directory is the tree structure directory, which can significantly improve the retrieval speed of the directory and the performance of the file system.

A directory structure with three or more levels is called a tree-structured directory. In the tree directory structure, it is allowed to create a new subdirectory in any level of subdirectory 

1. Path:

There is only one path from the root directory to any file

Connect all directories and file names from the root directory to the file with the "/" symbol to form the path name of the file

Each file, only has a unique path name (such as B/B/L, C/I/L)

2. Current directory:

It is very troublesome if the access of each file starts from the root directory

The user's operations on files are generally concentrated in one directory for a period of time, and this directory is set as the working directory (current directory)

For file access, if there is no path name, it defaults to operating in the current directory

3. Add directory:

Users can add new directories as needed

To add a new directory, you can use file system commands (such as the md command in MS-DOS, or the mkdir command in UNIX, etc.) or system calls

4. Delete the directory: 

Users can delete the old directory as needed

1. Do not delete non-empty directories If there are files (or subdirectories) in the directory, they cannot be deleted

2. Delete non-empty directories Regardless of whether there are files or subdirectories in the directory, delete them (more dangerous)

3.4 Directory query technology

When a user wants to access a saved file, the general process is as follows:

  • Query file FCB or index node by name
  • According to the physical address (block number) of the file in the FCB or index node, it is converted into the physical location of the file on the disk
  • Start the disk driver and read the required files into memory

The main search methods are linear search method and Hash method .

The linear search method to find /usr/ast/mbox is as follows:

As for the Hash method, the core is to establish a Hash index file directory query, which is very fast. 

4. External memory allocation method

The goal of external storage space allocation is to improve external storage utilization and file access rate

Different external storage allocation methods (organization methods) will form different physical file structures, which are generally divided into three types:

(1) Continuous organization method: allocate a continuous disk space for each file to form sequential files

(2) Link organization method: allocate discontinuous disk space and concatenate through link pointers to form linked files

(3) Index organization method: adopt an index organization method for files to form an index file structure

Tips: In modern OS, files can be organized in various types.

1. Continuous external memory allocation

Continuous external memory allocation refers to the allocation of a group of contiguous disk blocks for a file (the middle can be empty)

  • Advantages: easy sequential access, faster access to a large batch of data
  • Disadvantages: Continuous external storage space is required (there is also a "fragmentation" problem), and the length of the file must be known in advance, and it cannot be deleted and inserted flexibly 

2. Link Assignment

The disk blocks belonging to the same file are linked together for allocation, which is called link allocation (allocation of disk blocks can be discontinuous), which solves the shortcomings of continuous external memory allocation.

1. Implicit link:

Put the link pointer directly in the disk block data area

Tips: To find a file, you must first search through the previous blocks. For details, see the examples in Section 4

  • Advantages: ⑴, there is no "fragmentation" ⑵, no continuous external storage space is required ⑶, no need to know the file length in advance
  •  Disadvantages: ⑴, only suitable for sequential access, not suitable for random access (low efficiency) ⑵, poor reliability (any pointer in the middle is wrong, the whole chain is broken) ⑶, the size of the data area is not equal to the power of 2 (the pointer occupies the space of the data area ), which is not conducive to corresponding to the memory page

2. Display link:

The pointer used to link each disk block of the file is explicitly stored in a link table (file allocation table FAT) (-1 is the pointer tail)

  • Advantages: ⑴, FAT occupies a small space and can be loaded into memory ⑵, fast search speed, suitable for random access Also there is no "fragmentation", no need for continuous external storage space, no need to know the file length in advance 
  • Disadvantages: ⑴, FAT needs to occupy external storage space (the allocation unit can use clusters to reduce the occupation) ⑵, FAT needs to occupy a large memory space, for a large hard disk, only part of the FAT is transferred into the memory each time ⑶, cannot be directly stored efficiently Pick

3. Index assignment

Although the link organization method solves the problem of continuous storage allocation, there are two major problems: ① inefficient direct access and ② pointer/FAT occupying a large amount of space, so the index allocation appears, because the disk number of a certain file is actually adjusted Just put the number into the memory, not the entire FAT (FAT is loaded with the block number link of all files).

Core: Collect the disk block numbers assigned to the file and put them into the index block in order 

1. Single-level index allocation:

Each file is allocated an index block first, and when the allocated index block cannot hold all disk block numbers, a new index block is allocated

  • Pros: Convenient direct access
  • Disadvantages: ⑴. Compared with the FAT method, it takes up more external storage space (the index block is not full) ⑵. When the file is large, the efficiency of single-level index allocation is still low 

1. Two (multiple) levels of index allocation:

The information stored in the disk block (index block) indicated by the file is the disk block number of the next level index block

  • Advantage: For larger files, direct access can still be used
  •  Disadvantages: Compared with FAT and single-level index allocation, more external storage space is required. When the file is particularly large, external storage space cannot be allocated

3. Hybrid (incremental) index allocation:

In order to take care of various operations more comprehensively, multiple organization methods can be combined to form the physical structure of the file. The maximum length of the file is the superposition of the maximum length of each organization method .

If the size of each disk block (external storage space size) is 4KB, and the address of each disk block is 4B (32-bit, 4G disk block):

Therefore, it can support 4KB/4B = 1K disk block numbers , each level of index storage

  • Direct address (allocation): set 10 direct address items for small files , directly store the disk block number, the largest file is: 10x4KB = 40KB
  • One-time indirect address (single-level index allocation) ⑴. When the file is larger than 40KB, one-time indirect address is used, and 1K disk block numbers are stored. 4.04MB
  • Secondary indirect address (secondary index allocation) ⑴. When the file is larger than 4.04MB, the secondary indirect address is used to store 1K disk block numbers. ⑵. The largest file of the secondary indirect address is: 4.04MB + 1Kx1Kx4KB = 4.04MB + 4GB
  • Three (multiple) indirect addresses (multi-level index allocation) ⑴. When the file is larger than 4GB, three (multiple) indirect addresses are used, and 1K disk block numbers are added each time. = 4GB + 4TB

4. Examples of external storage allocation

There is a 20MB large file and a 20KB small file in a file system. When continuous, link, secondary index and mixed index allocation schemes are used respectively (the size of each block is 4096B, and the address of each block is represented by 4B), ask:

1. What is the largest file managed by each file system?

2. How many dedicated blocks are required for each scheme to record the physical address of the file (explain the purpose of each block) for the large and small files?

3. If it is necessary to read the first 5.5KB and the latter (16M+5.5KB) information of a large file, how many disk I/O operations are required for each scheme?

Number of blocks = 4K / 4B = 1K

The results are summarized as shown in the figure below. The UNIX mixed allocation scheme can manage large files and quickly access files at a small cost.

1. The largest files managed by each file system are as follows:

  • Continuous: unlimited, can be as large as the entire disk file area
  • link: ditto
  • Secondary index: 1K * 1K * 4KB = 4GB
  • Mixed index allocation: UNIX mixed allocation 40KB+4MB+4GB+4TB

2. The answer is as follows:

  • Continuous: For both large and small files, you only need to set two items in the file control block FCB, one is the first physical block number, the other is the total number of blocks in the file, no special block is required to record the physical address of the file
  • Link: For both large and small files, you only need to set two items in the file control block FCB, one is the first physical block number, and the other is the total number of blocks in the file; at the same time, set storage in each physical block of the file Pointer to the block number of the next block.
  • Secondary index: For large and small files, the secondary index is fixed. For small files of 20KB, use one as the first-level index and another as the second-level index. Two dedicated physical blocks are used as index blocks. For large files of 20MB , use one block as the first-level index, use five blocks as the second-level index, and share six dedicated physical blocks as index blocks.
  • Mixed index allocation: For small 20KB files, only the first 5 entries (direct address items) in i_addr[13] of the file control block FCB are used to store the physical block number of the file, and no dedicated physical block is required. For a large file of 20MB, the first 10 entries in FCB's i_addr[13] are used to store the first 10 physical block numbers of the large file, and the first-level index block is used to store the next 1K block numbers of the large file, and the second-level index block is also used Index the block numbers after storing large files. The secondary index uses 1 block for the first-level index and 4 blocks for the second-level index. A total of 6 dedicated physical blocks are also required to store the physical address of the file.

Tips: A level-1 index dedicated block = 4MB

3. The analysis is as follows:

5.5KB / 4KB = 1st block 16MB + 5.5KB/ 4KB = 4097 blocks

  • Continuous: In order to read the front and back information of a large file, it is necessary to first calculate the relative block number of the information in the file, and then calculate the physical block number = the first block number of the file + the relative logical block number, and finally spend a disk I/O operation to read the block Information, that is, each read of information takes one I/O operation .
  • Link: To read the 5.5.KB information in front of a large file, you only need to read the header block once to get the block number of the block where the information is located, and then read the logical block No. 1 to get the required information. To read the 16MB + 5.5KB information behind the large file, you must first read the blocks in front of the block where the information is located . It takes a total of 4097 disk I/O operations to get the block number of the block where the information is located. Finally, it takes one I/O operation. Read the block information. So a total of 4098 disk I/Os are needed to read (16MB+5.5KB) byte information.
  • Secondary index: Each time you read information, you read the first-level index and the second-level index respectively, and then read the information, that is, each time you read the information, it takes three I/O operations
  • Mixed index allocation: 5.5KB belongs to small files, so direct address addressing (1 time), 16MB+5.5KB belongs to secondary index, so 3 times.

Detailed reference: Operating system error record_There is a 20mb large file and a 20kb small file in a file system, when using continuous, link, i-node respectively (_MarshaZheng's blog-CSDN blog

There is a FAT as shown below, where -1 means the end of the file, -2 means the disk block is broken, and 0 means the disk block is empty. How many files are there in the FAT table? Which disk blocks exist for each file? When saving a file with 2 disk blocks, what is the content of FAT?

3 -1 = 3 files

Which disks to put -> push back from each -1

(1) A file: 2, 9, 1, 3 (2) B file: 11, 5 (3) C file: 7, 4, 12

Also save a file with 2 disk blocks, 0 in 6 becomes 10, 0 in 10 becomes -1

5. File storage space management

In order to implement any file organization method (external storage allocation method) in Chapter 4, it is necessary to allocate disk blocks for the file. At this time, it is necessary to know which disk blocks can be allocated, and at the same time provide means for disk block allocation and recycling . This is the file storage space management method, and several common methods are introduced as follows.

1. Free list method

The free list method is a continuous allocation method, and the free list method refers to creating a free list for all free areas on the external memory

  • Allocation algorithm of free disk area (similar to dynamic allocation of memory): 1. First adaptation algorithm 2. Cycle first adaptation algorithm 3. Best adaptation algorithm 4. Worst adaptation algorithm
  • Algorithm for recovering free extents: splicing adjacent extents 

There are less continuous allocations in memory allocation, but in the allocation of external memory (files), for the swap area (often accessing in large quantities) and the small file area , the free table method can be used to speed up.

2. Free list method

Pulling free disks into a free chain can be divided into two forms according to the basic units that make up the chain.

2.1 Free disk block linked list method

Connect all the free storage space on the disk into a linked list with the disk block as the basic unit

When the user needs disk space, the disk block is allocated from the head of the linked list. When the user releases disk space, hang the released disk block at the end of the linked list

  • Advantages: disk block allocation and reclamation operations are simple
  • Disadvantages: The linked list of free disk blocks may be very long, and any pointer in the middle of the free disk block linked list is wrong, resulting in the loss of all subsequent free disk blocks 

2.2 Free area linked list method

Link all free extents (multiple contiguous disk blocks) on the disk into a linked list

When the user needs disk space, the first adaptation algorithm is used to allocate the disk area along the linked list. When the user releases disk space, insert the released extent into the linked list to adapt to the position (splicing may be required)

  • Advantage: the free extent linked list may be shorter
  • Disadvantages: Extent allocation and reclamation operations are more complicated. Any pointer error in the middle of the free extent linked list will cause all subsequent free extents to be lost.

3. Bitmap method

The bit map refers to the use of a binary bit to indicate the usage of a disk block in the disk , and 01 is used to indicate whether it is free.

Tips: All disk blocks on the disk have a binary bit corresponding to it, the length x width of the bit map = total number of blocks

  • For the allocation of its disk blocks, the process is as follows:

1. Scan the bitmap to find a block with a set value of 0

2. Convert the scanning location information into a disk block number (n) For example, the i-th byte and the j-th bit, then n = ix 8 + j

3. Modify the bit diagram so that the corresponding bit of the newly allocated disk block is 1 

  • For the recovery of its disk blocks, the process is as follows:

1. Transform the block number (n) into location information i = n / 8; j = n mod 8

2. Modify the bit diagram so that the i-th byte and the j-th bit are 0

  • Advantages: 1. The bitmap takes up less space and can be loaded into memory 2. The operation of allocation and recycling is simple and effective

Therefore, the bitmap method is often used in microcomputers and minicomputers, such as MS OS systems

4. Group Linking Method - Key Points

Both the free list method and the free linked list method are not suitable for large file systems, which will make the list or linked list too long. The group link method is used in the UNIX system, which combines the advantages of the above two methods .

Form a group of N (such as 100) empty disk blocks, and connect the free disk block groups into a linked list

Free disk block number stack: store a set of currently available free disk block numbers

Tips: s.free(0) is the bottom of the stack, and s.free(99) is the top of the stack when the stack is full. 

Tips: Only the module corresponding to the bottom of each stack is an empty disk block group, for example, 300 corresponds to 301->400, and 299 corresponds to an empty disk block.

  • For the allocation of disk blocks, the process is as follows:

1. If s.free(N)=1 (the current free disk block number stack has been allocated), transfer the disk block (content) corresponding to the stack chassis block number into the free disk block number stack, and transfer the disk block distribute

2. Otherwise, directly allocate the disk block corresponding to the disk block number on the top of the stack, and decrement s.free by 1 

  •  For the recycling of disk blocks, the process is as follows:

1. If s.free=N, write the content of the free disk block stack into the newly released disk block, and make s.free=1, and use this disk block as the bottom of the stack. 2. Otherwise, s.free increments by 1 , use the block as the top of the stack

Detailed allocation and recovery cases can be found at:

Operating System - Group Linking Method - LengDanRan's Blog - CSDN Blog

Examples can be seen: examples explain group link method_Ajay666's Blog-CSDN Blog 

6. File sharing

In a modern computer system, file sharing means must be provided for multiple users to share the same file, and only one copy needs to be saved in the system to save storage space.

1. Detour file sharing method -> directed acyclic graph DAG

This method is a traditional tree directory method, and the file sharing is poor.

The working directory of each user is the current directory. If the file is not in the current directory, start from the current directory and look up (parent directory) or down (subdirectory)

If the current directory is PA, to find the B file in JBK, you can use cd ..\D\B

A directed acyclic graph DAG allows a file to have multiple parent directories.

However, it is difficult to link subdirectories and multiple parent directories in DAG , and there are troubles in modification.

2. How to share index nodes

The directory entry only contains the file name and the pointer to the index node. Different directory entries sharing the same file have different file names but the same pointer to the index node . Set the link counter to indicate the number of shared directory entries 

An example of creating and deleting is as follows:

  • Advantages: Any user who shares the same file can modify the file, and others can share it
  • Disadvantage: Even if the user who created the shared file has deleted his file, if there are other users sharing the file, the owner of the file will not change (the creating user must continue to pay for the file) 

3. Symbolic link for file sharing

When users share files, a symbolic link (Link) is used. The symbolic link is a pointer to the location of the directory item where the file is located (such as the path name in the same system, or the IP address and path name of other hosts in the network) 

The file owner or the file does not know how many users are sharing the file. The file owner can completely delete the shared file. When the shared file user accesses a deleted shared file, an access error will occur.

  • Advantages: can share shared files anywhere on the network
  • Disadvantage: slower access speed (search along the chain) 

7. Disk Fault Tolerance Technology (SFT)

When there is a defect or failure in a certain part of the unrealized disk, (1) the disk can still work normally (2) it will not cause data errors and loss. Disk fault tolerance technology (SFT) is required, which is divided into 1-3 levels from low to high.

1. First-level fault-tolerant technology (SFT-I)

SFT-I mainly prevents data loss when disk surface errors occur .

1. Double directory and double file allocation table (FAT):

Directory (file name and file data, file attributes) and file allocation table (file location) are the most important data in the file system

Store two directories and file allocation tables in different places on the disk

The operation is generally carried out on the main directory and the main FAT, and the modified data is regularly stored in the backup directory and FAT. When the disk is damaged, start the backup directory and backup FAT, and create a new directory and FAT

Every time you start up, check the consistency of the two directories and FAT

2. Hot fix redirection:

Establish a hot repair area in the disk (a small part of the disk capacity, such as 2%~3%). When a disk block is damaged, the data will be written into the hot repair area. Every time the damaged disk block is accessed, it will automatically transfer For access to disk blocks in the corresponding hotfix area

3. Read-after-write verification:

Write the disk block from a memory buffer A, and immediately read the contents of the disk block to the memory buffer B, and compare whether the contents of the two buffers A and B are consistent. If they are inconsistent, do it again; if they are still inconsistent, then Think that the disk block is damaged, and write the content to the hot repair area

2. Second level fault tolerance technology (SFT-Ⅱ)

SFT-II mainly solves the problem that data cannot be read and written normally when the disk drive fails .

1. Disk mirroring:

Connect two disk drives to one disk controller. Every time data is written to the primary disk, it is also written to the backup disk, and written to the same location. All data on the primary disk and the backup disk are exactly the same as the primary disk drive. When damaged, enable backup disk

2. Disk duplex:

Connect the two disk drives to the two disk controllers. Every time the data written to the primary disk is also written to the backup disk, and written to the same location, all the data in the primary disk and the backup disk are exactly the same. When the disk system is damaged, enable the backup disk system

For the two SFT-II, the analysis is as follows:

  • Pros: fault-tolerant
  • Disadvantages: serious waste (only 50% efficiency), I/O speed has not improved

3. Third-level fault-tolerant technology (SFT-Ⅲ)

Based on cluster technology to achieve fault tolerance, there are three main mode technologies: dual-machine hot backup, dual-machine mutual backup, and common disk .

Eight, data consistency control

In practical applications, multiple files often contain the same data, and data consistency control ensures that the same data in multiple files is the same under any circumstances . There are several core issues:

  • When the data is stored in different files at the same time, how to ensure that the modification of the data is consistent in each file
  • How to keep multiple backups of files consistent
  • Sequential modification of shared data

1. Affairs

1. Business:

A transaction is a program unit for accessing and modifying various data items . Transaction operations are "atomic" , that is, either all data is modified or none of the data is modified

2. Transaction records:

The transaction record records all the information of the data item modification when the transaction is running, and must be stored in the stable memory, including: ⑴, transaction name ⑵, data item name ⑶, old value ⑷, new value

3. Recovery algorithm:

  • Undo(Ti): restore all data modified by transaction Ti to the old value
  • Redo(Ti): Set all data that needs to be modified by transaction Ti to a new value

Transactions achieve data consistency through transaction records and recovery algorithms

2. Concurrency control

In a real computer system, multiple users execute different transactions at the same time, so it is necessary to control the order of transactions, that is, concurrency control . The general implementation is based on "locks".

1. Use mutex to achieve "sequence":

Set a "lock" for each object To access the object, you must first obtain the "lock" (close the lock) After accessing the object, release the "lock" (unlock)

2. Use the semaphore mechanism to achieve "sequence":

Use mutual exclusion semaphore (Mutex) to realize the sequential access of objects Use general semaphore to realize the record of visitors

3. Data consistency of disk block numbers (duplicate data)

  • Free disk blocks: realized through the free disk block management mechanism
  • Data disk block management: realized through directories and FAT (or index nodes)

Tips: Under normal circumstances, free disk blocks and data disk blocks are complementary, that is, disk blocks are either free disk blocks or data disk blocks

  • Free disk block counter: through free disk block management, record the situation of free disk blocks, each disk block corresponds to a counter (1-free disk block, 0-non-free disk block)
  • Data disk block counter: record data disk blocks through FAT (or index node), each disk block corresponds to a counter (1-data disk block, 0-non-data disk block)

 

Guess you like

Origin blog.csdn.net/weixin_51426083/article/details/131430820
Recommended