[Linux] An article to get basic IO related knowledge and operations

Basic IO

1. C language function operation file

C language file operation blog

C library IO function Function description
FILE *fopen(const char *path, const char *mode); Open a stream through a file stream pointer and associate it with the file
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream); Write the data pointed to by ptr into the stream pointed to by the stream pointer
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream); Read data from the stream pointed to by the stream pointer and save it at the location pointed to by ptr
int fseek(FILE *stream, long int offset, int whence); Reposition the file stream pointer offset by offset characters from wherece as the starting position
int fclose(FILE *stream); Close the stream pointed to by the file stream pointer

2. System call function operation file

System call function Function description
int open(const char *pathname, int flags, mode_t mode) Convert the file name to a new file descriptor, and manipulate the file through the file descriptor
ssize_t read(int fd, void *buf, size_t count); Read the contents of the file to the designated space and save
ssize_t write(int fd, const void *buf, size_t count); Write the contents of buf to the file
off_t lseek(int fd, off_t offset, int whence); Reposition the offset of the file associated with the file descriptor to the parameter offset
int close(int fd); Close a file descriptor

2.1 open function

int open(const char *pathname, int flags, mode_t mode)

Function: Convert the file name to a new file descriptor, and manipulate the file through the file descriptor

head File:

  • fcntl.h

parameter:

  • pathname: the name of the file to be opened (path + name)

  • flags: open in a certain way, defined as macros in the source code

    Required macro (choose one) meaning
    O_RDONLY Open as read-only
    O_WRONLY Open write only
    O_RDWR Open for reading and writing
    Optional macro (optional) meaning
    O_APPEND add to
    O_TRUNC Truncated
    O_CREAT File does not exist, create

    The required macro and the optional macro are connected by bitwise OR , for example O_RDONLY | O_CREAT

  • mode: set permissions for the file, and pass in an octal number

return value:

  • -1 Open failed
  • >= 0 The file descriptor that uniquely identifies the file

Q: Why is bitwise OR between macros?

Macro definition in the source code: The definition is located in the /usr/include/bits/fcntl-linux.h file, where the numbers are all octal

#define O_RDONLY             00
#define O_WRONLY             01
#define O_RDWR               02
#define O_CREAT            0100
Macro Octal Binary
O_RDONLY 00 00000000 00000000 00000000 00000000
O_WRONLY 01 00000000 00000000 00000000 00000001
O_RDWR 02 00000000 00000000 00000000 00000010
O_CREAT 0100 00000000 00000000 00000000 01000000

O_RDWR | O_CREAT
Insert picture description here

2.2 read function

ssize_t read(int fd, void *buf, size_t count);

Function: read the contents of the file to the designated space and save

head File:

  • unistd.h

parameter:

  • fd: file descriptor, the return value of the open function
  • buf: read the contents of the file into buf
  • coount: how many can be read

Return value: return the number of bytes read

2.3 write function

 ssize_t write(int fd, const void *buf, size_t count);

Function: write the content in buf to the file

head File:

  • unistd.h

parameter:

  • fd: file descriptor, the return value of the open function
  • buf: the content of the file to be written
  • count: the size of the file content to be written

Return value: the number of bytes successfully written

2.4 lseek function

off_t lseek(int fd, off_t offset, int whence);

Function: Relocate the offset of the file associated with the file descriptor to the parameter offset

head File:

  • unistd.h

parameter:

  • fd: file descriptor, the return value of the open function
  • offset: offset
  • whence: offset start position
Macro meaning
SEEK_SET Start of file content
SEEK_CUR current location
SEEK_END End of file content

Return value: offset

2.5 close function

int close(int fd);

Function: close a file descriptor

head File:

  • unistd.h

parameter:

  • fd: file descriptor, the return value of the open function

return value:

  • 0 means successful shutdown
  • -1 means an error occurred

2.6 Test procedure

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
int main()
{
    
    
  //1.测试open函数:打开文件并创建文件描述符
  int fd = open("./testfile",O_RDWR | O_CREAT, 0664);
  if(fd < 0)
    perror("open");
  else
  {
    
    
    printf("1.创建文件成功,文件描述符为fd = %d\n",fd);
  }

  //2.测试write函数:向文件中写入数据
  const char* w_buf = "Hello World!";
  int w_ret = write(fd,w_buf,strlen(w_buf));
  if(!w_ret)
    perror("write");                                                        
  else
    printf("2.写入成功,写入了%d个字节的数据\n",w_ret);

  //3.测试lseek函数:重定位
  int offset = 0;
  int l_ret = lseek(fd,offset,SEEK_SET);
  printf("3.偏移量为%d\n",l_ret);

  //4.测试read函数:从文件中读取数据
  char r_buf[1024] = {
    
     0 };
  int r_ret = read(fd,r_buf,sizeof(r_buf) - 1);
  if(!r_ret)
    perror("read");
  else
    printf("4.读取成功,读取了%d个字节的数据,读取内容为%s\n",r_ret,r_buf);

  //5.测试close函数:关闭文件描述符
  int c_ret = close(fd);
  if(!c_ret)
    printf("5.文件描述符%d关闭成功\n",fd);
  else
    perror("close");

  return 0;
}
[gongruiyang@localhost TestSysFile]$ gcc test.c -o test
[gongruiyang@localhost TestSysFile]$ ./test 
1.创建文件成功,文件描述符为fd = 3
2.写入成功,写入了12个字节的数据
3.偏移量为0
4.读取成功,读取了12个字节的数据,读取内容为Hello World!
5.文件描述符3关闭成功
[gongruiyang@localhost TestSysFile]$ ls
test  test.c  testfile
[gongruiyang@localhost TestSysFile]$ ll
总用量 20
-rwxrwxr-x. 1 gongruiyang gongruiyang 8872 12月 31 11:56 test
-rw-rw-r--. 1 gongruiyang gongruiyang 1159 12月 31 11:56 test.c
-rw-rw-r--. 1 gongruiyang gongruiyang   12 12月 31 11:56 testfile

3. File descriptor

3.1 View file descriptor

Q: How to find the opened file descriptor?

A:

  1. Use a while loop before closing the file descriptor to prevent the program from closing the file descriptor

  2. Use ps to view the process number of the open file descriptor

  3. Use cd /proc/process number/fd to enter the file descriptor folder

For example: View the file descriptor opened by the process number 6606

[gongruiyang@localhost TestSysFile]$ ps -aux | grep test
gongrui+   6606  0.0  0.0   4216   348 pts/0    S+   12:02   0:00 ./test
gongrui+   6686  0.0  0.0 112828   976 pts/1    R+   12:02   0:00 grep --color=auto test
[gongruiyang@localhost TestSysFile]$ cd /proc/6606/fd
gongruiyang@localhost fd]$ ll
总用量 0
lrwx------. 1 gongruiyang gongruiyang 64 12月 31 12:08 0 -> /dev/pts/0
lrwx------. 1 gongruiyang gongruiyang 64 12月 31 12:08 1 -> /dev/pts/0
lrwx------. 1 gongruiyang gongruiyang 64 12月 31 12:02 2 -> /dev/pts/0
lrwx------. 1 gongruiyang gongruiyang 64 12月 31 12:08 3 -> /home/gongruiyang/ClassLinunx/TestSysFile/testfile
  1. The operating system will create a folder named after the process number in the disk for each process. There is an fd folder in the folder, and the saved information is the file descriptor information opened by the process.
  2. When we create a new out of a new process, it will open three file descriptors, which correspond to the standard input (0) to standard output (1) the standard error (2)

3.2 The relationship between PCB and file descriptor

The source code and diagram of the relationship between PCB task_struct and file descriptor:

Part of the source code of task_struct

struct task_struct {
    
    
..........
/* open file information */
	struct files_struct *files;
..........
};

files_struct part of the source code

struct files_struct {
    
    
  /*
   * read mostly part
   */
	atomic_t count;
	struct fdtable *fdt;
	struct fdtable fdtab;
  /*
   * written part on a separate cache line in SMP
   */
	spinlock_t file_lock ____cacheline_aligned_in_smp;
	int next_fd;
	struct embedded_fd_set close_on_exec_init;
	struct embedded_fd_set open_fds_init;
	struct file * fd_array[NR_OPEN_DEFAULT];
};

Illustration:

It can be seen from the source code:

  1. The task_struct structure of each process contains a pointer to the files_struct structure. There is an array of fd_array in the files_struct structure. The array saves pointers to the file structure. Each file structure corresponds to a file.
  2. The file descriptor is actually the subscript of the fd_array array in the kernel
  3. The first three data elements in the fd_array array correspond to: standard input standard output standard error

结论:每创建一个新进程,系统默认会打开三个文件描述符:标准输入 标准输出 标准错误

3.3 分配文件描述符规则

规则:最小未占用原则

意思就是将从0开始往上寻找第一个未被占用的文件描述符分配给正在打开的文件

3.4 文件描述符泄漏问题

文件描述符也叫文件句柄

当我们打开一个文件,操作系统就会文件分配一个文件描述符,如果在使用完毕之前,没有及时的关闭文件,就会造成文件句柄泄漏的问题

Q : 一个进程当中最大打开的文件数量是多少?

A : 使用ulimit命令

[gongruiyang@localhost ClassLinunx]$ ulimit -a
................
open files                      (-n) 1024
................

可在输出打印中找到open files最大数为1024

该最大值是可以修改的

演示代码

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int main()
{
    
    
  int count = 0;

  while(1)
  {
    
    
    int fd = open("testfile",O_RDWR | O_CREAT, 0664);
    if(fd < 0)
    {
    
    
      perror("open");
      break;                                         
    }
    else
    {
    
    
      printf("fd:%d\n",fd);
      count++;
    }
      
  }
  printf("count = %d\n",count);
    
  return 0;
}

fd:1023
open: Too many open files
count = 1021

4.文件描述符 和 文件流指针 的对比

文件流指针:fopen函数返回的,文件流指针是由C库维护的

文件描述符:open函数返回的,文件描述符是由操作系统维护的

文件流指针源码:源文件路径/usr/include/stdio.h

typedef struct _IO_FILE FILE;

由此可以看出FILE是_IO_FILE的别名,我们再来看看_IO_FILE的部分源码:源文件路径:/usr/include/libio.h

struct _IO_FILE {
    
    
    ...............
    char* _IO_read_ptr; /* Current read pointer */
    char* _IO_read_end; /* End of get area. */
    char* _IO_read_base;  /* Start of putback+get area. */

    char* _IO_write_base; /* Start of put area. */
    char* _IO_write_ptr;  /* Current put pointer. */
    char* _IO_write_end;  /* End of put area. */
    
    int _fileno;
    ...............
};

由_IO_FILE结构体源码可以了解到,读写缓冲区分别使用了3个char型指针,并且用_fileno保存文件描述符
Insert picture description here

总结:文件流指针将文件描述符进行了封装

针对文件流指针而言的缓冲区,是C库维护的

exit函数在退出线程的时候,会刷新缓冲区,原因是操作的是文件流指针

_exit哈数在退出线程的时候,不会刷新缓冲区,原因是_exit是系统调用,内核无法涉及到C库维护的缓冲区,所以不会刷新

5.重定向

5.1 清空重定向 >

将输出位置调整到文件中,每一次从终端向文件输入数据前都会将先前的内容清空

[gongruiyang@localhost TestDup]$ echo "123"
123
[gongruiyang@localhost TestDup]$ echo "123" > testfile
[gongruiyang@localhost TestDup]$ cat testfile 
123
[gongruiyang@localhost TestDup]$ echo "456" > testfile 
[gongruiyang@localhost TestDup]$ cat testfile 
456

5.2 追加重定向 >>

将输出位置调整到文件中,每一次从终端向文件输入数据都会被追加在源文件内容之后

[gongruiyang@localhost TestDup]$ echo "123" >> testfile
[gongruiyang@localhost TestDup]$ cat testfile 
123
[gongruiyang@localhost TestDup]$ echo "456" >> testfile 
[gongruiyang@localhost TestDup]$ cat testfile 
123
456

5.3 重定向原理

echo本来是要将数据通过终端进程的 标准输出[1] 将数据打印输出,经过重定向后,1号文件描述符中指向file结构体的指针从原来指向的文件改变指向重定向后的文件

重定向接口:

int dup2(int oldfd, int newfd);

功能:dup2() makes newfd be the copy of oldfd, closing newfd first if necessary

头文件:

  • unistd.h

参数:

  • oldfd :
  • newfd :

返回值:重定向成功返回newfd,失败返回-1

演示程序:

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int main()
{
    
    
  int fd = open("./testfile",O_RDWR | O_CREAT,06664);
  if(fd < 0)
    perror("open");
  else
  {
    
    
     //将标准输出重定向到文件中去
    dup2(fd,1);
  }
  printf("test message!\n");                          

  return 0;
}

[gongruiyang@localhost TestDup]$ gcc test.c -o test
[gongruiyang@localhost TestDup]$ ./test 
[gongruiyang@localhost TestDup]$ cat testfile 
test message!

6. ext2文件系统

ext2文件系统
Insert picture description here

  • Block Bitmap : 本质是一个位图,每一个比特位表示Data blocks中数据块的使用情况,若比特位为1,则表示占用,若比特位为0,表示未被占用
  • inode Bitmap : 本质是一个位图,每一个比特位表示inode table当中inode块的使用情况,若比特位为1,则表示占用,若比特位为0,表示未被占用
  • Data blocks : 实际存储文件的区域,在这个区域中,将磁盘分成了不同的小block
  • inode Table : inode结点的集合,inode结点描述了文件的存储情况(文件在哪些Block块中存储的)

文件存储过程

  1. 去Block Bit Map区域查找空闲的block块,将文件存储在空闲的block块当中
  2. 通过inode BitMap获取空闲的inode节点,通过inode节点去描述文件在Data Block区域当中存储的位置
  3. inode+文件名称作为目录的目录项被保存下来

存储原理:分散存储,相比于线性存储,减少了许多磁盘碎片

Insert picture description here

文件的获取过程

  1. 通过文件名+inode节点号找到inode对应的文件信息
  2. Get the contents of the current file storage in the data blocks area, after the splicing, the file content is completed after the splicing

7. Soft link and hard link

7.1 Soft link

A soft link file is equivalent to a file shortcut

Features :

  1. Soft link files have independent inode node numbers

  2. When deleting a source file, the soft link file that is softly linked to the file should be deleted together , otherwise the following situations will occur
    Insert picture description here

Create soft link command:

ln -s 原文件名 软链接文件名

E.g:

[gongruiyang@localhost TestLink]$ ln -s sourceFile softLinkFile
[gongruiyang@localhost TestLink]$ ll
总用量 0
lrwxrwxrwx. 1 gongruiyang gongruiyang 10 12月 31 23:28 softLinkFile -> sourceFile
-rw-rw-r--. 1 gongruiyang gongruiyang  0 12月 31 23:25 sourceFile

7.2 Hard link

The hard link file is equivalent to a copy of the source file , except for the file name, everything else is exactly the same

Create hard link command:

ln  原文件名 硬链接文件名

E.g:

[gongruiyang@localhost TestLink]$ ln sourceFile hardLinkFile
[gongruiyang@localhost TestLink]$ ll
总用量 0
-rw-rw-r--. 2 gongruiyang gongruiyang 0 12月 31 23:38 hardLinkFile
-rw-rw-r--. 2 gongruiyang gongruiyang 0 12月 31 23:38 sourceFile

7.3 Inode comparison

[gongruiyang@localhost TestLink]$ ls -li
总用量 0
33554820 -rw-rw-r--. 2 gongruiyang gongruiyang  0 12月 31 23:38 hardLinkFile
33554821 lrwxrwxrwx. 1 gongruiyang gongruiyang 10 12月 31 23:43 softLinkFile -> sourceFile
33554820 -rw-rw-r--. 2 gongruiyang gongruiyang  0 12月 31 23:38 sourceFile

The output of the above command shows:

inode[sourceFile] = 33554820

inode[softLinkFile] = 33554821

inode[hardLinkFile] = 33554820

Summary: The inode of the soft link file is different from the source file, and the inode of the hard link file is the same as the source file

Guess you like

Origin blog.csdn.net/weixin_45437022/article/details/112060560