《Linux C 编程实战》学习笔记 - 第6章文件操作（上）

第6章文件操作（上）

- - 第6章文件操作（上）

6.1 系统编程概述

C语言的函数库通过 系统调用 来实现，它封装了系统调用，而系统调用是以 函数库 的方式提供的。

对于gcc不会自动链接的库，编译过程中需要加上 -l<库名> -L<库所在的目录> 。

知道函数的名字后，可以通过 man 来查询函数原型，参数等相关信息

库函数

man 3 <函数名>
函数名既是系统调用也是linux命令

man 2 <函数名>

不加 2 得到的是Linux命令的相关信息
其他

man <函数名>

在使用 man man 后可以得到：

man 是系统的手册分页程序。指定给 man 的页选项通常是程序、工具或函数名。程序将显示每一个找到的相关手册页。如果指定了章节，man 将只在手册的指定章节搜索。
   1.   可执行程序或 shell 命令
   2.   系统调用(内核提供的函数)
   3.  库调用(程序库中的函数)
   4.   特殊文件(通常位于 /dev)
   5.   文件格式和规范，如 /etc/passwd
   6.   游戏
   7.   杂项(包括宏包和规范，如 man(7)，groff(7))
   8.   系统管理命令(通常只针对 root 用户)
   9.  内核例程 [非标准]

6.2 Linux 文件结构

程序员可以通过 系统调用 或 C语言的库函数 对文件进行操作

文件包含的内容

文件本身的数据
文件的属性（元数据）
- 文件访问权限
- 所有者
- 文件大小
- 创建日期

可以使用 chmod/fchmod 函数来对文件的权限进行修改。

使用 man 2 chmod 后可以查看到

// NAME>
// chmod, fchmod, fchmodat - change permissions of a file
// SYNOPSIS>
#include <sys/stat.h>
int chmod(const char *pathname, mode_t mode);
int fchmod(int fd, mode_t mode);
include <fcntl.h>        
/* Definition of AT_* constants */
#include <sys/stat.h>
int fchmodat(int dirfd, const char *pathname, mode_t mode, int flags);

chmod 和 fchmod 这两个函数都定义在头文件 sys/stat.h 中，两个函数的第二个参数相同，都是一个mode_t类型的值(定义在 sys/types.h 中)

chmod() 的第一个参数是要修改文件权限的文件名， fchmod() 的第一个参数是一个整形值，即文件描述符

两个函数都会返回一个 int 类型的值，如果权限更改成功返回0，失败返回-1，错误代码存在于预定义变量 errno 中(在头文件 <errno.h> 中)。

参数 mode 的组合

模式	八进制	含义
S_ISUID	04000	执行时设置用户ID,setuid权限
S_ISGID	02000	执行时设置组ID，setgid权限
S_ISVTX	01000	粘贴位
S_IRUSR, S_IREAD	00400	所有者读
S_IWUSR, S_IWRITE	00200	所有者写
S_IXUSR, S_IEXEC	00100	所有者执行
S_IRGRP	00040	由组读
S_IWGRP	00020	由组写
S_IXGRP	00010	由组执行
S_IROTH	00004	其他人读
S_IWOTH	00002	其他人写
S_IXOTH	00001	其他人执行

linux命令 chomd 只有 文件所有者 和 超级用户 可以修改文件或目录的权限

实现简单的 chmod 命令

/**
* 6.1-my_chmod.c     
* @author: PhoenixXC
* @email: [email protected]
* @description: [Linux_C] -- 改变文件权限(考虑了特殊权限, 只支持数字方式的权限)
* @created: Mon Jul 23 2018 11:29:40 GMT+0800 (CST)
* @last-modified: Mon Jul 23 2018 11:29:40 GMT+0800 (CST)
*/

#include <stdio.h>      // 含有 perror() 的函数原型
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>  // 是Unix/Linux系统的基本系统数据类型的头文件
#include <sys/stat.h>   // 含有 chmod() 函数原型

int main(int argc, char *argv[])
{
    int MODE;           // 权限
    int len;            // 权限参数的长度
    int mode[4] = {0};  // 特殊权限;所有者;用户组;其他用户
    int enum_m[8] = {7, 6, 5, 3, 4, 2, 1, 0};  // 可能的权限设置
    char *path;         // 文件路径

    // 检查参数合法性
    if (argc < 3)
    {
        if (argc == 2)
            printf("my_chmod: \"%s\" 后缺少操作数\n", argv[1]);
        else 
            printf("my_chmod: 缺少操作数\n");
        exit(0);    
    }

    // 获取命令行参数 
    len = strlen(argv[1]);       
    if (len > 4 || len < 3)
    {
        printf("my_chmod: 无效模式：%s\n", argv[1]);
        exit(0);
    }
    else 
    {
        int i, j;
        if (len == 3)      // 设置特殊权限为0
        {
            i = 1;
            mode[0] = 0;
            len++;
        }
        else
            i = 0;

        for (i; i < len; i++)
        {
            for (j = 0; j < 8; j++)
            {
                if (argv[1][i] - '0' != enum_m[j])
                    continue;
                else
                    break;
            }
            if (j == 8)
            {
                printf("my_chmod: 无效模式：%s\n", argv[1]);
                exit(0);
            }
            mode[i] = enum_m[j];
        }
    }

    MODE = mode[0] * 8 * 8 * 8 + mode[1] * 8 * 8 + mode[2] * 8 + mode[3];       // 八进制转换, 本应当是宏之间取或运算

    for (int i = 2; i < argc; i++)
    {
        path = argv[i];                 // 获取要转换的文件名
        if (chmod(path, MODE) == -1)
        {
            perror("chmod error!");     // perror() 输出错误信息> %s: ....\n
            // exit(1);                 // 出错则会继续改变下一个文件的权限
        }
    }

    return 0;
}

程序中有这样一行命令 perror("chmod error!");

如果修改权限失败，会在屏幕上显示：

chmod error!: Operation not permitted

chmod error! 是我们在程序中输入的信息，但屏幕上却显示出了其他信息：Operation not permitted

通过 man perror , 可以得到如下信息：

NAME
       perror - print a system error message
SYNOPSIS
       #include <stdio.h>
       void perror(const char *s)
DESCRIPTION
       The perror() function produces a message on standard error describing the last error encountered during a call to a system or library function.
       First (if s is not NULL and *s is not a null byte ('\0')), the argument string s is  printed,  followed  by  a colon and a blank.         
       Then an error message corresponding to the current value of errno and a new-line.
       To be of most use, the argument string should include the name of the function that incurred the error.

perror(s) 用来将上一个函数发生错误的原因输出到标准设备(stderr)。参数 s 所指的字符串会先打印出，然后是冒号和一个空格，后面再加上错误原因字符串。这个错误原因字符串是由全局变量errno的值来决定要输出的(后面会经常用到, 错误代码可以查看相应函数的man 手册)，可以告诉我们程序运行过程中发生了什么问题。

6.3 文件的输入输出

文件的输入输出函数：creat, open, close, read, write, lseek…

C语言的标准库函数实际上是由上面的函数封装而成的。（以上函数用到了 文件描述符（UNIX/Linux独有）故不能跨平台）

文件描述符：

在读写一个文件之前，需要调用 open 或 creat 函数打开文件，成功则会返回一个非负整数，即文件描述符，可以将其作为参数传递给 read 或 write 来对文件进行读写操作。文件描述符范围是：0~NR_OPEN, Linux 中 NR_OPEN 值为255. 所以程序最多只能打开256个文件。

文件描述符为 0：标准输入文件（一般为键盘）

文件描述符为 1：标准输出文件（一般为显示器）

文件描述符为 2：标准错误输出（一般为显示器）

文件的创建、打开和关闭

1.open() create()

使用man 2 open 我们可以查看到相关信息：

OPEN(2)                                               System calls                                              

NAME

       open, creat - 用来 打开和创建 一个 文件或设备

SYNOPSIS
       #include <sys/types.h>
       #include <sys/stat.h>
       #include <fcntl.h>

       int open(const char *pathname, int flags);
       int open(const char *pathname, int flags, mode_t mode)
       int creat(const char *pathname, mode_t mode);

DESCRIPTION
       open()通常用于将路径名转换为一个文件描述符（一个非负的小整数，在 read, write 等 I/O 操作中将会被使用）。
       当 open() 调用成功， 它会返回一个新的文件描述符（永远取未用描述符的最小值）。                                        
       这个调用创建一个新的打开文件，即分配一个新的独一无二的文件描述符，不会与运行中的任何其他程序共享
       (但可以通过 fork (2) 系统调用实现共享）。
       这个新的文件描述符在其后对打开文件操作的函数中使用（参考 fcntl(2)）文件的读写指针被置于文件头

需要注意的是新文件实际存取权限是 mode 与umask 按照 (mode & ~umask) 运算得到的。

两个函数如果有错误发生返回 -1 ，并把错误代码赋给 errno.

creat 只能以只写的方式打开创建的文件，无法创建设备文件

2.close()

man 2 close 查询到的相关信息为：

NAME 名字
       close - 关闭一个文件描述符
SYNOPSIS 总览
      #include <unistd.h>
      int close(int fd);
DESCRIPTION 描述
       close 关闭一个文件描述符, 使它不在指向任何文件和可以在新的文件操作中被 再次使用.  
任何与此文件相关联的以及程序所拥有的锁, 都会被删除 (忽略那些持有锁的文件描述符)
 假如fd 是最后一个文件描述符与此资源相关联, 则这个资源将被释放.若此描述符是最后一个引用到此文件上的,则文件将使用 unlink(2) 删除.
RETURN VALUE 返回值
       close 返回 0 表示 成功 , 或者 -1 表示 有 错误 发生 .

close 调用成功并不保证数据能够全部写入磁盘，进程结束后内核会自动关闭所以已经打开的文件

利用上面的函数创建一个新文件：

/**
* 6-2-my_create.c
* @author: PhoenixXC
* @email: [email protected]
* @description: 
* @created: Tue Jul 24 2018 10:15:26 GMT+0800 (CST)
* @last-modified: Tue Jul 24 2018 10:15:26 GMT+0800 (CST)
*/

#include <stdio.h>          // perror()
#include <string.h>         // strerror()
#include <stdlib.h>         // exit()
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>          // errno
#include <unistd.h>

// > en.wikipedia.org
// The following lists typical operations on file descriptors on modern Unix-like systems. 
// Most of these functions are declared in the <unistd.h> header, but some are in the <fcntl.h> header instead.
// 现代类Unix系统上对文件描述符的典型操作的函数大部分在头文件 <unistd.h> ， 也有一些在 <fcntl.h> 中
// fcntl.h is the header in the C POSIX library for the C programming language that contains constructs that 
// refer to file control, e.g. opening a file, retrieving and changing the permissions of file, locking a file for edit, etc.

int main(void)
{
    int fd;

    // if ((fd = creat("6.1-my_chmod.c", S_IRWXU)) == -1)
    // # define S_IRWXU (__S_IREAD|__S_IWRITE|__S_IEXEC)  包含在头文件：<fcntl.h>
    // #define  __S_IREAD   0400    /* Read by owner.  */
    // #define  __S_IWRITE  0200    /* Write by owner.  */
    // #define  __S_IEXEC   0100    /* Execute by owner.  */
    if ((fd = open("6.1-my_chmod.c",  O_CREAT | O_EXCL, S_IRUSR | S_IWUSR)) == -1)      
    {
        // perror("open");
        printf("open:%s  with errno:%d\n", strerror(errno), errno);
        exit(1);
    }
    else
        printf("create file success\n");

    close(fd);
    return 0;
}

文件的读写

read() write()

NAME     
   read - 在文件描述符上执行读操作    
概述       
    #include <unistd.h>           
   ssize_t read(int fd, void *buf, size_t count);    
描述        
   read() 从文件描述符 fd 中读取 count 字节的数据并放入从 buf 开始的缓冲区中.           
   如果count为零,read()返回0,不执行其他任何操作.如果 count 大于SSIZE_MAX,那么结果将不可预料.    
返回值        
   成功时返回读取到的字节数(为零表示读到文件描述符),此返回值受文件剩余字节数限制.
   当返回值小于指定的字节数时并不意味着错误;这可能是因为当前可读取的字节数小于指定的字节数(比如已经接近文件结尾,或者正在从管道或者终端读取数据,或者read()被信号中断).
   发生错误时返回-1,并置 errno 为相应值.在这种情况下无法得知文件偏移位置是否有变化.

NAME         
   write -在一个文件描述符上执行写操作    
概述        
   #include <unistd.h>          
   ssize_t write(int fd, const void *buf, size_t count);    
描述         
   write 向文件描述符 fd 所引用的文件中写入从 buf 开始的缓冲区中count字节的数据. POSIX规定,当使用了write()之后再使用 read(),那么读取到的应该是更新后的数据. 但请注意并不是所有的文件系统都是 POSIX兼容的.    
返回值         
   成功时返回所写入的字节数(若为零则表示没有写入数据).
    错误时返回-1,并置errno为相应值.         
    若count为零,对于普通文件无任何影响,但对特殊文件将产生不可预料的后果.

每一个以及打开的文件都有一个读写位置，通常打开文件后，读写位置指向文件开头，如果是以追加的方式打开文件，那么读写位置会指向文件尾。在读写过程中这个读写位置也会随之移动。

Linux C 提供了系统调用　lseek 来移动文件的读写位置。

查看手册可以得到关于这个函数的相关信息：

NAME
     lseek - reposition read/write file offset    

SYNOPSIS         

   #include <sys/types.h>         

   #include <unistd.h>           

   off_t lseek(int fd, off_t offset, int whence);

各参数的含义为：

Field	Description
int fd	要移动读写指针的文件描述符
off_t offset	指针的偏移量（以字节为单位）
int whence	解释指针偏移量的方法（相对文件开始、结尾、当前位置等）。提供了３种取值
return value	调用成功返回相对于文件开始处的偏移量（以字节为单位）。如果发生错误会返回 -1，并用errno存放错误代码

whence 可取的值

Value	Meaning
SEEK_SET	从文件开始处计算 offset
SEEK_CUR	从文件读写指针的当前位置计算　offset（offset 值可以为负）
SEEK_END	从文件尾计算　offset（offset 值可以为负）

下面给出这些函数的实例

/**
   * 6-3_my_rwl.c
   * @author: PhoenixXC
   * @email: [email protected]
   * @description: 文件读写和文件读写指针的移动操作
   * @created: Tue Jul 24 2018 11:31:26 GMT+0800 (CST)
   * @last-modified: Tue Jul 24 2018 11:31:26 GMT+0800 (CST)
   */

   #include <stdio.h>
   #include <stdlib.h>
   #include <string.h>
   #include <sys/types.h>
   #include <sys/stat.h>
   #include <fcntl.h>
   #include <unistd.h>
   #include <errno.h>

   // 自定义的错误处理函数
   void my_err(const char * err_string, int line)
   {
       fprintf(stderr, "line:%d ", line);
       perror(err_string);
       exit(1);
   }

   // 自定义的读数据函数
   int my_read(int fd)
   {
       int     len;
       int     ret;
       int     i;
       char    read_buf[64];

       // 获取文件长度并保持文件读写指针在文件开始处
       if (lseek(fd, 0, SEEK_END) == -1)
           my_err("lseek", __LINE__);
       // __LINE__  编译器内置宏，表示行数
       // 文件读写指针置于文件尾， 获取文件头到文件尾的字节数
       if ((len = lseek(fd, 0, SEEK_CUR)) == -1)
           my_err("lseek", __LINE__);
       // 文件读写指针置于文件头
       if ((lseek(fd, 0, SEEK_SET)) == -1)
           my_err("lseek", __LINE__);
       // 输出文件长度
       printf("len: %d\n", len);

       // 读数据
       if ((ret = read(fd, read_buf, len)) < 0)
           my_err("read", __LINE__);

       // 打印数据
       for (i = 0; i < len; i++)
           printf("%c", read_buf[i]);
       printf("\n");

       // 返回实际读取文件内容的字节数
       return ret;
   }

   int main()
   {
       int fd;
       char write_buf[32] = "Hello World";

       // if ((fd = creat("example_63.c", S_IRWXU)) == -1)
       // 可读可写、文件不存在自动创建、文件若存在则将文件长度清零  |  设置为对所有者可写可读
       if ((fd = open("example_63.c", O_RDWR | O_CREAT | O_TRUNC , S_IRWXU)) == -1)
           my_err("open", __LINE__);
       else
           printf("create file success\n");

       // 把 write_buf 里面的内容写入文件中, 没有写入空字符（strlen）
       if (write(fd, write_buf, strlen(write_buf)) != strlen(write_buf))
           my_err("write", __LINE__);

       my_read(fd);

       // 演示文件的间隔
       printf("/*---------------------*/\n");
       if (lseek(fd, 10, SEEK_END) == -1)
           my_err("lseek", __LINE__);
       if (write(fd, write_buf, strlen(write_buf)) != strlen(write_buf))
           my_err("write", __LINE__);

       my_read(fd);

       close(fd);
       return 0;
   }

在代码中多出了一个之前没有见过的东西：__LINE_ ,　vscode 也找不到它的相关定义

查阅后发现它是一个　标准预定义的宏（Standard-Predefined-Macros）

这些是宏编译器内置的，类似的宏还有很多，__LINE_　表示行数，合理利用这些内置宏对于错误的定位有很大的帮助。

代码很简单，但有一个地方需要注意，如果我们用 creat 调用，注释 open 后，会得到错误信息:

   create file success
   len: 11
   line:50 read: Bad file descriptor

创建和读文件都没有问题，错误信息提示读文件出现了错误。

查询手册我们可以得到：

EBADF fd 不是一个合法的文件描述符,或者不是为读操作而打开.

creat() 这个函数是以只写的方式打开创建的文件，没有读的权限，所以会导致读文件出现问题。

代码中演示了 lseek 的大致用法，但也需要注意的是，某些设备文件不能使用 lseek，会返回错误代码 ESPIPE。

在 81 行中，lseek 把文件指针设置到了文件结束符(EOF)之后，在这之后又写入了数据。这样在EOF和写入的数据之间就有了间隔，间隔会用 '\0' 来填充。

可以使用 od -c example_63.c 输出文件内容的ASCII码来查看产生的间隔：

0000000 H e l l o W o r l d \0 \0 \0 \0 \0
0000020 \0 \0 \0 \0 \0 H e l l o W o r l d
0000040