C standard library basic IO operation summary

In fact, input and output are very important for the design of any system. For example, when designing a C interface function, the input parameters, output parameters and return values ​​must be designed first, and then the specific implementation process can be designed. The interface functionality provided by the C language standard library is limited, unlike the Python library. However, it is not easy to use it well. This article summarizes the common operations of the basic IO of the C standard library and some problems that need special attention. If you feel that you are not a god, then please believe me, after reading the whole article, you will definitely have Lots of gains.

1. Operation handle

Opening a file is actually allocating some resources in the operating system to save the state information of the file and the identifier of the file. In the future, the user program can use this identifier to perform various read and write operations, and closing the file will release the occupied resources.

Function to open file:

#include <stdio.h>
FILE *fopen(const char *path, const char *mode);

FILE is a structure type defined by the C standard library, which contains the identification (file descriptor) of the file in the kernel, the I/O buffer and the current read and write position information. Internally maintained, callers should not have direct access to these members. A file pointer like FILE* is called a handle.

The open file operation operates on file resources, so it may fail to open the file, so be sure to judge the return value when opening the function, and return an error message if it fails to quickly locate the error.

Open files should exist in pairs with closed files. Although the program will release the corresponding resources when it exits, for a long-running service program, frequently opening but not closing files will cause process resources to be exhausted, because the process The number of file descriptors is limited, and it is a good practice to close the file in time.

The function to close the file:

#include <stdio.h>
int fclose(FILE *fp);

Summary of the fopen function parameter mode:

  • "r": read only, the file must exist.
  • "w": write only, create if it does not exist, overwrite if it exists.
  • "a": Append, create if it doesn't exist.
  • "r+": Read and write are allowed, the file must exist.
  • "w+": allow read and write, create a file if it does not exist, and overwrite it if it exists.
  • "a+": Allow read and append, create file if it does not exist.

2. About stdin/stdout/stderr

When the user program starts, before the main function starts to execute, three FILE* pointers are automatically opened: stdin, stdout, stderr, these three file pointers are global variables defined in libc, declared in stdio.h , printf writes to stdout, while scanf reads from stdin, user programs can also use these three file pointers directly.

  • stdin is only used for read operations and is called standard input
  • stdout is only used for write operations and is called standard output
  • stderr is also used for write operations and is called standard error output

Usually the running result of the program is printed to the standard output, and the error message is printed to the standard error output. Generally, the standard output and standard error are both screens. Often standard output can be redirected to a regular file, while standard error output still corresponds to the terminal device, so that results can be separated from error messages.

3. IO function in bytes

The fgetc function reads a byte from the specified file, and getchar reads a byte from the standard input. Calling getchar() is equivalent to fgetc(stdin)

#include <stdio.h>
int fgetc(FILE *stream);
int getchar(void);

The fputc function writes one byte to the specified file, and putchar writes one byte to the standard output. Calling putchar() is equivalent to calling fputc(c, stdout).

#include <stdio.h>
int fputc(int c, FILE *stream);
int putchar(int c);

Why use int for parameter and return value types? You can see that the parameters and return value types of these functions are int, not unsigned char. Because of error or reading the end of the file will return EOF, ie -1, if the return value is unsigned char (0xff), it is indistinguishable from actually reading the byte 0xff, if you use int you can avoid this problem.

Fourth, operate the read and write position function

When we are operating the file, there is a guy called "file pointer" to record the file position of the current operation. For example, when the file is just opened, after calling fgetc once, the file pointer points to the back of the first byte. Note that is recorded in bytes.

A function that changes the position of the file pointer:

#include <stdio.h>
int fseek(FILE *stream, long offset, int whence);
whence:从何处开始移动,取值:SEEK_SET | SEEK_CUR | SEEK_END
offset:移动偏移量,取值:可取正 | 负
void rewind(FILE *stream);

Here are a few simple examples:

fseek(fp, 5, SEEK_SET);     // 从文件头向后移动5个字节
fseek(fp, 6, SEEK_CUR);     // 从当前位置向后移动6个字节
fseek(fp, -3, SEEK_END);    // 从文件尾向前移动3个字节

The offset can be positive or negative. A negative value means moving towards the beginning of the file, and a positive value means moving towards the end of the file. If the number of bytes moved forward exceeds the beginning of the file, an error will be returned, and if the number of bytes moved backward exceeds At the end of the file, writing again will increase the size of the file, and the empty bytes of the file are all 0

$ echo "5678" > file.txt

fp = fopen("file.txt", "r+");
fseek(fp, 10, SEEK_SET);
fputc('K', fp)
fclose(fp)

// 通过结果可以看出字母K是从第10个位置开始写的
liwei:/tmp$ od -tx1 -tc -Ax file.txt 
0000000    35  36  37  38  0a  00  00  00  00  00  4b                    
           5   6   7   8  \n  \0  \0  \0  \0  \0   K

rewind(fp) is equivalent to fseek(fp, 0, SEEK_SET)

The ftell(fp) function is relatively simple and directly returns the position of the current file pointer in the file

// 实现计算文件字节数的功能
fseek(fp, 0, SEEK_END);
ftell(fp);

5. IO functions in units of strings

fgets reads a line of characters from the specified file into the buffer provided by the caller, up to size.

char *fgets(char *s, int size, FILE *stream);
char *gets(char *s);

First of all, it should be noted that the gets() function is strongly discouraged. Similar to the strcpy function, the user cannot specify the buffer size, which can easily cause buffer overflow errors. However, strcpy programmers can still avoid it, and the input user of gets can provide any long string. The only way to avoid it is to not use gets, but use fgets(buf, size, stdin)

The fgets function reads a line ending with '\n' from the file pointed to by stream, including '\n', and stores it in the buffer, and adds a '\0' at the end of the line to form a complete string. If the file line is too long, fgets has read size-1 characters from the file and has not read '\n', it stores the read size-1 characters and a '\0' character into the buffer, the file The remainder of the line can be read on the next call to fgets.

If a fgets call reaches the end of the file after reading a number of characters, the read characters plus '\0' will be stored in the buffer and returned, if called again, it will return NULL, which can be used to judge whether the end of the file is read. .

fputs writes a string to the specified file, and the buffer stores a string terminated by '\0'. Unlike fgets, fputs does not care about the '\n' character in the string.

int fputs(const char *s, FILE *stream);
int puts(const char *s);

6. IO functions in record units

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);

fread and fwrite are used to read and write records, where a record refers to a string of fixed-length bytes, such as an int, a structure, or a fixed-length array.

The parameter size indicates the length of a record, and nmemb indicates how many records to read or write. These records are stored continuously in the memory space pointed to by ptr, occupying a total of size * nmemb bytes.

The number of records returned by fread and fwrite may be less than the number of records specified by nmemb. For example, when the read/write position is only one record long from the end of the file, and fread is called to specify nmemb as 2, the return value is 1. If there is an error writing the file, the value returned by fwrite is less than the value specified by nmemb.

struct t{
    int   a;
    short b;
};
struct t val = {1, 2};
FILE *fp = fopen("file.txt", "w");
fwrite(&val, sizeof(val), 1, fp);
fclose(fp);

liwei:/tmp$ od -tx1 -tc -Ax file.txt 
0000000    01  00  00  00  02  00  00  00                                
         001  \0  \0  \0 002  \0  \0  \0

It can be seen from the results that 8 bytes are written. Interested students can analyze the "big and small end" of the system and the "alignment and completion" of the structure.

Seven, format IO function

(1). printf / scanf

int printf(const char *format, ...);
int scanf(const char *format, ...);

These two functions are the earliest contact we have in learning the C language, and we may have more contact with them, so there is nothing special to say. printf is formatted to print to standard output. The following summarizes the common ways of printf.

printf("%d\n", 5);            // 打印整数 5
printf("-%10s-\n", "hello")   // 设置显示宽度并左对齐:-     hello-
printf("-%-10s-\n", "hello")  // 设置显示宽度并右对齐:-     hello-
printf("%#x\n", 0xff);        // 0xff 不加#则显示ff
printf("%p\n", main);         // 打印 main 函数首地址
printf("%%\n");               // 打印一个 %

scanf is to read formatted data from standard input, for example:

int year, month, day;
scanf("%d/%d/%d", &year, &month, &day);
printf("year = %d, month = %d, day = %d\n", year, month, day);

(2). sprintf / sscanf / snprintf

sprintf does not print to the file, but prints to the buffer provided by the user and adds '\0' at the end. Since the length of the formatted string is difficult to predict, it is likely to cause a buffer overflow. It is strongly recommended that snprintf be more Better, the parameter size specifies the buffer length. If the formatted string exceeds the buffer length, snprintf will truncate the string to size - 1 byte, and add a '\0' to ensure that the string ends with ' \0' ends. If truncation occurs, the return value is the length before truncation. By comparing the return value with the actual length of the buffer, you can know whether truncation has occurred.

int sscanf(const char *str, const char *format, ...);
int sprintf(char *str, const char *format, ...);
int snprintf(char *str, size_t size, const char *format, ...);

sscanf reads the corresponding data from the input string according to the specified format. The function is very powerful and supports functions similar to regular expression matching. Please check the official manual for the specific usage format. Here are some of the most commonly used and important usage scenarios and methods.

  • the most basic usage

    char buf[1024] = 0;
    sscanf("123456", "%s", buf);
    printf("%s\n", buf);
    // 结果为:123456
  • Get a string of specified length

    sscanf("123456", "%4s", buf);
    printf("%s\n", buf);
    // 结果为:1234
  • take the first string

    sscanf("hello world", "%s", buf);
    printf("%s\n", buf);
    // 结果为:hello  因为默认是以空格来分割字符串的,%s读取第一个字符串hello
  • Read a string up to the specified character

    sscanf("123456#abcdef", "%[^#]", buf);
    // 结果为:123456
    // %[^#]表示读取到#符号停止,不包括#
  • Read a string containing only the specified character set

    sscanf("123456abcdefBCDEF", "%[1-9a-z]", buf);
    // 结果为:123456abcdef
    // 表达式是要匹配数字和小写字母,匹配到大写字母就停止匹配了。
  • Read a string up to the specified character set

    sscanf("123456abcdefBCDEF", "%[^A-Z]", buf);
    // 结果为:123456abcdef
  • Read the content between two symbols (the content between @ and .)

    sscanf("[email protected]", "%*[^@]@%[^.]", buf);
    // 结果为:linuxblogs
    // 先读取@符号前边内容并丢弃,然后读@,接着读取.符号之前的内容linuxblogs,不包含字符.
  • give a string

    sscanf("hello, world", "%*s%s", buf);
    // 结果为:world
    // 先忽略一个字符串"hello,",遇到空格直接跳过,匹配%s,保存 world 到 buf
    // %*s 表示第 1 个匹配到的被过滤掉,即跳过"hello,",如果没有空格,则结果为 NULL
  • slightly more complicated

    sscanf("ABCabcAB=", "%*[A-Z]%*[a-z]%[^a-z=]", buf);
    // 结果为:AB  自己尝试分析哈
  • Contains special character handling

    sscanf("201*1b_-cdZA&", "%[0-9|_|--|a-z|A-Z|&|*]", buf);
    // 结果为:201*1b_-cdZA&

If you can understand the above examples, I believe that you have basically mastered the usage of sscanf, and practice is the only criterion for testing the truth. Only by using more and thinking more can you truly understand its usage.

(3). fprintf / fscanf

fprintf prints to the specified file stream, and fscanf reads data from the file formatted, similar to the scanf function. The relevant functions are declared as follows:

int fprintf(FILE *stream, const char *format, ...);
int fscanf(FILE *stream, const char *format, ...);

Or through a simple example to illustrate the basic usage.

FILE *fp = fopen("file.txt", "w");
fprintf(fp, "%d-%s-%f\n", 32, "hello", 0.12);
fclose(fp);

liwei:/tmp$ cat file.txt 
32-hello-0.120000

The fscanf function is basically used in the same way as the sscanf function.

Eight, IO buffer

Another very important concept about IO is the IO buffer.

The C standard library allocates an I/O buffer for each open file, and most user calls to read and write functions read and write in the I/O buffer, and only a few requests are passed to the kernel.

Taking fgetc/fputc as an example, when calling fgetc for the first time to read a byte, the fgetc function may enter the kernel through a system call to read 1k bytes into the buffer, and then return the first byte in the buffer to the user. Then call fgetc to read directly from the buffer.

On the other hand, fputc usually just writes to the buffer, and if the buffer is full, fputc passes the buffer data to the kernel via a system call, and the kernel writes the data back to disk. If you want to write the buffer data to disk immediately, you can call the fflush function.

There are three types of C standard library IO buffers: full buffer, line buffer and no buffer. Different types of buffers have different characteristics.

  • 全缓冲: Write back to the kernel if the buffer is full. Regular files are usually fully buffered.
  • 行缓冲: If there is a newline in the data written by the program, write this line back to the kernel, or write back to the kernel when the buffer is full. Standard input and standard output are usually line-buffered for terminal devices.
  • 无缓冲: Every time the user program calls the library function to do a write operation, it must be written back to the kernel through a system call. Standard error output is usually unbuffered, and user program error messages can be output to the device as quickly as possible.
printf("hello world");
while(1);
// 运行程序会发现屏幕并没有打印hello world
// 因为缓冲区没满,且没有\n符号

In addition to filling the buffer and writing newlines, there is another situation in which the line buffer will automatically do the flush operation, if:

  • User program calls library function to read from unbuffered file
  • Or read from a line-buffered file, and this read operation will trigger a system call to read data from the kernel, then all line buffers will be automatically flushed before reading
  • Buffers are usually flushed automatically when the program exits

If you don't want to rely entirely on automatic flush operations, you can call the fflush function to do it manually. If fflush(NULL) is called, the IO buffer of all open files can be flushed. The buffer size can also be customized. Generally, it is not necessary to set it, and it can be set by default.

Welcome to the public account: "linuxblogs"

Wechat QR code

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325357616&siteId=291194637