[Linux] Understand basic IO and simulate implementation in one article

img

Halo, this is Ppeua. I usually update C language, C++, data structure algorithms... If you are interested, please follow me! You won't be disappointed.


Insert image description here

0. File interface in C language

We have already learned the relevant interfaces for file calling in C language. Let’s review the relevant interfaces:

  1. Open a file through fopen

image-20231122190650694

It is included in the header file of stdio.h. There are two commonly used modes: w,a(< a i=3>Clear and then writeandAppend to the end of the file)

Instructions

#include<stdio.h>
int main()
{
    
    
    const char* path="./log.txt";
    FILE* f=fopen(path,"a");
    fclose(f);
    return 0;
}

If the file does not exist in the current working directory (CWD), it will be created and opened. If it already exists, it will be opened directly.

  1. Read and write a file through fwrite

image-20231122193437468

Write to the file FILE stream, get the content from ptr and write, each time writing nmemb * size memory.*

#include<stdio.h>
#include<string.h>
int main()
{
    
    
    const char* path="./log.txt";
    FILE* f=fopen(path,"a");
    const char* info="hello,linux\n";
    fwrite(info,strlen(info),1,f);
    fclose(f);
    return 0;
}

Note!! When writing here, strlen does not need to be incremented by one, that is, there is no need to count '\0', because '\0' is the way C language uses to split strings, but here is a file for reading and writing. So there is no need to add the address of '\0'.

In this way, we successfully wrote "hello, linux" to log.txt;

image-20231122194340298

1. System file interface

After reviewing the file calling interface of C language, let’s learn about the system’s file calling interface.

Because the file calling interface of C language needs to access hardware resources, I said when talking about the operating system a long time ago that the system does not trust anyone. If you need to access its underlying hardware, you can only access it through the system-level interface. Similar to The interfaces include fprint, printf, sprint, etc.

SoThe file calling interface of C language must be a secondary encapsulation of the system interface.

1.1 open open the file

It is similar to fwrite in C language. You also need to pass in the pathname of the created file, and then you need to pass in the flags operator. Then pass in the permission information mode of the created file (using octal).

There are several types of flags:

  • O_APPEND – additional copy
  • O_CREAT – Create file if it does not exist
  • RDONLY – read only
  • O_WRONLY – write only
  • O_TRUNC – clear first and then write.

Each of these flags is a defined macro, and you can use | to pass in multiple options. (Each bit is different and can represent different functions)

The return value is an integer: file description file descriptor.

#include<stdio.h>
#include<string.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<unistd.h>
int main()
{
    
    
    const char* path="./log.txt";
    FILE* f=fopen(path,"w");
    const char* pathos="./logos.txt";
    int fd=open(pathos,O_CREAT|O_WRONLY|O_TRUNC,0666);
    close(fd);
    return 0;
}

So, the W permission in C language is rightO_CREAT|O_WRONLY|O_TRUNC, a package.

image-20231122200515866

Since the umask in the system is 0002 at this time, there is only read permission for the group.

1.2 write write file

image-20231122195015122

This usage method is almost the same as fwrite.

I won’t introduce it

#include<stdio.h>
#include<string.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<unistd.h>
int main()
{
    
    
    const char* pathos="./logos.txt";
    int fd=open(pathos,O_CREAT|O_WRONLY|O_TRUNC,0666);
    write(fd,pathos,strlen(pathos));
    close(fd);
    return 0;
}

Even if we write multiple times, there is only one piece of content in the file.

image-20231122200934064

It can also be explained that the W permission in C language is an encapsulation of O_CREAT|O_WRONLY|O_TRUNC.

2. Introduction to file system

So what is the file description mentioned above? Why does C language use the structure pointer of FILE* as the return value while the bottom layer only needs to return a number? What is this number?

If the operating system wants to manage the files that have been opened by the process, it must first go throughdescribe and then organize this process.

Today we only focus on how the process Task_Struct manages the files it has opened. We do not care about how the operating system manages the entire file system.

4e7e442d7cca1fe7e9cb0d0e15c3355

There is a pointer in the process PCB that manages the opened file system. It manages a series of open files.So the integer return value fd we get is our current file The subscript placed in the file_struct array that has been opened by the process, we can access the file we just opened through this subscript,

The file fd will be allocated starting from the smallest value among the unused indexes every time. Usually what we get is 3.

Because 012 is allocated by the system0. Standard input stdin, 1. Standard output stdout, 2. Standard error stderr. So these three standard input and output streams are system standards.

In addition, although the results of standard output and standard error are printed on the screen (sharing the resource of the screen), closing one will not affect the other.

Because reference counting is used for the screen resource (many shared resources in the system use this solution), closing one of the reference objects will only cause the reference count –

You can verify this conclusion.

We can directly write data to the file with fd 1 and see if it will appear on the screen

#include<stdio.h>
#include<string.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<unistd.h>
int main()
{
    
    
    const char* pathos="./logos.txt";
    write(1,pathos,strlen(pathos));
    return 0;
}

image-20231122203402445

We close it and reallocate the file descriptor to see if it is 1

#include<stdio.h>
#include<string.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<unistd.h>
int main()
{
    
    
    close(1);
    const char* pathos="./logos.txt";
    int fd=open(pathos,O_CREAT|O_WRONLY|O_TRUNC,0666);
    fprintf(stderr,"fd is %d\n",fd);
    close(fd);
    return 0;
}

image-20231122204013093

The function fprintf is used here

image-20231122204230962

Write content to a file in a specific format.

2.1 How to understand that everything is a file?

We need to have a unified method to control various hardware.

Therefore, three layers are encapsulated when designing the operating system.

  1. The lowest layer is the method for describing various hardware and writing operations.

  2. But every kind of hardware is definitely different. So there is a second layer. Each method of manipulating each file is uniformly encapsulated. The encapsulated function pointer points to the specific method of the first layer. At this time, their The names are all the same

  3. The third layer is the STRUCT_FILE exposed to the user, which encapsulates the specific operation methods pointing to the second layer and other information describing the file.

This is the design inspiration for polymorphism in C++

fe362b05215d724d3cb8f8d3b5216a8

3. Input and output redirection

As we introduced before, the fd of stdout is 1. What will happen if I close 1 first, then open a file, and then write to 1?

#include<stdio.h>
#include<string.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<unistd.h>
int main()
{
    
    
    close(1);
    const char* pathos="./logos.txt";
    int fd=open(pathos,O_CREAT|O_WRONLY|O_TRUNC,0666);
    write(1,pathos,strlen(pathos));
    close(fd);
    return 0;
}

The information that was supposed to be printed on the screen was written to the file.

image-20231125111603057

Because stdout is closed first, the allocation of file descriptors starts from the minimum value, so 1 is allocated to a new file during allocation, so when writing to 1 again, it becomes writing to the file.

This is the essential concept of input and output redirection:Assign the input and output descriptors to the corresponding files to achieve the principle of input and output redirection

image-20231125112218475

We also have a system call interface dup2. Copy oldfd to newfd.

This may be a bit difficult to understand.oldfd can be understood as src (original file descriptor), newfd can be understood as tar (target file descriptor). Put in the original file descriptor The pointer to the original file is copied and overwritten into the target file descriptor

66782e9db8a6fe19f6249844a477065

We can use the following code to achieve the above effect

#include<stdio.h>
#include<string.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<unistd.h>
int main()
{
    
    
    const char* pathos="./logos.txt";
    int fd=open(pathos,O_CREAT|O_WRONLY|O_TRUNC,0666);
    dup2(fd,1);
    close(fd);
    write(1,pathos,strlen(pathos));
    return 0;
}

After redirection, you can close the originally allocated descriptor, that isclose(fd)

image-20231125113529579

So what we entered on the command line before

command < file
command >> file
command > file

It's all a redirection.

The first is to redirect the input stream to a file, that is, read the content from the file and put it into the command for execution.

The second and third types all write the output of the file to the file instead of outputting it to the screen.

4. User buffer and system buffer

To understandbuffer, let us first review a concept< a i=4>.

As mentioned before, most computers satisfy the von Neumann system, that is, data participates in calculations (Before using the CPU ), you need to put yourself into the memory first.

Because the CPU running speed is much higher than other media, you need to choose the taller one from other "short ones" and find a storage medium with faster access speed.

Then our data is originally stored on the disk. If we want to write it into the memory**, we first face a problem. Should I write as soon as there is data, or should I wait until the data reaches a certain scale? What about writing it in?The answer is obviouslythe latter**, because this way you only need to copy once instead of multiple times.

The buffer serves to store data that has not reached a certain size and format the data input by the user.

So where are the buffers in the languages ​​and systems we use every day?

Let’s first take a look at the following phenomenon

#include<stdio.h>
#include <string.h>
#include<sys/types.h>
#include<unistd.h>
int main()
{
    
    
    const char * buff="hello linux\n";
    
    printf("hello io\n");
    fprintf(stdout,"hello linux i am %d\n",getpid());
    write(1,buff,strlen(buff));

    return 0;
}

The content will be printed normally.

image-20231126125405142

If this is changed to this

#include<stdio.h>
#include <string.h>
#include<sys/types.h>
#include<unistd.h>
int main()
{
    
    
    const char * buff="hello linux";
    
    printf("hello io");
    fprintf(stdout,"hello linux i am %d",getpid());
    write(1,buff,strlen(buff));
    close(1);
    return 0;
}

What will be the result if you manually close the standard output stream stdout and remove the '\n' before the process ends?

image-20231126130734877

Surprisingly, only the content in write is output.

What if we add '\n' to it?

#include<stdio.h>
#include <string.h>
#include<sys/types.h>
#include<unistd.h>
int main()
{
    
    
    const char * buff="hello linux\n";
    
    printf("hello io\n");
    fprintf(stdout,"hello linux i am %d\n",getpid());
    write(1,buff,strlen(buff));
    close(1);
    return 0;
}

image-20231126130901900

We found that the C language interface will not be refreshed without adding '\n'.

The C language has its own buffer (user buffer), which will not be written to the system buffer until the buffer is flushed.

The refresh strategy of this buffer is:

  • Refresh immediately (higher cost)
    • Row refresh Printing to the screen is usually the standard
  • Full refresh (refresh when full) This is usually the standard for inputting to a file

The C language buffer will be refreshed before the process exits. So when '\n' is removed, stdout is closed first and is not refreshed at this time. So even if the buffer is refreshed when the process exits, it will no longer be output to the screen. .

He has his own set of rules regarding the system's buffer. Now we only need to think that when the buffer in the system is flushed, the content will be written to the file.

There is an fflush interface, which manually refreshes the user buffer of the process.

image-20231126132057277

There is also a fclose

image-20231126132143841

If we rewrite the code like this

#include<stdio.h>
#include <string.h>
#include<sys/types.h>
#include<unistd.h>
int main()
{
    
    
    const char * buff="hello linux\n";
    
    printf("hello io\n");
    fprintf(stdout,"hello linux i am %d\n",getpid());
    fclose(stdout);
    return 0;
}

image-20231126132244480

The result will be printed correctly

So the essence of fclose is fflush+close. It refreshes the user layer buffer.

d4520d66a5271945285dd39c0375911

FILE* maintains the buffer space in C. Therefore,Each opened file has its own buffer stored in FILE

Finally, let’s take a look at this phenomenon.

#include<stdio.h>
#include <string.h>
#include<sys/types.h>
#include<unistd.h>
int main()
{
    
    
    const char * buff="hello linux\n";
    FILE* f=fopen("./helloio.txt", "w");
    dup2(f->_fileno, 1);
    printf("hello io\n");
    fprintf(stdout,"hello linux i am %d\n",getpid());
    write(1,buff,strlen(buff));
    fork();
    return 0;
}

A subprocess is created, the output stream is redirected, and the output stream is changed to writing to a file.

turn out:

image-20231126133329564

The content in write is written once. The content in the C interface is written twice.

This is because When the process is created, all the status of the parent process will be copied, so the C language buffer is also copied. As mentioned before, writing to ordinary files When entered, it is a full refresh, so even if there is '\n', the buffer will not be refreshed immediately. Instead, it will be refreshed again when the process exits. When the C buffer is refreshed, a copy-on-write occurs, and the parent and child processes each have one The buffer (because the shared data was changed) was flushed twice to the system buffer

Therefore, we can roughly understand what the bottom layer of the IO interface in C language looks like.

  1. FILE* stores fd and related pointers describing the buffer.
  2. fopen = open + malloc(FILE)
  3. fclose = fflush +close
  4. flush = write
  5. fwrite = write

Then we can implement a simple version of stdio.h ourselves

5. Implement Stdio.h

Makefile:

test:main.c Mystdio.c
	gcc -o $@ $^ -std=c99
.PHONY:clean
clean:
	rm -rf test

Mystdio.h

#pragma once

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>

#define SIZE 1024
#define MODE 0666
#define MALLOC_FAILED 2

//刷新策略  立即刷新 行刷新 全刷新
#define NOW_FLUSH 1
#define LINE_FLUSH 2
#define ALL_FLUSH 4

typedef struct _FILE{
    
    
    int fileno;
    char buffer[SIZE];
    int end_pos;
    int flush_mode;
}_FILE;

_FILE* _fopen(const char * path,const char * mode);

size_t _fwrite(const void *ptr, size_t len, _FILE *stream);

int _fclose(_FILE* stream);

int _fflush(_FILE* stream);

Mystdio.c:

#pragma once
#include "Mystdio.h"

#include <stdio.h>
#include <unistd.h>


_FILE* _fopen(const char * path,const char * mode)
{
    
    
    if(path==NULL)return NULL;
    
    int fd = 0;
    
    if(strcmp(mode,"w") == 0)
        fd = open(path, O_WRONLY | O_CREAT | O_TRUNC ,MODE);
    else if(strcmp(mode,"a") == 0)
        fd = open(path,O_WRONLY | O_APPEND | O_CREAT, MODE);
    else if(strcmp(mode,"r") == 0)
        fd = open(path,O_RDONLY );
    else return NULL;
    
    if(fd == -1)return NULL;

    _FILE* f = (_FILE*)malloc(sizeof(_FILE));
    
    if(f == NULL)
    {
    
    
        perror("malloc failed: ");
        exit(MALLOC_FAILED);
    }

    f->fileno=fd;
    
    f->end_pos=0;
    
    //默认刷新为行刷新
    f->flush_mode=LINE_FLUSH;

    return f;

} 

size_t _fwrite(const void *ptr, size_t len, _FILE *stream)
{
    
    
    stream->end_pos += len;
    memcpy(stream->buffer,(char *)ptr,len);

    if(stream->flush_mode & NOW_FLUSH)
    {
    
    
        write(stream->fileno,stream->buffer,stream->end_pos);
        stream->end_pos=0;   
    }
    else if(stream->flush_mode & LINE_FLUSH)
    {
    
    
        if(stream->buffer[stream->end_pos - 1] == '\n')
        {
    
    
                write(stream->fileno,stream->buffer,stream->end_pos);
                stream->end_pos=0;
        }
    }
    else if(stream->flush_mode & ALL_FLUSH)
    {
    
    
        if(stream->end_pos == SIZE)
        {
    
    
                write(stream->fileno,stream->buffer,stream->end_pos);
                stream->end_pos=0;
        }
    }
    return len;

} 
int _fflush(_FILE* stream)
{
    
    
    if(stream == NULL)return -1;
    if(stream->end_pos > 0)
    {
    
    
        write(stream->fileno,stream->buffer,stream->end_pos);
        stream->end_pos=0;
    }
    return 0;
} 

int _fclose(_FILE* stream)
{
    
    
    if(stream == NULL)return -1;
    _fflush(stream);
    close(stream->fileno);
    free(stream);
    stream=NULL;
} 

image-20230905164632777

Guess you like

Origin blog.csdn.net/qq_62839589/article/details/134627347