[Linux] Basic IO, soft and hard links, dynamic and static libraries

1. Get to know IO

What is IO

I/O simply corresponds to the two words Input and Output, which refers to the data exchange process between the computer system and the external environment (usually hardware devices or other computer systems)

I/O can be divided into two main types:

  1. Input: Input refers to the process by which a computer system receives data or information from the external environment. For example, when a user enters text on a keyboard, these input characters are considered input operations. Input can come from a variety of external sources, including keyboard, mouse, sensors, network connections, disk drives, etc. Computer systems need to be able to receive and process this input data in order to perform appropriate operations.
  2. Output: Output refers to the process by which the computer system sends processed data or information to the external environment. For example, when a computer system displays text on the screen, prints a document to a printer, or writes data to a disk drive, these are output operations. Output enables the computer system to communicate effectively with users or other devices.

In computer programs, I/O operations are often relatively slow operations because they involve communicating with external devices or reading and writing from storage media, and these operations are more time-consuming than the processing of data in memory. Therefore, when writing efficient programs, you need to pay attention to how to minimize the number of I/O operations to improve performance. In addition, for some applications, such as database management systems and network communications applications, effective I/O management is critical to the system's response time and throughput.

2. IO of C file interface

write file

#include <stdio.h>
#include <string.h>
int main()
{
    
    
    FILE* fp=fopen("myfile","w");
    if(fp==NULL){
    
    
        perror("open");
        return 1;
    }

    const char* str="hello world\n";
    int cnt=5;
    while(cnt--)
        fwrite(str,strlen(str),1,fp);
    
    fclose(fp);
    return 0;
}

image-20230909160750952

read file

#include <stdio.h>
#include <string.h>
int main()
{
    
    
    FILE* fp=fopen("myfile","r");
    if(fp==NULL){
    
    
        perror("open");
        return 1;
    }

    const char* msg="hello world\n";
    char* str[1024];
    while (1)
    {
    
    
        size_t s=fread(str,1,strlen(msg),fp);
        if(s>0)
        {
    
    
            str[s]=0;
            printf("%s",str);
        }
        if(feof(fp)) break;
    }
    
    fclose(fp);
    return 0;
}

image-20230909161531953

3. I/O of system file interface

To operate files, in addition to the above C interface (of course there are also other languages), we can also use the system interface to access files.

Interface introduction

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
pathname: 要打开或创建的目标文件
flags: 打开文件时,可以传入多个参数选项,用下面的一个或者多个常量进行“或”运算,构成flags。
参数:
    O_RDONLY: 只读打开
    O_WRONLY: 只写打开
    O_RDWR : 读,写打开
    这三个常量,必须指定一个且只能指定一个
    O_CREAT : 若文件不存在,则创建它。需要使用mode选项,来指明新文件的访问权限
    O_APPEND: 追加写
返回值:
    成功:新打开的文件描述符
    失败:-1

mode_t understanding: directly read the man manual, it is clearer than anything else.
Which open function to use depends on the specific application scenario. If the target file does not exist and needs to be created by open, the third parameter indicates the default permission to create the file. Otherwise, use the open with two parameters.

write file

#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main(){
    
    
    int fd=open("myfile",O_WRONLY);
    if(fd<0){
    
    
        perror("open");
    }

    const char* msg="hello world!!\n";
    int cnt=5;
    while (cnt--){
    
    
        //fd: 后面讲, msg:缓冲区首地址, len: 本次读取,期望写入多少个字节的数据。 返回值:实际写了多少字节数据
        write(fd,msg,strlen(msg));
    }
    
    close(fd);
    return 0;
}

read file

#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main()
{
    
    
    int fd=open("myfile",O_RDONLY);
    if(fd<0){
    
    
        perror("open");
    }

    const char* msg="hello world!!\n";
    char* str[1024];
    while (read(fd,str,strlen(msg)))
    {
    
    
        printf("%s",str);
    }
    
    close(fd);
    return 0;
}

open function return value

Before understanding return values, let’s first understand two concepts: 系统调用and library functions

The above fopen fclose fread fwriteare all functions in the C standard library, which we call library functions (libc).
However, open close read write lseek they all belong to the interface provided by the system, which is called the system call interface.

image-20230909163535105
Therefore, it can be considered that the functions of the f# series are encapsulation of system calls to facilitate secondary development.

file descriptor fd

What is the number of this file descriptor fd?

We can try to print it out and see

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main()
{
     
     
    int fd=open("myfile",O_RDONLY);
    if(fd<0){
     
     
        perror("open");
    }
    printf("%d\n",fd);
    close(fd);
    return 0;
}
//输出结果
//3

The file descriptor is a small integer, but why is it 3? Anyone who has studied C language should know

C will open three input and output streams by default, namely stdin, stdout, and stderr. If you look carefully, you will find that the types of these three streams are FILE*, fopen return value type, and file pointer. In fact, these three streams correspond to the number 0. 1, 2, and it can be concluded that fd is essentially the subscript of the array , but this array is maintained by the system for us.

image-20230910122140248

And now we know that file descriptors are small integers starting from 0. When we open a file, the operating system creates a corresponding data structure in memory to describe the target file. So there is the file structure. Represents an open file object. The process executes the open system call, so the process and the file must be associated. Each process has a pointer *files, pointing to a table files_struct. The most important part of the table is that it contains an array of pointers. Each element is a pointer to an open file! So, essentially, the file descriptor is the subscript of the array. Therefore, as long as you hold the file descriptor, you can find the corresponding file

By default, a Linux process will have three open file descriptors, namely standard input 0, standard output 1, and standard error 2.

The physical devices corresponding to 0,1,2 are generally: keyboard, monitor, monitor,

Therefore, input and output can also be used in the following ways:

#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main()
{
    
    
    char buffer[1024];
    ssize_t s=read(0,buffer,sizeof buffer);
    if(s>0){
    
    
        buffer[s]=0;
        write(1,buffer,strlen(buffer));
        write(2,buffer,strlen(buffer));
    }
    return 0;
}
//运行结果
//输入:hello world!!!
//输出:
//hello world!!!
//hello world!!!

File descriptor allocation rules

In the above example, after we opened a new file and output it, we found that it was fd: 3

Now close file descriptor 0 or 2 and take a look again

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
    
    
    close(0);
    //close(2);
    int fd = open("myfile", O_RDONLY);
    if(fd < 0){
    
    
        perror("open");
        return 1;
    }
    printf("fd: %d\n", fd);
    close(fd);
    return 0;
}

It is found that the result is: fd: 0 or fd 2. It can be seen that the allocation rule of file descriptors: in the files_struct array, find the smallest subscript that is not currently used as a new file descriptor.

What if we close standard output 1?

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
int main()
{
    
    
    close(1);
    int fd = open("myfile", O_WRONLY|O_CREAT, 00644);
    if(fd < 0){
    
    
        perror("open");
        return 1;
    }
    printf("fd: %d\n", fd);
    close(fd);
    return 0;
}

image-20230910123418750

At this time, we found that the content that should have been output to the monitor was output to the file myfile, where fd=1. This phenomenon is called output redirection. Common redirects are:>, >>, <

//比如我想让myfile文件写入ls -l 打印的内容
[hdm@centos7 BasicIO]$ ls -l > myfile 
[hdm@centos7 BasicIO]$ cat myfile 
total 28
-rw-r--r-- 1 hdm wheel    65 Sep  9 15:56 makefile
-rw-r--r-- 1 hdm wheel     0 Sep 10 12:40 myfile
-rwxr-xr-x 1 hdm wheel 16576 Sep 10 12:33 test
-rw-r--r-- 1 hdm wheel   652 Sep 10 12:33 test.c

Redirect

What is the nature of redirection?

image-20230910125105172

However, if we want to implement redirection, we will not close and then open a file descriptor like the example above. The system gives us a set of interfaces.

dup2 system call

The function prototype is as follows:

#include <unistd.h>
int dup2(int oldfd, int newfd);
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main()
{
    
    
    int fd=open("myfile",O_WRONLY | O_CREAT,0644);
    if(fd<0){
    
    
        perror("open");
    }
    dup2(fd,1);
    printf("fd: %d\n", fd);
    fflush(stdout);
    const char* str="hello\n";
    const char* str1="nihao\n";

    fprintf(stdout,"%s",str1);
    write(1,str,strlen(str));
    write(fd,str,strlen(str));

    return 0;
}

image-20230910130802335

Printf is an IO function in the C library. It usually outputs to stdout. However, when stdout accesses files at the bottom level, it still looks for fd:1. But at this time, the content represented by the fd:1 subscript has become the address of myfile. , is no longer the address of the monitor file, so any output messages will be written to the file, thereby completing output redirection.

FILE

Because IO-related functions correspond to system call interfaces, and library functions encapsulate system calls, in essence, files are accessed through fd. Therefore, fd must be encapsulated inside the FILE structure in the C library.

#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main()
{
    
    
    const char* str1="hello printf\n";
    const char* str2="hello fprintf\n";
    const char* str3="hello write\n";

    printf("%s",str1);
    fprintf(stdout,"%s",str2);
    write(1,str3,strlen(str3));

    fork();
    return 0;
}
//运行出结果:
//hello printf
//hello fprintf
//hello write

But what if you implement output redirection for the process? ./test > myfile , we find that the result becomes:

[hdm@centos7 BasicIO]$ ./test  >myfile 
[hdm@centos7 BasicIO]$ cat myfile 
hello write
hello printf
hello fprintf
hello printf
hello fprintf

We found that both printf and fwrite (library function) output twice, while write only output once (system call).

why? It must be related to fork!

  • Generally, C library functions are fully buffered when writing to files, while writing to the display is line buffering.

  • The printf fwrite library function has its own buffer. When redirection to a normal file occurs, the data buffering method changes from line buffering to full buffering.

  • The data we put in the buffer will not be refreshed immediately, even after fork

  • But after the process exits, it will be refreshed uniformly and written to the file.

  • But when forking, the parent-child data will be copied on write, so when your parent process is ready to refresh, the child process will also have the same data, and then two copies of data will be generated.

  • write There is no change, indicating that there is no so-called buffering.

To sum up: printf fwrite the library function will have its own buffer, but the write system call does not have a buffer. In addition, the buffers we are talking about here are all user-level buffers. In fact, in order to improve the performance of the entire machine, the OS will also provide relevant kernel-level buffers, but this is beyond the scope of our discussion.
Who provides this buffer zone? printf fwrite is a library function, and write is a system call. The library function is in the "upper layer" of the system call and is the "encapsulation" of the system call. However, write does not have a buffer, while printf fwrite does. It is enough to show that the buffer is a secondary buffer. The addition is provided by the C standard library because it is C.

4. Understand the file system

When we use ls -l, in addition to the file name, we also see other data of the file.

[hdm@centos7 BasicIO]$ ls -l
total 12
-rw-r--r-- 1 hdm wheel   65 Sep 10 12:57 makefile
-rw-r--r-- 1 hdm wheel   66 Sep 10 15:50 myfile
-rw-r--r-- 1 hdm wheel 1101 Sep 10 15:48 test.c

Each row contains 7 columns:

  • model

  • Number of hard links

  • File owner

  • Group

  • size

  • Last Modified

  • file name

image-20230910160312460

In fact, in addition to reading this information in this way, there is also a stat command to see more information.

[hdm@centos7 BasicIO]$ stat test.c 
 File: ‘test.c’
 Size: 1101            Blocks: 8          IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 1582421     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/     hdm)   Gid: (   10/   wheel)
Access: 2023-09-10 15:48:26.249722588 +0800
Modify: 2023-09-10 15:48:26.086715074 +0800
Change: 2023-09-10 15:48:26.086715074 +0800
Birth: -

There are several pieces of information in the above execution results that need to be explained clearly.

inode

In order to explain inode clearly, let us first briefly understand the file system.

image-20230910165605740

Linux ext2 file system, the picture above is the disk file system diagram (the kernel memory image must be different), the disk is a typical block device, and the hard disk partition is divided into blocks. The size of a block is determined during formatting and cannot be changed.

  • Block Group: The ext2 file system will be divided into several Block Groups according to the size of the partition. Each Block Group has the same structural composition. Examples of government management of various districts
  • Super Block: Stores the structural information of the file system itself. The recorded information mainly includes: the total amount of bolts and inodes, the number of unused blocks and inodes, the size of a block and inode, the most recent mount time, the most recent data writing time, and the most recent disk check time. and other file system related information. The information in the Super Block is destroyed, and it can be said that the entire file system structure is destroyed.
  • GDT, Group Descriptor Table: block group descriptor, describing block group attribute information
  • Block Bitmap: Block Bitmap records which data block in the Data Block has been occupied and which data block has not been occupied.
  • inode bitmap: each bit indicates whether an inode is free and available
  • i-node table: stores file attributes such as file size, owner, last modification time, etc.
  • Data area: stores file content

The idea of ​​keeping properties and data separate seems simple, but how does it actually work? Let's see how it works by touching a new file

[hdm@centos7 BasicIO]$ touch myfile
[hdm@centos7 BasicIO]$ ls -i myfile 
1574424 myfile

In order to illustrate the problem, we simplify the above figure

image-20230910170312651

There are four main operations to create a new file:

  1. Storage attributes
    The kernel first finds a free i node (here 1574424). The kernel records file information into it.

  2. Storing data
    The file needs to be stored in three disk blocks, and the kernel found three free blocks: 300, 500, and 800. Copy the first block of data in the kernel buffer to 300, the next block to 500, and so on.

  3. Record allocation and
    file contents are stored in sequence 300, 500, 800. The kernel records the above block list in the disk distribution area on the inode.

  4. Add filename to directory

The new file name is myfile. How does Linux record this file in the current directory? The kernel adds entry (1574424, myfile) to the directory file. The correspondence between the file name and the inode connects the file name to the file's content and attributes.

Understanding soft and hard links

We see that it is not the file name that actually finds the file on the disk, but the inode. In fact, in Linux, multiple file names can correspond to the same inode.

//ln 属于硬链接的文件 硬链接文件的名字
ln file file.link
[hdm@centos7 BasicIO]$ touch myfile
[hdm@centos7 BasicIO]$ ls -li 
1574424 -rw-r--r-- 1 hdm wheel   66 Sep 11 13:27 myfile
hdm@centos7 BasicIO]$ ln myfile myfile.link 
[hdm@centos7 BasicIO]$ ls -li 
1574424 -rw-r--r-- 2 hdm wheel   66 Sep 11 13:27 myfile
1574424 -rw-r--r-- 2 hdm wheel   66 Sep 11 13:27 myfile.link
[hdm@centos7 BasicIO]$ unlink myfile.link 
[hdm@centos7 BasicIO]$ ls -li
1574424 -rw-r--r-- 1 hdm wheel   66 Sep 11 13:27 myfile
[hdm@centos7 BasicIO]$ rm myfile 
[hdm@centos7 BasicIO]$ ls -li

The link status of myfile and myfile.link are exactly the same, they are called hard links pointing to the file. The kernel records this connection number, and
the hard connection number for inode 1574424 is 2.
We do two things when deleting files:

1. Delete the corresponding record in the directory,

2. Decrease the number of hard connections to 1. If it is 0, release the corresponding disk.

image-20230911134946032

Why does the number of hard links in the current directory increase by one after a directory is created?

Because a directory such as a is created under the current directory such as test, then after entering this directory a, there will be two hidden directories in it, one . ..corresponding to the current directory and the upper level, and this ..is a hard link to the upper level directory test. Moreover, ordinary users cannot operate such hard links to the directory themselves, and can only be allocated by the operating system.

Why can't we hardlink directories ourselves?

  1. Circular link issues: Allowing directory hard links may lead to circular link issues. When one directory is hard-linked to another directory, a chain of links can be created in which one directory contains a hard-link to another directory, and the other directory contains a hard-link to the first directory, thus forming an infinite loop of link chains. . This circular link poses a serious threat to the stability and availability of the file system because it can cause the file system to get stuck in an infinite loop of operations.
  2. File system consistency and security: A directory is a structure used to organize files and subdirectories, including the hierarchy and metadata of files and subdirectories (such as permissions, owners, etc.). If hard link directories are allowed, confusion and permissions issues may result. For example, different users or processes can create hard links in different locations, which can lead to confusion and permission conflicts. Directory hard links can have serious consequences for file system consistency and security.
  3. Performance issues: If the directory can be hard linked, then the file system must handle the hard link situation when dealing with the directory structure, which can lead to performance issues and more complex directory operation implementations.

Soft link:
A hard link refers to another file through inode, and a soft link refers to another file through name. How to do it in shell

Soft links are actually just like file shortcuts on Windows systems. As long as the linked file is deleted, the soft links will also become invalid.

//ln 属于软链接的文件 软链接文件的名字
ln -s file file.link
[hdm@centos7 BasicIO]$ touch myfile
[hdm@centos7 BasicIO]$ ls -li
1574424 -rw-r--r-- 1 hdm wheel    0 Sep 11 13:34 myfile
[hdm@centos7 BasicIO]$ ln -s myfile myfile.link.s
[hdm@centos7 BasicIO]$ ls -li
1574424 -rw-r--r-- 1 hdm wheel    0 Sep 11 13:34 myfile
1574967 lrwxrwxrwx 1 hdm wheel    6 Sep 11 13:34 myfile.link.s -> myfile

image-20230911133934619

When we use statthe command to view file attributes, three lines of time will appear. What do they mean above?

image-20230911135415361

  • Access last access time

    Access time: The access time is the last time the file or directory was accessed. Whenever a file or directory is opened, read, or executed, the access time is updated. Logging of access times can be helpful in understanding how often a file or directory is accessed, but in some cases frequent updates of access times can have an impact on performance. You can use the stator ls -lucommand to view the access time of a file.

  • Modify file content last modified time

    Access time: The access time is the last time the file or directory was accessed. Whenever a file or directory is opened, read, or executed, the access time is updated. Logging of access times can be helpful in understanding how often a file or directory is accessed, but in some cases frequent updates of access times can have an impact on performance. You can use the stator ls -lucommand to view the access time of a file.

  • Change property last modified time

    Change time: The change time is the last time the metadata of a file or directory changed. Metadata includes information such as file permissions, owners, number of links, etc. When a file's metadata changes (for example, the file's permissions are modified), the change time is updated. The change time also includes the creation time of the file or directory. You can use statthe command to see when a file was changed.

5. Dynamic libraries and static libraries

Dynamic libraries and static libraries are two types of library files commonly used in programming. They have different characteristics and uses.

  1. Static Library:
    • A static library is a collection of compiled object files, usually with .a(on Unix/Linux) or .lib(on Windows) file extensions.
    • When you link a static library into an executable, the compiler copies the code and data from the library into the final executable.
    • Static libraries make executables have all the necessary code, so they are usually larger.
    • The advantage of static libraries is independence and no need to rely on external library files, but it may cause the executable file to become larger.
  2. Dynamic Link Library (DLL, or Shared Library):
    • A dynamic library is also a collection of compiled object files, usually with .dll(on Windows) or .so(on Unix/Linux) file extensions.
    • Unlike static libraries, the code and data of dynamic libraries are not copied into the executable file. Instead, the executable contains references to libraries, which are loaded into memory at runtime.
    • Dynamic libraries allow multiple programs to share the same library, saving disk and memory space.
    • The disadvantage of dynamic libraries is that if the library file does not exist or the version does not match, the program may not run.

The choice of using a dynamic library or a static library depends on the needs and design of the project. usually:

  • Using dynamic libraries can reduce executable file size because multiple programs can share the same library.
  • Using static libraries ensures the independence of the executable file from external libraries, but may cause the executable file to become larger.

Generate static library

We can develop a dynamic and static library ourselves for testing, for example

/add.h/
#ifndef __ADD_H__
#define __ADD_H__
int add(int a, int b);
#endif // __ADD_H__
/add.c/
#include "add.h"
int add(int a, int b){
    
    
    return a + b;
}
/sub.h/
#ifndef __SUB_H__
#define __SUB_H__
int sub(int a, int b);
#endif // __SUB_H__
/add.c/
#include "add.h"
int sub(int a, int b){
    
    
    return a - b;
}
///main.c
#include <stdio.h>
#include "add.h"
#include "sub.h"
int main(void)
{
    
    
    int a = 10;
    int b = 20;
    printf("add(%d,%d)=%d\n", a, b, add(a, b));
    a = 100;
    b = 200;
    printf("sub(%d,%d)=%d\n", a, b, sub(a, b));
}
[hdm@centos7 myfile]$ ls
add.c  add.h  main.c  sub.c  sub.h
[hdm@centos7 myfile]$ gcc -c add.c -o add.o 
[hdm@centos7 myfile]$ gcc -c sub.c -o sub.o

生成静态库
[hdm@centos7 myfile]$ ar -rc libmymath.a add.o sub.o
ar是gnu归档工具,rc表示(replace and create)

查看静态库中的目录列表
[hdm@centos7 myfile]$ ar -tv libmymath.a 
rw-r--r-- 1000/10   1240 Sep 11 15:49 2023 add.o
rw-r--r-- 1000/10   1232 Sep 11 15:50 2023 sub.o
t:列出静态库中的文件
v:verbose 详细信息

[hdm@centos7 myfile]$ gcc main.c -L. -lmymath
-L 指定库路径
-l 指定库名
测试目标文件生成后,静态库删掉,程序照样可以运行。

library search path

  • Search the directories specified by -L from left to right.

  • Directory specified by environment variable (LIBRARY_PATH)


  • Directory /usr/lib
    /usr/local/lib specified by the system

Generate dynamic library

shared: means generating a shared library format
fPIC: generating position independent code (position independent code)
library name rules: libxxx.so

[hdm@centos7 myfile]$ ls
add.c  add.h  add.o  libmymath.a  main.c  sub.c  sub.h  sub.o
[hdm@centos7 myfile]$  gcc -fPIC -c sub.c add.c
[hdm@centos7 myfile]$ gcc -shared -o libmymath.so *.o
[hdm@centos7 myfile]$ ls -l
total 44
-rw-r--r-- 1 hdm wheel    57 Sep 11 15:42 add.c
-rw-r--r-- 1 hdm wheel     0 Sep 11 15:36 add.h
-rw-r--r-- 1 hdm wheel  1240 Sep 11 15:57 add.o
-rw-r--r-- 1 hdm wheel  2680 Sep 11 15:50 libmymath.a
-rwxr-xr-x 1 hdm wheel 15824 Sep 11 15:58 libmymath.so
-rw-r--r-- 1 hdm wheel   236 Sep 11 15:52 main.c
-rw-r--r-- 1 hdm wheel    57 Sep 11 15:43 sub.c
-rw-r--r-- 1 hdm wheel    66 Sep 11 15:42 sub.h
-rw-r--r-- 1 hdm wheel  1232 Sep 11 15:57 sub.o

Use dynamic libraries

Compilation option
l: Link to the dynamic library, as long as the library name is required (remove lib and version number)
L: The path where the link library is located.

Example: gcc main.c -lmymath -L.

Run the dynamic library
1. Copy the .so file to the system shared library path, usually /usr/lib (administrator rights are required)
2. Change LD_LIBRARY_PATH

Guess you like

Origin blog.csdn.net/dongming8886/article/details/132997945