[Linux] Basic IO --- soft and hard links, acm time, dynamic and static library creation, dynamic and static links, dynamic and static library loading principles...

I burned all the childishness and willfulness with my persistence, and the wilderness slowly grew rational indifference and sobriety.

insert image description here

Article Directory



1. Soft and hard links

linux file type illustrate
b Block device files generally refer to storage devices such as hard disks and floppy disks.
c Character devices are some serial port interface devices, such as keyboards, mice, printers, and tty terminals.
d Directory files, similar to Windows folders.
l Link files, similar to Windows shortcuts.
s Socket file (socket), mainly used for communication, especially on the network.
p Pipeline file (pipe), mainly used for inter-process communication
- Files are divided into plain text files (ASCII) and binary files (binary).

1. The difference between soft and hard links (whether it has an independent inode)

1. The following are the results of soft links and hard links respectively.

[wyn@VM-8-2-centos lesson21]$ ln -s myfile.txt soft_file.link ---软链接

insert image description here

[wyn@VM-8-2-centos lesson21]$ ln myfile.txt hard_file.link ---硬链接

insert image description here
insert image description here
2.
The soft link file soft_file.link has its own independent inode and can be treated as an independent file .
The hard link file does not have its own independent inode. No matter what content in myfile.txt is changed, hard_file.link will change together. Therefore, when a hard link is established, no new file is actually created at all, because no independent inode is assigned to the hard link .

3.
Since the hard link does not create a new file, the hard link must not have its own attribute set and data set, and must use inodes and data blocks of other files, so the essence of the hard link is to add files under the specified path The mapping relationship between name and inode, inode is pointed to by multiple files .

insert image description here

4.
So the reference count (also called the number of hard links) displayed by 791188 is 2, because there are two files pointing to the inode 791188.

insert image description here

insert image description here

5.
After deleting the myfile.txt file, the number of hard links of the hard_file.link file will naturally become 1, so when the number of hard links of a file becomes 0, the file is actually deleted, such as myfile.txt The number of hard links has become 0 .

insert image description here

2. The role of soft and hard links

2.1 The role of soft links (creating shortcuts)

1.
After deleting the target file myfile.txt of the soft link, the soft link actually still exists because its inode is still there, but when cat prints the soft link file, it shows that the file does not exist. So the soft link soft_file.link does not use the inode of the target file to identify the target file, because the inode of the source file actually still exists. Does the hard_file.link hard link use the inode of the source file? It can be seen that the soft link identifies the source file using the source file name .

2.
The data block of the soft link saves the path of the target file it points to , so once the source file is deleted, the soft link will immediately become invalid because it cannot find the target file.

insert image description here

3.
Deleting the soft link does not affect the source file, so the soft link is equivalent to a shortcut under windows.
There is a target in the edge attribute, which is actually equivalent to the target file pointed to in the soft link. The reason why we can double-click the shortcut to open Microsoft Edge is that the target file is actually the executable program of Microsoft Edge .
If every time we run a program, we have to find the specific path of the disk where the program is downloaded, and then double-click the executable program to run it, so that all users may go crazy, too It's difficult to use, it's too disgusting, so there is such a thing as a shortcut, which is very similar to the soft link of linux .

[wyn@VM-8-2-centos lesson21]$ unlink soft_file.link ---删除软链接文件

insert image description here

insert image description here

4.
The picture below shows the function of the soft link, which is to establish a soft link for an executable program in a deep directory in a specified directory, and then you can quickly run the executable program in the specified directory.

insert image description here

2.2 The role of hard links (to prevent accidental deletion of important files, fast search and switching of paths (. and . .))

1.
On the surface, the hard link looks like a renaming of the source file, just like the clone of the source file. The actual function of the hard link is to allow a file to have multiple valid path names, so that the user can establish a hard link to important files to prevent "accidental deletion" function.
Assuming that myfile.txt is a very important file, we will establish a hard link to myfile.txt to form a hard_file.txt file, so even if we delete the myfile.txt file by mistake, don’t worry, because the data in hard_file.txt and myfile .txt is exactly the same, which is equivalent to making a data backup of an important file.

insert image description here

2.
The file.txt file only points to its own inode itself, so its hard link number is 1. The empty directory file not only points to its own inode, but also the hidden files under the directory. It also points to the inode of the empty directory file, so empty The number of hard links is 2, because there are two files pointing to the empty inode.
If you create another directory in empty, the number of hard links in empty will become 3, because the hidden files in dir...also point to the inode of empty, so the hidden files in the empty directory.
And the hidden files in the dir directory in the empty directory The file . . is actually equivalent to the hard link of the empty directory file.

insert image description here
insert image description here

3.
Supplementary knowledge: The size of directory files is a multiple of 4096 bytes, the minimum is 4096, because 4096 is an IO block, and the smallest unit of disk read and write is 8 sectors, which is exactly 4096 bytes in size. The operating system knows that you will definitely read and write files under the directory file, so it directly opens an IO block-sized space for you .

In the following article, there are some mistakes about units, which need to be viewed dialectically.
Why is the size of the directory always 4096 under Linux

4.
As can be seen from the following, Linux does not allow ordinary users to create hard links to directories, but Linux itself can create hard links (hidden files) to directories. Only state officials are allowed to set fires, and ordinary people are not allowed to light lamps.

Why can't Linux hard link directories? (Reprinted from Zhihu blogger Zuiwo battlefield blogger's article, the explanation is very, very, very detailed )

Why can't ln hard link directories under normal circumstances (reproduced from blogger Kproxy's article)

insert image description here
5.
However, soft link directories are allowed. If you want to cancel the soft link directory, you can use unlink, but it should be noted that / cannot be added at the end of the file, because the soft link file is not a directory but a file .

insert image description here
insert image description here

Share a well-written article, interested partners can read it.
Linux soft link and hard link (reproduced from the article about the back-end siege lion of the blogger Heropoo)

Second, the acm time under the stat command

insert image description here
1.
Access refers to the time when the file is accessed.
Change refers to the time when the file attributes are modified.
Modify refers to the time when the file content is modified.

2.
If you modify the file content, the file size in the file properties will also change accordingly , so we often see that after modifying the file content, Change will be modified together with the Modify time

3.
In the Linux operating system in the early years, as long as you accessed the file, the Access time of the file would be changed immediately, but later found that the rate of changing attributes and contents in file operations is much greater than that of accessing files , so If the file access time must be refreshed every time, frequent IO data will affect the efficiency of the operating system, because the disk is a peripheral.
Therefore, Linux changed the original strategy. For example, when the number of visits reaches a certain fixed number, Linux will refresh the access time of the file uniformly, so the Access time is not updated in real time .

3. The difference between dynamic and static libraries (link stage, link result, difference in link method)

1.
The static library is suffixed with .a, and the program links the code of the library to the executable file during the compilation and linking phase. When the program is running and loaded into memory to become a process, the static library will no longer be needed.
The dynamic library is suffixed with .so. When the program is loaded into the memory and becomes a process, the code of the dynamic library will be linked. If there are multiple processes in the memory that need the dynamic library, multiple processes will share the code using the dynamic library. .

2.
The executable file using dynamic link only contains a table of the entry addresses of the functions it uses, not the entire machine code of the external functions in the library file.

3.
After the executable file is loaded into the memory and becomes a process, the machine code of the external function will be copied from the library file on the disk to the memory by the operating system. This process is called dynamic linking.

4.
The dynamic library can be shared between multiple processes, so the dynamically linked executable file is smaller and saves disk space. The virtual memory mechanism adopted by the operating system allows a dynamic library in physical memory to be shared by multiple processes , saving memory space.

4. What is the nature of the library? (a collection of .o files)

Some simple codes are written below, which can help us understand what a library is and what the library does.
insert image description here

insert image description here
1.
The two ways to generate the executable program mymath are actually the same. One integrates the compilation and linking process, and the other separates the compilation and linking process. First, each source file is compiled to generate a relocatable target binary file, and then the Multiple .o files are linked, that is, the symbol table is merged, and the linking method can be subdivided into dynamic linking and static linking.

1.gcc -o mymath main.c my_sub.c my_add.c

2.gcc -c main.c 
gcc -c my_sub.c
gcc -c my_add.c
gcc -o mymath main.o my_sub.o my_add.o

2.
If we don't want to give the source code to the other party, we can provide the other party with the .o relocatable target binary file and the .h header file, and let the other party directly perform the linking work. This method can also generate executable programs.
The left side is equivalent to the person who uses the library, and the right side is equivalent to the person who wrote the library.
The idea of ​​providing the other party with a collection of .o (method implementation) and .h (what methods are there) is the idea of ​​the library.

insert image description here

3.
Once there are too many source files to be compiled, all the .o files can be packaged into one package for ease of use, and the package containing a bunch of .o files is actually a library file. According to the packaging tool and packaging method Different can be divided into dynamic library and static library, the essence of the library is a collection of .o files .

5. Static library and static link (ar command, archive the .o file)

1. Make a static library (package and compress .h files and .o files to form a collection of header files and library files)

insert image description here
Detailed explanation of tar command, packing, compressing and unpacking
1.
The ar command is actually the first two letters of the word archive, and the static library is the archive file.
rc stands for replace and create. If the packaged file does not exist, the file will be created, and if it exists, the file will be replaced into the archive file.
The generated libmymath.a file is now an archive.

insert image description here

  1 libmymath.a:my_add.o my_sub.o
  2     ar -rc $@ $^    ---生成静态库libmymath.a
  3 my_add.o:my_add.c
  4     gcc -c my_add.c
  5 my_sub.o:my_sub.c
  6     gcc -c my_sub.c
  7 
  8 .PHONY:clean
  9 clean:
 10     rm -f *.o libmymath.a      

insert image description here

2.
It is not enough to have libmymath.a (library file written by ourselves), because C language will provide users with library file libc.a and header file stdio.h, so we still lack header files.
The reality of giving the library to the other party is to give both the library file (.a/.so) and the matching header file to the other party.

[wyn@VM-8-2-centos lesson22]$ ls /usr/include/stdio.h
/usr/include/stdio.h
[wyn@VM-8-2-centos lesson22]$ ls /lib64/libc.a
/lib64/libc.a

3.
After make output is executed, a library can be generated. The name of the library is mylib, which contains library files and header files. Mylib can be used as a library file for the other party to use.
If the library file is large in size, it can also be delivered to the user in the form of compressed package.

  1 libmymath.a:my_add.o my_sub.o
  2     ar -rc $@ $^    ---生成静态库libmymath.a
  3 my_add.o:my_add.c
  4     gcc -c my_add.c
  5 my_sub.o:my_sub.c
  6     gcc -c my_sub.c
  7 
  8 .PHONY:output
  9 output:
 10     mkdir -p mylib/include 
 11     mkdir -p mylib/lib
 12     cp -f *.a mylib/lib 
 13     cp -f *.h mylib/include                                                                                                           
 14 .PHONY:clean
 15 clean:
 16     rm -f *.o libmymath.a

insert image description here

Compress the mylib library into a mylib.tgz compressed package. When delivering the library, you can give this compressed package to the other party.

[wyn@VM-8-2-centos lesson22]$ tar -czvf mylib.tgz mylib

insert image description here

When deleting multiple files, we can use the wildcard * for efficient deletion.

insert image description here

The person who uses the library on the left can get the library mylib after decompressing the compressed package.

insert image description here

4.
If the person who uses the library wants to install the library, he only needs to copy the corresponding files to the system directory. Therefore, the essence of installation is copying.

[wyn@VM-8-2-centos test]$ cp mylib/include/*.h /usr/include/
[wyn@VM-8-2-centos test]$ cp mylib/lib/libmymath.a /lib64/

2. After the user gets the library, the problems encountered when compiling and linking

2.1 gcc cannot find the header file

1.
If the user directly compiles and links with gcc, an error will occur, indicating that the header file cannot be found.
When the gcc compiler searches for header files, there are two search strategies, one is to search under the current path (same level path as the source code), and the other is to search under the default path specified by the system , and gcc really cannot find it under the current path to the header files inside the mylib library.

insert image description here
2.
So we need to use the -I option to specify the gcc header file search path,

[wyn@VM-8-2-centos test]$ gcc -o mymath main.c -I ./mylib/include/

2.2 Link error: undefined reference to function (library file not found, library search path)

1.
A link error occurs after the instruction is executed, that is to say, there is no problem in the preprocessing, compiling, and assembling stages.
If the library file is in the system path (/usr/lib64 or /usr/lib path), the linker can definitely find the corresponding library file, but the linker cannot find the library file in the current path.

insert image description here
insert image description here

2.
So you need to use the -L option to specify the search path of the linker. But in addition to that, the library name needs to be specified.
Because if you want to link a third-party library, you must explicitly specify the name of the library .

[wyn@VM-8-2-centos test]$ gcc -o mymath main.c -I ./mylib/include/ -L ./mylib/lib/

Just specify the library file path, the system still reports a link error.
insert image description here

3.
The header file does not need to specify the name of the header file, only the path of the header file is required. That is because the source code main.c tells the compiler what header file to include, and gcc will go to the specified path to find a specific header document.
But no one tells the linker which library file to link, so we have to specify the path and name of the library file.

4.
But when we wrote the code before, we never specified the name of the library. That’s because we didn’t use a third-party library at that time. We used the standard library provided by C or C++ language, or the system level provided by the operating system. interface, so gcc or g++ can determine which library file the code needs to link by default , but the library we link today is not a standard library, but a third-party library.

First-party library: system's
second-party library: own
third-party library: written by others

5.
When using the -l option to specify the library name, the prefix lib and the suffix .a or .so of the library file should be removed, and the rest in the middle is the name of the library file.

insert image description here
Between the option and the content behind the option, whether there is a space or not is acceptable, and the function of auto-filling without spaces will be lost, but it will look more compact without spaces.

[wyn@VM-8-2-centos test]$ gcc -o mymath main.c -I./mylib/include/ -L./mylib/lib/ -lmymath

After removing the prefix and suffix, the library mylib can be used normally.
insert image description here

2.3 What does the specific linking method depend on? (depending on provided libraries and options compiled with)

1.
But through ldd to list the shared library and file products to see the specific information of the mymath file, we will find a lot of tricks.
Gcc is dynamically linked by default, but what if we don't provide dynamic libraries and only give gcc static libraries? And we know that the formation of an executable program may not only depend on one library, so if you link 100 libraries, 70 static libraries, and 30 dynamic libraries, how should gcc link?

insert image description here

Linux command (61) - ldd command (reproduced from the article of csdn blogger Lian Miao Big Carp)

2.
Therefore, the default dynamic link of gcc is only a suggested option, and whether it is a dynamic link or a static link depends on whether the provided library is a dynamic library or a static library .
If only dynamic libraries are provided and you don't bring options, it happens to be dynamic links. However, if the -static option is included in the compilation, the compilation and linking will be unsuccessful at this time, and an error will be reported, and the compilation and linking cannot be performed!
If you only provide a static library and you don't have an option, then gcc can only be linked statically. Of course, if you bring the -static option, it is more standard practice.
If both the dynamic and static libraries are given to gcc, and you compile with the -static option at this time, it is a static link. If you don't have it, it's a dynamic link.

3.
As long as one of the linked libraries is a dynamic library, the last linking method presented by gcc is a dynamic link.
The executable program mymath not only links the static library libmymath.a written by ourselves, but also links the dynamic library libc.so.6 of C language, so the final linking method presented is dynamic linking.

2.4 Copy the library path to the system default path (the essence of installation is to copy)

1.
Copy the path of the library to the default path of the system, which is essentially installation. copy = install

insert image description here
2.
Even though we have copied the library to the default path of the system, if we do not specify the name of the link library file at compile time, the same connection error will still be reported, the undefined reference to the function, the reason we said above, the header The source code of the file tells the specific header file to link, but no one tells the library file, and what we link is not the standard library, but a third-party library, so at least the -l option must be added when compiling .

insert image description here

3.
Delete the library path under the system default path, which is actually uninstalling.
It is not recommended to copy the test code written by ourselves to the default path of the system. The libraries under the default path of the system have been strictly tested, and the system process such as the release version, the security and practicality of the test code written by ourselves It's not good enough, so don't copy it to the system default path.

insert image description here

6. Dynamic library and dynamic link (gcc -shared generates dynamic library)

1. Generate position-independent code + archive .o files to form a dynamic library (gcc -fPIC -c *.c and gcc -shared -o libxxx.so *.o)

gcc -fPIC -c *.c  ---生成.o文件
gcc -shared -o libmymath.so *.o  ---.o文件进行归档形成动态库

shared: means to generate shared library format
fPIC: generate position independent code (position independent code)
library name rule: libxxx.so

insert image description here

2. During the running of the program, when loading the dynamic library, the OS and shell cannot find the library file (four solutions)

1.
Then we package the library files and header files and put them in the mylib directory. If you want, you can compress this directory and give it to the library user. After downloading and decompressing, the user can get library files and header files.

insert image description here
2.
The usage method of the dynamic library is very similar to that of the static library, and the executable program mymath can be generated with the corresponding options when compiling.
But when we run this program, there is a problem. Our mymath program is indeed dynamically linked, but the system cannot find our dynamic library libmymath.so file.

insert image description here
insert image description here

3.
When compiling, gcc knows the path and name of the library file, but it has nothing to do with gcc when the program is running. The dynamic library is loaded during the running of the program, but during running, the OS and shell do not know our Where is the library, because our library is not in the system path, so the OS cannot find it .

2.1 Add the library path to the environment variable LD_LIBRARY_PATH (not permanently, just temporarily)

1.
During the running of the program, the shell not only searches for the library in the system default path, but also in the environment variable LD_LIBRARY_PATH, so as long as the dynamic library file path is added to the environment variable , the problem can be solved.

2.
From the content displayed by ldd, you can see that the OS has successfully found the library file path.

insert image description here

2.2 Add a configuration file in the /etc/ld.so.conf.d/ directory, and manually call ldconfig to update it

1.
But when we log in to xshell next time, the path we just added in the environment variable will disappear automatically by default, so when we log in next time, mymath will not be able to run normally, and it will report an error that the library file cannot be found. If If you want the path to take effect permanently, you need to change the configuration file of the environment variable. This configuration file is very troublesome to change, so the solution of the environment variable is more suitable for ordinary testing, and it is temporarily valid under this login.

2.
First enter /etc/ld.so.conf.d/, you can see that there are many configuration files in the directory, and the path of the dynamic library is stored in these directories. If we write our own dynamic library path to a file, and put this file in the /etc/ld.so.conf.d/ directory, and the OS and shell can find the library file.

insert image description here

insert image description here
3.
After adding the configuration file, you can still see that the dynamic library file of the executable program is still missing. In fact, there is still one step left. We need to call ldconfig manually, because we have installed a new dynamic link library. So you need to inform the system, that is, refresh it, and the program can run normally after the refresh, and the dynamic library can be linked normally during the running of the program .

insert image description here

Linux: Introduction to the use of ldconfig (reproduced from the article of csdn blogger technology explorer)

2.3 Under the system or current path, create a soft link of the dynamic library file

1.
When the program is running, the system will search for the dynamic library file that needs to be linked in the current path, then we can create a shortcut to the dynamic library file through a soft link, so that the system can find the corresponding dynamic library file through the shortcut during operation Dynamic library files.

insert image description here
2.
In addition to establishing a soft link under the current path, we can also establish a soft link under the system path, so that the OS can also find the dynamic library file during the running of the program

insert image description here

2.4 Copy the dynamic library file path to the system default path (to put it bluntly, install the dynamic library)

I won’t go into details about this solution, it’s relatively simple, you only need to cp it, and bring the sudo option when executing the command, but this is not recommended, because our dynamic library is relatively easy to compare with the library of other systems, so Do not install dynamic libraries indiscriminately.

3. Install other third-party libraries ncurses

[wyn@VM-8-2-centos Use_libraries]$ sudo yum install -y ncurses-devel

1.
After installing the ncurses library, you can find the header files and library files of the downloaded ncurses library under the system default header file and library file path.

insert image description here
2.
The following is the demo code using the ncurses library. You can also play it on vim. When compiling the code, you must tell the name of the gcc library, otherwise a connection error will be reported: undefined reference to the function.

//test.c
 
#include <string.h>
#include <ncurses.h>
 
int main()
{
    
    
    initscr();
    raw();
    noecho();
    curs_set(0);
 
    const char* c = "Hello, World!";
 
    mvprintw(LINES/2,(COLS-strlen(c))/2,c);
    refresh();
 
    getch();
    endwin();
 
    return 0;
}

The following is the result of running the demo code
insert image description here

Introduction to installation of curses ncurses library (reproduced from the article of csdn blogger whatday)

7. In-depth understanding of the dynamic and static library loading process (absolute addressing, relative addressing: fPIC generates position-independent code)

1.
The static library does not need to be loaded. When loading the program, that is, compiling and linking, the system will copy the code of the static library to the code segment of the executable program, because there is no stack and heap segment in the executable program, only the code segment, Data segment (can be subdivided into .data and .rodata segments) and BSS segment .
Therefore, in the physical memory, there must be the code of the static library , because the code of the static library will be loaded into the virtual address space of the memory as part of the executable program, and then mapped to the physical memory through the page table, then the physical memory will have The address of the static library code, such a loading scheme is an absolute addressing scheme

insert image description here

Program or - memory area allocation (five segments) - finally figured it out (reproduced from the article of csdn blogger helmsgao)

2.
Different from the static link, the dynamic library only copies the offset address of the library function used by the executable program into the executable program. The addressing scheme of all library functions in the dynamic library adopts start: offset address method for relative addressing .
When the CPU executes the code, it finds that there is an external address in the physical memory. This external address is the offset address of the function in the dynamic library during the compilation and linking phase. The dynamic library is loaded into the physical memory (what should be loaded when loading the dynamic library), and then the OS will map the location of the dynamic library in the physical memory to the shared area in the virtual address space through the page table, once the dynamic library is mapped to Shared area, then the starting address of this library is determined immediately . After the mapping is completed, isn’t there an offset of the library function in the virtual address space? Then jump directly in the context of the virtual address space, jump to the shared area , and now you have the starting address of the library and the offset of the specific library function, so you can easily access it in the shared area Find the binary code of the library function and execute it. After the execution is completed, jump to the code segment and continue to execute the remaining code backwards.

3.
This can also explain that when dynamically generating .o files, we need to add the -fPIC option to gcc compilation, which is to allow the functions in the dynamic library to use the relative address scheme for addressing, so as to complete the subsequent program runtime The dynamic linking process .

4.
When packaging .o files, gcc uses the -shared option to form a specific format for the dynamic library , which is convenient for the operating system to load the dynamic library into the memory in the form of a library later. Then map to the shared area through the page table, the starting address of the library is determined, the code segment jumps to the shared area, takes the library function offset and starting address, executes the corresponding library function binary code, and then jumps Return to the code segment to execute the remaining binary code.

5.
Assuming that 100 programs use static libraries, the 100 processes involved in process rotation all have their own static library codes, rather than shared between processes.
And if 100 programs use the dynamic library, then only one copy of the dynamic library code is required in the physical memory, and the 100 processes included in the process rotation only need to share one copy of the dynamic library on the physical memory.

insert image description here

Guess you like

Origin blog.csdn.net/erridjsis/article/details/128797445