[linux]: The teacher asked what is love, and I said something: soft and hard links and dynamic and static libraries

 

 

Article directory

 


foreword

At the end of the last article, we explained the inode of the file, so what is the difference between the file name and the inode? The difference is that the Linux system only recognizes the inode number, and the file name does not exist in the inode attribute of the file, but the file name is actually for the user. We have talked about linux file directories before, so is a directory a file? The answer is yes, directories are also files, and directories also have inodes. Any file must be in the directory, so what is the content of the directory? First of all, data blocks are required inside the directory. The data blocks in the directory store the mapping relationship between the file name and the inode number, and in the directory, the file name and inode are key values ​​for each other. When we access a file, we are in a specific To access in the directory, to find the inode, you need to find the inode number corresponding to the file in the current directory. A directory is also a file, and it must belong to a partition. Combined with the inode, find the group in the partition, and find the inode in the group In the table, find the inode of the file, find the data block of the file through the mapping relationship between the inode and the corresponding datablock, load it into the operating system, and complete the display on the monitor. Let's enter the main body of this article


 

1. Soft and hard links

Understand the addition, deletion, and modification of files

According to the file name, find the inode - "number, the mapping relationship from the inode number to the inode attribute, set the bit corresponding to the block bitmap, and then set this bit to 0. So to delete a file, you only need to modify the bitmap.

Let's take a look at the mapping relationship between inode and datablock:

0ebefb9c1db14a1a8807de0dc5452ae1.png

In the figure above, we only draw the direct index, but this is not the case, because the direct index can store too little data, while the secondary index can store more data, and the data block pointed to by the secondary index The content is not the direct data, but the numbers of other data blocks. There are also three-level indexes above the secondary index. We will not introduce these and we will directly enter the explanation of soft and hard links:

Let's first demonstrate how to make a soft or hard link:

7b3b5df20f474803adf0508a98b05c41.png

 The command to establish a soft link is ln -s file name soft link file name

The l in front of the file attribute stands for link, which means a link. Let's take a look at what this link has:

49e28f6179f441f18f32a00f97b52e60.png

 We found that this link points to the original file just now, let's take a look at the inode of this file:

65680a1e22aa497fadae2a500ec590c1.png

 A soft link is an independent link file with its own inode number and must have its own inode attributes and content. Let's demonstrate the hard link again:

e98b0cae80d5427eab7be4e7c7507dd9.png

 The hard link is just the command without the -s option, ln file name hard link file name.

Let's take a look at the relationship between the hard-linked file and the original file:

ab975b3cce2d47979cec9fba49af760f.png

 We found that the inode of the hard link is exactly the same as that of the original file, and the hard link and the target file share the same inode number, which means that the hard link must use the same inode as the target file.

And the number after the file attribute has become 2, what does this mean? Let's talk about the original file deletion first to see what's different:

336933e139b5436ea2e9d51b417cf1cf.png

 We found that the original 2 has become 1 again but the inode still exists, let's see if it can be read

971bd9a9141f41f19999f8450434fe87.png

 We found that after deleting the original file, the hard link can still print out the data of the original file, but the soft link cannot and shows that the file does not exist. So what are hard links? Hard links are actually a kind of reference counting, with the same file count ++, when we delete a file, we first count -- count, and only when the count is 0 will the file be actually deleted, as shown in the following figure:

ceeebf7efc14448d832afe0f2d82a383.png

c916361289834ba0aaed792c75f8afbd.png

So the number behind the file attribute we saw before is actually the number of hard links. When we create a few hard links, the number will increase. The inside of the soft link is the path of the file it points to, and the hard link is the original Document references. When you see this, can you think of soft links as something in windows? In fact, it is the shortcut of the program. If we delete the shortcut, it will not affect the real file itself. If you find its file path, you can still open the software, so this is a soft link. The above tests are all tested with ordinary files. Let's see what the directory looks like:

78c96ba8acbb42ba899e10794fcfd5dc.pngWhy is the number of hard links in a directory 2? It is not difficult to understand from the description we just made. The reason for 2 must be because there is a file name and a mapping relationship pointing to this directory. First, the directory name and its own inode are a hard link. Next, we enter the directory to see what is there. The files point to this directory:

fe8c20a2c93845c0a9447512d0fe76b0.png

 We found that there are hidden files in this empty directory, one. and one.., as we said before. Represents the current directory, .. represents the upper-level directory, and most importantly. The inode of this file is exactly the same as the inode of the directory, as shown in the figure below :

48e17a39f0544ff9a3fb48e0668194f5.png

 So... is it the same as...? The answer is yes, .. is the inode of the parent directory, as shown below:

78fdbb7c47044ed1a8b27b58eeec6624.png

Now that we can add a hard link to the file just now, can the directory also work, let's try:

3f059bdc3f7a4b6b814433bd3e9b10aa.png As a result, everyone has seen that we cannot add a hard link to the directory. Why is this? Why just . and .. can create hard links to the directory, but we can't? Because the operating system does not allow users to establish hard links to the directory, because if a hard link is established for the directory, it is easy to cause loop path problems, as shown in the following figure:

da7a56997aef40f2a7f3232455175341.png

 For example, we have established a hard link for the 107 directory of the path in the above figure, then when we use find to find the path, we find the hard link in 107, and then the hard link sends the path back to 107 so that we can never find it Arriving at the path, so why can . and .. avoid this problem? Because the operating system has done special processing here, it can be judged, but if the user encounters such a problem, the operating system is difficult to judge whether a loop is caused and how to solve it.

The three times of the file:

First check the command stat + file name of the time of the file:

bd2c02a0678446c4b5fb7be2386c5671.png

Change is the time to modify the properties of a file, as shown in the following figure:

9ccbc281bdf14e26b54207af65aa757e.png And modify is the time to change the content, let's change it to see:

35ddb041e6c54dae808740331eb30e71.png

 I don’t know if you have noticed that the time of the attribute changes after we change the content. This is because changing the file content will change the size in the file attribute, so the attribute will also change. Access is the access time of the file, let's try it out:

545f5527125244f3b30c4f171b2c182f.png

 Why didn't we change it even after we visited? This is because the proportion of viewing file content is very high. If we modify the file access time frequently, we will frequently access the disk and write the file attributes to the disk. This will greatly consume the cost of IO interaction, so generally Visit multiple times and modify once time.

2. Dynamic library and static library

Static library (.a): The program links the code of the library into the executable file when compiling and linking. Static libraries are no longer needed when the program is running
Dynamic library (.so): The code of the dynamic library is only linked when the program is running, and multiple programs share the code of the library.
An executable file linked with a dynamic library only contains a table of the function entry addresses it uses, rather than the entire machine code of the object file where the external function is located
Before the executable file starts running, the machine code of the external function is copied from the dynamic library on the disk to the memory by the operating system. This process is called dynamic linking.
Dynamic libraries can be shared among multiple programs, so dynamic linking makes executable files smaller and saves disk space. The operating system uses a virtual memory mechanism to allow a dynamic library in physical memory to be shared by all processes that use the library, saving memory and disk space.

We should be familiar with the library, because we have been using their standard library when writing c/c++ code, let's take a look at the library under linux:

fc1980baa1a74403b3ac4ba85b07fefa.png

 Our system has pre-installed C/C++ header files and library files. The header files provide method descriptions, and the library provides method implementation. Libraries and headers have a corresponding relationship and must be used together. The header file is introduced in the preprocessing stage. The essence of the link is actually the link library, so when we install the development environment under vs2019, we are actually installing the compiler software and the library and header files of the language we want to develop. If we are writing code When the header file is not included, there is no automatic syntax reminder function.

The name of the library must be followed by .so (dynamic library) or .a (static library). For example, we now have a library named libstdc++.so.6, and the real name of a library must remove the prefix lib and the suffix .so, So the real name of our library just now should be stdc++. Here we want to explain that the general cloud server will only have dynamic libraries by default, and there will be no static libraries, which need to be installed separately.

Below we encapsulate a simple library to let everyone know how to use the library:

1c02f4259c694632bf1be102dc28adb5.png

 We first create the header file and .c file for addition and subtraction, and then write a simple code:

First complete the add header file and .c file:

#ifndef __ADD_H__
 #define __ADD_H__ 
 int add(int a, int b); 
 #endif // __ADD_H__

 

#include "myadd.h"
 int add(int a, int b)
 {
     return a + b;
 }

 Use conditional compilation in the header file to prevent the header file from being included, and include the .h file in the .c file.

Then there is the sub header file and the .c file:

#ifndef __SUB_H__
 #define __SUB_H__ 
 int sub(int a, int b); 
 #endif // __SUB_H__
#include "mysub.h"
 int sub(int a, int b)
 {
    return a - b;
 }

Let's implement the main function below:

#include <stdio.h>
#include "myadd.h"
#include "mysub.h"
 
 int main( void )
 {
    int a = 10;
    int b = 20;
    printf("add(%d,%d)=%d\n", a, b, add(a, b));
    a = 100;
    b = 20;
    printf("sub(%d,%d)=%d\n", a, b, sub(a, b));
 }

Below we use these three .c files to generate an executable program:

406915f55757449f9aa2a1809f28093d.png

c64fb9e88d554ca5b769e64f0a3db48e.png

 Of course, we must have no problem running it. How to form our own library in the next step? Before doing this step, let's use another method to let the other party use our library: create two folders, and we will do the following steps to let the other party use our library without giving the other party the source code:

06c8a42bf26f4c9e990b9740fee1e1a2.png

 Next we put the main.c file into a folder for others to use:

2e5dc79471834803acab9c1f98a6fc6e.png

 Next we put both the .c file and the .h file into mylib:

18bd0f4eaeb8493cbc15c49e84015ed4.png

 The next step is to package these files under the mylib path: first, the .c file is preprocessed, compiled and assembled to form a .o file. The .o file is called a relocatable binary object file. This file is currently unable to run, but it has been It's binary.

1cd5d082351944458b633eb4ff8dadb6.png

 In the next step, use the same steps to form the mysub.c file into a .o file, as shown below:

c0a59ed640094de08364ec38566bd19f.png

 Next, copy all the .h files to others:

1969208e36e94fabb9054cb27b590af9.png

 Then we enter the folder of other people:

f1ecba517dbd4f35947914c9ed53b08e.png

 At this point, the person who wants to use our library can already use it. He only needs to compile it in the folder we gave him, as shown below:

0ab86d530efe4429a8c4ce8cce79dad3.png

 Let's first make the main.c file also form a .o file, and then link these .o files together:

dba6322084294aee97ef0f5e6ba04127.png

 Then we run this executable to find that our library is used normally.

Below is our official packaging library, we first package a static library:

As we said earlier, the prefix of the library is lib and the suffix is ​​.a:

ccda8108cd4b47a08efaa750592d8632.png

 We put all the .o files into our math static library, and we can see that the static library takes up a lot of space. Next, we will delete all the files except the main.c file that we will give to others:

2442018d9e734d11968522cb686a468d.png

After deletion, we copy the .h and .a static libraries to other people's files: 

1c08787a47734d3ca4f50e7da682b9e1.png With header files and static libraries, how should other people use them? We can use gcc main.c directly.

0a3b020c7ce1409d8d54107aa25e481c.png

 The error is reported here because our compiler does not recognize this library, so we need to use the -l command:

89804c23fbf44e6f805a97baeeba6bf2.png

 When we form an executable program, -L means to link. After the L, it means to search for our library in the current path, and -l is which library I want to connect in the corresponding path, and it runs successfully as shown in the figure above. Why not add the prefix and suffix .a, because we said earlier that the real name of the library does not contain the prefix and suffix.

After what we said before, if we want to share our library with others, we only need to put the .a library file in a folder, put the .h header file in a header file, and then put the two files together Just send it to the other party. Or package and upload, the other party wants to use decompression.

Summary of the use of third-party libraries:

1. Library files and header files that need to be known

2. If it is not installed in the default search path of the system gcc g++, the user must specify the corresponding option and inform the compiler, 1. Where is the header file 2. Where is the library file 3. Who is the library file?

3. Copy the header files and library files we downloaded to the default path of the system, and install the library under linux. The essence of installation and uninstallation is to copy to a system-specific path.

4. If the library we installed is a third-party (language, operating system, system interface) library, we must use it normally, even if it has been installed in the system, gcc/g++ must use -l to specify the name of the specific library.

Below we demonstrate the operation of the dynamic library:

Let's delete the .o and .a files just now

638698b97d3a4b7ca0db20dbbaa4efc4.png

 The dynamic library can directly use gcc as shown in the following figure:

68a79037794245b780cebd857303d845.png

 The -fPIC option means to form a .o file, and then we will package the .o file:

Dynamic library packaging can be done directly with gcc

9c18f1f2bb3d4d9ca1c27c473807d37a.png

 The library we want to form is mymath, remember to add the prefix and suffix, shared means that the package we created is a shared package

eaa25ebadd954ef99517fc0fa1f51203.png

 Next, create two folders for others, put the .h into one file, and put the .so file into another file

e2c1c8f3e8e2443c8d72eb671044eb02.png

Next we use the tar command to package:

151b1ec331204210bf93816ee0924faf.pngAfter the packaging is complete, send the file directly to others:

7ade3b20e6844615a559b2649ce24077.png

At this time, others can directly unpack and use our library:

26e8f651993b4ddaa1d45ac641ed50fd.png When we loaded the shared library, we found an error. Why?

adf6dc45af724329b9042292332650b3.png

 This means that the dynamic library is not linked into our executable program when connecting. In fact, the main reason for the error here is that we only told the compiler where our library is in the gcc command, but the operating system does not know where the library is. When running, because our .so is not in the system default path, so the operating system still cannot find the library. So why can the static library be found? Because the linking principle of the static library is to directly copy the binary code used by the user into the target executable program, but the dynamic library does not.

So how does the runtime operating system find the dynamic library? We have 3 methods:

1. Environment variable: LD_LIBRARY_PATH

cb793afd296344d38637f0bb9cf3dcec.png

 Below we demonstrate how to add dynamic libraries to environment variables:

b84b1ad5362549da88b27bbd8c7c52d2.png

3b3df4be5a92476db225b92f9cc9aee8.png At this time, we will check the environment variables and find that there is already a path:

10a22d6693034a928f8804d520f71db5.png

Next we link the dynamic library into the executable program:

c464a2c195774095ba9280f35292ef1a.png2af31c5580034392b2dc185bf3cedf9e.png

 This time we found that our executable program can run successfully.

 Of course, this is a temporary solution, because the environment variable is only valid within this login, and it cannot be run after we log out and log in again.

2. Soft connection scheme:

First, let's log out and log in again to invalidate the environment variables just now, and then add our library to the system library:

21f8182291aa42d7944800c14aa11025.png

b6fb197ba0864082a3d826ea10d693fb.png

 The path after ln -s is to find our own library, and the following lib64 is to add the soft connection of our library to the system library.

Then we use the ls -l command to view the path and find that the corresponding soft link is our library. Next, let's see if it works:

6383fd46f8cd4fc89aada8bd2d50cd5a.png

 From the above figure, we can see that it can run successfully and will not be unusable after exiting xshell and then logging in like the environment variables just now.

3. Configuration file scheme

Let's undo the soft connection scheme just now

b2b196098ef543ed96a9a108631920d3.png

 Let's first see what the configuration file looks like under linux, and then we also create a configuration file:

041771a7777446c39f1a80d419b47b63.png

 After creating our own configuration file, we can directly put the path of our library in it:

c2b095ab835d437baa66b4dea9bd87f6.png

4e876575ee034fb999c07ad7db217125.png

59c707e0e1904f2e9744917839a1fc02.png

 After we add the path of our library in the configuration file, the next step is to load the corresponding configuration file. The command to load the corresponding configuration file is ldconfig: Note that the root I use is a super user. If you are an ordinary user, you must Add sudo right in front.

986a46e28d584de7ad214b1671ffb786.png

 And this method is the same as the second solution, even if you log out of xshell and log in again, you can continue to use the library.

Any of the above three methods can find the dynamic library. Let's talk about the loading of dynamic and static libraries.

The executable program formed by static link itself has the implementation of the other party's method in the static library, but the static library is very resource-intensive (disk, executable program size becomes larger and loads occupy memory, the download cycle becomes longer, and network resources are occupied)

c24a6b00b40247d89bf94cba02e47b59.png

 The big square on the right in the above figure is the disk, and there is a small square in the disk that is the static library we just wrote, and this static library will be used by multiple people, and these people will copy the static library to their own files, And these codes will be copied to the memory when everyone is using and running, just like the slender square in the above picture is the memory, there are 4 copies of codes in it and they are all repeated, if such a library is copied in multiple programs Use, and each volume is very large, it is bound to occupy a lot of resources when loaded into the memory, and the resources occupied are disk resources, memory resources, network resources, etc.

 Let's look at the loading problem of the dynamic library:

ee2b6aca3504421181af733cd3422e4f.png

 The big drum on the far right is the disk, the green box circled in the disk is our dynamic library, and the left of the disk is the physical memory. When the method in the dynamic library is needed to form an executable program, instead of copying the code directly into the executable program like a static library, the address of the method in the dynamic library, such as 1234, is linked to the executable program. That is to say, the external symbols in the executable program are replaced with specific addresses in the library. When the executable program is running, the executable program will be loaded into the memory. When the program becomes a process, it is not only loaded into the memory, but also the corresponding PCB is created, so there are task_struct and process address space and page table, in the code The area has the virtual address of the printf method. When this method is executed, it is found that there is no such method after page table mapping. At this time, the operating system will search the dynamic library and find it, and then map the dynamic library to the shared area of ​​the process through the page table (shared area In the middle of the heap area and the stack area). Through our description, you can find that a problem that dynamic libraries must face is that different processes have different operating levels, and the third-party libraries that need to be used are different. It is doomed that the free position in the shared space of each process is uncertain. of. All addresses in the dynamic library are offsets, starting from address 0 by default. Only when the dynamic library is actually mapped into the address space can its starting address be truly determined.


Summarize

This article is relatively difficult, because most of the concepts require us to understand with the previous knowledge, and these problems are very abstract, you need to draw the logic diagram yourself to understand, for the three we give, let the operating system Everyone must try to find the method of dynamic library, because these methods will definitely be encountered in the process of doing projects in the future! The next linux article is inter-process communication, and I will introduce the pipeline under linux in detail.

 

Guess you like

Origin blog.csdn.net/Sxy_wspsby/article/details/130048151