After using dlclose to record the problem that so cannot be uninstalled
Problem Description
There is a function similar to a plug-in, use dlopen to load a so, when upgrading so, first dlclose, and then dlopen to load. In this way, it is not necessary to restart the program when replacing so. Originally, everything worked well, but there was a strange so. After upgrading the so, some functions could not use the implementation in the new so, or the implementation in the old so.
identify the problem
Use the info sharedlibrary command in gdb to view the loaded so, and then use lsof to view the following:
(gdb) info sharedlibrary
0x00007fff2132a2c0 0x00007fff213bc5f8 Yes /xxxx/xxx/libxxx.so
root@probe:~# lsof |grep libxxx.so
nginx 3391245 root mem REG 253,0 8604544 5250054 /xxx/xxx/libxxx.so
From these information, we can see that the inode of the so currently in use is 5250054, but use stat to view the inode information of the current so, and find that the inode is 5377891, as shown below:
root@probe:~# stat /xxx/xxx/libxxx.so
File: /apisec/modules/component/sensitive_data/libs/libdi_rechk.so
Size: 8604544 Blocks: 16808 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 5377891 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-10-21 17:01:57.404283135 +0800
Modify: 2022-10-21 15:48:29.000000000 +0800
Change: 2022-10-21 17:00:23.423822609 +0800
The inodes of the two are different, indicating that the same so file is not used, because the old so has been deleted. But the program has dlclosed and re-dlopened, so I searched for information on the Internet, and some information said to check whether there is a NODELET mark. So use the readelf command to check:
root@probe:~# readelf -d libxxx.so |grep NODELETE
0x000000006ffffffb (FLAGS_1) Flags: NODELETE
If there is this flag in so, the dynamic loader has been told not to unload the library, so after calling dlclose, this so will not be unloaded from the process.
manual testing
Write a test example of test.c by yourself, the code is as follows:
#include <stdio.h>
int test()
{
printf("this is test function.\n");
return 0;
}
Compile with the following compile command:
gcc -fpic -shared -o libtest.so test.c
Then use the readelf command to see if there is a logo:
gcc -fpic -shared -o libtest.so test.c
found no output.
Re-use the following command to compile and use the readelf command to view:
gcc -fpic -shared -znodelete -o libtest.so test.c
readelf -d libtest.so |grep NODELETE
0x000000006ffffffb (FLAGS_1) Flags: NODELETE
It can be seen that after adding the -znodelete option, the compiled so already has the NODELETE option.
The -znodelete option is a parameter of the linker program ld, use ld --help to see the effect of the -znodelete option. With this option, the program will be resident in the process, and will not be deleted from the process after calling dlclose. In addition to the nodelete option, there are other options such as nodlopen.
Why does this SO add the -znodelete option?
I learned from the person who developed so that he did not manually add this option. His so was developed in go language and compiled in c language using go build -buildmode=c-shared.
So I searched for related issues on the Internet and found that the so generated by the go language does not currently support dlclose, and the way to use it is to add the -znodelete option. (https://github.com/golang/go/commit/bd7de94d7fe8a0ba7742e90b1d6a09baa468bb58)
At present (October 24, 2022), go language support dlclose is still an open bug in golang's github: https://github .com/golang/go/issues/11100
References
https://stackoverflow.com/questions/45967961/how-to-unload-all-the-dependent-shared-libraries-from-a-process
https://www.coder.work/article/1518255
https://docs.oracle.com/cd/E19683-01/816-0210/6m6nb7mcs/index.html