Libco source code reading (8): hook mechanism

1. What is the hook mechanism

2. Dynamic connection and static connection

    2.1 Compilation process

    2.2 Static link

    2.3 Dynamic link

3. Implementation of hook mechanism

    3.1 Use the environment variable LD_PRELOAD

    3.2 Source code invasion


    I read the libco source code step by step following the blog of a big guy. I thought that seeing the co_eventloop was the source code, but I didn’t expect to look down. There is actually a hook mechanism. I have never used it like this after programming for so many years. , Hook is really an obscene technique in programming. In short, looking at the excellent source code will be of great benefit to improving your programming level. Let's take a look at what a hook is.

1. What is the hook mechanism

    The hook mechanism is essentially a function hijacking technology. For example, we usually need to call the malloc function to allocate memory. Can we encapsulate a malloc function with the same name, the same input parameters and the same return value to replace the system malloc function, Some specific functions are implemented in the malloc function that we encapsulate, and the malloc of the system can also be called back. This is the hook mechanism.

    The dlopen and dlsym family functions provided by the system can be used to operate the dynamic link library. For example, if we want to hook the system call function read, we can use the dlsym family function to get the address of the function before hook, so that we can call back in the read implemented by ourselves The original function, plus some additional logic, and will call our version when it is running.

    Through the hook mechanism, libco can replace the synchronous code with asynchronous code without the user's feeling. This is also the purpose of Tencent engineers writing libco. The libco library provides hooks for socket family functions, so that the background logic service can be asynchronously transformed almost without modifying the logic code . It is claimed that a single machine can reach tens of millions of connections.

2. Dynamic connection and static connection

2.1 Compilation process

    The compilation process generally consists of 4 steps, namely the preprocessing phase, the compilation phase, the assembly phase and the link phase, as shown in the figure below.

    

2.2 Static link

    Static linking means that in the linking phase, the object file .o generated by the assembly will be linked and packaged into the executable file (.out) together with the referenced library. Imagine that the static library and the object file generated by the assembly are linked together as an executable file, then the static library must be similar to the .o file format. In fact, a static library can be simply regarded as a collection of a set of object files (.o files), that is, a file formed after many object files are compressed and packaged.

    Summary of static library features:

  •  The link between the static library and the function library is done at compile time
  •  The program has nothing to do with the function library at runtime, easy to transplant
  •  Waste space and resources, because all related object files and the function libraries involved are linked into an executable file
  • The static library will bring trouble to the program update, deployment and release page. If the static library liba.lib is updated, all applications that use it need to be recompiled and released to users (for players, it may be a small change, but it causes the entire program to be re-downloaded and updated in full)

2.3 Dynamic link

    Dynamic linking means that the dynamic library will not be linked to the target code when the program is compiled, but will be loaded when the program is running. If different applications call the same library, only one instance of the shared library is needed in memory, avoiding the problem of space waste. The dynamic library is loaded when the program is running, which also solves the trouble caused by the update, deployment and release page of the program by the static library. Users only need to update the dynamic library.

    The characteristics of the dynamic library include:

  • The dynamic library postpones the linking and loading of some library functions until the program is running.
  • Can realize resource sharing between processes, so dynamic libraries are also called shared libraries
  • Some program upgrades become simple, and even the link loading can be truly controlled by the programmer in the program code (explicit call)

3. Implementation of hook mechanism

    There are two ways to implement hook, the first is to use the environment variable LD_PRELOAD, and the second is to directly invade the code.

3.1 Use the environment variable LD_PRELOAD

    LD_PRELOAD is an environment variable of the Linux system. It can affect the runtime linker of the program. It allows you to define the dynamic link library that is loaded first before the program runs. This function is mainly used to selectively load the same function in different dynamic link libraries. Through this environment variable, we can load other dynamic link libraries between the main program and its dynamic link libraries, and even cover normal function libraries. On the one hand, we can use this function to use our own or better functions (without other people's source code), and on the other hand, we can also inject programs into other people's programs to achieve specific goals.

    When the system is looking for a dynamic library, it will generally look for it under LD_LIBRARY_PATH, but if the variable LD_PRELOAD is used, the system will first search for this path. If it finds it, it will return. Don’t look for it anymore. By the way, load the dynamic library. The order is: LD_PRELOAD ----> LD_LIBRARY_PATH ----> /etc/ld.so.cache ----> /lib>/usr/lib.

// hookread.cpp
#include <dlfcn.h>
#include <unistd.h>

#include <iostream>

typedef ssize_t (*read_pfn_t)(int fildes, void *buf, size_t nbyte);

static read_pfn_t g_sys_read_func = (read_pfn_t)dlsym(RTLD_NEXT,"read");

ssize_t read( int fd, void *buf, size_t nbyte ){
    std::cout << "进入 hook read\n";
    return g_sys_read_func(fd, buf, nbyte);
}

void co_enable_hook_sys(){
    std::cout << "可 hook\n";
}

// main.cpp
#include <bits/stdc++.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <unistd.h>

using namespace std;

int main(){
    int fd = socket(PF_INET, SOCK_STREAM, 0);
    char buffer[10000];
    
    int res = read(fd, buffer ,10000);
    return 0;
}

   Compile and run:

g++ -o main main.cpp
g++ -o hookread.so -fPIC -shared -D_GNU_SOURCE hookread.cpp -ldl
LD_PRELOAD=./hookread.so ./main

输出:
进入 hook read

 

3.2 Source code invasion

    But libco does not do this, you can't see LD_PRELOAD in the whole libco, libco uses a special method, that is, by including the function defined in co_hook_sys_call.cpp in the user code, we can also use our own The library to replace the system library. Let's see how to do it:

//hookread.cpp
#include <dlfcn.h>
#include <unistd.h>

#include <iostream>

#include "hookread.h"

typedef ssize_t (*read_pfn_t)(int fildes, void *buf, size_t nbyte);

static read_pfn_t g_sys_read_func = (read_pfn_t)dlsym(RTLD_NEXT,"read");

ssize_t read( int fd, void *buf, size_t nbyte ){
    std::cout << "进入 hook read\n";
    return g_sys_read_func(fd, buf, nbyte);
}

void co_enable_hook_sys(){
    std::cout << "可 hook\n";
}

// hookread.h
void co_enable_hook_sys();


// main.cpp
#include <bits/stdc++.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include "hookread.h"
#include <unistd.h>

using namespace std;

int main(){
    co_enable_hook_sys();
    int fd = socket(PF_INET, SOCK_STREAM, 0);
    char buffer[10000];
    
    int res = read(fd, buffer ,10000);
    return 0;
}

  Compile and run:

g++ hookread.cpp -o hookread.i -E
g++ hookread.i -o hookread.s -S
g++ hookread.s -o hookread.o -c
g++ main.cpp -ldl hookread.o
./a.out

输出:
可 hook
进入 hook read

    We can see that this can also achieve the purpose of hooking. Although compared to the previous method, it will invade the user code, but the advantage is that the user does not need to configure the environment variables by himself to reduce the difficulty of use.

reference:

    https://blog.csdn.net/liushengxi_root/article/details/78798130

    https://blog.csdn.net/weixin_43705457/article/details/106895038

    https://blog.csdn.net/shixin_0125/article/details/78848561

    

Guess you like

Origin blog.csdn.net/MOU_IT/article/details/115050472