Implementing DLL Injection by Embedding Assembly in C++

      There are many ways to implement DLL injection, and they are all relatively mature. The most used method is to perform thread injection through remote threads, and then import the Dll file. There are two key technologies to be solved in remote threads, one is the access problem of global variables and strings, and the other is the problem of address relocation. Implementing these two problems in assembly is simple, but in high-level languages ​​it's a bit clumsy. There was an article that used C++ to implement this technique, but that article used a workaround, that is, using local variables instead of global variables, and passing the variables to the remote thread when the remote thread was created. This method can be said to achieve the purpose of killing two birds with one stone, because local variables exist in the stack, there is no absolute address calling problem, and there is no address relocation problem when accessing local variables. But looking back, that method still failed to break through these limitations of C++. After referring to the thread hiding chapter in Luo Yunbin's "32-bit Assembly Language Programming in Windows Environment", the author proposes a method to implement remote threads in C++. Because this article involves a lot of assembly knowledge, I believe that the significance of this article is not only to propose a method of DLL injection, but more importantly, to train the programming ability and deepen the understanding of the bottom layer of the operating system.
        The variables in the compilation process of the high-level language
        C++ are divided into the following situations: global variables, strings and local variables. For variables, we can define direct calls in C++ programming. We never need to consider how these variables are stored after the generated binary file is loaded into memory, but these must be considered when performing remote thread operations. In fact, at compile time, the compiler treats variables and code differently. Global variables and strings are placed in the data segment, local variables are stored in the stack, code is located in the code segment, and the data segment, stack area, Code segments are stored independently in memory, and their addresses are not contiguous. The access to global variables and strings at compile time is modified into a call to an absolute address, which is determined at compile time, and the operating system is responsible for allocating and initializing memory at this address when the program is running. The compiler's processing of local variables is to modify them into operations on the stack, so that the addresses of local variables actually depend on the stack registers. Different stack registers have different absolute addresses.
        In addition, when a function is called in a high-level language, the compiler does the following: 
1. Push the parameters to be passed back onto the stack;
2. 2. Use the call statement to call the function address and execute the function;
3. After the function is executed, the balance of the stack is restored;
        in the second step of the above three steps, it can be divided into several small steps:
1. Push the next instruction address of the call instruction of this sentence into the stack;
2. 2. Go to the function address to execute
; 3. Pop the return address from the stack when encountering a return instruction such as ret
; Go to the popped address (that is, the next instruction of the call statement) to continue execution.
        Now take LoadLibrary as an example, this function is an API in windows, located in Kernel32.dll, and its default load address is 7C801D7B. Because Kernel32.dll is resident in memory, in general, this address in memory is the entry address of LoadLibrary, but this is not absolute. To call this function in C++ we can write:         

LoadLibrary(”Dll.dll”);  

        In assembly it would be written like this:

Push  字符串”DLL.dll”在内存中的地址,通常这个地址位于数据区
Call  LoadLibrary的入口地址,如7C801D7B

        Because the APIs in Windows are standard calling conventions, the work of restoring the stack is done by the called function. Of course, the custom functions in C++ are generally C calling conventions. If you call a function written by yourself, you have to consider the problem of restoring the stack. Finally, a point to note is that the return value of calling the API in assembly is generally placed in the register eax, so to detect whether the function is called successfully, you only need to detect the value of eax.
        Problems encountered by remote threads in high-level languages
        ​​First, let's recall the implementation process of remote threads:
1. Write remote thread code somewhere in the program;
2. Use WriteProcessMemory to copy the remote thread code written in the previous step to the target process;
3. Create a remote thread with CreateRemoteThread and make it run.
        Now let's analyze it carefully. Let's assume that this process is process A, and the host process to be injected is process B. The remote thread code written in the first step is located in A. The absolute addresses of the global variables and strings used at compile time are determined by process A, and these addresses are located in the data area of ​​process A. The second step When copying the code, we completely copy the data in the code area to process B, while the data area is not copied. When the copied code wants to run, it also needs to access the absolute addresses of those global variables and strings. In process B, that address may already be occupied by other processes, or it may be some random data, which will cause access errors. In assembly language, you can apply for variable space in the code area, and these variables are copied accordingly when copying the code, so that there is no problem of accessing the absolute address of the data area. However, in C++, it is not allowed to apply for variable space in the code area, so how to make variables copied along with the code is the first problem to be solved.
        Second, if we have successfully stored variables in the code area. But the compiler modifies the access to this variable to access to the absolute address at compile time, and this address is also located in process A. When applying for memory space in process B, the address of this space is indeterminate, so after the code is copied to process B, the access to this address will still go wrong. The principle is the same as above. So this requires us to relocate the addresses of these variables.

        Implementation of remote thread code
        Because our purpose is to import our own DLL file into the target process, the code of the remote thread is relatively simple, which is to call LoadLibrary, which is the code in the example above. There are two variables we want to save in the code, one is the address of LoadLibrary (although this address is basically fixed, but for the sake of insurance, we still get it dynamically and save it, otherwise this variable is not needed), The other is a string holding the DLL filename. In order to allocate memory for these two variables in the code area when writing remote thread code, we can use empty instructions to occupy the space for these two variables, copy them to process B, and then modify them to real values. For example, the address of the API occupies four bytes, and an empty instruction nop in the assembly occupies one byte, so we use four nop to "apply" space for it; our DLL file name is "Dll.dll", which occupies one byte. Seven bytes, considering that the character must be marked with 0 as the end mark, it occupies a total of 8 bytes, so we use 8 nop to "apply" space.
        The next problem is to solve the problem of address relocation. This technology has a wide range of applications in viruses, Trojan horses and many other aspects. Of course, this technology is not implemented by the author. The author also learned it through learning. Here, I will briefly introduce the principle of implementation.
Take a look at the following code:

1                call              relocal
2                relocal:
3                pop               ebx
4                sub               ebx , offset relocal

        Now take a closer look. When the first sentence is executed, the address of the third sentence at runtime (note that it is the address of the runtime, not the absolute address, this address is different in process A and in process B) will be pushed onto the stack, and then executed The third sentence, which in turn pops the address into register ebx. The offset relocal of the fourth sentence is modified by the compiler to the absolute address of the third sentence in process A at compile time. If the code is now running in process B, ebx is not 0 after the fourth sentence is subtracted and executed. It's the difference between the address offset of this code in process A and the address offset in process B! After the difference is obtained, the correct address can be obtained by adding the difference whenever accessing a variable containing an absolute address in process B.
        Well, the code of the remote thread after the key technology is implemented is as follows:

REMOTE_THREAD_BEGIN:                        //远程线程代码开始标记
        _asm        
        {
                //*******给LoadLibrary函数地址占位*******
LoadLibraryAddr:
                nop
                nop
                nop
                nop
                //*******给FreeLibrary函数地址占位*******
FreeLibraryAddr:
                nop
                nop
                nop
                nop
                //*******给动态链接库名占位*******
LibraryName:
                nop
                nop
                nop
                nop
                nop
                nop
                nop
                nop

                //*******代码开始的真正位置*******
REMOTE_THREAD_CODE:
                //*******实现地址重定位,ebx保存差值*******
                call                relocal
relocal:
                pop                ebx
                sub                ebx , offset relocal
                //*******1.调用LoadLibrary*******
                //*******1.1.压入LoadLibrary参数(动态链接库名)*******
                mov                eax , ebx                                        
                add                eax , offset LibraryName        //变量地址加上ebx,实现地址重定位
                push                eax
                //*******1.2.调用LoadLibrary*******
                mov                eax , ebx
                add                eax , offset LoadLibraryAddr                //同样实现地址重定位
                mov                eax , [eax]                //从变量中取出LoadLibrary的地址
                call                eax
                //*******1.3.检测是否成功,如果失败了就直接返回,防止程序异常*******
                or                eax , eax
                jnz                NEXT1                        //执行成功,跳转到位NEXT1继续执行
                ret
NEXT1:
                // *******2.释放动态链接库*******
                // *******2.1.压入FreeLibrary参数*******
                push                eax
                // *******2.2.调用FreeLibrary*******
                mov                eax , ebx
                add                eax , offset FreeLibraryAddr                //地址重定位
                mov                eax , [eax]                //从变量中取出FreeLibrary的地址
                call                eax

                ret
        }
REMOTE_THREAD_END:

        Because the DLL file will automatically execute the code in DllMain when it is imported for the first time, so we put the code written in the DLL in this function, so that the code can be executed as long as the DLL file is imported; if not, we You must also obtain the function address in the DLL file, which will increase the workload. Of course, if you are not afraid of trouble, you can try it.

        After the implementation of the main program
        , the technical content of this section is relatively much lower after the remote thread code is implemented. This code mainly realizes the following functions:
1. Apply for code space in the host process;
2. 2. Copy the code of the remote thread to the host process
; 3. Modify the value of the remote thread variable
; Create a remote thread to enable remote code execution.

        The key code is as follows:

//*******1. 在宿主进程中申请代码空间*******
//*******1.1. 通过进程ID打开进程句柄,并获得进程句柄*******
        HANDLE        hSelectedProcHandle;                //保存宿主进程句柄
        hSelectedProcHandle = OpenProcess(PROCESS_ALL_ACCESS , FALSE , 
nSelectedThreadId);        //进程ID的获取方法,完整的源代码中有介绍,这里就不介绍了

//*******1.2.得到远程线程代码长度,目的是得到要申请的空间的大小******
        int         nRemoteThreadCodeLength;                        //保存代码长度
        _asm 
        {
                        mov                eax , offset REMOTE_THREAD_END
                        mov                ebx , offset REMOTE_THREAD_BEGIN
                        sub                eax , ebx                //用代码结尾偏移减去开始的偏移,得到代码长度
                        mov                nRemoteThreadCodeLength , eax
        }

//*******1.3.在宿主进程中申请空间*******
        LPVOID         pRemoteThreadAddr;          //保存申请空间的基址
        pRemoteThreadAddr = VirtualAllocEx(hSelectedProcHandle , NULL , nRemoteThreadCodeLength , MEM_COMMIT,PAGE_EXECUTE_READWRITE);

//*******2.把远程线程的代码复制到宿主进程*******
//*******2.1.得到本进程中远程线程代码的起始地址*******
        LPVOID        pRemoteThreadCodeBuf;        //指向本进程中远程线程代码的起始位置
        DWORD        nWritenNum ,        nSuccess;                //临时变量
        _asm mov        eax , offset REMOTE_THREAD_BEGIN
        _asm mov        pRemoteThreadCodeBuf , eax        

//*******2.2.向宿主进程中复制代码*******
        nSuccess = WriteProcessMemory(hSelectedProcHandle , pRemoteThreadAddr , pRemoteThreadCodeBuf , nRemoteThreadCodeLength , &nWritenNum);
        
// *******3.修正远程线程中变量的值*******
// *******3.1.首先获取两个关键函数的地址*******
        HMODULE hKernel32;
        hKernel32 = LoadLibrary("Kernel32.dll");
        LPVOID pLoadLibrary , pFreeLibrary;
        pLoadLibrary = (LPVOID)GetProcAddress(hKernel32 , "LoadLibraryA");
        pFreeLibrary = (LPVOID)GetProcAddress(hKernel32 , "FreeLibrary");
        
// *******3.2.修正代码*******
        PBYTE pRemoteAddrMove;                //在远程线程地址上移动的指针
        pRemoteAddrMove = (PBYTE)pRemoteThreadAddr;

// *******3.2.1.修正LoadLibrary地址*******
        nSuccess = WriteProcessMemory(hSelectedProcHandle , 
                pRemoteAddrMove , 
                &pLoadLibrary ,
                4 , 
                &nWritenNum);

//*******3.2.2.修正FreeLibrary地址*******
        pRemoteAddrMove +=4;                //定位到保存FreeLibrary的变量
        nSuccess = WriteProcessMemory(hSelectedProcHandle ,
                pRemoteAddrMove , 
                &pFreeLibrary ,
                4 ,
                &nWritenNum);

//*******3.2.3.修正动态链接库名*******
        char szDllName[8] = {"Dll.dll"};                //注意这里必须是8个字符,
//并且必须与你的DLL文件名相同
        pRemoteAddrMove +=4;
        nSuccess = WriteProcessMemory(hSelectedProcHandle , 
                        pRemoteAddrMove , 
                        szDllName ,
                        8 , 
                        &nWritenNum);

        //*******4.创建远程线程,使远程代码执行*******
        //*******4.1.把指针移动到远程线程代码开始处*******
        pRemoteAddrMove +=8;
        HANDLE hRemoteThreadHandle;                //远程线程句柄

        // *******4.2.定义远程线程函数类型*******
        typedef unsigned long (WINAPI *stRemoteThreadProc)(LPVOID);
        stRemoteThreadProc pRemoteThreadProc;

        // *******4.3.把入口地址赋给声明的函数*******
        pRemoteThreadProc = (stRemoteThreadProc)pRemoteAddrMove;

        //*******4.4.创建远程线程*******
        hRemoteThreadHandle = CreateRemoteThread(hSelectedProcHandle , NULL , 0 , 
                pRemoteThreadProc , 0 , 0 , NULL);

        Because this module mainly calls some APIs, and these APIs can be used to know their usage by checking the data, so I will not introduce them in detail here. The attached source code is a dialog-based MFC program, and there is also a module for obtaining the current system process, and its implementation process will not be introduced here. Also comes with a simple DLL project as a test. When running the program, be sure to put the DLL file in the system search path, otherwise it will fail because the DLL file cannot be found.
        So far, all functional modules have been introduced. Generally speaking, this method achieves our expected functions. Its disadvantage is that it is more complicated to implement, but it is a good method from the perspective of learning. If there is something missing in the text, please criticize and correct me.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325364326&siteId=291194637