Ptrace--Linux中一种代码注入技术的应用

Ptrace--Linux中一种代码注入技术的应用

一、摘要
二、基础
三、ptrace 参数
四、读取系统调用参数
五、一些有趣的尝试
六、单步调试
参考与链接
文档信息
注释
相关文章

Ptrace–Linux中一种代码注入技术的应用

在以往的工作中，曾遇到以下需求:可以随意的打开或是屏蔽已运行进程的输出。
通过查询相关博客以及开源项目，最终选择ptrace作为最终的实现手段。从理解到最终应用的过程中，以下博客:Playing with ptrace (作者：Pradeep Padaia)起到了答疑解惑的作用，本文介绍了该博文的主要内容，并在此基础上加入了个人的一些理解与修改。
此外，原文代码是在i386上运行，而本人所用为64位机，因此对原有代码进行了移植。

一、摘要

你曾对系统调用是怎样被中断的而感到好奇么？你曾尝试通过改变系统调用的参数，来愚弄内核么？你曾经想过调试器是如何暂停正在运行的进程，转而由你获得控制权么？
如果你正在尝试用复杂的内核编程来完成任务，可以重新考虑使用Linux提供的一种优雅的机制——ptrace系统调用，来完成这些任务。ptrace提供了一种使得 Tracer 可以观察并控制 Tracee 的机制。该机制可以检测并改变 Tracee 的内核镜像以及寄存器，其主要被用来实现断点调试以及对系统调用的追踪。
在本篇文章中，我们将会了解如何阻止一个系统调用，同时改变它的参数。而在下篇文章中，我们将学习更高级的技术——向一个运行中的程序中设置断点或是注入一段代码。如此一来，我们就可以一窥 Tracee 的寄存器以及数据段，并修改内容。此外，本文也会介绍使得 Tracee 可以被停止并执行任意指令的方式。

二、基础

操作系统通过一组被称为系统调用的标准机制来提供服务。系统调用提供了标准的API用以访问底层的硬件及服务（例如文件系统）。当一个进程想要执行一个系统调用时，它将系统调用的参数传入寄存器，并调用0x80软中断。该软中断就像是进入内核模式的一扇大门，内核会在检测完参数后，执行相应的系统调用。在i386架构上，系统调用号是被放在%eax寄存器中，而系统调用相应的参数则是按序被放在%ebx,%ecx, %edx, %esi以及%edi。例如：

write(2, "Hello", 5)

大致会转化为以下汇编语句：

movl   $4, %eax			//在i386中__NR_write的系统调用号是4
movl   $2, %ebx
movl   $hello,%ecx
movl   $5, %edx
int    $0x80

其中 $hello 指向了字符串 "Hello"。
那么 ptrace应出现在哪里呢？在执行系统调用之前，内核会检测该进程是否被追踪。对于已被追踪的程序—— Tracee ，内核会暂停该程序，并将控制权给到 Tracer ,因此 Tracer 就可以检测和修改 Tracee 的寄存器了。

如果是64位系统则有所不同, 用户层的应用使用寄存器%rdi, %rsi, %rdx, %rcx, %r8 以及 %r9来传参,而内核接口用%rdi, %rsi, %rdx, %r10, %r8 以及 %r10来传参. 并且用syscall指令而不是80软中断来进行系统调用. 相同之处是都用寄存器%rax来保存调用号和返回值.
更多关于32位和64位汇编指令的区别可以参考stack overflow关于[What are the calling conventions for UNIX & Linux system calls on i386 and x86-64]的总结,
因为我当前环境是64位Linux,所以下文的操作都以64位系统为例.
——Linux Hook 笔记

我们通过一个简单的例子来阐明其是怎样工作的：

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/reg.h>   /* For constants  ORIG_RAX etc */
#include <stdio.h>
int main()
{   
	pid_t child;
    long orig_rax;
    child = fork();
    if(child == 0) 
    {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        printf("Try to call: execl\n");
        execl("/bin/ls", "ls", NULL);
        printf("child exit\n");
    }
    else 
    {
        wait(NULL);
        orig_rax = ptrace(PTRACE_PEEKUSER,
                          child, 8 * ORIG_RAX,
                          NULL);
        printf("The child made a "
               "system call %ld\n", orig_rax);
        ptrace(PTRACE_CONT, child, NULL, NULL);
        printf("Try to call:ptrace\n");
    }
    return 0;
}

程序输出如下：
Try to call: execl The child made a system call 59 Try to call:ptrace Press to close this window… main.o Makefile Ptarce。

注：
1.源代码在移植到64位机时进行了修改，具体原因见：将ptrace移植到64位机

系统调用59是指exeve，它是子进程执行的第一个系统调用。作为参考，64位机的系统调用号可以在 /usr/include/asm/unistd_64.h路径下找到。
正如你在例子中看到的，父进程fork了一个子进程，由子进程执行了我们想要追踪的进程。在运行execl()前，子进程调用了ptrace(使用PTRACE_TRACEM作为第一个参数)。该操作告诉内核，这个进程将会被追踪，而当子进程执行execv系统调用时，它就将控制权移交给了它的父进程（父进程通过wiat()调用获取通知）。
接下来，父进程就可以检测子进程系统调用的参数，或是做些其它的事情，例如查看子进程的寄存器等等。
当系统调用发生的时候，内核将%eax寄存器中原始的系统调用号保存下来。正如上边所展示的那样，我们可以调用ptrace（将PTRACE_TRACEME作为第一个参数）从子进程的用户段中读出该值。
当我们检查完系统调用后，父进程可以调用ptrace（将PTRACE_CONT作为第一个参数），使子进程得以继续执行。
在系统调用追踪中，常见的流程如下图所示：
系统调用追踪流程图

三、ptrace 参数

ptrace的函数声明如下：

long ptrace(enum __ptrace_request request,
            pid_t pid,
            void *addr,
            void *data);

第一个参数决定了code的行为以及后续的参数是如何被使用的。具体的值可以是以下中的一个：

第一个参数	说明
PTRACE_TRACEME	指明该进程将要被其父进程追踪。后续的参数可以被忽略。该请求只可以被 Tracee 使用；剩余的请求则只能被Tracer使用。其余的请求中，`pid`指定了 Tracee 的进程ID。
PTRACE_PEEKTEXT	从 Tracee 存储空间中的`addr`地址处读取一个字，并将其作为返回值。由于Linux不区分文本与数据段的地址空间，因此`PTRACE_PEEKTEXT`,`PTRACE_PEEKTEXT`这两个请求无异（`data`将被忽略）。
PTRACE_PEEKDATA	同上
PTRACE_PEEKUSER	在 Tracee 用户空间（保存了关于进程的寄存器，以及其他信息）的偏移地址——`addr`处读取一个字。该值将作为结果返回。因架构而异，偏移地址`addr`通常是是字对齐的（`data`将被忽略）。
PTRACE_POKETEXT	将一个字长的数据`data`复制到 Tracee 存储空间中的地址`addr`去。由于Linux不区分文本与数据段的地址空间，`PTRACE_PEEKTEXT`,`PTRACE_PEEKTEXT`因此这两个请求无异
PTRACE_POKEDATA	同上
PTRACE_POKEUSER	将一个字长的数据`data`复制到 Tracee 用户空间的偏移地址`addr`处去。对于该请求而言，偏移地址`addr`通常是要求字节对其的。为了维护内核的完整性，针对用户区的某些修改，通常是禁止的。
PTRACE_GETREGS	复制 Tracee 的通用寄存器内容给`data`。具体的数据格式可以参考`<sys/user.h>`(其中`addr`是被忽略的)。注意：在SPARC系统中`data`和`addr`的语意是反转的——`data`被忽略，寄存器数据被保存在`addr`地址处。`PTRACE_GETREGS`以及`PTRACE_GETFPREGS`并非在所有的架构上都有实现。
PTRACE_GETFPREGS	用法与`PTRACE_GETREGS`相同，只是取 Tracee 浮点寄存器的值。
PTRACE_SETREGS	取 Tracer `data`地址处的数据，用以修改 Tracee 通用寄存器的值。至于`PTRACE_POKEUSER`，针对一些通用寄存器的修改可能是禁止的（`addr`是被忽略的）。注意：在SPARC系统中`data`和`addr`的语意是反转的——`data`被忽略，Tracee 的寄存器从`addr`地址处获取数据。`PTRACE_SETREGS`与`PTRACE_SETFPREGS`并非在所有的设备上都有实现。
PTRACE_SETFPREGS	用法与`PTRACE_SETREGS`相同，只是修改 Tracee 浮点寄存器的值。
PTRACE_CONT	重新启动已被停止的 Tracee —— 使其继续运行。如果`data`是非空的，它被解释为要发送给 Tracee 的信号，否则，不发送任何信号。换句话说，Tracer 可以控制是否要发送一个信号给到 Tracee （`addr`是被忽略的）。
PTRACE_SYSCALL、PTRACE_SINGLESTEP	像`PTRACE_CONT`一样，使得已经停止的 Tracee 重新运行，但是 Tracee 会在每次进入或是离开一个系统调用，或是执行完一条信号指令后被停止。从 Tracer 的角度而言， Tracee 就如同受到了`SIGTRAP`被停止了一样。因此`PTRACE_SYSCALL`一种用法是：在进入系统调用时检测参数；在离开此次调用时，检测其返回值。`data`参数的用法与`PTRACE_CONT`中的一致（`addr`是被忽略的）。
PTRACE_DETACH	像`PTRACE_CONT`一样，使 Tracee 继续执行，但会使 Tracee 脱离被追踪状态。在Linux中，无论是使用何种方式建立追踪状态的，都可以通过此方式使 Tracee 摆脱追踪状态（`addr`是被忽略的）。

四、读取系统调用参数

使用PTRACE_PEEKUSER作为调用ptrace的第一个参数，我们能够检测包含寄存器内容以及其它信息的用户区域——USER area。内核将寄存器信息保存在该区域，以便父进程(Tracer)能够通过ptrace检测它。
具体的使用方式详见下例：

#include <sys/wait.h>
#include <unistd.h>     /* For fork() */
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/reg.h>   /* For constants ORIG_RAX etc */
#include <sys/user.h>
#include <sys/syscall.h> /* SYS_write */
#include <stdio.h>
int main() {
    pid_t child;
    long orig_rax;
    int status;
    int iscalling = 0;
    struct user_regs_struct regs;

    child = fork();
    if(child == 0)
    {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execl("/bin/ls", "ls", "-l", "-h", NULL);
    }
    else
    {
        while(1)
        {
            wait(&status);
            if(WIFEXITED(status))
            {
                break;
            }
            orig_rax = ptrace(PTRACE_PEEKUSER,
                              child, 8 * ORIG_RAX,
                              NULL);
            if(orig_rax == SYS_write)
            {
                ptrace(PTRACE_GETREGS, child, NULL, &regs);			//获取寄存器参数
                if(!iscalling)			//进入系统调用
                {
                    iscalling = 1;			
                    printf("[Enter SYS_write call] with regs.rdi [%ld], regs.rsi[%ld], regs.rdx[%ld], regs.rax[%ld], regs.orig_rax[%ld]\n",
                            regs.rdi, regs.rsi, regs.rdx,regs.rax, regs.orig_rax);
                }
                else			//离开此次系统调用
                {
                    printf("[Leave SYS_write call] return regs.rax [%ld], regs.orig_rax [%ld]\n", regs.rax, regs.orig_rax);
                    iscalling = 0;
                }
            }
            ptrace(PTRACE_SYSCALL, child, NULL, NULL);
        }
    }
    return 0;
}

程序的输出如下所示：
[Enter SYS_write call] with regs.rdi [1], regs.rsi[140309977006080], regs.rdx[10], regs.rax[-38], regs.orig_rax[1] total 40K [Leave SYS_write call] return regs.rax [10], regs.orig_rax [1] [Enter SYS_write call] with regs.rdi [1], regs.rsi[140309977006080], regs.rdx[49], regs.rax[-38], regs.orig_rax[1] -rw-r–r--. 1 root root 7.5K Oct 7 16:56 main.o [Leave SYS_write call] return regs.rax [49], regs.orig_rax [1] [Enter SYS_write call] with regs.rdi [1], regs.rsi[140309977006080], regs.rdx[51], regs.rax[-38], regs.orig_rax[1] -rw-r–r--. 1 root root 17K Oct 6 16:58 Makefile [Leave SYS_write call] return regs.rax [51], regs.orig_rax [1] [Enter SYS_write call] with regs.rdi [1], regs.rsi[140309977006080], regs.rdx[49], regs.rax[-38], regs.orig_rax[1] -rwxr-xr-x. 1 root root 11K Oct 7 16:56 Ptarce [Leave SYS_write call] return regs.rax [49], regs.orig_rax [1] Press to close this window…
在以上程序中，我们追踪了 Tracee 的write系统调用，可以看到在此次的ls -l -h命令中，共发生了四次write调用。在读取寄存器的时候，我们可以使用之前介绍的PTRACE_PEEKUSER参数来获取某个寄存器的值，也可以直接使用PTRACE_GETREGS请求，将所有寄存器的值读到结构体user_regs_struct中，该结构体的定义在sys/user.h中。
在上例中，从wait调用获取的状态值被用来检测 Tracee 是否已经退出。这是用来检测进程是被ptarce停止还是退出的典型用法。关于WIFEXITED宏的用法，可以参见the wait(2) man page。
需要引起我们注意的一点是：从程序的返回值中我们可以发现,无论是在进入还是退出系统调用，%orig_rax寄存器中保存的都是系统调用号，而 %.rax则是在调用返回时，保存了返回值。关于该内容具体的介绍，详见Why is orig_eax provided in addition to eax?。文中主要讲解了在32位机中已经有了 %eax寄存器的情况下，为何还需要 %orig_eax。这两个寄存器其实就对应了64位机中的 %rax以及 %orig_rax,以下是援引文中的一段话：

Ptrace needs to be able to read both all registers state before syscall and the return value of syscall; but the return value is written to %eax. Then original eax, used before syscall will be lost. To save it, there is a orig_eax field.
系统希望ptrace能够读取系统调用前各寄存器的值，同时包括调用的返回值；但是在系统调用后，返回值被写在了%eax寄存器中，而原来写在%eax中的系统调用号，则被丢弃了。为了能够保存调用号，因此才有了%orig_eax
——Why is orig_eax provided in addition to eax?

五、一些有趣的尝试

现在是时候找点乐子了。在接下来的例子中，我们会尝试将传给write系统调用的字符串进行倒置。

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
using namespace std;


const int long_size = sizeof(long);
void reverse(char *str)
{   int i, j;
    char temp;
    for(i = 0, j = strlen(str) - 2; //跳过结尾的 '\0'
        i <= j; ++i, --j)
    {
        temp = str[i];
        str[i] = str[j];
        str[j] = temp;
    }
}
void getdata(pid_t child, long addr,
             char *str, int len)
{   char *laddr;
    int i, j;
    union u
     {
        long val;
        char chars[long_size];
    }data;
    i = 0;
    j = len / long_size;
    laddr = str;
    while(i < j)
     {
        data.val = ptrace(PTRACE_PEEKDATA,
                          child, addr + i * 8,
                          NULL);    //从响应的数据段取出数据
        memcpy(laddr, data.chars, long_size);
        ++i;
        laddr += long_size;
    }
    j = len % long_size;
    if(j != 0) 
    {
        data.val = ptrace(PTRACE_PEEKDATA,
                          child, addr + i * 8,
                          NULL);
        memcpy(laddr, data.chars, j);
    }
    str[len] = '\0';
}
void putdata(pid_t child, long addr,
             char *str, int len)
{   char *laddr;
    int i, j;
    union u 
    {
        long val;
        char chars[long_size];
    }data;
    i = 0;
    j = len / long_size;
    laddr = str;
    while(i < j)
     {
        memcpy(data.chars, laddr, long_size);
        ptrace(PTRACE_POKEDATA, child,
               addr + i * 8, data.val);
        ++i;
        laddr += long_size;
    }
    j = len % long_size;
    if(j != 0) 
    {
        memcpy(data.chars, laddr, j);
        ptrace(PTRACE_POKEDATA, child,
               addr + i * 8, data.val);
    }
}
int main()
{
    pid_t child;
    printf("******Get Arch Info Begin******\n");
    if ( -1 == system("cat /proc/version "))
    {
        exit(-1);
    }
    printf("******Get Arch Info End******\n");
    printf("sizeof(long) is [%d]\n", sizeof(long));
    child = fork();

    if(child == 0)
    {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execlp("/bin/ls", "ls", NULL);
        printf("child exit\n");
    }
    else
    {
        long orig_rax;
        long params[3];
        int status;
        char *str, *laddr;
        int toggle = 0;   //用来判断是进入还是离开系统调用
        while(1)
        {
            wait(&status);
            if(WIFEXITED(status))  //子进程是否退出
            {
                break;
            }
            orig_rax = ptrace(PTRACE_PEEKUSER,
                              child, 8 * ORIG_RAX,
                              NULL);   //获取系统调用号
            if(orig_rax == SYS_write)
            {
                if(toggle == 0)
                {
                    toggle = 1;
                    //获取传递给系统调用的参数:正如前文所介绍的X86_64而言，用户层的应用使用%rdi,%rsi,%rdx寄存器，依次保存系统调用的参数
                    params[0] = ptrace(PTRACE_PEEKUSER,
                                       child, 8 * RDI,
                                       NULL);
                    params[1] = ptrace(PTRACE_PEEKUSER,
                                       child, 8 * RSI,
                                       NULL);
                    params[2] = ptrace(PTRACE_PEEKUSER,
                                       child, 8 * RDX,
                                       NULL);
                    printf("[Enter SYS_write call] with regs.rdi [%ld], regs.rsi[0X%X], regs.rdx[%ld], regs.orig_rax[%ld]\n",
                           params[0], params[1],  params[2],orig_rax);
                    str = (char *)calloc((params[2]+1), sizeof(char));
                    getdata(child, params[1], str,
                            params[2]);
                    printf("Original str is: [%s]\n", str);
                    reverse(str);
                    putdata(child, params[1], str,
                            params[2]);
                }
                else
                {
                    toggle = 0;
                }
            }
            ptrace(PTRACE_SYSCALL, child, NULL, NULL);
        }
    }
    return 0;
}

程序的输出如下所示：

Get Arch Info Begin Linux version 2.6.32-431.el6.x86_64 ([email protected]) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Nov 22 03:15:09 UTC 2013 Get Arch Info End sizeof(long) is [8] [Enter SYS_write call] with regs.rdi [1], regs.rsi[0X994CE000], regs.rdx[54], regs.orig_rax[1] Original str is: [Eclipse hello.asm Reverse Reverse.cpp~ Test.cpp ] ppc.tseT ~ppc.esreveR esreveR msa.olleh espilcE [Enter SYS_write call] with regs.rdi [1], regs.rsi[0X994CE000], regs.rdx[70], regs.orig_rax[1] Original str is: [gnome-terminal.desktop hello.asm~ Reverse.cpp Test Test.cpp~ ] ~ppc.tseT tseT ppc.esreveR ~msa.olleh potksed.lanimret-emong

该示例程序在使用了前文已讨论的概念的同时，还使用了其它一些概念。我们使用PTRACE_POKEDATA作为参数来调用ptrace,用以改变 Tracee 数据段的值；使用PTRACE_PEEKDATA作为参数来调用ptrace用以获取 Tracee 数据段的值。

六、单步调试

ptrace提供了单步调试 Tracee 代码的功能。调用ptrace（使用PTRACE_SINGLESTEP作为参数）告知内核：在 Tracee 执行每条指令时，都被停止，并让 Tracer 获得控制权。下述例程展示了一种读取正在执行的指令的方式。
为了让读者更好的理解发生了什么事情，以一个简单的程序作为被调试的程序：

#include <stdio.h>
int main(void)
{
	printf("Hello World\n");
	return 0;
}

原文此处是使用一段汇编代码进行举例，但是对应汇编无法在x86_64上运行，因此此处使用相似的C程序代替。
我们使用下述代码对其进行单步调试：

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <string.h>
int main()
{   pid_t child;
    const int long_size = sizeof(long);
    child = fork();
    if(child == 0)
    {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execl("./hello", "hello", NULL);
    }
    else {
        int status;
        union u
        {
            long val;
            char chars[long_size];
        }data;
        struct user_regs_struct regs;
        int start = 0;
        long ins;
        while(1)
        {
            wait(&status);
            if(WIFEXITED(status))
            {
                break;
            }
            ptrace(PTRACE_GETREGS,
                   child, NULL, &regs);
            if(start == 1)
            {
                ins = ptrace(PTRACE_PEEKTEXT,
                             child, regs.rip,
                             NULL);
                printf("RIP: %lx Instruction "
                       "executed: %lx\n",
                       regs.rip, ins);
            }

            //%orig_rax保存了系统调用号
            if(regs.orig_rax == SYS_write)
            {
                start = 1;
                ptrace(PTRACE_SINGLESTEP, child,
                       NULL, NULL);
            }
            else
            {
                ptrace(PTRACE_SYSCALL, child,
                       NULL, NULL);
            }
        }
    }
    return 0;
}

执行后，程序的输出如下所示：
Hello World RIP: 3ac1adb790 Instruction executed: 3173fffff0013d48 RIP: 3ac1adb796 Instruction executed: e808ec8348c33173 RIP: 3ac1aad028 Instruction executed: e076fffff0003d48
程序中对于%RIP指针的用法，本人还不够了解，此处仅贴出相应代码（在x86_64上测试通过），具体的原理，还有待进一步查证。
倘若想要理解指令的意思，可能需要我们去查看相应的用户手册。要想对复杂的程序实现单步调试，则相求更周密的设计以及更复杂的代码。
以上即为本篇文章第一部分的内容。在下篇文章中，我们将了解如何插入断点，以及在正在运行的程序中注入一段代码。
上述所有的程序都在x86_64上测试通过。本人所学有限，对于文中翻译或是解释不当之处，希望各位批评指正，感谢。

参考与链接

Linux Hook 笔记:http://www.cnblogs.com/pannengzhi/p/5203467.html
What are the calling conventions for UNIX & Linux system calls on i386 and x86-64:https://stackoverflow.com/questions/2535989/what-are-the-calling-conventions-for-unix-linux-system-calls-on-i386-and-x86-6
将ptrace移植到64位机:https://stackoverflow.com/questions/22278858/ptrace-linux-user-h-no-such-file-or-directory
exeve:https://www.cnblogs.com/jxhd1/p/6706701.html
the wait(2) man page:https://linux.die.net/man/2/wait
Why is orig_eax provided in addition to eax?:https://stackoverflow.com/questions/6468896/why-is-orig-eax-provided-in-addition-to-eax

文档信息

发表日期：2018年10月14日
更多内容：Litost_Cheng的博客

注释

Tracer:追踪进程
Tracee:被追踪，被观察进程

暂无