Learn some technology|Intercept system calls

1. What is a system call

系统调用 It is a functional function provided by the kernel to the application program. Since the application program generally runs in the  用户态user state process, there are many restrictions (such as not being able to perform I/O operations), so some functions must be performed by the kernel. The kernel is provided to the application layer  系统调用to complete some work that cannot be done in the user mode.

To put it bluntly, a system call is actually a function call, but it calls a kernel-mode function. But unlike ordinary function calls, system calls cannot be  call called using instructions, but need to be called using  软中断 instructions. In Linux systems, system calls are generally invoked using  int 0x80 instructions (x86) or  syscall instructions (x64).

Let's take  int 0x80 the instruction (x86) call method as an example to illustrate the principle of system call.

2. System call principle

In the Linux kernel,  sys_call_table an array is used to store all system calls, and sys_call_table each element of the array represents the entry of a system call, which is defined as follows:

typedef void (*sys_call_ptr_t)(void);

const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
    ...
};

When the application program needs to call a system call, it first needs to place the number of the system call to be called (that is, the  sys_call_table index of the array where the system call is located) into  eax the register, and then  int 0x80 trigger the call  0x80 number soft interrupt service by using the instruction.

0x80 No. soft interrupt service, the system call will be called through the following code, as shown below:

...
call *sys_call_table(,%eax,8)
...

The above code will  eax call the correct system call according to the value in the register, and the process is shown in the following figure:

4631c42a91bd6217d3a64520f0f77970.jpeg

3. System call interception

After understanding the principle of system calls, it is very simple to intercept system calls. So how to intercept it?

The method is: we only need to  sys_call_table replace the system call of the array with the function entry written by ourselves. For example, if we want to intercept  write() system calls, we only need to  sys_call_table replace the first element of the array with the function we wrote (because  the index of write() the system call in  sys_call_table the array is 1).

To modify  sys_call_table the value of an array element, proceed as follows:

1. Get  sys_call_table the address of the array

To modify  sys_call_table the value of an array element, it generally needs to be done through a kernel module. Because the user mode program cannot rewrite the data in the kernel mode due to the memory protection mechanism. The kernel module runs in kernel mode, so it can skip this limitation.

To modify  sys_call_table the value of an array element, first obtain  sys_call_table the virtual memory address of the array (since  sys_call_table the variable is not an exported symbol, the kernel module cannot use it directly).

There are two ways to get  sys_call_table the virtual memory address of an array:

First method:  System.map read from a file

System.map It is a kernel symbol table, which contains the variable names and function name addresses in the kernel, and is automatically generated every time the kernel is compiled. To obtain  sys_call_table the virtual address of an array use the following command:

sudo cat /boot/System.map-`uname -r` | grep sys_call_table

The result is shown in the figure below:

3f47430bd893f8206b239168ee2e9dcd.png


As can be seen from the figure above, sys_call_table the virtual address of the array is: ffffffff818001c0.

The second method:  kallsyms_lookup_name() get through the function

The method of reading from  System.map a file is not very elegant, so the kernel provides a  kallsyms_lookup_name() function called .

kallsyms_lookup_name() The use of the function is very simple, you only need to pass in the variable name to get the virtual memory address, as shown in the following code:

#include <linux/kallsyms.h>

void func() {
    ...
    unsigned long *sys_call_table;

    // 获取 sys_call_table 的虚拟内存地址
    sys_call_table = (unsigned long *)kallsyms_lookup_name("sys_call_table");
    ...
}

2. Set the sys_call_table array to be writable

sys_call_table Is it possible to modify the value of its elements after obtaining  the virtual address of the array? not that simple.

Since  sys_call_table the array is in a write-protected area, its contents cannot be modified directly. But there are two ways to temporarily close the write protection, as follows:

First method:  cr0 set bit 16 of the register to zero

cr0 The 16th bit of the control register is the write protection bit, if it is set to zero, it allows the super authority to write data into the kernel. In this way, we can   clear the 16th bit of the register  sys_call_table before modifying the value of the array  , so that it can modify  the content of the array. When the modification is completed, restore that bit again.cr0sys_call_table

code show as below:

/*
 * 设置cr0寄存器的第16位为0
 */
unsigned int clear_and_return_cr0(void)
{
    unsigned int cr0 = 0;
    unsigned int ret;

    /* 将cr0寄存器的值移动到rax寄存器中,同时输出到cr0变量中 */
    asm volatile ("movq %%cr0, %%rax" : "=a"(cr0));

    ret = cr0;
    cr0 &= 0xfffeffff;  /* 将cr0变量值中的第16位清0,将修改后的值写入cr0寄存器 */

    /* 读取cr0的值到rax寄存器,再将rax寄存器的值放入cr0中 */
    asm volatile ("movq %%rax, %%cr0" :: "a"(cr0));

    return ret;
}

/*
 * 还原cr0寄存器的值为val
 */
void setback_cr0(unsigned int val)
{
    asm volatile ("movq %%rax, %%cr0" :: "a"(val));
}

The second method: set the read and write attributes of the page table entry corresponding to the virtual address

Since  x86 CPU the memory protection mechanism is implemented through the virtual memory page table (you can refer to this article: talk about memory mapping ), so we only need to  sys_call_table clear the protection flag in the virtual memory page table entry of the array, the code is as follows:

/*
 * 把虚拟内存地址设置为可写
 */
int make_rw(unsigned long address)
{
    unsigned int level;

    //查找虚拟地址所在的页表地址
    pte_t *pte = lookup_address(address, &level);

    if (pte->pte & ~_PAGE_RW)  //设置页表读写属性
        pte->pte |=  _PAGE_RW;

    return 0;
}

/*
 * 把虚拟内存地址设置为只读
 */
int make_ro(unsigned long address)
{
    unsigned int level;

    pte_t *pte = lookup_address(address, &level);
    pte->pte &= ~_PAGE_RW;  //设置只读属性

    return 0;
}

3. Modify  sys_call_table the contents of the array

All is ready except for the opportunity. We have finished all the preparatory work before, now we only need to  sys_call_table replace the system call entry in the array with the function entry we wrote.

We can modify  sys_call_table the value of the array in the kernel module initialization function, and then change it back to the original value in the kernel module exit function. The complete code is as follows:

/*
 * File: syscall.c
 */

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/unistd.h>
#include <linux/time.h>
#include <asm/uaccess.h>
#include <linux/sched.h>
#include <linux/kallsyms.h>

unsigned long *sys_call_table;

unsigned int clear_and_return_cr0(void);
void setback_cr0(unsigned int val);
static int sys_hackcall(void);

unsigned long *sys_call_table = 0;

/* 定义一个函数指针,用来保存原来的系统调用*/
static int (*orig_syscall_saved)(void);

/*
 * 设置cr0寄存器的第16位为0
 */
unsigned int clear_and_return_cr0(void)
{
    unsigned int cr0 = 0;
    unsigned int ret;

    /* 将cr0寄存器的值移动到rax寄存器中,同时输出到cr0变量中 */
    asm volatile ("movq %%cr0, %%rax" : "=a"(cr0));

    ret = cr0;
    cr0 &= 0xfffeffff;  /* 将cr0变量值中的第16位清0,将修改后的值写入cr0寄存器 */

    /* 读取cr0的值到rax寄存器,再将rax寄存器的值放入cr0中 */
    asm volatile ("movq %%rax, %%cr0" :: "a"(cr0));

    return ret;
}

/*
 * 还原cr0寄存器的值为val
 */
void setback_cr0(unsigned int val)
{
    asm volatile ("movq %%rax, %%cr0" :: "a"(val));
}

/*
 * 自己编写的系统调用函数
 */
static int sys_hackcall(void)
{
    printk("Hack syscall is successful!!!\n");
    return 0;
}

/*
 * 模块的初始化函数,模块的入口函数,加载模块时调用
 */
static int __init init_hack_module(void)
{
    int orig_cr0;

    printk("Hack syscall is starting...\n");

    /* 获取 sys_call_table 虚拟内存地址 */
    sys_call_table = (unsigned long *)kallsyms_lookup_name("sys_call_table");

    /* 保存原始系统调用 */
    orig_syscall_saved = (int(*)(void))(sys_call_table[__NR_perf_event_open]);

    orig_cr0 = clear_and_return_cr0(); /* 设置cr0寄存器的第16位为0 */
    sys_call_table[__NR_perf_event_open] = (unsigned long)&sys_hackcall; /* 替换成我们编写的函数 */
    setback_cr0(orig_cr0); /* 还原cr0寄存器的值 */

    return 0;
}

/*
 * 模块退出函数,卸载模块时调用
 */
static void __exit exit_hack_module(void)
{
    int orig_cr0;

    orig_cr0 = clear_and_return_cr0();
    sys_call_table[__NR_perf_event_open] = (unsigned long)orig_syscall_saved; /* 设置为原来的系统调用 */
    setback_cr0(orig_cr0);

    printk("Hack syscall is exited....\n");
}

module_init(init_hack_module);
module_exit(exit_hack_module);
MODULE_LICENSE("GPL");

In the above code, we  perf_event_open() replaced the system call with our own implemented function.

Note: It is best to use unpopular system calls when testing, otherwise it may cause the system to crash.

4. Write the Makefile

For the convenience of compiling, we write a Makefile to compile, as follows:

obj-m:=syscall.o
PWD:= $(shell pwd)
KERNELDIR:= /lib/modules/$(shell uname -r)/build
EXTRA_CFLAGS= -O0

all:
    make -C $(KERNELDIR)  M=$(PWD) modules
clean:
    make -C $(KERNELDIR) M=$(PWD) clean

Pay attention to adding  EXTRA_CFLAGS= -O0 the option to turn off gcc optimization to avoid errors in inserting modules.

5. Test procedure

Now, we write a test program to test whether the system call interception is successful, the code is as follows:

#include <syscall.h>
#include <stdio.h>
#include <unistd.h>

int main(void)
{
    unsigned long ret = syscall(__NR_perf_event_open, NULL, 0, 0, 0, 0);
    printf("%d\n", (int)ret);
    return 0;
}

6. Running results

Step 1: Install the interception kernel module

Install the kernel module with the following command:

root# insmod syscall.ko

Then  dmesg observe the system log through the command, you can see the following output:

...
[  133.564652] Hack syscall is starting...

This shows that our kernel module was installed successfully.

Step 2: Run the test program

Next, we run the test program we just wrote, and then observe the system log, the output is as follows:

...
[  532.243714] Hack syscall is successful!!!

This shows that the interception system call was successful.

35bd50e9c9925e141536d93b7ee282be.jpeg

3648c956052650861f6c8be91ed3de5d.jpeg

c25e5f1bd4dc74beeeb51864fd5f39e7.jpeg

Guess you like

Origin blog.csdn.net/weiqifa0/article/details/130023138