How to replace a Linux kernel function implementation - hot patch Principle

Last night swore up. Then the craft will not write binary hook, and today there are users consult the technical details, and finally could not resist ...

In order not to violate the oath even casually say, today do not write binary hook, today written in C language, binary just dip the edges!

Look at the subject, alternative implementations of the Linux kernel function , what? Is not that kpatch thing! That is what we call the hot patch . We do hot patches for the kernel when no one assembly to write it, no one to spell binary logic script it, we are generally directly modify the C code of the kernel function, then form a patch file, and then ... and then ... Documents read kpatch of it.

In this paper I will describe the principle of thermal patch, rather than how to use kpatch of Howto, more than on any source kpatch technical analysis.

Bugfix hot patch to a practical example of the 3.10 kernel begin our story.

In this example, we modified the realization of set_next_buddy:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
...
@@ -4537,8 +4540,11 @@ static void set_next_buddy(struct sched_entity *se)
    if (entity_is_task(se) && unlikely(task_of(se)->policy == SCHED_IDLE))
        return;

-   for_each_sched_entity(se)
+   for_each_sched_entity(se) {
+       if (!se->on_rq)
+           return;
        cfs_rq_of(se)->next = se;
+   }
 }

It seems, in order to Fix a known Bug, we need to add a few lines of code set_next_buddy function, it is clear that this is easy.

This adds a few lines of code later, they formed a new set_next_buddy function, in order to allow new functions run up, and now we are faced with three problems:

  • How we can use this new function set_next_buddy compiled into binary?
  • How do we function this new set_next_buddy binary code injected into the running kernel?
  • How do we replace the old with the new function set_next_buddy set_next_buddy binary?

We see a problem.

First of all, the first question is very easy to solve.

We modified a C file kernel / sched / fair.c, in order to solve the compile-time dependencies, patch file you want to modify just after the formation into the running kernel source tree can be compiled, and the like by objdump mechanism, we can pull out of the compiled binary form set_next_buddy a obj file, and then make up a ko is not a difficult thing. This forms a core module, similar kpatch-y8u59dkv.ko

Next, look at the second question, ko file how to form the first question in set_next_buddy binary injected into the core of it?

It is not difficult, kpatch module loading mechanism is doing that. Hot patch into the kernel there will be two set_next_buddy functions:

crash> dis set_next_buddy
dis: set_next_buddy: duplicate text symbols found:
// 老的set_next_buddy
ffffffff810b9450 (t) set_next_buddy /usr/src/debug/kernel-3.10.0/linux-3.10.0.x86_64/kernel/sched/fair.c: 4536
// 新的set_next_buddy
ffffffffa0382410 (t) set_next_buddy [kpatch_y8u59dkv]

By the third question, a little trouble. How does the new set_next_buddy binary replace the old binary set_next_buddy it?

Obviously, the cover can not be used, because the layout of the kernel function is very compact and continuous, when each function in the kernel space formed on the set, if the new function much larger than the old function, it will overwrite other cross-border The function.

Using my previous article described the binary hook technology is feasible, such as the following article methods:
https://blog.csdn.net/dog250/article/details/105206753
binary diff, then compact poke need to modify the place this is undoubtedly a coup! However, this method is not elegant, full of clever but useless, its biggest problem is the inverse manager.

The most normal method is to use hook ftrace, i.e. modified function at the beginning of the old byte ftrace stub 5, which will be modified to instruct the "jmp / call the new function", and the old stack frame skip function in the stub functions. In this way completely bypass the old function.

We look at two set_next_buddy of the above-mentioned binary:

// 老的set_next_buddy:
crash> dis ffffffff810b9450 4
// 注意,老函数的ftrace stub已经被替换
0xffffffff810b9450 <set_next_buddy>:    callq  0xffffffff81646df0 <ftrace_regs_caller>
// 后面这些如何被绕过呢?ftrace_regs_caller返回后如何被skip掉呢?这需要平衡堆栈的技巧!
// 后面通过实例来讲如何平衡堆栈,绕过老的函数。
0xffffffff810b9455 <set_next_buddy+5>:  push   %rbp
0xffffffff810b9456 <set_next_buddy+6>:  cmpq   $0x0,0x150(%rdi)
0xffffffff810b945e <set_next_buddy+14>: mov    %rsp,%rbp
// 新的set_next_buddy:
crash> dis ffffffffa0382410 4
// 新函数则是ftrace_regs_caller最终要调用的函数
0xffffffffa0382410 <set_next_buddy>:    nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa0382415 <set_next_buddy+5>:  push   %rbp
0xffffffffa0382416 <set_next_buddy+6>:  cmpq   $0x0,0x150(%rdi)
0xffffffffa038241e <set_next_buddy+14>: mov    %rsp,%rbp

This is the principle of a heat patch.

This article here is paper talk, this would be the end of embarrassment and regret, then I use a practical example to illustrate this. This example is very simple, just a few arrangements and will be able to run up and see the effect.

I hate to compare source code analysis, so I will not go commuting comment ftrace_regs_caller source code, I use my own way to achieve similar needs, and much simpler, which is very beneficial to our workers understand the nature of things.

My example will not go to patch the kernel function both of my examples patch is a simple kernel module I wrote in the function, the module code is as follows:

#include <linux/module.h>
#include <linux/proc_fs.h>

// 下面的sample_read就是我将要patch的函数
static ssize_t sample_read(struct file *file, char __user *ubuf, size_t count, loff_t *ppos)
{
	int n = 0;
	char kb[16];

	if (*ppos != 0) {
		return 0;
	}

	n = sprintf(kb, "%d\n", 1234);
	memcpy(ubuf, kb, n);
	*ppos += n;
	return n;
}

static struct file_operations sample_ops = {
	.owner = THIS_MODULE,
	.read = sample_read,
};

static struct proc_dir_entry *ent;
static int __init sample_init(void)
{
	ent = proc_create("test", 0660, NULL, &sample_ops);
	if (!ent)
		return -1;

	return 0;
}

static void __exit sample_exit(void)
{
	proc_remove(ent);
}

module_init(sample_init);
module_exit(sample_exit);
MODULE_LICENSE("GPL");

We loaded it, then go read about the / proc / test:

[root@localhost test]# insmod sample.ko
[root@localhost test]# cat /proc/test
1234

OK, everything to do so. At this point, we look at the front of sample_read 5 bytes:

crash> dis sample_read 1
0xffffffffa038c000 <sample_read>:       nopl   0x0(%rax,%rax,1) [FTRACE NOP]

Come in already loaded sample.ko premise, we patch it. My goal is to, Fix out sample_read function so that it returns 4321 instead of 1234.

The following is the full code points are in a comment:

// hijack.c
#include <linux/module.h>
#include <linux/kallsyms.h>
#include <linux/cpu.h>

char *stub;
char *addr = NULL;

// 可以用JMP模式,也可以用CALL模式
//#define JMP	1

// 和sample模块里同名的sample_read函数
static ssize_t sample_read(struct file *file, char __user *ubuf, size_t count, loff_t *ppos)
{
	int n = 0;
	char kb[16];

	if (*ppos != 0) {
		return 0;
	}
	// 这里我们把1234的输出给fix成4321的输出
	n = sprintf(kb, "%d\n", 4321);
	memcpy(ubuf, kb, n);
	*ppos += n;
	return n;
}

// hijack_stub的作用就类似于ftrace kpatch里的ftrace_regs_caller
static ssize_t hijack_stub(struct file *file, char __user *ubuf, size_t count, loff_t *ppos)
{
	// 用nop占位,加上C编译器自动生成的函数header代码,这么大的函数来容纳stub应该够了。
	asm ("nop; nop; nop; nop; nop; nop; nop; nop;");
	return 0;
}

#define FTRACE_SIZE   	5
#define POKE_OFFSET		0
#define POKE_LENGTH		5
#define SKIP_LENGTH		8

static unsigned long *(*_mod_find_symname)(struct module *mod, const char *name);
static void *(*_text_poke_smp)(void *addr, const void *opcode, size_t len);
static struct mutex *_text_mutex;
unsigned char saved_inst[POKE_LENGTH];
struct module *mod;

static int __init hotfix_init(void)
{
	unsigned char jmp_call[POKE_LENGTH];
	unsigned char e8_skip_stack[SKIP_LENGTH];
	s32 offset, i = 5;

	mod = find_module("sample");
	if (!mod) {
		printk("没加载sample模块,你要patch个啥?\n");
		return -1;
	}
	_mod_find_symname = (void *)kallsyms_lookup_name("mod_find_symname");
	if (!_mod_find_symname) {
		printk("还没开始,就已经结束。");
		return -1;
	}
	addr = (void *)_mod_find_symname(mod, "sample_read");
	if (!addr) {
		printk("一切还没有准备好!请先加载sample模块。\n");
		return -1;
	}
	_text_poke_smp = (void *)kallsyms_lookup_name("text_poke_smp");
	_text_mutex = (void *)kallsyms_lookup_name("text_mutex");
	if (!_text_poke_smp || !_text_mutex) {
		printk("还没开始,就已经结束。");
		return -1;
	}

	stub = (void *)hijack_stub;

	offset = (s32)((long)sample_read - (long)stub - FTRACE_SIZE);

	// 下面的代码就是stub函数的最终填充,它类似于ftrace_regs_caller的作用!
	e8_skip_stack[0] = 0xe8;
	(*(s32 *)(&e8_skip_stack[1])) = offset;
#ifndef JMP	// 如果是call模式,则需要手工平衡堆栈,跳过原始函数的栈帧
	e8_skip_stack[i++] = 0x41; // pop %r11
	e8_skip_stack[i++] = 0x5b; // r11寄存器为临时使用寄存器,遵循调用者自行保护原则
#endif
	e8_skip_stack[i++] = 0xc3;
	_text_poke_smp(&stub[0], e8_skip_stack, SKIP_LENGTH);

	offset = (s32)((long)stub - (long)addr - FTRACE_SIZE);

	memcpy(&saved_inst[0], addr, POKE_LENGTH);
#ifndef JMP
	jmp_call[0] = 0xe8;
#else
	jmp_call[0] = 0xe9;
#endif
	(*(s32 *)(&jmp_call[1])) = offset;
	get_online_cpus();
	mutex_lock(_text_mutex);
	_text_poke_smp(&addr[POKE_OFFSET], jmp_call, POKE_LENGTH);
	mutex_unlock(_text_mutex);
	put_online_cpus();

	return 0;
}

static void __exit hotfix_exit(void)
{
	mod = find_module("sample");
	if (!mod) {
		printk("一切已经结束!\n");
		return;
	}
	addr = (void *)_mod_find_symname(mod, "sample_read");
	if (!addr) {
		printk("一切已经结束!\n");
		return;
	}
	get_online_cpus();
	mutex_lock(_text_mutex);
	_text_poke_smp(&addr[POKE_OFFSET], &saved_inst[0], POKE_LENGTH);
	mutex_unlock(_text_mutex);
	put_online_cpus();
}

module_init(hotfix_init);
module_exit(hotfix_exit);
MODULE_LICENSE("GPL");

OK, now we loaded it, and then re-read about the / proc / test:

[root@localhost test]# insmod ./hijack.ko
[root@localhost test]# cat /proc/test
4321

It can be seen already patch successfully. In the end what happened? We look disassemble:

crash> dis sample_read
dis: sample_read: duplicate text symbols found:
ffffffffa039d000 (t) sample_read [sample]
ffffffffa03a2020 (t) sample_read [hijack]
crash>

Ah, there have been two sample_read function symbols of the same name, sample module where the function is old, and there is a function module hijack the new fix. We look at each:

// 先看老的sample_read,它的ftrace stub已经被改成了call hijack_stub
crash> dis ffffffffa039d000 1
0xffffffffa039d000 <sample_read>:       callq  0xffffffffa03a2000 <hijack_stub>
// 再看新的sample_read,它就是最终被执行的函数
crash> dis ffffffffa03a2020 1
0xffffffffa03a2020 <sample_read>:       nopl   0x0(%rax,%rax,1) [FTRACE NOP]
crash>

When the new sample_read finished, after returning hijack_stub, if CALL mode where the need to skip off the old stack frame sample_read function, so a pop% r11 to complete it, directly after ret can, if JMP mode, direct ret, do not need to skip stack frame, because the JMP instruction would not push.

Well, this is what I want to say the story. To put it plainly, we described herein is a craft still alive, I just want to use the most simple way that everyone can understand, to show to achieve relatively complex thermal patch of principle. I think that the workers need to have a deep knowledge of the underlying principles.

Managers also love to eat chili, but not quite, but it is obvious that the manager can not sprinkle water.


Wenzhou shoes wet, rain water not fat!

发布了1583 篇原创文章 · 获赞 5118 · 访问量 1114万+

Guess you like

Origin blog.csdn.net/dog250/article/details/105254739