The experience of using Tracepoint of the Linux kernel for the first time

I am not ashamed, not at all.

I mean I have worked for so many years to do things related to the Linux kernel, and I only used tracepoint for the first time last week.

This is not surprising, there are many things I don't know, for example, I have always emphasized that I can't program, and I can't use git.

Closer to home, if I haven't used tracepoint for so many years, what should I use instead?

If I debug the kernel or debug the kernel module, the most common method is to add a printk to the code (if it is a user mode program, I add a printf.), let me tell you the reason:

printk/printf does not need to install any other dependency packages.
Printk/printf is readily available, and there is no need to write too much code that is over-designed for reuse.
printk/printf is very friendly to people who don't know how to program.
…

Therefore, I love printk. I don’t like gdp/crash tool, kprobe/systemtap, and bpftrace. Of course, I know these things very well, but I think they are too complicated and need to remember too many grammars, just like a homeless person. Homeless people don't like to carry any burdens.

After so much talk, it has something to do with tracepoint.

It's related, because I found that adding a tracepoint to the place where a printk needs to be added is about the same amount of work, and the effect is better. Tracepoint is more like adding a HOOK, you can customize what you want to do outside, not just stick to printk to print a message.

Using tracepoint, if you just want to achieve the effect of printk, you can see the information you want to print in /sys/kernel/debug/tracing/trace. If you want to do something else, you can also use eBPF.

The part of Taishigong's metaphysics similar to "Historical Records" is at the end of this article. Now let's give some practical things.

Adding a tracepoint to the kernel code is as simple as needless to say, it just needs to recompile the kernel. Let me give a Howto for adding tracepoint to the module.

First give me the kernel module code that I want to add tracepoit:

#include <net/protocol.h>
#include <linux/ip.h>
#include <linux/udp.h>

#define IPPROTO_MYPROTO  123
int myproto_rcv(struct sk_buff *skb)
{
    
    
    struct udphdr *uh;
    struct iphdr *iph;

    iph = ip_hdr(skb);
    uh = udp_hdr(skb);

    printk("proto 123\n");

    kfree_skb(skb);
    return 0;
}

static const struct net_protocol myproto_protocol = {
    
    
    .handler = myproto_rcv,
    .no_policy = 1,
    .netns_ok = 1,
};

int init_module(void)
{
    
    
    int ret = 0;
    ret = inet_add_protocol(&myproto_protocol, IPPROTO_MYPROTO);
    if (ret) {
    
    
        printk("failed\n");
        return ret;
    }
    return 0;
}
void cleanup_module(void)
{
    
    
    inet_del_protocol(&myproto_protocol, IPPROTO_MYPROTO);
}

int init_module(void);
void cleanup_module(void);
MODULE_LICENSE("GPL");

A very realistic example, this code registers a four-layer protocol that is equal to TCP and UDP, but it is actually UDP, but the protocol number is changed to 123. Our expectation is that when such a protocol packet is received, printk will print it out. port.

I want to use tracepoint instead of printk, what should I do? Here is the method. Here is the new code:

/*
 * 注意：
 * 1. CREATE_TRACE_POINTS 必须define
 * 2. CREATE_TRACE_POINTS 必须在头文件之前define
 */

#define CREATE_TRACE_POINTS
#include "test_tp.h"

#include <net/protocol.h>
#include <linux/ip.h>
#include <linux/udp.h>

#define IPPROTO_MYPROTO  123
int myproto_rcv(struct sk_buff *skb)
{
    
    
    struct udphdr *uh;
    struct iphdr *iph;

    iph = ip_hdr(skb);
    uh = udp_hdr(skb);

    // 这里增加一个tracepoint点
    trace_myprot_port(uh->dest, uh->source);

    kfree_skb(skb);
    return 0;
}

static const struct net_protocol myproto_protocol = {
    
    
    .handler = myproto_rcv,
    .no_policy = 1,
    .netns_ok = 1,
};

int init_module(void)
{
    
    
    int ret = 0;
    ret = inet_add_protocol(&myproto_protocol, IPPROTO_MYPROTO);
    if (ret) {
    
    
        printk("failed\n");
        return ret;
    }
    return 0;
}
void cleanup_module(void)
{
    
    
    inet_del_protocol(&myproto_protocol, IPPROTO_MYPROTO);
}

int init_module(void);
void cleanup_module(void);
MODULE_LICENSE("GPL");

Added a header file reference and added a macro definition, nothing else has changed, is it very simple?

All metadata definitions related to tracepoint are in the header file:

// test_tp.h
#undef TRACE_SYSTEM
#define TRACE_SYSTEM myprot

#if !defined(_TEST_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TEST_TRACE_H
#include <linux/tracepoint.h>

// tracepoint的定义就在这里了。可用的参数都在这里，日后debugfs输出的就是按照下面这个TP_printk的格式来的。
// 如果eBPF来输出，依然要遵循这个格式取参数，下面有例子。
TRACE_EVENT(myprot_port,
    TP_PROTO(unsigned short dest, unsigned short source),
    TP_ARGS(dest, source),
    TP_STRUCT__entry(
        __field(unsigned short, dest)
        __field(unsigned short, source)
    ),
TP_fast_assign(
    __entry->dest = dest;
    __entry->source = source;
),

TP_printk("dest:%d, source:%d", __entry->dest, __entry->source)
);

#endif
/*
 * 需要添加以下这一坨
 * 由于模块不能修改内核头文件，因此我们需要在自己的目录放头文件，
 * TRACE_INCLUDE_PATH 必须重新定义，而不是仅仅在内核头文件目录寻找定义。
 *
 * 为了支持这个重定义，Makefile中必须包含：
 * CFLAGS_myprot.o = -I$(src)
 *
 */

#undef TRACE_INCLUDE_PATH
#define TRACE_INCLUDE_PATH .
#define TRACE_INCLUDE_FILE test_tp // 这就是该头文件的名字
#include <trace/define_trace.h>

Next is the Makefile, we only need to pay attention to the second line:

obj-m += test.o
# CFLAGS_xx.o 其中 xx 为内核模块的名字
CFLAGS_test.o = -DDEBUG -I$(src)

OK, compile the module and load it.

First we open the tracepoint:

echo 1 >/sys/kernel/debug/tracing/events/myprot/enable

At this point, send a message with a protocol of 123 through the raw socket, and we will see the output in debugfs:

cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1   #P:4
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| /     delay
#           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
#              | |         |   ||||      |         |
     ksoftirqd/1-16      [001] ..s. 174074.362291: myprot_port: dest:36895, source:53764

Of course, this is just a simulation of printk, this is just to unify printk in one place, it's not a big deal. What makes tracepoint more meaningful is that it can mount eBPF programs. Now we can try:

bpftrace -l|grep tracepoint.*myprot
tracepoint:myprot:myprot_port

We can simply debug with bpftrace:

#!/usr/bin/bpftrace

tracepoint:myprot:myprot_port
{
    
    
	$dest = args->dest;
	$source = args->source;
	printf("bpftrace:dest:%d  source:%d\n",
		($source & 0xff) << 8 | ($source & 0xff00) >> 8,
		($dest & 0xff) << 8 | ($dest & 0xff00) >> 8);
}

Run it, send out the package:

./test_bpf_tp.tp
Attaching 1 probe...
bpftrace:dest:1234  source:8080

The only thing to complain about is why there is no builtin for a commonly used interface like ntohs!

Compared with kprobe/kretprobe, what are the advantages and disadvantages of tracepoint? Why I don’t like kprobe? I don’t even like live patches, but that’s why.

The only disadvantage of tracepoint is that it is static. It needs to be statically inserted into a specific location of the source code by writing code, and then executed after recompilation to take effect. On the contrary, kprobe is dynamic, and it can HOOK at runtime. Dynamically attach to a specific location, but unfortunately, this is the only advantage of kprobe.

In front of anyone, I have revealed my dissatisfaction with kprobe. The reason is the same as my dissatisfaction with live patch. That is, in order to dynamically insert even a single printk, the code execution cost of maintaining the kprobe/live patch framework is too high!

You can trace the length of the code path that calls the new function of the live patch under the ftrace framework. You can trace how many extra things krpobe/kretprobe does to execute the handler. All I need is the handler, and the handler is just printk variables The value of is only, I don’t want to fight for such a small thing. If another spinlock is encountered on the path of calling the handler, it will be even more embarrassing. The new bottleneck introduced for debugging the performance bottleneck exceeds the original performance bottleneck itself. . On the performance optimization such fine movements, such uncertainty effect is very constraints!

I am a craftsman, and I have a hard time trusting kprobe and live patch that are completely uncontrollable in my opinion. Of course I don't care about the ridicule of those who can operate these tools proficiently, I have my own way.

If I want to patch a function, or if I want to insert a printk somewhere, I will do it manually. It is very simple. Just change the code to call myhandler at a specific location. When myhandler returns, re-execute the replaced instruction. This method is very simple and direct. I trust this method. When I find this method and can use it proficiently, in my opinion, there is no essential difference between tracepoint and kprobe.

Forget about kprobe, live patch, and forget about tracepoint may be better than remembering them.

Beckoning to stop a taxi may be lighter than Didi on your mobile phone. Similarly, you don’t need to worry about battery life and privacy issues when paying with paper money...

The leather shoes in Wenzhou, Zhejiang are wet, so they won’t get fat in the rain.

The experience of using Tracepoint of the Linux kernel for the first time

Guess you like