”
Series Catalog
1. Doubt
2. vfsstat_bpf__open
3. bpf_object__load_skeleton loads bpf
4. bpf_object__attach_skeleton attaches bpf program
5. Trigger the bpf program
6. Summary
The process of injecting eBPF programs into the kernel will now take you to study (Part 1)
3. bpf_object__load_skeleton loads bpf
vfsstat_bpf__load calls bpf_object__load_skeleton,
bpf_object__load_skeleton will perform bfp loading bpf_object__load, initialize map mmaped
3.1 bpf_object__load
Load bpf program and maps, here we only talk about one line bpf_object__load_progs load bpf program
3.2 bpf_object__load_progs loads bpf program
1) Check whether it needs to be skipped. In this example, for unsupported functions, such as fentry_vfs_write, bpf_program__set_autoload will be used to skip.
What is set is the load label (you can see that the setting does not load a certain bpf function, it needs to be set after open and before load)
2) Load the bpf program bpf_object_load_prog and put it into instances.fds (fd of bpf-prog)
3.3 bpf_object_load_prog loads a single bpf function
This bpf program refers to a single bpf function
1) Create a new prog->instances.fds array
2) Load the bpf program bpf_object_load_prog_instance, if successful, it will return the fd of bpf-prog
3.4 bpf_object_load_prog_instance loader instance
1) Initialize load_attr (bpf_prog_load_opts) to load the parameter structure of the bpf program
2) Call the bpf_prog_load function to load the bpf program
=> specific process external/libbpf/src/bpf.c => bpf_prog_load_v0_6_0 -> sys_bpf_prog_load -> BPF_PROG_LOAD -> syscall(__NR_bpf
The actual kernel function called is bpf_prog_load(kernel_platform/common/kernel/bpf/syscall.c)
3.5 bpf_prog_load system call
1) When entering, various permissions will be checked first, such as CAP_BPF, CAP_SYS_ADMIN, CAP_PERFMON, and CAP_NET_ADMIN permission checks.
2) bpf_prog_alloc creates a new bfp program, and at the same time sets jit_requested = 1 of bpf_prog
3) bpf check bpf_check
4) bpf_prog_select_runtime will compile the bfp program prog in real time (jit), modify the stack pointer at the same time, save the pointer, register, etc. before entering
5) Create a new bpf_ksym and add it to bpf_kallsyms
6) Create fd information of bpf-prog (bpf_prog_new_fd)
3.6 bpf_prog_select_runtime compiles the bfp program prog in real time
The main call is the bpf_int_jit_compile function
3.7 bpf_int_jit_compile just-in-time compilation
1) Call bpf_jit_blind_insn to blind the immediate value in the eBPF instruction
2) When executing for the first time, verify whether build_prologue (stack preservation), build_body (filling bpf instructions), build_epilogue (stack restoration), etc. can be executed normally,
Since ctx.image = NULL, the instructions submitted by emit will not be set to ctx.image, only the ctx->idx++ of the instruction;
3) Apply for the memory of bpf_binary_header *hdr according to the extra size of ctx.idx (number of instructions) and extable_size above, and set the storage location to ctx.image
4) Instruction ctx->idx is cleared, re-call build_prologue (stack preservation), build_body (filling bpf instruction), build_epilogue (stack restoration), the error reported here is the bpf function instruction
5) Call flush_icache_range to refresh the instruction cache in the address range from bpf_binary_header *header to ctx.image + ctx.idx (the old data is flushed)
6) After jit compilation, prog->bpf_func is set, and set jited = 1 (meaning that the function is compiled in real time)
7) tmp_blinded = true means that the blinding command is successful, and then the original bpf program orig_prog will be released here (there is already a new bpf program prog)
1. First introduce the registers of bpf. Currently, bpf supports r0-r15, a total of 16 registers:
1) r0 is the register for the return value
2) r1-r5 are the parameters of the ebpf program
3) Registers reserved by r6-r9 for caller save registers
4) r10 read-only frame stack register, used to access the stack
5) The 3 registers behind the fp register (r10), r12, r13, and r15 registers are all used for bpf jit instant compilation
6) The r14 register is used for tail calls
7) BPF_REG_AX = r11 register is a temporary register for blinding
2. build_prologue saves the original stack information
The operation is as follows:
1) Save A64_FP (stack top pointer), A64_LR (connection register)
2) Change the stack top register A64_FP = A64_SP (if it is now in the program current A64_SP)
3) Save the stack register before calling bfp, r6/r7/r8/r9/fp/tcc, and update fp = A64_SP (now at the position of BPF fp register)
4) Next, the depth of stack_depth is the content of the bpf program, and stack_size is the current SP pointer
(SP is also called the stack register (also known as the bottom pointer of the stack), which is used to store the data to be executed)
=> The general structure is as follows:
3. The emit function, each execution will store the instruction in ctx->image
4. Instructions for instant compilation of bpf programs
Traverse the bpf program (function such as kprobe_vfs_read) instruction set prog->insnsi[], call build_insn to compile the instruction jit
5. Restore stack build_epilogue
1) The current pointer minus the stack_size will be the place of the BPF fp register (the end of the register where we reported the error in the previous build_prologue)
2) Pop up the registers we saved just now, tcc, r9/r8/r7/r6, FP/LR
6. Also introduce the function of bpf_jit_binary_alloc to allocate bpf_binary_header memory:
Since the allocated address is at least 128 more, the starting address image_ptr of the instruction set after storing bpf just-in-time compilation is random
3.8 bpf_prog_new_fd returns fd information to the application bpf-prog
Create an fd named "bpf-prog" for the application
3.9 load summary
load mainly loads maps and compiles bpf functions on the fly (interacts with the kernel), and returns bpf-prog fd to the application
4. bpf_object__attach_skeleton attaches bpf program
The call is bpf_program__attach
This operation is attach_fn. As mentioned earlier, you can find attach_kprobe when looking for it through sec_name = 'kprobe/vfs_read' (such as chapter 2.8)
4.1 attach_kprobe
1) Determine whether it is kprobe (entering the probe) or kretprobe (returning the probe)
2) Call bpf_program__attach_kprobe_opts
4.2 bpf_program__attach_kprobe_opts
1) perf_event_open_probe finds the kprobe kernel function func_name = "vfs_read", and creates the corresponding perf_event, registers register_kprobe
2) bpf_program__attach_perf_event_opts Above we have found the function address corresponding to the "vfs_read" kernel, which is associated with the fd of perf_event (such as pfd here)
4.3 perf_event_open_probe
1. Construct the perf_event_attr structure to pass data to the kernel. For example, ret probe will additionally set attr.config |= 1,
The attr.type transfer is the kprobe type (6), and the attr.config1 transfer is the attach function name "vfs_read"
2. Then call the kernel function sys_perf_event_open
4.4 sys_perf_event_open
1. perf_event_alloc will find the corresponding kernel function, such as vfs_read in this example, and insert the @BRK64_OPCODE_KPROBES instruction (abnormal interrupt), and then enable the kprobe.
This is just to insert an interrupt, and the execution function has not been added (the bpf_program__attach_perf_event_opts of libbpf will put the binary instructions of the bpf program into the interrupt execution function)
2. Create event_file based on perf_event event, associate perf_event with fd("[perf_event]")
4.4.1 perf_event_alloc
Create and initialize perf_event *event, then call perf_init_event
4.4.2 perf_init_event
1、perf_init_event
Found through start_kernel->perf_event_init->perf_tp_register->perf_pmu_register(&perf_kprobe, "kprobe", -1);
Registered pmu_idr (type = 6, this is due to the type = -1 passed in by perf_pmu_register, so an unused id will be found starting from PERF_TYPE_MAX = 6),
That is pmu = perf_kprobe.
Then call perf_try_init_event
2. The event_init called by perf_try_init_event is perf_kprobe_event_init
3、perf_kprobe_event_init
Determine whether to ret probe according to whether attr.config is 1 (set in Chapter 4.3)
Call perf_kprobe_init
4、perf_kprobe_init
1) Obtain the function name func("vfs_read") through config1(kprobe_func)
2) create_local_trace_kprobe finds the kernel address of the corresponding function, and registers kprobe. (The interrupt instruction BRK64_OPCODE_KPROBES_SS for single-step debugging is inserted when registering)
3) perf_trace_event_init enables kprobe, and inserts the BRK64_OPCODE_KPROBES instruction into the corresponding position
4.4.3 create_local_trace_kprobe
1) alloc_trace_kprobe assigns trace kprobe, and sets the function to be processed when the kprobe interrupt is triggered, including pre_handler (processing when entering the kprobe function), handler (ret probe processing)
2) init_trace_event_call sets the registration function call->class->reg of kprobe
3) __register_trace_kprobe registers the kprobe function
1. alloc_trace_kprobe allocates trace kprobe
1) Set the function symbol name symbol_name of kprobe (here is vfs_read)
2) Set kprobe's interrupt handler function pre_handler = kprobe_dispatcher, return handler = kretprobe_dispatcher
3) Initialize tp->event->call
2、init_trace_event_call
1) Set the registration function kprobe_register of kprobe
2) Mark call->flags is TRACE_EVENT_FL_KPROBE (this event is a kprobe type)
3、__register_trace_kprobe
1) Set the flags of kprobe, the default is KPROBE_FLAG_DISABLED
2) Register kprobe, register_kprobe
4、register_kprobe
1) Use kprobe_addr to traverse /proc/kallsyms to find the symbol address of the function ("vfs_read")
2) If the function allows single-step debugging, a single-step interrupt instruction BRK64_OPCODE_KPROBES_SS will be inserted, and when a single-step exception is triggered, it will enter the exception handling function
(prepare_kprobe->arch_prepare_kprobe->arch_prepare_ss_slot)
3) Since kprobe is still disabled at this time, arm_kprobe will not be called
4.4.4 perf_trace_event_init
Here is mainly the registration of trace event perf_trace_event_reg
1、perf_trace_event_reg
In the second point init_trace_event_call of chapter 4.4.2, the reg function of kprobe is set to kprobe_register
Here tp_event->class->reg calls kprobe_register
2、kprobe_register
The type passed is TRACE_REG_PERF_REGISTER, so enable_trace_kprobe is run
3、enable_trace_kprobe
1) Set the flag of TP_FLAG_PROFILE
2) Call __enable_trace_kprobe to enable enable kprobe
4、__enable_trace_kprobe
Determine whether kprobe has been registered, whether KPROBE_FLAG_GONE is set (remove the flag of kprobe)
5、enable_kprobe
Enable kprobe, if kprobe is disabled, remove the disable (KPROBE_FLAG_DISABLED) label, and the removed kprobe (KPROBE_FLAG_GONE) will no longer be enabled
6、arm_kprobe
Lock text_mutex call __arm_kprobe
7、__arm_kprobe/arch_arm_kprobe
The symbol address p->addr corresponding to the function "vfs_read" has been found in register_kprobe,
arch_arm_kprobe will insert the BRK64_OPCODE_KPROBES instruction at this address,
In this way, after the blk interrupt exception is triggered, the interrupt exception handling function is first entered.
4.5 bpf_program__attach_perf_event_opts
bpf_program__attach_kprobe_opts calls perf_event_open_probe for
After the operation of perf_event_open (find the corresponding kernel function, create and enable kprobe and insert blk exception interrupt), the next step is to call bpf_program__attach_perf_event_opts to inject the bpf program into the exception handling function of kprobe.
1) Find the bpf-prog fd obtained by bpf_prog_load (which contains the JIT-compiled instruction set of the kernel of the bpf program)
2) Initialize bpf_link_perf (the bridge between bpf program and bpf performance event)
3) bpf_link_create injects the bpf program into the prog_array that the kprobe/uprobe exception handling function will execute
4) Enable the performance event obtained by perf_event_open_probe
We mainly focus on bpf_link_create to see how the bpf program is injected into the kernel function
4.5.1 bpf_link_create
Use the link_create in the union bpf_attr to pass parameters, and call the bpf system call sys_bpf_fd(BPF_LINK_CREATE
(The bpf system call is passed through kernel_platform\common\kernel\bpf\syscall.c
__sys_bpf, link_create is called here
)
4.5.2 link_create
system call link_create
4.5.3 bpf_perf_link_attach
1) Find the corresponding perf_event through the fd of anon_inode:[perf_event]
2) Create anon_inode: fd of bpf_link (perf_file of bpf_perf_link is perf_event,
bpf_perf_link->link->prog is a bpf program, bpf_perf_link->link itself is associated with anon_inode:bpf_link)
3) perf_event_set_bpf_prog associates perf_event *event with prog (bpf program)
4.5.4 perf_event_set_bpf_prog
Both kprobe and uprobe go here, and the last call is
perf_event_attach_bpf_prog attaches the bpf program to perf_event
4.5.5 perf_event_attach_bpf_prog
In addition to putting the bpf program into (perf_event event)->prog,
Also put the bpf program into event->tp_event->prog_array (this is the instruction set that blk interrupt will execute)
4.6 attach summary
2 champion processes included in attach
1. bpf_program__attach_kprobe_opts calls perf_event_open_probe to perform perf_event_open: find the corresponding kernel function, create and enable kprobe and insert blk exception interrupt.
2. bpf_program__attach_perf_event_opts injects the bpf program into the exception handling function of kprobe.
5. Trigger the bpf program
5.1 brk_handler
blk interrupt exception handling function, call_break_hook calls the hook function
5.2 call_break_hook
Find the kprobe processing function kprobe_breakpoint_handler registered to the kernel_break_hook linked list
ps:
The process of inserting the kprobe processing function kprobe_breakpoint_handler into the kernel_break_hook linked list is as follows =>
early_initcall(kernel\kprobes.c) -> init_kprobes -> arch_init_kprobes(probes\kprobes.c)
-> register_kernel_break_hook(&kprobes_break_hook)(debug-monitors.c) -> Insert into kernel_break_hook list
The instruction in break_hook is KPROBES_BRK_IMM, and the processing function is kprobe_breakpoint_handler
5.3 kprobe_breakpoint_handler/kprobe_handler
Find the corresponding kprobe, and kprobe's blk interrupt handler function pre_handler
(alloc_trace_kprobe in section 4.4.3, set tk->rp.kp.pre_handler = kprobe_dispatcher)
5.4 kprobe_dispatcher
TP_FLAG_PROFILE is set in enable_trace_kprobe in section 4.4.4, so kprobe_perf_func is run
5.5 kprobe_perf_func
1) Check whether the instruction set call->prog_array of the bpf program is valid
2) trace_call_bpf runs the bpf program
5.6 trace_call_bpf
bpf_prog_run runs the bpf program call->prog_array (perf_event_attach_bpf_prog in Chapter 4.5.5 sets the bpf program set call->prog_array),
So here we start running the bpf program we injected
5.7 bpf_prog_run
The last run is prog->bpf_func(ctx, insnsi),
bpf_func is the JIT-compiled instruction set for the injection function when loading (add stack save, execute bpf program, stack restore operation),
insnsi is an instruction of the bpf program itself
6 Summary
Summarize the entire injection and execution process of the bpf program:
1. bpf_object__open_skeleton reads and initializes bpf programs, bpf maps (via libbpf/libelf)
2. bpf_object__load_skeleton compiles the bpf function in real time (interacts with the kernel), and returns the fd of bpf-prog to the application
3. bpf_object__attach_skeleton sets the kprobe blk exception interrupt and processing function corresponding to the kernel function, injects the bpf program into call->prog_array, and returns perf_event and fd (related to the probe function), and fd of bpf_link (associated bpf program and perf_event fd)
4. blk abnormal interrupt trigger, execute bpf program
reference link
https://www.kernel.org/doc/html/latest/bpf/instruction-set.html
https://github.com/iovisor/bcc/tree/master/libbpf-tools
https://github.com/libbpf/libbpf
Past
Expect
push
recommend
The process of injecting eBPF programs into the kernel will now take you to study (Part 1)
An article to understand the bandwidth and synchronization of Vulkan in mobile rendering
AMD High Fidelity Super Resolution Algorithm 1.0 Decryption
Long press to follow Kernel Craftsman WeChat
Linux Kernel Black Technology | Technical Articles | Featured Tutorials