The process of injecting eBPF programs into the kernel will now take you to study (below)

Series Catalog

1. Doubt

2. vfsstat_bpf__open

3. bpf_object__load_skeleton loads bpf

4. bpf_object__attach_skeleton attaches bpf program

5. Trigger the bpf program

6. Summary

The process of injecting eBPF programs into the kernel will now take you to study (Part 1)

3. bpf_object__load_skeleton loads bpf

vfsstat_bpf__load calls bpf_object__load_skeleton,

bpf_object__load_skeleton will perform bfp loading bpf_object__load, initialize map mmaped

d34111114977cc67ff3c697924f05582.png

3.1 bpf_object__load

Load bpf program and maps, here we only talk about one line bpf_object__load_progs load bpf program

75b7136d1587e1bcd948e2ccee68e4c8.png

e317e9af3dc07dba66e12ddd42f835ef.png

3.2 bpf_object__load_progs loads bpf program

1) Check whether it needs to be skipped. In this example, for unsupported functions, such as fentry_vfs_write, bpf_program__set_autoload will be used to skip.

What is set is the load label (you can see that the setting does not load a certain bpf function, it needs to be set after open and before load)

2) Load the bpf program bpf_object_load_prog and put it into instances.fds (fd of bpf-prog)

af596cec2696d4e8c4ac7197a9e02275.png

3.3 bpf_object_load_prog loads a single bpf function

This bpf program refers to a single bpf function

1) Create a new prog->instances.fds array

2) Load the bpf program bpf_object_load_prog_instance, if successful, it will return the fd of bpf-prog

6bfb298c429a22d26b20776f8f923038.png

3.4 bpf_object_load_prog_instance loader instance

1) Initialize load_attr (bpf_prog_load_opts) to load the parameter structure of the bpf program

2) Call the bpf_prog_load function to load the bpf program

=> specific process external/libbpf/src/bpf.c => bpf_prog_load_v0_6_0 -> sys_bpf_prog_load -> BPF_PROG_LOAD -> syscall(__NR_bpf

The actual kernel function called is bpf_prog_load(kernel_platform/common/kernel/bpf/syscall.c)

a3556878f39b932b9857d0546081bfb1.png

7972b54498386364959c20a407c952c5.png

b020c322fa10d303f859fc341509fa5a.png

e92c988ca3129298368053e3347a54e6.png

3.5 bpf_prog_load system call

1) When entering, various permissions will be checked first, such as CAP_BPF, CAP_SYS_ADMIN, CAP_PERFMON, and CAP_NET_ADMIN permission checks.

2) bpf_prog_alloc creates a new bfp program, and at the same time sets jit_requested = 1 of bpf_prog

3) bpf check bpf_check

4) bpf_prog_select_runtime will compile the bfp program prog in real time (jit), modify the stack pointer at the same time, save the pointer, register, etc. before entering

5) Create a new bpf_ksym and add it to bpf_kallsyms

6) Create fd information of bpf-prog (bpf_prog_new_fd)

87bf6c6472c05c536951f77c1a093df4.png

39ca41feac4f4dd4f5c877dd11d7447c.png

4936a88b47dd63dada7e9bf27572bf63.png

5267efb31f4fb0c77e5abbd147f68aef.png

26d4e83e353bf56921b3c6890db97dbd.png

c76f18a9e495f884bf123031e62ac1ec.png

3.6 bpf_prog_select_runtime compiles the bfp program prog in real time

The main call is the bpf_int_jit_compile function

3bd6245095e559bb14b52aba4d92c718.png

63473c0bf2270491f2bc4e6cd9dcb81d.png

3.7 bpf_int_jit_compile just-in-time compilation

1) Call bpf_jit_blind_insn to blind the immediate value in the eBPF instruction

2) When executing for the first time, verify whether build_prologue (stack preservation), build_body (filling bpf instructions), build_epilogue (stack restoration), etc. can be executed normally,

Since ctx.image = NULL, the instructions submitted by emit will not be set to ctx.image, only the ctx->idx++ of the instruction;

3) Apply for the memory of bpf_binary_header *hdr according to the extra size of ctx.idx (number of instructions) and extable_size above, and set the storage location to ctx.image

4) Instruction ctx->idx is cleared, re-call build_prologue (stack preservation), build_body (filling bpf instruction), build_epilogue (stack restoration), the error reported here is the bpf function instruction

5) Call flush_icache_range to refresh the instruction cache in the address range from bpf_binary_header *header to ctx.image + ctx.idx (the old data is flushed)

6) After jit compilation, prog->bpf_func is set, and set jited = 1 (meaning that the function is compiled in real time)

7) tmp_blinded = true means that the blinding command is successful, and then the original bpf program orig_prog will be released here (there is already a new bpf program prog)

cec9bf294d1a1b41f9c3522da70d26cd.png

fadff52c5ce129842e215358420a162d.png

6ba810cf42fa17f028fe414070c21003.png

bd4c067dd410d4acc5a8551d4d368753.png

6936e445000bf5b1d36a0181ad7f8a95.png

b31b1e8fe251fb13aa0595a609797105.png

39da02c6fad627c5cc19bc3580d03715.png

7930c5552bebd302e6241ded6673a345.png

7e922b5c10930b26565b5d5f5b0540f9.png

1. First introduce the registers of bpf. Currently, bpf supports r0-r15, a total of 16 registers:

1) r0 is the register for the return value

2) r1-r5 are the parameters of the ebpf program

3) Registers reserved by r6-r9 for caller save registers

4) r10 read-only frame stack register, used to access the stack

5) The 3 registers behind the fp register (r10), r12, r13, and r15 registers are all used for bpf jit instant compilation

6) The r14 register is used for tail calls

7) BPF_REG_AX = r11 register is a temporary register for blinding

2bf33455b5e8e63fbc5ea811fbba7d09.png

2. build_prologue saves the original stack information

The operation is as follows:

1) Save A64_FP (stack top pointer), A64_LR (connection register)

2) Change the stack top register A64_FP = A64_SP (if it is now in the program current A64_SP)

3) Save the stack register before calling bfp, r6/r7/r8/r9/fp/tcc, and update fp = A64_SP (now at the position of BPF fp register)

4) Next, the depth of stack_depth is the content of the bpf program, and stack_size is the current SP pointer

(SP is also called the stack register (also known as the bottom pointer of the stack), which is used to store the data to be executed)

=> The general structure is as follows:

45337d73bef820430667a4e6240e2c9f.png

26d2c35bfad76ff715918764ea0136a2.png

48f10a3bcd3e27d1ee6193b06ec166e1.png

af0df729c3bd8a1aabe3de29c697eec9.png

3. The emit function, each execution will store the instruction in ctx->image

4b00bdde0ceb4dda6d7a317e3741bc34.png

4. Instructions for instant compilation of bpf programs

Traverse the bpf program (function such as kprobe_vfs_read) instruction set prog->insnsi[], call build_insn to compile the instruction jit

5d9ef64cbd65b0c909992f6a37c07fa3.png

8bb81b319c9f1102315cdab119a15874.png

5. Restore stack build_epilogue

1) The current pointer minus the stack_size will be the place of the BPF fp register (the end of the register where we reported the error in the previous build_prologue)

2) Pop up the registers we saved just now, tcc, r9/r8/r7/r6, FP/LR

3fad6e7557e8586819bf813644dae1c6.png

6. Also introduce the function of bpf_jit_binary_alloc to allocate bpf_binary_header memory:

Since the allocated address is at least 128 more, the starting address image_ptr of the instruction set after storing bpf just-in-time compilation is random

0409e4f76828a7acafbdf2b84742e864.png

656b2ccf8ad2c2bd73e9f4bd782b26e6.png

3.8 bpf_prog_new_fd returns fd information to the application bpf-prog

Create an fd named "bpf-prog" for the application

b4f2443b4dd6f7c1faef1acb4a7099b6.png

3.9 load summary

load mainly loads maps and compiles bpf functions on the fly (interacts with the kernel), and returns bpf-prog fd to the application

4. bpf_object__attach_skeleton attaches bpf program

The call is bpf_program__attach

2af385adbb7d93b854d01b3b9e0b4005.png

This operation is attach_fn. As mentioned earlier, you can find attach_kprobe when looking for it through sec_name = 'kprobe/vfs_read' (such as chapter 2.8)

a5e49b07092678b116f8e75ed47a076b.png

4.1 attach_kprobe

1) Determine whether it is kprobe (entering the probe) or kretprobe (returning the probe)

2) Call bpf_program__attach_kprobe_opts

acd5e069e8c1e659d6289e7ce1118463.png

4.2 bpf_program__attach_kprobe_opts

1) perf_event_open_probe finds the kprobe kernel function func_name = "vfs_read", and creates the corresponding perf_event, registers register_kprobe

2) bpf_program__attach_perf_event_opts Above we have found the function address corresponding to the "vfs_read" kernel, which is associated with the fd of perf_event (such as pfd here)

d4fa5f6fd4d8c3143aa211e9dcda75ee.png

fff7b219580a21888910a93448f191f0.png

4.3 perf_event_open_probe

1. Construct the perf_event_attr structure to pass data to the kernel. For example, ret probe will additionally set attr.config |= 1,

The attr.type transfer is the kprobe type (6), and the attr.config1 transfer is the attach function name "vfs_read"

2. Then call the kernel function sys_perf_event_open

e7d2ec38fc23e17737d43f104cd033c6.png

adbd5151460a290ad4f7fd405543c693.png

ebbd241bf5d817958eced8eb4cd2d69d.png

4.4 sys_perf_event_open

1. perf_event_alloc will find the corresponding kernel function, such as vfs_read in this example, and insert the @BRK64_OPCODE_KPROBES instruction (abnormal interrupt), and then enable the kprobe.

This is just to insert an interrupt, and the execution function has not been added (the bpf_program__attach_perf_event_opts of libbpf will put the binary instructions of the bpf program into the interrupt execution function)

2. Create event_file based on perf_event event, associate perf_event with fd("[perf_event]")

4215cae520d0133d6d976ce2753de0eb.png

87b1c1e3e409167d087477b4c1768a99.png

5706e25528df593f5c53870b14f6f3e7.png

50d917857e67d90dfc6ffcd36003a6b8.png

a5130d9e59c0c47d3888119da3697049.png

4.4.1 perf_event_alloc

Create and initialize perf_event *event, then call perf_init_event

142d0dd5e2af4513de9ede184d4b53f6.png

2e3b0f2a22237f955870935dfca953dc.png

4.4.2 perf_init_event

1、perf_init_event

Found through start_kernel->perf_event_init->perf_tp_register->perf_pmu_register(&perf_kprobe, "kprobe", -1);

Registered pmu_idr (type = 6, this is due to the type = -1 passed in by perf_pmu_register, so an unused id will be found starting from PERF_TYPE_MAX = 6),

That is pmu = perf_kprobe.

Then call perf_try_init_event

40327a2a6ca33d03907258461c997f68.png

2. The event_init called by perf_try_init_event is perf_kprobe_event_init

bc228d63ac1ff005f2ca534acfc7b6de.png

3、perf_kprobe_event_init

Determine whether to ret probe according to whether attr.config is 1 (set in Chapter 4.3)

Call perf_kprobe_init

d5e3ca9035c6bd74606a633b6ed3d769.png

4、perf_kprobe_init

1) Obtain the function name func("vfs_read") through config1(kprobe_func)

2) create_local_trace_kprobe finds the kernel address of the corresponding function, and registers kprobe. (The interrupt instruction BRK64_OPCODE_KPROBES_SS for single-step debugging is inserted when registering)

3) perf_trace_event_init enables kprobe, and inserts the BRK64_OPCODE_KPROBES instruction into the corresponding position

dea72f90d016fea29218aca8f3f51421.png

d9df125927fbf9db51057daa46ef69e7.png

4.4.3 create_local_trace_kprobe

1) alloc_trace_kprobe assigns trace kprobe, and sets the function to be processed when the kprobe interrupt is triggered, including pre_handler (processing when entering the kprobe function), handler (ret probe processing)

2) init_trace_event_call sets the registration function call->class->reg of kprobe

3) __register_trace_kprobe registers the kprobe function

1840cfb20d7831fc82dc2eebfe9f87eb.png

273fab9c6d68bcd2918ab6e7dd938cb1.png

1. alloc_trace_kprobe allocates trace kprobe

1) Set the function symbol name symbol_name of kprobe (here is vfs_read)

2) Set kprobe's interrupt handler function pre_handler = kprobe_dispatcher, return handler = kretprobe_dispatcher

3) Initialize tp->event->call

2710c78d486b2fd0dee5184f0e3aaacd.png

1638d319326f1e77e103c56dd98069ed.png

2、init_trace_event_call

1) Set the registration function kprobe_register of kprobe

2) Mark call->flags is TRACE_EVENT_FL_KPROBE (this event is a kprobe type)

f6d02eb1c1585509a67eb450dcebab42.png

3、__register_trace_kprobe

1) Set the flags of kprobe, the default is KPROBE_FLAG_DISABLED

2) Register kprobe, register_kprobe

98d79ca132c1ff0d8f7727cc93850bd1.png

4、register_kprobe

1) Use kprobe_addr to traverse /proc/kallsyms to find the symbol address of the function ("vfs_read")

2) If the function allows single-step debugging, a single-step interrupt instruction BRK64_OPCODE_KPROBES_SS will be inserted, and when a single-step exception is triggered, it will enter the exception handling function

(prepare_kprobe->arch_prepare_kprobe->arch_prepare_ss_slot)

3) Since kprobe is still disabled at this time, arm_kprobe will not be called

74b723e391704175bed9b61cc533a410.png

181355babf3789fe20cae56d6a514211.png

4.4.4 perf_trace_event_init

Here is mainly the registration of trace event perf_trace_event_reg

68de77ed1e17713d8e7a8a061eeba64f.png

1、perf_trace_event_reg

In the second point init_trace_event_call of chapter 4.4.2, the reg function of kprobe is set to kprobe_register

Here tp_event->class->reg calls kprobe_register

1d24967a3d7b6f889c4a931ccc5c05eb.png

2、kprobe_register

The type passed is TRACE_REG_PERF_REGISTER, so enable_trace_kprobe is run

4b06a8693a9f1dacda39b78ea8f24718.png

3、enable_trace_kprobe

1) Set the flag of TP_FLAG_PROFILE

2) Call __enable_trace_kprobe to enable enable kprobe

9588bfbf5084a60929656a9d4003b116.png

4、__enable_trace_kprobe

Determine whether kprobe has been registered, whether KPROBE_FLAG_GONE is set (remove the flag of kprobe)

cb1060df68d70f5add00a3c025048354.png

5、enable_kprobe

Enable kprobe, if kprobe is disabled, remove the disable (KPROBE_FLAG_DISABLED) label, and the removed kprobe (KPROBE_FLAG_GONE) will no longer be enabled

b585b89bf98933ec0ddecc30ff13e9ba.png

6、arm_kprobe

Lock text_mutex call __arm_kprobe

4b1bd9761cb4ee89d74b90a09f2d048c.png

7、__arm_kprobe/arch_arm_kprobe

The symbol address p->addr corresponding to the function "vfs_read" has been found in register_kprobe,

arch_arm_kprobe will insert the BRK64_OPCODE_KPROBES instruction at this address,

In this way, after the blk interrupt exception is triggered, the interrupt exception handling function is first entered.

b3831b42c468e0b5bddde24f8cc703f3.png

4.5 bpf_program__attach_perf_event_opts

bpf_program__attach_kprobe_opts calls perf_event_open_probe for

After the operation of perf_event_open (find the corresponding kernel function, create and enable kprobe and insert blk exception interrupt), the next step is to call bpf_program__attach_perf_event_opts to inject the bpf program into the exception handling function of kprobe.

1) Find the bpf-prog fd obtained by bpf_prog_load (which contains the JIT-compiled instruction set of the kernel of the bpf program)

2) Initialize bpf_link_perf (the bridge between bpf program and bpf performance event)

3) bpf_link_create injects the bpf program into the prog_array that the kprobe/uprobe exception handling function will execute

4) Enable the performance event obtained by perf_event_open_probe

We mainly focus on bpf_link_create to see how the bpf program is injected into the kernel function

e45233c55c08984e6efb8cca566d9a1c.png

42a1b650a7b76bc19fcdf490dd952ae0.png

4.5.1 bpf_link_create

Use the link_create in the union bpf_attr to pass parameters, and call the bpf system call sys_bpf_fd(BPF_LINK_CREATE

(The bpf system call is passed through kernel_platform\common\kernel\bpf\syscall.c

__sys_bpf, link_create is called here

)

f9d3a6aafbb542d2b918748dcc115b20.png

4.5.2 link_create

system call link_create

f78cce4108495fdbb43fcef49902724e.png

70b8a09634005d41e32d93cfdfb96a44.png

4.5.3 bpf_perf_link_attach

1) Find the corresponding perf_event through the fd of anon_inode:[perf_event]

2) Create anon_inode: fd of bpf_link (perf_file of bpf_perf_link is perf_event,

bpf_perf_link->link->prog is a bpf program, bpf_perf_link->link itself is associated with anon_inode:bpf_link)

3) perf_event_set_bpf_prog associates perf_event *event with prog (bpf program)

e892b95d963e0f07c2e7d9f442a0d6e4.png

a4b719b12680d45dd3acdcdf26267e7f.png

4.5.4 perf_event_set_bpf_prog

Both kprobe and uprobe go here, and the last call is

perf_event_attach_bpf_prog attaches the bpf program to perf_event

1a1b76653d18625660785119453ebe98.png

d6e86d643351df8c4fdbd34eaa8fc703.png

4.5.5 perf_event_attach_bpf_prog

In addition to putting the bpf program into (perf_event event)->prog,

Also put the bpf program into event->tp_event->prog_array (this is the instruction set that blk interrupt will execute)

29a07676228bd0c4804f0ff0d307f5a1.png

4.6 attach summary

2 champion processes included in attach

1. bpf_program__attach_kprobe_opts calls perf_event_open_probe to perform perf_event_open: find the corresponding kernel function, create and enable kprobe and insert blk exception interrupt.

2. bpf_program__attach_perf_event_opts injects the bpf program into the exception handling function of kprobe.

5. Trigger the bpf program

5.1 brk_handler

blk interrupt exception handling function, call_break_hook calls the hook function

4aa3d73cc2c7b80029a6f2b12cecc83b.png

5.2 call_break_hook

Find the kprobe processing function kprobe_breakpoint_handler registered to the kernel_break_hook linked list

55144d00d70a57afcb9f83a23001f2cd.png

ps:

The process of inserting the kprobe processing function kprobe_breakpoint_handler into the kernel_break_hook linked list is as follows =>

early_initcall(kernel\kprobes.c) -> init_kprobes -> arch_init_kprobes(probes\kprobes.c)

-> register_kernel_break_hook(&kprobes_break_hook)(debug-monitors.c) -> Insert into kernel_break_hook list

The instruction in break_hook is KPROBES_BRK_IMM, and the processing function is kprobe_breakpoint_handler

2d724e25f10e128c9f96e0d3391d1315.png

5.3 kprobe_breakpoint_handler/kprobe_handler

Find the corresponding kprobe, and kprobe's blk interrupt handler function pre_handler

(alloc_trace_kprobe in section 4.4.3, set tk->rp.kp.pre_handler = kprobe_dispatcher)

3e17e36a5ba5139397d280e1f9c93a75.png

9f3bdf01c67147383d39956ff13e4eba.png

5.4 kprobe_dispatcher

TP_FLAG_PROFILE is set in enable_trace_kprobe in section 4.4.4, so kprobe_perf_func is run

c045b300aa65d70ee6d6e224cd81fff0.png

5.5 kprobe_perf_func

1) Check whether the instruction set call->prog_array of the bpf program is valid

2) trace_call_bpf runs the bpf program

8a1ddef9d9746b6d21e4cbe1d12a8b53.png

5.6 trace_call_bpf

bpf_prog_run runs the bpf program call->prog_array (perf_event_attach_bpf_prog in Chapter 4.5.5 sets the bpf program set call->prog_array),

So here we start running the bpf program we injected

f2f5438254dded0adc8c50921e16b84f.png

5.7 bpf_prog_run

The last run is prog->bpf_func(ctx, insnsi),

bpf_func is the JIT-compiled instruction set for the injection function when loading (add stack save, execute bpf program, stack restore operation),

insnsi is an instruction of the bpf program itself

c9665fb5babb57490092a96722fa97de.png

6 Summary

Summarize the entire injection and execution process of the bpf program:

1. bpf_object__open_skeleton reads and initializes bpf programs, bpf maps (via libbpf/libelf)

2. bpf_object__load_skeleton compiles the bpf function in real time (interacts with the kernel), and returns the fd of bpf-prog to the application

3. bpf_object__attach_skeleton sets the kprobe blk exception interrupt and processing function corresponding to the kernel function, injects the bpf program into call->prog_array, and returns perf_event and fd (related to the probe function), and fd of bpf_link (associated bpf program and perf_event fd)

4. blk abnormal interrupt trigger, execute bpf program

reference link

  1. https://www.kernel.org/doc/html/latest/bpf/instruction-set.html

  2. https://github.com/iovisor/bcc/tree/master/libbpf-tools

  3. https://github.com/libbpf/libbpf

Past

Expect

push

recommend

The process of injecting eBPF programs into the kernel will now take you to study (Part 1)

An article to understand the bandwidth and synchronization of Vulkan in mobile rendering

AMD High Fidelity Super Resolution Algorithm 1.0 Decryption

874b20fb6bb8d93925e2bc614e6f2af8.gif

Long press to follow Kernel Craftsman WeChat

Linux Kernel Black Technology | Technical Articles | Featured Tutorials

Guess you like

Origin blog.csdn.net/feelabclihu/article/details/132632839