Linux performance optimization (four)-BCC performance monitoring tool

1. Introduction to BCC

1. Introduction to BCC

BCC is a Python library that simplifies the development process of eBPF applications and collects a large number of eBPF applications related to performance analysis. BCC provides different front-end support for BPF development, including Python and Lua, and realizes map creation, code compilation, parsing, injection and other operations, so that developers only need to focus on developing the kernel code to be injected in C language.
Most of the tools in the BCC tool set need to be supported by Linux Kernel 4.1 or higher, and full tool support requires Linux Kernel 4.15 or higher.
GitHub: https://github.com/iovisor/bcc

2. BCC installation

yum install bcc-tools
export PATH=$PATH:/usr/share/bcc/tools

Two, commonly used command tools

1、opensnoop

Opensnoop displays the process of attempting to open the file by tracking the open() system call, which can be used to locate configuration files or log files, or to troubleshoot failed applications that have failed to start.
Opensnoop dynamically tracks the sys_open() kernel function and updates any changes in the function. Opensnoop requires Linux Kernel 4.5 support. Because it uses BPF, it requires root privileges.
opensnoop [-h] [-T] [-U] [-x] [-p PID] [-t TID] [-u UID] [-d DURATION] [-n NAME] [-e] [-f FLAG_FILTER]
-h, --help: help information view
-T, --timestamp: output result print time stamp
-U, --print-uid: print UID
-x, --failed: only display failed open system calls
-p PID, --pid PID: PID of the process to track only
-t TID, --tid TID: only track TID thread
-u UID, --uid UID: Follow only UID
-d DURATION, --duration DURATION: track time, in seconds
- n NAME, --name NAME: only print processes that contain name
-e, --extended_fields: display extended fields
-f FLAG_FILTER, --flag_filter FLAG_FILTER: specify filter fields, such as O_WRONLY

2、execsnoop

execsnoop tracks new processes by tracing exec system calls. Processes that use fork instead of exec will not be included in the displayed results.
execsnoop requires BPF support, so root privileges are required.
execsnoop [-h] [-T] [-t] [-x] [-q] [-n NAME] [-l LINE] [--max-args MAX_ARGS]
-h: view help information
-T: print time stamp, format HH:MM:SS
-t: print time stamp
-x: include failure exec
-n NAME: print only the command line whose regular expression matches name
-l LINE: only The command line matching LINE in the print parameters
--max-args MAXARGS: parse and display the maximum number of parameters, the default is 20

3、biolatency

Biolatency tracks the IO of the block device, records the IO latency distribution, and displays it in a histogram. Biolatency dynamically tracks blk_family functions and records changes in functions.
Biolatency requires BPF support, so root privileges are required.
biolatency [-h] [-F] [-T] [-Q] [-m] [-D] [interval [count]]
-h Print usage message.
-T: output contains timestamp
-m: output ms-level histogram
-D: print the histogram of each disk device
-F: print the histogram of each IO set
interval: output interval
count: output Quantity
Linux performance optimization (four)-BCC performance monitoring tool

4、ext4slower

ext4slower tracks the read, write, open, and sync operations of the ext4 file system, and measures the time consumed by the corresponding operations, and prints detailed information that exceeds the threshold. The minimum value of the default threshold is 10ms. If the threshold is 0, all events are printed.
ext4slower requires BPF support, so root privileges are required.
ext4slower can identify independent slower disk IO through the file system.
ext4slower [-h] [-j] [-p PID] [min_ms]
-h, --help: view help information
-j, --csv: print fields in csv format
-p PID, --pid PID: only track PID processes
min_ms: track IO threshold, the default is 10.
Linux performance optimization (four)-BCC performance monitoring tool

5 、 biosnoop

biosnoop can track device IO and print a line of summary information for each IO device.
biosnoop dynamically tracks blk_family functions and records the changes of functions.
Biosnoop requires BPF support, so root privileges are required.
biosnoop [-hQ]
-h: view help information
-Q: display time spent in the OS queue
Linux performance optimization (four)-BCC performance monitoring tool

6 、 cachestat

Cachestat is used to count the hit rate and miss rate of Linux Pages, dynamically track the cache function of the kernel page, and update any changes in the cache function.
Cachestat requires BPF support, so root privileges are required.
cachestat [-h] [-T] [interval] [count]
-h: View help information
-T, --timestamp: output timestamp
interval: output interval, in seconds
count: output quantity
Linux performance optimization (four)-BCC performance monitoring tool

7 、 cachetop

Cachetop is used to count the hit rate and miss rate of the Linux Page cache of each process, dynamically track the cache function of the kernel page, and update any changes in the cache function.
Cachestat requires BPF support, so root privileges are required.
c achetop [-h] [interval]
-h: View help information
interval: Output interval
Linux performance optimization (four)-BCC performance monitoring tool
PID: Process ID
UID: Process user ID
HITS: Number of page cache hits
MISSES: Number of page cache misses
DIRTIES: Number of dirty pages added to page cache
READ_HIT%: Read hits of page cache Rate
WRITE_HIT%: write hit rate of page cache
BUFFERS_MB: Buffer size, data source /proc/meminfo
CACHED_MB: Cache size of current page, data source /proc/meminfo

8、tcpconnect

tcpconnect is used to track the number of active TCP connections, dynamically track the kernel tcp_v4_connect and tcp_v6_connect functions, and record any changes within the function.
tcpconnect requires BPF support, so root privileges are required.
tcpconnect [-h] [-c] [-t] [-x] [-p PID] [-P PORT]
-h: view help information
-t: print timestamp
-c: count the number of connections of each source IP and destination IP/port
-p PID: only track PID processes
-P PORT: list of destination ports to be tracked, separated by commas
Linux performance optimization (four)-BCC performance monitoring tool

9、trace

Trace is used to trace a function call and print function parameters or return values. It requires BPF support, so root privileges are required.
trace [-h] [-b BUFFER_PAGES] [-p PID] [-L TID] [-v] [-Z STRING_SIZE] [-S] [-s SYM_FILE_LIST] [-M MAX_EVENTS] [-t] [-u] [-T] [-C] [-K] [-U] [-a] [-I header] probe [probe ...]
-h: view help information
-p PID: only track PID process
-L TID: only track TID thread
-v: display the generated BPF program, debug use
-z STRING_SIZE: collect the length of string parameters
-s SYM_FILE_LIST: collect stack size
-M MAX_EVENTS: Maximum number of trace messages to be
printed -t: Printing time, in seconds.
-u: Print the time stamp
-T: Print the time column
-C: Print the CPU ID
-K: Print the kernel stack of
each event -U: Print the user stack of each event
-a: Print the virtual sequence of the kernel stack and the user stack Address
-I header: add the header file to the BPF program
probe [probe ...]: the probe attached to the function
trace'::do_sys_open "%s", arg2' to
trace all call methods of open system calls,
Linux performance optimization (four)-BCC performance monitoring tool
trace ':c:malloc "size = %d", arg1'
trace malloc calls and print them Apply for the size of the allocated memory to
Linux performance optimization (four)-BCC performance monitoring tool
trace 'u:pthread:pthread_create "start addr = %llx", arg3'
track the pthread_create function call and print the thread start function address
Linux performance optimization (four)-BCC performance monitoring tool

10、deadlock

deadlock is used to find potential deadlocks in a running process. Deadlock requires BPF support by attaching uprobe events, so root privileges are required.
deadlock [-h] [--binary BINARY] [--dump-graph DUMP_GRAPH] [--verbose] [--lock-symbols LOCK_SYMBOLS] [--unlock-symbols UNLOCK_SYMBOLS] pid
-h, --help: View help information
--binary BINARY: Specify thread library, must be specified for dynamic linker.
--dump-graph DUMP_GRAPH: Export the mutex graph to the specified file
--verbose: Print mutex statistics
--lock-symbols LOCK_SYMBOLS: A list of locks to be tracked, separated by commas, the default is pthread_mutex_lock.
--unlock-symbols UNLOCK_SYMBOLS: A list of unlocks to be tracked, separated by commas, the default is pthread_mutex_unlock.
pid: The process ID to be tracked to
deadlock 181 --binary /lib/x86_64-linux-gnu/libpthread.so.0
find potential deadlocks in process 181. If the process is created by a dynamic linker, you need to use the thread library specified by --binary.

11 、 memleak

memleak is used to track and find memory allocation and release matching, and requires Linux Kernel 4.7 or higher support.
memleak [-h] [-p PID] [-t] [-a] [-o OLDER] [-c COMMAND] [--combined-only] [-s SAMPLE_RATE] [-T TOP] [-z MIN_SIZE] [-Z MAX_SIZE] [-O OBJ] [INTERVAL] [COUNT]
-h: View help information
-p PID: specify process PID
-t: track all memory allocation and release requests and results
-a: output a list of unreleased memory
-z MIN_SIZE: capture the minimum value of allocated memory
-Z MAX_SIZE: capture allocation The maximum value of memory
memleak -z 16 -Z 32
only captures and analyzes the memory allocation between 16 bytes and 32 bytes.

Three, BCC programming development

1. BCC realization principle

BCC is a tool set of eBPF, which is an upper-level package for extracting data from eBPF. The programming form of BCC tools is a nested BPF program in Python. Python code can provide users with a friendly upper-level interface for eBPF, and can also be used for data processing. The BPF program will be injected into the kernel to extract data. When the BPF program is running, the BPF program is compiled by LLVM to obtain the elf file of the BPF instruction set, and the part that can be injected into the kernel is parsed from the elf file, and the bpf_load_program method is used to complete the injection.
The bpf_load_program injection program method adds a complex verifier mechanism. Before running the injection program, a series of safety checks are performed to ensure the safety of the system to the greatest extent. The security-checked BPF bytecode is compiled using the kernel JIT to generate native assembly instructions and attached to the kernel-specific hook program. Finally, the kernel mode and the user mode communicate through an efficient map mechanism, and the BCC tool uses Python for data processing in the user mode.
Linux performance optimization (four)-BCC performance monitoring tool

2. BCC example implementation

Part of Python coding requires the introduction of modules and packages used.
In the Python code of the BCC tool, the BPF C language program code is used in the following way:
hello_world.py:

#!/usr/bin/python3
from bcc import BPF
bpf_program = '''
int kprobe__sys_clone(void *ctx) 
{ 
    bpf_trace_printk("Hello, World!\\n");
    return 0;
}'''
if __name__ == "__main__":
       BPF(text=bpf_program).trace_print()

kprobe__sys_cloneIt is a shortcut for kernel dynamic tracking through kprobes. If the C function starts with kprobe__, the rest is regarded as the name of the kernel function to be detected.
bpf_trace_printk: output
python3 hello_world.py
Linux performance optimization (four)-BCC performance monitoring tool

3. DDOS defense example

#!/usr/bin/python
from bcc import BPF
import pyroute2
import time
import sys

flags = 0
def usage():
    print("Usage: {0} [-S] <ifdev>".format(sys.argv[0]))
    print("       -S: use skb mode\n")
    print("e.g.: {0} eth0\n".format(sys.argv[0]))
    exit(1)

if len(sys.argv) < 2 or len(sys.argv) > 3:
    usage()

if len(sys.argv) == 2:
    device = sys.argv[1]

if len(sys.argv) == 3:
    if "-S" in sys.argv:
        # XDP_FLAGS_SKB_MODE
        flags |= 2 << 0

    if "-S" == sys.argv[1]:
        device = sys.argv[2]
    else:
        device = sys.argv[1]

mode = BPF.XDP
ctxtype = "xdp_md"

# load BPF program
b = BPF(text = """
#define KBUILD_MODNAME "foo"
#include <uapi/linux/bpf.h>
#include <linux/in.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <linux/if_vlan.h>
#include <linux/ip.h>
#include <linux/ipv6.h>

// how to determin ddos
#define MAX_NB_PACKETS 1000
#define LEGAL_DIFF_TIMESTAMP_PACKETS 1000000

// store data, data can be accessd in kernel and user namespace
BPF_HASH(rcv_packets);
BPF_TABLE("percpu_array", uint32_t, long, dropcnt, 256);

static inline int parse_ipv4(void *data, u64 nh_off, void *data_end) {
    struct iphdr *iph = data + nh_off;
    if ((void*)&iph[1] > data_end)
        return 0;
    return iph->protocol;
}
static inline int parse_ipv6(void *data, u64 nh_off, void *data_end) {
    struct ipv6hdr *ip6h = data + nh_off;
    if ((void*)&ip6h[1] > data_end)
        return 0;
    return ip6h->nexthdr;
}

// determine ddos
static inline int detect_ddos(){
    // Used to count number of received packets
    u64 rcv_packets_nb_index = 0, rcv_packets_nb_inter=1, *rcv_packets_nb_ptr;
    // Used to measure elapsed time between 2 successive received packets
    u64 rcv_packets_ts_index = 1, rcv_packets_ts_inter=0, *rcv_packets_ts_ptr;
    int ret = 0;

    rcv_packets_nb_ptr = rcv_packets.lookup(&rcv_packets_nb_index);
    rcv_packets_ts_ptr = rcv_packets.lookup(&rcv_packets_ts_index);
    if(rcv_packets_nb_ptr != 0 && rcv_packets_ts_ptr != 0){
        rcv_packets_nb_inter = *rcv_packets_nb_ptr;
        rcv_packets_ts_inter = bpf_ktime_get_ns() - *rcv_packets_ts_ptr;
        if(rcv_packets_ts_inter < LEGAL_DIFF_TIMESTAMP_PACKETS){
            rcv_packets_nb_inter++;
        } else {
            rcv_packets_nb_inter = 0;
        }
        if(rcv_packets_nb_inter > MAX_NB_PACKETS){
            ret = 1;
        }
    }
    rcv_packets_ts_inter = bpf_ktime_get_ns();
    rcv_packets.update(&rcv_packets_nb_index, &rcv_packets_nb_inter);
    rcv_packets.update(&rcv_packets_ts_index, &rcv_packets_ts_inter);
    return ret;
}

// determine and recode by proto
int xdp_prog1(struct CTXTYPE *ctx) {
    void* data_end = (void*)(long)ctx->data_end;
    void* data = (void*)(long)ctx->data;
    struct ethhdr *eth = data;
    // drop packets
    int rc = XDP_PASS; // let pass XDP_PASS or redirect to tx via XDP_TX
    long *value;
    uint16_t h_proto;
    uint64_t nh_off = 0;
    uint32_t index;
    nh_off = sizeof(*eth);
    if (data + nh_off  > data_end)
        return rc;
    h_proto = eth->h_proto;
    // parse double vlans
    if (detect_ddos() == 0){
        return rc;
    }
    rc = XDP_DROP;
    #pragma unroll
    for (int i=0; i<2; i++) {
        if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
            struct vlan_hdr *vhdr;
            vhdr = data + nh_off;
            nh_off += sizeof(struct vlan_hdr);
            if (data + nh_off > data_end)
                return rc;
                h_proto = vhdr->h_vlan_encapsulated_proto;
        }
    }
    if (h_proto == htons(ETH_P_IP))
        index = parse_ipv4(data, nh_off, data_end);
    else if (h_proto == htons(ETH_P_IPV6))
       index = parse_ipv6(data, nh_off, data_end);
    else
        index = 0;
    value = dropcnt.lookup(&index);
    if (value)
        *value += 1;
    return rc;
}
""", cflags=["-w", "-DCTXTYPE=%s" % ctxtype])

fn = b.load_func("xdp_prog1", mode)
b.attach_xdp(device, fn, flags)

dropcnt = b.get_table("dropcnt")
prev = [0] * 256
print("Printing drops per IP protocol-number, hit CTRL+C to stop")
while 1:
    try:
        for k in dropcnt.keys():
            val = dropcnt.sum(k).value
            i = k.value
            if val:
                delta = val - prev[i]
                prev[i] = val
                print("{}: {} pkt/s".format(i, delta))
        time.sleep(1)
    except KeyboardInterrupt:
        print("Removing filter from device")
        break;

b.remove_xdp(device, flags)

Guess you like

Origin blog.51cto.com/9291927/2593705