Introduction to BPF Technology - Ecology and History

The pictures in this article are all from public information on the Internet, and the source is not indicated. Content stems from the development of Network Detection and Response NDR.

historical cause

Figure 1: The code size growth of the Linux kernel

What DPDK, PF_RING, and BPF have in common

Separate out the data plane of the network

  • The price of network function expansion is to sacrifice performance
  • They are breaking through the limitation of kernel complexity-kernel bypass
  • Separation of the data plane from the kernel
  • Hand over to user mode development - DPDK, PF_RING and BPF
  • Download to NIC Execution - XDP for BPF

The difference between DPDK, PF_RING and BPF: Ecological goals are different

  • DPDK independent platform
    • Exclusive Kernel Networking Stack
    • Dedicated SDK API
    • Ecological hardware-oriented
    • Application isolation and security mechanism failure
  • PF_RING independent manufacturer
    • independent italian company
    • The high-performance version is charged separately
    • independent ecology

  • Ecology of BPF Integration
    • grafted into the kernel
    • Integrate into existing hardware and software
    • The ecology is software-oriented and developing rapidly

DPDK and BPF Foundation

Infrastructure-oriented: underlying hardware/upper-layer business

Figure 2: Two foundations with different interests

DPDK Annual Events and Technical Standards

annual event

  • 2010, Intel, DPDK of Xeon 1 generation
  • 2013, 6WIND, DPDK.org community
  • 2014, OvS, OvS-DPDK Distributed Virtual Switch
  • 2015, ARM version
  • In 2017, joined the Linux Foundation
  • 2018, VMware, Software-Defined Infrastructure in the Data Center
  • 2019, Financial Applications (High Frequency Trading)

technical standard

  • PCI passthrough, bypassing Hyper-V emulation, virtual machines pass through PCI
  • SR-IOV, bypassing the Hyper-V stack, virtualizing a single NIC into multiple pass-through NICs
  • FD.io/VPP, vector accelerated data IO, reducing I-cache jitter and read latency
  • vDPA, virtio data path acceleration, control plane software simulation, data plane hardware implementation
  • Switchdev, offloads the switching function of the kernel to the hardware
  • Code Contribution Proportion (V21.11)

The rapidly developing ecology of BPF

technology of the year

  • 1992 paper, BSD Packet Filters
  • 1994 cBPF, implementation of libpcap in tcpdump
  • 2014 eBPF, Universal Virtual Machine
  • 2015 BCC, Development Tools and Libraries
  • 2016 XDP, network kernel bypass module
  • 2017 libbpf, developed independently from the kernel
  • 2017 ARM/MIPS, multi-platform BPF
  • 2018 BTF, Cross-Kernel Version Type Format
  • 2019 Tail Calls and Hot Updates
  • 2020 LSM and TCP Congestion Control
  • 2021 eBPF Foundation

Product ecology

  • 2017 Facebook, Load Balancing Katran in Production
  • 2018 Cloudflare, DDoS/Firewall in production and more
  • 2018 Android 9, traffic monitoring, including DHCPClient
  • 2018 Netronome, Agilio SmartNIC supports XDP
  • 2018 DPDK, supports BPF (excluding MAP and tail calls)
  • 2019 Google, KRSI Kernel Runtime Security Assistant (Kernel V5.7)
  • 2019 Sysdig, Falco donated for k8s security foundation
  • 2020 Nvidia, Mellanox Smart NIC supports XDP (¥3,000~9,000)
  • 2020 Microsoft, Sysmon for Linux, eBPF for Windows
  • 2020 bytes, high-performance network ACL
  • 2020 Alibaba Cloud, Acceleration and Expansion Based on Cilium
  • 2021 Cilium,Service mesh (无Sidecar)

DPDK and BPF ecology

Different for ecology and developers

Product Categories DPDK BPF
Safety Falcon/Cilius/L4drop/...
Observability Hubble/L3AF/Tracee/...
network DPVS/OVS/FD.IO/VPP Tar
SDKs C++/GO C++/Rust/GO/Python
kernel TCP stackF-Stack/mTCP helper-API/Maps/Verfier&JIT
Market ecology Focus on the virtualization and sharing of hardware capabilities, promoted by hardware manufacturers, is the evolution of virtualization technologies such as vt-x/ept on network IO Focus on the separation of core capabilities and the promotion of multiplexed cloud vendors, which is the integration of cloud-native technology evolution and local ecology

Table 1: Two foundations with different interests

BPF Technology Introduction Performance Analysis

Similarities and differences between DPDK and PF_RING technologies - similar technologies

Figure 3: Principle of PF_RING

 

Figure 4: DPDK module structure

same

  • UIO+PMD active polling reduces interrupts and CPU context switching
  • UIO+mmap realizes zero copy
  • HugePages reduce TLB misses

difference

  • PF_RING is divided into normal version and ZC (zero copy) high-performance version
  • ZC license fee, DPDK free
  • The performance of ZC and DPDK is basically the same
  • ZC application layer API is easy to use, and the development difficulty is much less than that of DPDK
  • ZC is a product of Italian ntop company, with a small ecological

Network technical characteristics of BPF

Figure 5: BPF module structure

7 types of mount points of BPF, covering the whole life cycle process

  1. probe
  2. syscall
  3. sockmap/sockops
  4. kprobe
  5. cgroups
  6. tc
  7. xdp

Figure 6: Four packet processing methods of BPF's XDP program

Four packet processing methods of XDP

  1. PASS release to the kernel
  2. DROP Discard, less than kernel
  3. REDIRECT Forward other processing
  4. TX backtracking (for blocking and redirection)

XDP's offloaded NIC performance

Netronome SmartNIC data

  • XDP 3 hook points (network card Offload/Native before the kernel/Generic in the kernel)
  • Offload 1 core is approximately 1/3 times faster than Native 8 core
  • Native is slightly slower than DPDK, but the performance is similar Stackoverflow Redhat

Figure 7: Performance of 3 mount modes of XDP

 Figure 8: The position of XDP in the network protocol stack

NAC pressure test record

3 types of throughput, NAC WSL2 virtual machine

XDP_DROP Drop directly

❯ tcpreplay -t -i lo t.pcap
Actual: 56320 packets (46419288 bytes) sent in 1.10 seconds
Rated: 42062496.6 Bps, 336.49 Mbps, 51033.95 pps
Flows: 1091 flows, 988.60 fps, 56290 flow packets, 30 non-flow

XDP_TX blocking, redirection

Actual: 56320 packets (46419288 bytes) sent in 1.30 seconds
Rated: 35446666.1 Bps, 283.57 Mbps, 43007.04 pps
Flows: 1091 flows, 833.10 fps, 56290 flow packets, 30 non-flow

XDP_TX + map_perf block and report to the application

Actual: 56320 packets (46419288 bytes) sent in 1.49 seconds
Rated: 31016641.1 Bps, 248.13 Mbps, 37632.14 pps
Flows: 1091 flows, 728.98 fps, 56290 flow packets, 30 non-flow

Competing products have 10 gigabit electrical ports, the number of transactions per second is 1000TPS, the maximum throughput is 500Mbps, and the maximum number of concurrent connections: 1000 (pieces)

Introduction to BPF Technology - Case Study

Actual case cloudflare DDOS Mirai botnet 3 attacks

Figure 9: Traffic carried by the network during DDOS

2020.07, 654Gbps, SYN flood and UDP flood 2021.08, 1.2Tbps, SYN flood and UDP flood, maximum HTTP request 25 million times/s, blocked within 3 seconds 2021.11, 2Tbps, 1 minute DNS amplification attack and UDP flood, 15,000 A zombie (below) nearly 2Tbps

DDOS mitigation process ecological chain: full BPF series product integration

Figure 10: Toolchain using XDP when mitigating DDOS

  • Automatically push mitigation strategies when receiving samples and analyzing attacks
  • Then execute the local mitigation strategy on each server,
  • L4Drop (XDP) for DDos
  • LB's Unimog (XDP)
  • Magic firewall for firewalls (BPF xt_btf)
  • distribution and monitoring
  • UDP rate limiting (BPF SO_ATTACH_BPF)
  • ebpf_exporter for reporting metrics

Figure 11: Protocol discovery technology using XDP when mitigating DDOS

The p0f fingerprint library describes the characteristics of the TCP header

DDOS traffic fingerprinting

fingerprint type

Windows XP: 4:120+8:0:1452:65535,0   :mss,nop,nop,sok   :df,id+:0
Windows 7:  4:128:0:*     :8192,8    :mss,nop,ws,nop,sok:df,id+:0
Windows 11: 4:128+0:0:1460:mss*44,8  :mss,nop,ws,nop,sok:df,id+:0
ubuntu 14:  4:64   :0     :*:mss*10,6:mss,sok,ts,nop,ws :df,id+:0

The basis for TCP fingerprint judgment is that the TTL of Linux is 64, and that of Win is 128. The number and position of flags are different between Linux and Win

4: IP version 
  64: TTL   
     0: IP options length
       *: MSS maximum segment size
         mss*10,6: TCP window size and scale
                  mss,sok,ts,nop,ws: TCP Options(sok:selective ACK permitted)
                                 df,id+: quirks(df: don't fragment)
                                        0 TCP Payload Length

Traffic identification in load balancing

Facebook katran TCP option add custom ID

Figure 12: The relationship between XDP at the L4 layer and other services during load balancing

  • Realize stateless routing mechanism by adding server_id to TCP header
  • Processing TCP header overhead (CPU/memory) is very small
  • Select backend based on server_id using Maglev Hash variant
  • Only applicable to intranet without firewall

Intercept event point read and write server_id

Figure 13: Events used by XDP in L4 layer load balancing

Observability Debugging

cilium pwru (packet, where are you?)

API calls to trace network packets in the kernel

Introduction to BPF Technology - Technical Analysis

BPF development and operation mechanism

Figure 14: BPF life cycle process (XDP and TC of the network)

1. Write and compile 2. Verify 3. BPF bytecode to machine code 4. Mount and execute 6. Communicate with the system 5. DROP/PASS/TX/REDIRECT of XDP

cBPF virtual machine principle

 Figure 15: Decomposition of the code in the compiler

Figure 16: BPF converted to DAG

BPF is based on a register virtual machine, and the program is finally converted into a DAG, and the kernel executes the compilation and decomposition of filter expressions in two modes, as shown in the figure below. Model 1: The problem that restricts performance the most, there is repeated calculation Model 2: etc. Comparable to Model 1, only 6 Boolean operations are required to traverse the entire tree Model 2 eBPF Model 1 cBPF

cBPF virtual machine execution process

Figure 17: BPF internal execution process

Figure 18: BPF instruction code explanation

$ tcpdump tcp port 443 -d

BPF Existing Issues

Kernel version and security

  • The kernel version supports Linux > V4.18
  • XDP and its MAP support
  • BTF runs across different versions of the kernel
  • Security Privilege Escalation Vulnerability Linux V5.7~V5.8
  • CVE-2020-27194: Out-of-bounds reading and writing during Verify
  • CVE-2020-8835: The logic error of the constant variable value range when verifying
  • CVE-2020-27194: Register boundary tracking during Verify
  • Container escape: write rootkit, modify user space, hook network data

Guess you like

Origin blog.csdn.net/zmule/article/details/126549626