Linux tc flow control (1): classless qdisc

what is tc

The full name of tc is traffic control, which is a tool in the iproute2 package to control traffic in the kernel. In the kernel's network protocol stack, there is a special place for processing network traffic (after XDP, before netfilter), where tc reads network data packets (which are already sk_buffer at this time) for control, distribution, discarding, etc. operate. It should be noted that tc can handle both outgoing data packets (egress) and incoming data packets (ingress), but it has fewer functions for processing incoming data packets. This article does not involve ingress content.

Core concepts: qdisc

When we use tc, we first encounter a noun called qdisc. In fact, we often see it, that is, every time we execute the ip command

root@ubuntu:~# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:58:48:55 brd ff:ff:ff:ff:ff:ff
    altname enp2s1

For example, here, we can see that each network device has a qdisc behind the mtu, and there is a word after qdisc. The qdisc of lo above is noqueue, and the qdisc of ens33 is fq_codel.

qdisc is actually the abbreviation of queuing discipline. We can think of it as a queue with certain rules. When tc processes network packets, it will queue the packets into qdisc, and these packets will be taken out by the kernel in a certain order according to the specified rules. There are many different qdiscs built into tc, and some qdiscs can take parameters, such as the qdisc parameters on ens33.

root@ubuntu:~# tc qdisc show dev ens33
qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 

Noticed that the qdisc on lo is noqueue, then can we delete the qdisc of ens33? Try:

root@ubuntu:~# tc qdisc del dev ens33 root
Error: Cannot delete qdisc with handle of zero.

Ignore root in the command for now. This deletion command failed. An article on the Internet concluded that for this kind of non-virtual network device (although it is a network card of a virtual machine, it is not a virtual device), its qdisc cannot be deleted and used. It is set to noqueue, so if we want to modify his qdisc, we cannot delete it first and then add it, but we can modify it through add and replace. The article also mentioned that the effect of add and replace is the same. I have a different opinion on this.

First try adding another type of qdisc:

root@ubuntu:~# tc qdisc add dev ens33 root pfifo
root@ubuntu:~# tc qdisc show dev ens33
qdisc pfifo 8005: root refcnt 2 limit 1000p

If the addition is successful, then we will add another type of qdisc:

root@ubuntu:~# tc qdisc add dev ens33 root pfifo_fast
Error: Exclusivity flag on, cannot modify.
root@ubuntu:~# tc qdisc show dev ens33
qdisc pfifo 8005: root refcnt 2 limit 1000p

This will report an error, or use the original qdisc. At this point we try to use del to delete

root@ubuntu:~# tc qdisc del dev ens33 root
root@ubuntu:~# tc qdisc show dev ens33
qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 

Found that it changed back to fq_codel. If we use replace here and then delete, the result is like this

root@ubuntu:~# tc qdisc replace dev ens33 root pfifo_fast
root@ubuntu:~# tc qdisc show dev ens33
qdisc pfifo_fast 8006: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
root@ubuntu:~# tc qdisc replace dev ens33 root pfifo
root@ubuntu:~# tc qdisc show dev ens33
qdisc pfifo 8007: root refcnt 2 limit 1000p
root@ubuntu:~# tc qdisc del dev ens33 root
root@ubuntu:~# tc qdisc del dev ens33 root
Error: Cannot delete qdisc with handle of zero.
root@ubuntu:~# tc qdisc show dev ens33
qdisc fq_codel 0: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 

So we can speculate: fq_codel is the default value for non-virtual devices. At this time, the network device does not have qdisc set, so an error will be reported when deleting it. However, we can set qdisc for it (using add or replace). At this time, qdisc is set and deleted. It will not fail, and qdisc becomes the default value after deletion. Notice that the value of lo is noqueue. If we use the ip link command to manually add devices, their qdisc is also noqueue, so we can also speculate that the default value of the virtual device is noqueue.

There is a root parameter in the command here. In fact, this parameter represents the root node for processing egress traffic. As can be seen from the above command, only one qdisc can be set for the root point of a network device. Execute the tc qdisc show command, and you can see the parameters of the current qdisc. Some qdiscs cannot specify parameters, and some can. If you need to specify parameters, you can add them at the end of the add command:

tc qdisc add dev <dev> root <qdisc> <qdisc-param>

There are two types of qdisc. One type of qdisc is relatively simple. It is the same as shown above. It can only set some parameters and then take effect at the (egress) root point of the device. This type is called classless qdisc. The other type is more complex. They also include internal components called classes, and can further pass packets to other qdiscs. All data packets flow in a tree-like structure, which is called classful qdisc. This article will only introduce the relatively simple classless qdisc.

some classless qdisc

tc has built-in the following classless qdisc

  • choke
  • codel
  • bfifo、qfifo
  • fq
  • fq_codel
  • grade
  • hhf
  • ingress
  • mqprio
  • multiq
  • i don't have
  • pfifo_fast
  • pie
  • red
  • rr
  • sfb
  • sfq
  • tbf

Here are some of the qdiscs

tbf

The full name of tbf is token bucket filter. As can be seen from the name, tbf has two concepts: token and bucket. Its working principle is shown in the figure below.

The token that an incoming data packet needs to obtain is related to the size of the data packet, and data packets of any size need to consume tokens. If a certain data packet does not have enough tokens, it will be placed in the queue and wait for token replenishment. , subsequent arriving data packets also enter the queue and wait until the queue is filled, and subsequent packets will be discarded. It can be seen that if we limit the capacity of the bucket or limit the replenishment speed of tokens, we can limit the overall packet processing speed.

The parameters that can be configured by tbf are as follows:

  • limit: The capacity of data packets that the queue can save, in bytes. Or you can specify a similar waiting effect through another parameter latency
  • burst: bucket capacity, in bytes
  • rate: number of tokens added
  • mpu: specifies the minimum number of tokens required
  • peakrate: the maximum consumption rate of the bucket, tbf is achieved by adding a second bucket with a smaller capacity
  • mtu/minburst: the capacity of the second bucket

Test the effect of tbf speed limit

Install iperf3 to start the service and test the effects of noqueue and tbf restrictions respectively:

#另一终端执行 iperf3 -s 
​
root@ubuntu:~# ip l show lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
​
root@ubuntu:~# iperf3 -c 127.0.0.1
Connecting to host 127.0.0.1, port 5201
[  5] local 127.0.0.1 port 56294 connected to 127.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  4.49 GBytes  38.6 Gbits/sec    0   3.25 MBytes       
[  5]   1.00-2.00   sec  4.47 GBytes  38.4 Gbits/sec    0   3.25 MBytes       
[  5]   2.00-3.00   sec  4.41 GBytes  37.9 Gbits/sec    0   3.25 MBytes       
[  5]   3.00-4.00   sec  4.39 GBytes  37.8 Gbits/sec    0   3.25 MBytes       
[  5]   4.00-5.00   sec  4.63 GBytes  39.8 Gbits/sec    0   3.25 MBytes       
[  5]   5.00-6.00   sec  4.72 GBytes  40.5 Gbits/sec    0   3.25 MBytes       
[  5]   6.00-7.00   sec  4.58 GBytes  39.4 Gbits/sec    0   3.25 MBytes       
[  5]   7.00-8.00   sec  4.59 GBytes  39.4 Gbits/sec    0   3.25 MBytes       
[  5]   8.00-9.00   sec  4.61 GBytes  39.6 Gbits/sec    0   3.25 MBytes       
[  5]   9.00-10.00  sec  4.66 GBytes  40.0 Gbits/sec    0   3.25 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  45.6 GBytes  39.1 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  45.6 GBytes  39.1 Gbits/sec                  receiver
​
iperf Done.
​
​
root@ubuntu:~# tc qdisc add dev lo  root tbf  limit 100mb rate 10mbit burst 5mb
root@ubuntu:~# tc qdisc show dev lo
qdisc tbf 800a: root refcnt 2 rate 10Mbit burst 5Mb lat 79.7s 
root@ubuntu:~# iperf3 -c 127.0.0.1
Connecting to host 127.0.0.1, port 5201
[  5] local 127.0.0.1 port 56304 connected to 127.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  8.75 MBytes  73.3 Mbits/sec    1   1.37 MBytes       
[  5]   1.00-2.00   sec  1.25 MBytes  10.5 Mbits/sec    0   1.37 MBytes       
[  5]   2.00-3.00   sec  1.25 MBytes  10.5 Mbits/sec    0   1.37 MBytes       
[  5]   3.00-4.00   sec  1.25 MBytes  10.5 Mbits/sec    0   1.37 MBytes       
[  5]   4.00-5.00   sec  1.25 MBytes  10.5 Mbits/sec    0   1.37 MBytes       
[  5]   5.00-6.00   sec  1.25 MBytes  10.5 Mbits/sec    0   1.37 MBytes       
[  5]   6.00-7.00   sec  1.25 MBytes  10.5 Mbits/sec    0   1.37 MBytes       
[  5]   7.00-8.00   sec  1.25 MBytes  10.5 Mbits/sec    0   1.37 MBytes       
[  5]   8.00-9.00   sec  1.25 MBytes  10.5 Mbits/sec    0   1.37 MBytes       
[  5]   9.00-10.00  sec  1.25 MBytes  10.5 Mbits/sec    0   1.37 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  20.0 MBytes  16.8 Mbits/sec    1             sender
[  5]   0.00-10.09  sec  16.9 MBytes  14.0 Mbits/sec                  receiver
​
iperf Done.
​

It can be seen that the network bandwidth has dropped significantly.

What's more amazing is that tc's man page introduces tbf as a classless qdisc, while tc-tbf says it is a classful qdisc. Try adding a class to tbf, but an error will be reported, so tbf should be classless qdisc. I suspect that the man-page of tc-tbf has mistakenly written classic as classful.

root@ubuntu:~# tc qdisc add dev lo root handle 1:0 tbf limit 100M rate 100Mbit burst 100M
root@ubuntu:~# tc qdisc show dev lo
qdisc tbf 1: root refcnt 2 rate 100Mbit burst 100Mb lat 0us 
qdisc ingress ffff: parent ffff:fff1 ---------------- 
root@ubuntu:~# tc class add dev lo parent 1: classid 1:1 htb rate 100Mbit
RTNETLINK answers: File exists

i don't have

Netem is the abbreviation of network emulator, which can simulate network characteristics such as network delay and packet loss rate. Or take the local loop interface for testing, manually change its delay to 100ms (ping sending and receiving icmp delays are doubled), and add the packet loss rate

root@ubuntu:~# tc qdisc del dev lo root
root@ubuntu:~# ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.034 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.052 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.040 ms
​
--- 127.0.0.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2026ms
rtt min/avg/max/mdev = 0.034/0.042/0.052/0.007 ms
root@ubuntu:~# tc qdisc add dev lo root netem delay 100ms
root@ubuntu:~# ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=200 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=200 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=201 ms
​
--- 127.0.0.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 200.426/200.586/200.855/0.191 ms
​
root@ubuntu:~# tc qdisc del dev lo root
​
root@ubuntu:~# iperf3 -c 127.0.0.1 -u -i 10 -b 100M
Connecting to host 127.0.0.1, port 5201
[  5] local 127.0.0.1 port 58834 connected to 127.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-10.00  sec   119 MBytes   100 Mbits/sec  3815  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec   119 MBytes   100 Mbits/sec  0.000 ms  0/3815 (0%)  sender
[  5]   0.00-10.00  sec   119 MBytes   100 Mbits/sec  0.002 ms  0/3815 (0%)  receiver
​
iperf Done.
root@ubuntu:~# tc qdisc add dev lo root netem loss 5%
root@ubuntu:~# iperf3 -c 127.0.0.1 -u -i 10 -b 100M
Connecting to host 127.0.0.1, port 5201
iperf3: error - unable to read from stream socket: Resource temporarily unavailable
root@ubuntu:~# tc qdisc change dev lo root netem loss 2%
root@ubuntu:~# iperf3 -c 127.0.0.1 -u -i 10 -b 100M
Connecting to host 127.0.0.1, port 5201
[  5] local 127.0.0.1 port 45366 connected to 127.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-10.00  sec   119 MBytes   100 Mbits/sec  3815  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec   119 MBytes   100 Mbits/sec  0.000 ms  0/3815 (0%)  sender
[  5]   0.00-10.00  sec   117 MBytes  97.9 Mbits/sec  0.002 ms  79/3815 (2.1%)  receiver
​
iperf Done.
​

bfifo、pfifo

These two should be the simplest qdiscs. The processing of data packets only follows the first-in-first-out rule. The queue capacity of bfifo is the size that can save the data packets, and the queue capacity of pfifo is the number of packets.

pfifo_fast

Many articles say that pfifo_fast is the default qdisc, but my virtual machine uses fq_codel. Maybe the new version of the kernel has replaced the default qdisc. pfifo_fast can be regarded as an enhanced version of fifo. It maintains three queues with different priorities (band 0 1 2) internally. When processing data packets, the packets in the queue will be processed sequentially according to the priority (0 is the highest, 2 is the lowest), that is, the highest After the packet processing in the priority queue is completed, the next priority queue will be processed.

Which queue the data packet enters depends on the tos (type of service) section of the ip packet header. The content of the tos section is as follows:

7 6 5 4 3 2 1 0
(IP Precedence) (IP Precedence) (IP Precedence) TOS(lowdelay) TOS(throughput) TOS(reliability) TOS(lowcost ) MBZ(Must be zero)
  • The lowest bit: MBZ bit, must be zero
  • 1~4: TOS segment, any number of these bits can be 0 or 1. When only one bit is set, the meaning is as follows
    • 1000: minimum delay (md)
    • 0100: maximum bandwidth (maximize throughput, mt)
    • 0010: Maximize reliability (MR)
    • 0001: Minimize monetary cost (MMC)
  • 5~7: IP priority: The higher the value of this field, the more priority the mark will be processed. 0 means normal processing.

The 1~4 bit TOS segment mentioned above determines which band the packet will enter. If the IP priority is set to 0, the value of the entire 8-bit TOS segment and the corresponding band are as follows

band 0 band 1 band 2
0x10 md 0x0 0x8 mt
0x12 mmc + md 0x2 mmc 0xa mmc + mt
0x14 mr+md 0x4 mr 0xc mr + mt
0x16 mmc + mr + md 0x6 mmc + mr 0xe mmc + mr + mt
0x18 mt + md
0x1a mmc + mt + md
0x1c mr + mt + md
0x1e mmc + mr + mt + md

Detailed information can be seen in the man-page of tc-prio.

Summarize

Using tc, you can set a type of queuing rule called classless qdisc for network equipment. This type of rule processes data packets sent to the outside in the tc processing point of the network equipment according to different algorithms. It can limit the sending speed and change Sending sequence, packet loss and other operations. Some classless qdiscs can also be configured using parameters to make them run with the desired effect. However, classless qdisc cannot classify data packets and cannot perform more precise control. Classful qdisc will be introduced later, and we can use it to implement more functions.

Guess you like

Origin blog.csdn.net/buhuidage/article/details/128331608