[Kubernetes / k8s source code analysis] CNI bandwidth source code analysis

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/zhonglinzhang/article/details/98053900

github: https://github.com/containernetworking/plugins

      Pod for a cluster of access K8S bandwidth limitation, use plug Token Bucket Filter (TBF) to limit the flow of

 

1. cmdAdd function

     Conf configuration obtained from the standard input and output content:

  • CNIVersion:\"0.3.0\"
  • Name:\"k8s-pod-network\"
  • Type:\"bandwidth\", Capabilities:map[string]bool{\"bandwidth\":true}
func cmdAdd(args *skel.CmdArgs) error {
	conf, err := parseConfig(args.StdinData)
	if err != nil {
		return err
	}

    1.1 structure PluginConf

type PluginConf struct {
   types.NetConf

   RuntimeConfig struct {
      Bandwidth *BandwidthEntry `json:"bandwidth,omitempty"`
   } `json:"runtimeConfig,omitempty"`

   *BandwidthEntry
}

    1.2 getHostInterface function

      sandbox is empty host interface, if there is vetp peer, look for the verification success

func getHostInterface(interfaces []*current.Interface) (*current.Interface, error) {
	if len(interfaces) == 0 {
		return nil, errors.New("no interfaces provided")
	}

	var err error
	for _, iface := range interfaces {
		if iface.Sandbox == "" { // host interface
			_, _, err = ip.GetVethPeerIfindex(iface.Name)
			if err == nil {
				return iface, err
			}
		}
	}
	return nil, errors.New(fmt.Sprintf("no host interface found. last error: %s", err))
}

    1.3 If the index is greater than 0 call function to create

qdisc tbf 1: dev calif779b875532 root refcnt 2 rate 50000bit burst 1073740b lat 4123.2s 
qdisc ingress ffff: dev calif779b875532 parent ffff:fff1 ---------------- 
qdisc tbf 1: dev c9fe root refcnt 2 rate 80000bit burst 2491080b lat 824.7s 

if bandwidth.IngressRate > 0 && bandwidth.IngressBurst > 0 {
	err = CreateIngressQdisc(bandwidth.IngressRate, bandwidth.IngressBurst, hostInterface.Name)
	if err != nil {
		return err
	}
}

     1.3.1 createTBF function

func createTBF(rateInBits, burstInBits, linkIndex int) error {
	// Equivalent to
	// tc qdisc add dev link root tbf
	//		rate netConf.BandwidthLimits.Rate
	//		burst netConf.BandwidthLimits.Burst
	if rateInBits <= 0 {
		return fmt.Errorf("invalid rate: %d", rateInBits)
	}
	if burstInBits <= 0 {
		return fmt.Errorf("invalid burst: %d", burstInBits)
	}

    1.3.2 calls netlink.QdiscAdd command to create a token bucket filter tbf

       See below tfb content

rateInBytes := rateInBits / 8
burstInBytes := burstInBits / 8
bufferInBytes := buffer(uint64(rateInBytes), uint32(burstInBytes))
latency := latencyInUsec(latencyInMillis)
limitInBytes := limit(uint64(rateInBytes), latency, uint32(burstInBytes))

qdisc := &netlink.Tbf{
	QdiscAttrs: netlink.QdiscAttrs{
		LinkIndex: linkIndex,
		Handle:    netlink.MakeHandle(1, 0),
		Parent:    netlink.HANDLE_ROOT,
	},
	Limit:  uint32(limitInBytes),
	Rate:   uint64(rateInBytes),
	Buffer: uint32(bufferInBytes),
}
err := netlink.QdiscAdd(qdisc)
if err != nil {
	return fmt.Errorf("create qdisc: %s", err)
}

    1.3.3 Creating ifb type of interface

func CreateIfb(ifbDeviceName string, mtu int) error {
	err := netlink.LinkAdd(&netlink.Ifb{
		LinkAttrs: netlink.LinkAttrs{
			Name:  ifbDeviceName,
			Flags: net.FlagUp,
			MTU:   mtu,
		},
	})

	if err != nil {
		return fmt.Errorf("adding link: %s", err)
	}

	return nil
}

      

bug:

    https://github.com/kubernetes/kubernetes/pull/76584  (1000 times less units)

 

calico Configuration

{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.0",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "datastore_type": "kubernetes",
      "nodename": "master-node",
      "mtu": 1440,
      "ipam": {
          "type": "calico-ipam"
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {"portMappings": true}
    },
    { "type": "bandwidth", "capabilities": { "bandwidth": true } }
  ]
}

 

Introduce the principle of TC 

    Flow controller TC (Traffic Control) for controlling the flow of the Linux kernel, to establish a data packet queue processing queue using a predetermined and defined manner in the packet queue to be transmitted, thus control the flow.

    TC predetermined queue module using the flow control function is divided into two categories, one class queue is not predetermined, the other is a predetermined classification queue.

  1.     Class queue is not indiscriminate predetermined uniform treatment entering network device (card) data stream. Classless predetermined queue formed to receive data packet queue and rearranged, delayed or dropped packets. It can shape traffic across network devices (network card). Conventional non-class queue predetermined main pfifo _fast (Advanced emerged), TBF (token bucket filter), SFQ (stochastic fair queuing), ID (forward random packet loss) and the like. Traffic shaping means such queues require the use mainly sorting, speed and packet loss .
  2.     Classification queue are predetermined manner into the packet network device according to different needs classification. After classification of a packet enters the queue, it is necessary to make the packet sorting process. Tool for classifying data packets is a filter, the filter returns a decision, this decision according to a predetermined queue data packets into the corresponding class queue. Each subclass can use a filter again further classification. Until no further classification, packet queue before entering the class containing the queue. In addition to containing other than the predetermined queue, the queue is also capable of classifying the vast majority of predetermined shape traffic. This requires simultaneously and flow control schedule (e.g., using SFQ) where useful.

Flow control includes the following ways:

SHAPING ( limit )
       when the flow is restricted, its transmission rate was controlled to below a certain value. Limit value may be significantly less than the effective bandwidth, which can smooth the burst data traffic to make the network more stable. Shaping (limit) applies only to outgoing traffic .

SCHEDULING ( scheduled )
      transmission of data packets by the scheduling, may be in the bandwidth range, allocate bandwidth according to priority. SCHEDULING (scheduling) is adapted to flow only outwardly.

Policing ( policy )
      SHAPING process for outgoing traffic, and the received data POLICIING (policy) for processing .

DROPPING ( discard )
       , if the flow rate exceeds a certain bandwidth is set, the packet is discarded, either inwardly or outwardly.

Flow control processing target

    Processing the control traffic from three objects, which are: qdisc allows users to ( queuing rules ) , class ( category ) and the filter ( filters ) .

 

    QDISC (queuing discipline) queueing discipline shorthand, it is the basis for understanding traffic control (traffic control) is. If required by the kernel interface to send a network packet, in accordance with the needs for this qdisc allows users to configure the interface (queuing discipline) is added to the packet queue. The kernel then fetch as much as possible inside the packet from qdisc, pass them on to the network adapter driver module. The simplest is pfifo QDisc incoming packets it does not do any processing of the data packets using FIFO manner through the queue.

 

                                             Introducing pictures 

        Simulation of a virtual network adapter driver ifb, it can be seen as a virtual network card TC only a filtering function , it is only filtering is that it does not change the direction of the packet, i.e., to send out a packet to be redirected after ifb before, after filtering ifb of TC, is still sent through the network card prior to redirection for a packet received by the network card, after being redirected to ifb, after the TC ifb filtered card still be redirected continue to receive treatment, whether it is from a piece of card or send packets after receiving data packets from one network card, redirected to the ifb, go through dev_queue_xmit operating a virtual network card via ifb

 

1.pfifo_fast

      First In First Out (FIFO), no data packet is handled specially. The queue has three so-called "channels" (band). FIFO rule applied to each channel. And: if there is a packet waiting to be sent on channel 0, channel 1 packet will not be processed, the relationship between the channel 1 and channel 2 is true.

      TOS marking of packets in accordance with the kernel, with the "minimum delay" channel 0 into the marked packet.

      * Parameters and use *

      pfifo_fast rigid queuing discipline as default, it can not be configured.

 

2. The token bucket filter (the TBF)

      Token bucket filters (TBF, Token Bucket Filter) queuing discipline is a simple: only not to exceed a rate set in advance by the incoming packets, but may allow short burst traffic exceeds a set value.

      TBF is very precise, and the network processor effects are small. So if you want the speed limit on a card, it should be the best choice!

      TBF is achieved in that a buffer (bucket), a buffer (bucket) by some called "token" dummy data at a particular rate (token rate) filled with. Capable of storing a number of tokens.

      Each incoming token collecting data from a data packet queue, and then deleted from the bucket. This correlation algorithm to the two streams - the token and data streams , so we get three scenarios:

      * A data stream at a rate equal to the token stream reaches the TBF. In this case, each incoming packet can correspond to a token and without delay through the queue.

      * Data stream is smaller than the token stream velocity reaches TBF. Through packet queue consumes part of the token, the remaining tokens accumulated in the bucket until the bucket is full. The remaining consumed token may transmit data stream when needed token is higher than the flow rate, burst transmission occurs in this case.

      * Stream flow rate greater than the token arrives TBF. This means that the token bucket will soon be exhausted. TBF interruption led to a period of time, known as the "more limited" (overlimit). If the packet arrival sustained packet loss will occur.

      The third scenario is very important, because it will be shaped by the data rate of the filter. Accumulated tokens can lead to the more limited data burst transmission for a short time without loss, but continued it will lead to the more limited transmission delay until packet loss. The actual implementation is performed for the number of bytes of data, rather than a packet.

      * Parameters and use *

      TBF provides a number of controllable parameters. The first argument is always available:

limit/latency

      determining a maximum limit how much data (bytes) in the queue waiting for the token. You can also be specified by this parameter setting latency, latency parameter determines the maximum waiting time to wait for a transmission packet in TBF. Both calculation determines the size of the tub, the rate and peak rate.

burst/buffer/maxburst

      Bucket size, in bytes. This parameter specifies the maximum number of tokens can be used directly. Generally, the larger the bandwidth management, the greater the need for a buffer. In the Intel system, 10Mbit / s rate requires at least 10k byte buffer to achieve a desired rate .

If your buffer is too small, it will cause the token arrives no place to put (the bucket is full), which can lead to a potential loss.

MPU

      Is not a zero-length packets are not consuming bandwidth. Such as Ethernet, a data frame is not less than 64 bytes. MPU (Minimum Packet Unit, minimum packet unit) determines the lowest consumption of tokens.

rate

      Speed ​​joystick. See above the limit.

      If the token bucket and allowed no token exists, corresponds to not limit the rate (default). If you do not want this to be transferred to the following parameters:

peakrate (peak rate)

      If there is a token is available, the packet will be sent immediately upon coming out, just as the speed of light. That may not be what you want, especially if you have a large bucket of time.

      Peak rate may be used to specify how quickly the token is deleted. With written language, it is this: the release of a data packet, and then wait sufficient time after the release of the next one. We controlled rate by calculating the peak latency. For example: UNIX timer resolution of 10 milliseconds, if the average packet size 10KB, we are limited to a peak rate of 1Mbps.

MTU (Maximum Transmission Unit, MTU) / minburst

      But if your regular rate is relatively high, the peak rate of 1Mbps will need to be adjusted. To achieve a higher peak rate, one clock cycle may send more packets. The most effective way is to: create a second token bucket! This second token for a single packet, is not a true lower barrel bucket default.

      To calculate the peak rate, multiplied by 100 with MTU on the line. (It should be said is multiplied by the number HZ, is the Intel 100 system, the system is Alpha 1024)

      * Configuration Examples *

      This is a very simple and practical example:

      # tc qdisc add dev ppp0 root tbf rate 220kbit latency 50ms burst 1540

      Why is it practical? If you have a long queue of network devices, such as cable modem or DSL modem or something, and connected by a fast device (such as an Ethernet card) with them, you will find that uploading data destroys interactivity.

      This is because the modem to upload data will fill the queue, the queue is particularly large in order to improve the throughput of the upload data and settings. You may need to increase the interactivity is not only a big queue, which means you want to do some other things when sending data.

      The above command does not directly affect the queue in the modem, but by controlling the queue and Linux slowed transmission data.

The 220kbit modified to your actual upload speed minus a few percentage points. If your modem really quickly, put the "burst" value increased a little.

 

reference:

    https://www.cnblogs.com/CasonChan/p/4919921.html

Guess you like

Origin blog.csdn.net/zhonglinzhang/article/details/98053900