[Switch] the Linux network - data packet transmission process

Turn, Original: https://segmentfault.com/a/1190000008926093

--------------------------------------------------------------------------------------------------------------------

Following an introduction on the reception of data packets , the article describes the Linux system, step by step how the packet is eventually sent from the application to the card out.

If English is not a problem, I strongly recommend reading the following reference in the article, which describes in greater detail.

This article discusses only the physical Ethernet card, and in the process send a UDP packet as an example, because I am not familiar with the code of the protocol stack, some places may be misunderstood, please correct me

socket layer

               +-------------+
               | Application |
               +-------------+
                     |
                     |
                     ↓
+------------------------------------------+
| socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) |
+------------------------------------------+
                     |
                     |
                     ↓
           +-------------------+
           | sendto(sock, ...) | +-------------------+ | | ↓ +--------------+ | inet_sendmsg | +--------------+ | | ↓ +---------------+ | inet_autobind | +---------------+ | | ↓ +-----------+ | UDP layer | +-----------+ 
  • socket (...): create a socket structure, function and initialize the appropriate action, as we define is the UDP socket, so there are storage related functions with UDP

  • sendto (sock, ...): program (Application) application layer calling this function begins to send packets, the number of function calls behind inet_sendmsg

  • inet_sendmsg: This function is mainly to check the current socket has no binding source port, if not, assign a call inet_autobind, and then call the function UDP layer

  • inet_autobind: This function calls the function to bind the socket get_port acquiring an available port, since this is the UDP socket socket, so get_port tuned to the respective function code inside the UDP function.

UDP layer

                     |
                     |
                     ↓
              +-------------+
              | udp_sendmsg |
              +-------------+
                     |
                     |
                     ↓
          +----------------------+
          | ip_route_output_flow |
          +----------------------+
                     | | ↓ +-------------+ | ip_make_skb | +-------------+ | | ↓ +------------------------+ | udp_send_skb(skb, fl4) | +------------------------+ | | ↓ +----------+ | IP layer | +----------+ 
  • udp_sendmsg: udp entry packet transmission module, which is longer function, the function will first call ip_route_output_flow routing information (including source IP and network adapter), and then call ip_make_skb structure skb structure, and finally the card information skb association.

  • ip_route_output_flow: The function and purpose IP routing table, find the packet should be sent from the device which, if the socket is not bound source IP, this function will find a routing table according to the most appropriate source IP to it. If the source IP socket has been bound, but according to the routing table, corresponding to the source IP address from the network card can not reach the destination address, the packet will be dropped, so that data transmission failure, the sendto function returns an error. This function will find the last device and the source IP into flowi4 structure and returned to udp_sendmsg

  • ip_make_skb: Function The function is configured skb packet, the constructed skb package which has been assigned an IP header, and a part of the initialization information (IP header source IP address is set into here), while the function calls __ip_append_dat, If you need to slice it, will be fragmented in __ip_append_data function, while also checking whether the socket send buffer has been used up in this function, if it is used up, then return ENOBUFS

  • udp_send_skb (skb, fl4) skb mainly to fill the inside of the UDP header, and the processing Checksum, then calls the appropriate function of the IP layer.

IP layer

          |
          |
          ↓
   +-------------+
   | ip_send_skb |
   +-------------+
          |
          |
          ↓
  +-------------------+       +-------------------+       +---------------+
  | __ip_local_out_sk |------>| NF_INET_LOCAL_OUT |------>| dst_output_sk | +-------------------+ +-------------------+ +---------------+ | | ↓ +------------------+ +----------------------+ +-----------+ | ip_finish_output |<-------| NF_INET_POST_ROUTING |<------| ip_output | +------------------+ +----------------------+ +-----------+ | | ↓ +-------------------+ +------------------+ +----------------------+ | ip_finish_output2 |----->| dst_neigh_output |------>| neigh_resolve_output | +-------------------+ +------------------+ +----------------------+ | | ↓ +----------------+ | dev_queue_xmit | +----------------+
  • ip_send_skb: IP module sends an entry packet, the function simply calls a function at the back

  • __ip_local_out_sk: Set the length and checksum IP packet header, and then calls the following netfilter hooks

  • NF_INET_LOCAL_OUT: netfilter hooks, you can configure how to process the packet through iptables, if not the packet is discarded, then continue to go down

  • dst_output_sk: This function is based on inside information skb, call the appropriate output function, in this case we UDP IPv4, calls ip_output

  • ip_output: The above udp_sendmsg card information obtained by written skb, then call NF_INET_POST_ROUTING hook

  • NF_INET_POST_ROUTING: Here, the user may configure the SNAT, thereby causing the routing information changes skb

  • ip_finish_output: When will this judgment passed after the previous step, the routing information is changed or not, if a change occurs, you have to call dst_output_sk (recall this function, it may no longer come ip_output, but went to the netfilter specified output function where there are likely to be xfrm4_transport_output), or go down

  • ip_finish_output2: find the destination IP address to the next hop routing table inside (nexthop), and then call __ipv4_neigh_lookup_noref go inside to find neigh arp table next hop, did not find the words will call __neigh_create construct an empty neigh structure

  • dst_neigh_output: In this function, if the previous step ip_finish_output2 not get neigh information, it will come to function neigh_resolve_output, otherwise called directly neigh_hh_output, in this function, will neigh inside information to fill skb mac address, and then call transmission packet dev_queue_xmit

  • neigh_resolve_output: This function inside arp sends a request to obtain the mac address of the next hop, and then to fill the mac address and call skb dev_queue_xmit

netdevice subsystem

                          |
                          |
                          ↓
                   +----------------+
  +----------------| dev_queue_xmit |
  |                +----------------+
  |                       |
  | | | ↓ | +-----------------+ | | Traffic Control | | +-----------------+ | loopback | | or +--------------------------------------------------------------+ | IP tunnels ↓ | | ↓ | | +---------------------+ Failed +----------------------+ +---------------+ +----------->| dev_hard_start_xmit |---------->| raise NET_TX_SOFTIRQ |- - - - >| net_tx_action | +---------------------+ +----------------------+ +---------------+ | +----------------------------------+ | | ↓ ↓ +----------------+ +------------------------+ | ndo_start_xmit | | packet taps(AF_PACKET) | +----------------+ +------------------------+
  • dev_queue_xmit: netdevice subsystem entry function, in the function, the device will first obtain the corresponding qdisc allows users to, if not (e.g., loopback or IP tunnels), and call it directly dev_hard_start_xmit, otherwise the data packets will pass Traffic Control processing module

  • Traffic Control: Here are some major filtering and prioritization, here, if the queue is full, then packets will be dropped, please refer to the document , will come after this is done dev_hard_start_xmit

  • dev_hard_start_xmit: this function, a first copy skb "packet taps", tcpdump is obtained from this data, then call ndo_start_xmit. If dev_hard_start_xmit returns an error, then (in most cases may be NETDEV_TX_BUSY), call its functions will skb one place, then throw soft interrupt NET_TX_SOFTIRQ, to the soft interrupt handlers net_tx_action try again later (if it is or IP loopback tunnels, then no retry logic after a failure)

  • ndo_start_xmit: This is a function pointer, the function will point to specific transmission data driver

Device Driver

ndo_start_xmit will bind to the corresponding specific function NIC driver, to this step after you go NIC driver control, and different network card drivers have a different approach, not described in detail here, it probably process is as follows:

  1. The skb put their card sending queue

  2. Packet transmission notification card

  3. NIC sends an interrupt to the CPU after completion of transmission

  4. Skb carried out clean-up work after an interruption receipt

In the network card driver to send packets process, there will be some local needs and netdevice subsystem deal, such as card queue is full, do not need to tell upper made, such as queue has free time, and then notifies the upper layer and then send the data.

other

  • SO_SNDBUF: You can see from the above process out for UDP, there is no corresponding send buffer exists, SO_SNDBUF just a limitation, when more memory than the value of this socket assignment skb occupied, returns ENOBUFS, so that as long as ENOBUFS without errors, this value does not make sense to transfer large. Seen from the help file sendto function inside this sentence: (. Normally, this does not occur in Linux Packets are just silently dropped when a device queue overflows.). Here queue device shall refer to the inside of the Traffic Control queue, described linux inside, the default value enough queue SO_SNDBUF used, are areas of doubt, the queue length and the number is configurable, if the configuration is too large It stands to reason that it should be possible case ENOBUFS will appear.

  • txqueuelen: This is a control in many parts of said queue qdisc in length, but seemingly only partially with the qdisc type configuration, such as linux default pfifo_fast.

  • hardware RX: Usually Ring card has its own queue, the queue size may be configured by ethtool, when the drive transmission request is received, the queue is typically placed inside, and then transmits the notification data card, when the queue is full time will give the upper call returns NETDEV_TX_BUSY

  • packet taps (AF_PACKET): When you first send packets and retry sending the packet will go through here, if the situation retry happens, uncertain whether it will catch twice tcpdump package stands to reason that it should not be, I probably did not understand where

reference

Guess you like

Origin www.cnblogs.com/oxspirt/p/12041552.html