Network Device Driver Framework

1. Framework

        

        1) The network protocol interface layer provides a unified data packet sending and receiving interface for the network layer protocol. Regardless of whether the upper layer protocol is ARP or IP, data is sent through the dev_queue_xmit() function and received through the netif_rx() function. The existence of this layer makes the upper layer protocol independent of specific devices.

        2) The network device interface layer provides a unified structure net_device for describing the properties and operations of specific network devices to the protocol interface layer, and the structure is the container for each function in the device driver function layer. In fact, the network device interface layer plans the structure of the specific operating hardware device driver function layer from a macro perspective.

        3) Each function of the device driver function layer is a specific member of the net_device data structure of the network device interface layer. It is a program that drives the network device hardware to complete the corresponding action. It starts the sending operation through the hard_start_xmit() function and passes the interrupt on the network device. Trigger a receive operation.

        4) The network device and media layer is the physical realization of sending and receiving data packets, including network adapters and specific transmission media. The network adapters are physically driven by functions in the device driver function layer. For Linux systems, both network devices and media can be virtualized.

        When designing a specific network device driver, the main work that needs to be done is to write the relevant functions of the device driver function layer to fill the content of the net_device data structure and register net_device into the kernel.

2. Network protocol interface layer

        The main function of the network protocol interface layer is to provide a transparent data packet sending and receiving interface to the upper layer protocol. When the upper layer ARP or IP needs to send a data packet, it will call the dev_queue_xmit() function of the network protocol interface layer to send the data packet, and at the same time pass to the function a pointer to the struct sk_buff data structure. The prototype of the dev_queue_xmit() function is:

int dev_queue_xmit(struct sk_buff *skb);

        Similarly, the reception of the upper layer data packet is also completed by passing a pointer to the struct sk_buff data structure to the netif_rx() function. The netif_rx() function prototype is:

int netif_rx(struct sk_buff *skb);

2.1 sk_buff

        The sk_buff structure is very important. It is defined in the include/linux.skbuff.h file, meaning "socket buffer", which is used to transfer data between layers in the linux network subsystem, and linux network subsystem data transmission The "Central Nervous".

        When sending a data packet, the network processing module of the linux kernel must create a sk_buff containing the data packet to be transmitted, and then submit the sk_buff to the lower layer. Each layer adds different protocol headers to the sk_buff and directly sends it to the network device. Similarly, when a network device receives a data packet from the network medium, it converts the received data packet into a sk_buff data structure and passes it to the upper layer, and each layer strips off the corresponding protocol header until it is handed over to the user. Several key data members of the sk_buff structure are listed below:

        Each layer protocol in the network protocol stack can add or delete the protocol data of this layer through the operation of the structure. Using the sk_buff structure avoids the inefficiency caused by copying data back and forth between layers of the network protocol stack.

         In particular, note that head and end point to the head and tail of the buffer, while data and tail point to the head and tail of the actual data. Each layer will fill the protocol header between head and data, or add new protocol data between tail and end.

3. Network device interface layer

        The only function of the network device interface layer is to define a unified and abstract data structure net_device structure for ever-changing network devices, so as to adapt to all changes without change, and realize the unification of various hardware at the software level. The net_device structure refers to a network device in the kernel, which is defined in the include/linux/netdevice.h file. The network device driver only needs to fill in the specific members of net_device and register net_device to realize the connection between the hardware operation function and the kernel . net_device is a huge structure that contains the attributes and operation interfaces of network devices. Some key members are introduced below.

        Each network device will have a corresponding instance, and then call register_netdevice() (definition and file net/core/dev.c) to register to the system, and unregister can be done through unregister_netdevice().

struct net_device {
    /* 设备名称,对应 ifconfig 输出的网卡名称,例如 eth0,字母代表网络设备的类型,数字代表此类网络设备的数量 */
    char            name[IFNAMSIZ];
    /* 名称hash */
    struct hlist_node    name_hlist;
    /*  别名,用于 SNMP 协议 */
    char             *ifalias;
    /*
        描述设备所用的共享内存,用于设备与内核沟通
        其初始化和访问只会在设备驱动程序内进行
    */
    unsigned long        mem_end;
    unsigned long        mem_start;
 
    /* 设备自有内存映射到I/O内存的起始地址 */
    unsigned long        base_addr;
 
    /*
        设备与内核对话的中断编号,此值可由多个设备共享
        驱动程序使用request_irq函数分配此变量,使用free_irq予以释放
    */
    int            irq;
 
    /* 侦测网络状态的改变次数 */
    atomic_t        carrier_changes;
 
    /*
        网络队列子系统使用的一组标识
        由__LINK_STATE_xxx标识
    */
    unsigned long        state;
 
    struct list_head    dev_list;
    struct list_head    napi_list;
    struct list_head    unreg_list;
    struct list_head    close_list;
 
    /* 当前设备所有协议的链表 */
    struct list_head    ptype_all;
    /* 当前设备特定协议的链表 */
    struct list_head    ptype_specific;
 
    struct {
        struct list_head upper;
        struct list_head lower;
    } adj_list;
 
    /*
        用于存在其他一些设备功能
        可报告适配卡的功能,以便与CPU通信
        使用NETIF_F_XXX标识功能特性
    */
    netdev_features_t    features;
    netdev_features_t    hw_features;
    netdev_features_t    wanted_features;
    netdev_features_t    vlan_features;
    netdev_features_t    hw_enc_features;
    netdev_features_t    mpls_features;
    netdev_features_t    gso_partial_features;
 
    /* 网络设备索引号 */
    int            ifindex;
 
    /* 设备组,默认都属于0组 */
    int            group;
 
    struct net_device_stats    stats;
 
    atomic_long_t        rx_dropped;
    atomic_long_t        tx_dropped;
    atomic_long_t        rx_nohandler;
 
#ifdef CONFIG_WIRELESS_EXT
    const struct iw_handler_def *wireless_handlers;
    struct iw_public_data    *wireless_data;
#endif
    /* 设备操作接口,主要用来操作网卡硬件 */
    const struct net_device_ops *netdev_ops;
    /* ethtool操作接口 */
    const struct ethtool_ops *ethtool_ops;
#ifdef CONFIG_NET_SWITCHDEV
    const struct switchdev_ops *switchdev_ops;
#endif
#ifdef CONFIG_NET_L3_MASTER_DEV
    const struct l3mdev_ops    *l3mdev_ops;
#endif
#if IS_ENABLED(CONFIG_IPV6)
    const struct ndisc_ops *ndisc_ops;
#endif
 
#ifdef CONFIG_XFRM
    const struct xfrmdev_ops *xfrmdev_ops;
#endif
 
    /* 头部一些操作,如链路层缓存,校验等 */
    const struct header_ops *header_ops;
 
    /* 标识接口特性,IFF_XXX,如IFF_UP */
    unsigned int        flags;
 
    /*
        用于存储用户空间不可见的标识
        由VLAN和Bridge虚拟设备使用
    */
    unsigned int        priv_flags;
 
    /* 几乎不使用,为了兼容保留 */
    unsigned short        gflags;
 
    /* 结构对齐填充 */
    unsigned short        padded;
 
    /* 与interface group mib中的IfOperStatus相关 */
    unsigned char        operstate;
    unsigned char        link_mode;
 
    /*
        接口使用的端口类型
    */
    unsigned char        if_port;
 
    /*
        设备使用的DMA通道
        并非所有设备都可以用DMA,有些总线不支持DMA
    */
    unsigned char        dma;
 
    /*
        最大传输单元,标识设备能处理帧的最大尺寸
        Ethernet-1500
    */
    unsigned int        mtu;
    /* 最小mtu,Ethernet-68 */
    unsigned int        min_mtu;
    /* 最大mut,Ethernet-65535 */
    unsigned int        max_mtu;
 
    /*     设备所属类型
        ARP模块中,用type判断接口的硬件地址类型
        以太网接口为ARPHRD_ETHER
    */
    unsigned short        type;
    /*
        设备头部长度
        Ethernet报头是ETH_HLEN=14字节
    */
    unsigned short        hard_header_len;
    unsigned char        min_header_len;
 
    /* 必须的头部空间 */
    unsigned short        needed_headroom;
    unsigned short        needed_tailroom;
 
    /* Interface address info. */
    /* 硬件地址,通常在初始化过程中从硬件读取 */
    unsigned char        perm_addr[MAX_ADDR_LEN];
    unsigned char        addr_assign_type;
    /* 硬件地址长度 */
    unsigned char        addr_len;
    unsigned short        neigh_priv_len;
    unsigned short          dev_id;
    unsigned short          dev_port;
    spinlock_t        addr_list_lock;
    /* 设备名赋值类型,如NET_NAME_UNKNOWN */
    unsigned char        name_assign_type;
    bool            uc_promisc;
    struct netdev_hw_addr_list    uc;
    struct netdev_hw_addr_list    mc;
    struct netdev_hw_addr_list    dev_addrs;
 
#ifdef CONFIG_SYSFS
    struct kset        *queues_kset;
#endif
    /* 混杂模式开启数量 */
    unsigned int        promiscuity;
 
    /* 非零值时,设备监听所有多播地址 */
    unsigned int        allmulti;
 
 
    /* Protocol-specific pointers */
/* 特定协议的指针 */
#if IS_ENABLED(CONFIG_VLAN_8021Q)
    struct vlan_info __rcu    *vlan_info;
#endif
#if IS_ENABLED(CONFIG_NET_DSA)
    struct dsa_switch_tree    *dsa_ptr;
#endif
#if IS_ENABLED(CONFIG_TIPC)
    struct tipc_bearer __rcu *tipc_ptr;
#endif
    void             *atalk_ptr;
    /* ip指向in_device结构 */
    struct in_device __rcu    *ip_ptr;
    struct dn_dev __rcu     *dn_ptr;
    struct inet6_dev __rcu    *ip6_ptr;
    void            *ax25_ptr;
    struct wireless_dev    *ieee80211_ptr;
    struct wpan_dev        *ieee802154_ptr;
#if IS_ENABLED(CONFIG_MPLS_ROUTING)
    struct mpls_dev __rcu    *mpls_ptr;
#endif
 
/*
 * Cache lines mostly used on receive path (including eth_type_trans())
 */
    /* Interface address info used in eth_type_trans() */
    unsigned char        *dev_addr;
 
#ifdef CONFIG_SYSFS
    /* 接收队列 */
    struct netdev_rx_queue    *_rx;
 
    /* 接收队列数 */
    unsigned int        num_rx_queues;
    unsigned int        real_num_rx_queues;
#endif
 
    struct bpf_prog __rcu    *xdp_prog;
    unsigned long        gro_flush_timeout;
 
    /* 如网桥等的收包回调 */
    rx_handler_func_t __rcu    *rx_handler;
    /* 回调参数 */
    void __rcu        *rx_handler_data;
 
#ifdef CONFIG_NET_CLS_ACT
    struct tcf_proto __rcu  *ingress_cl_list;
#endif
    struct netdev_queue __rcu *ingress_queue;
#ifdef CONFIG_NETFILTER_INGRESS
    /* netfilter入口 */
    struct nf_hook_entry __rcu *nf_hooks_ingress;
#endif
 
    /* 链路层广播地址 */
    unsigned char        broadcast[MAX_ADDR_LEN];
#ifdef CONFIG_RFS_ACCEL
    struct cpu_rmap        *rx_cpu_rmap;
#endif
    /* 接口索引hash */
    struct hlist_node    index_hlist;
 
/*
 * Cache lines mostly used on transmit path
 */
     /* 发送队列 */
    struct netdev_queue    *_tx ____cacheline_aligned_in_smp;
    /* 发送队列数 */
    unsigned int        num_tx_queues;
    unsigned int        real_num_tx_queues;
    /* 排队规则 */
    struct Qdisc        *qdisc;
#ifdef CONFIG_NET_SCHED
    DECLARE_HASHTABLE    (qdisc_hash, 4);
#endif
    /*
        可在设备发送队列中排队的最大数据包数
    */
    unsigned long        tx_queue_len;
    spinlock_t        tx_global_lock;
 
    /*     网络层确定传输超时,
        调用驱动程序tx_timeout接口的最短时间
    */
    int            watchdog_timeo;
 
#ifdef CONFIG_XPS
    struct xps_dev_maps __rcu *xps_maps;
#endif
#ifdef CONFIG_NET_CLS_ACT
    struct tcf_proto __rcu  *egress_cl_list;
#endif
 
    /* These may be needed for future network-power-down code. */
    /* watchdog定时器 */
    struct timer_list    watchdog_timer;
 
    /* 引用计数 */
    int __percpu        *pcpu_refcnt;
 
    /*     网络设备的注册和除名以两步进行,
        该字段用于处理第二步
    */
    struct list_head    todo_list;
 
    struct list_head    link_watch_list;
 
    /* 设备的注册状态 */
    enum { NETREG_UNINITIALIZED=0,
           NETREG_REGISTERED,    /* completed register_netdevice */
           NETREG_UNREGISTERING,    /* called unregister_netdevice */
           NETREG_UNREGISTERED,    /* completed unregister todo */
           NETREG_RELEASED,        /* called free_netdev */
           NETREG_DUMMY,        /* dummy device for NAPI poll */
    } reg_state:8;
 
    /* 设备要被释放标记 */
    bool dismantle;
 
    enum {
        RTNL_LINK_INITIALIZED,
        RTNL_LINK_INITIALIZING,
    } rtnl_link_state:16;
 
    bool needs_free_netdev;
    void (*priv_destructor)(struct net_device *dev);
 
#ifdef CONFIG_NETPOLL
    struct netpoll_info __rcu    *npinfo;
#endif
 
    possible_net_t            nd_net;
 
    /* mid-layer private */
    union {
        void                    *ml_priv;
        struct pcpu_lstats __percpu        *lstats;
        struct pcpu_sw_netstats __percpu    *tstats;
        struct pcpu_dstats __percpu        *dstats;
        struct pcpu_vstats __percpu        *vstats;
    };
 
#if IS_ENABLED(CONFIG_GARP)
    struct garp_port __rcu    *garp_port;
#endif
#if IS_ENABLED(CONFIG_MRP)
    struct mrp_port __rcu    *mrp_port;
#endif
 
    struct device        dev;
    const struct attribute_group *sysfs_groups[4];
    const struct attribute_group *sysfs_rx_queue_group;
 
    const struct rtnl_link_ops *rtnl_link_ops;
 
    /* for setting kernel sock attribute on TCP connection setup */
#define GSO_MAX_SIZE        65536
    unsigned int        gso_max_size;
#define GSO_MAX_SEGS        65535
    u16            gso_max_segs;
 
#ifdef CONFIG_DCB
    const struct dcbnl_rtnl_ops *dcbnl_ops;
#endif
    u8            num_tc;
    struct netdev_tc_txq    tc_to_txq[TC_MAX_QUEUE];
    u8            prio_tc_map[TC_BITMASK + 1];
 
#if IS_ENABLED(CONFIG_FCOE)
    unsigned int        fcoe_ddp_xid;
#endif
#if IS_ENABLED(CONFIG_CGROUP_NET_PRIO)
    struct netprio_map __rcu *priomap;
#endif
    struct phy_device    *phydev;
    struct lock_class_key    *qdisc_tx_busylock;
    struct lock_class_key    *qdisc_running_key;
    bool            proto_down;
};

3.1 Device operation function

        Network card operation functions, such as up, down, sending packets, etc. (ifconfig XXX up, ifconfig XXX down, etc. are all called APIs in this structure). The members of net_device_ops are function pointers, which are the function functions implemented by the network driver.

struct net_device_ops {
	int	(*ndo_init)(struct net_device *dev);	//提供网络设备的初始化、创建网络设备struct   net_device 数据结构实例、初始化struct net_device的相关数据														域如设备名、i/o端口地址、中断号、向内核注册设备
	void	 (*ndo_uninit)(struct net_device *dev);		//注销设备的时候用
	int	(*ndo_open)(struct net_device *dev);		//打开网络设备,主要注册的设备才能打开
	int	(*ndo_stop)(struct net_device *dev);		//停止网络设备,注销的时候调用和open相反
	netdev_tx_t	(*ndo_start_xmit) (struct sk_buff *skb,
	struct net_device *dev);				//初始化数据包发送过程,初始化成功后数据包就放入网络适配器的发哦送你个数据缓冲区
	u16(*ndo_select_queue)(struct net_device *dev,
	struct sk_buff *skb);					//当网络设备支持多个发送队列时用于选择发送队列
	................
}

        The ndo_select_queue function selects the packet sending queue. The device driver obtains the sent data packet from any sending queue through the function ndo_start_xmit, and obtains the queue where the data packet is located through the function static inline u16 skb_get_queue_mapping(const struct sk_buff *skb).

        The following work needs to be done in the ndo_open function:

Enable the hardware resources used by the device, apply for I/O areas, interrupts and DMA channels, etc.

Call the netif_start_queue( ) function provided by the Linux kernel to activate the sending queue of the device.

        The ndo_close function needs to complete the following tasks:

Call the netif_stop_queue( ) function provided by the Linux kernel to stop the device from transmitting packets.

Frees the I/O area, interrupt and DMA resources used by the device.

4. Device driver function layer

        The members of the net_device structure (attributes and function pointers in the net_device_ops structure) need to be assigned specific values ​​and functions by the device driver layer. For the specific device XXX, the engineer should write the corresponding device driver function layer functions, such as xxx_open(), xxx-stop(), xxx_tx(), etc.

        Since the reception of network datagrams can be triggered by interrupts, another main body of the device driver function layer will be the interrupt processing function, which is responsible for reading the data packets received on the hardware and sending them to the upper layer protocol, so it may contain xxx_interrupt() and xxx_rx( ) function, the former completes basic tasks such as interrupt type judgment, and the latter needs to complete complex tasks such as generating data packets and submitting them to the upper layer.

        For a specific device, you can also define related private data and operations, and encapsulate it into a private information structure xxx_private, and assign its pointer to the private member of net_device. The xxx_private structure can contain special attributes and operations of the device, spin locks and number limits, timers, and statistical information, etc., which are all customized by the engineer. In the driver, when private data is needed, use the interface defined in netdevixce.h:

static inline void *netdev_priv(const struct net_device *dev);

5. Registration and deregistration of network device drivers

        The registration and deregistration of network devices are completed by the register_netdev() and unregister_netdev() functions. The prototypes of these two functions are:

int register_netdev(struct net_device *dev);
void unregister_netdev(struct net_device *dev);

6. Data sending process

        From the structural analysis of the network device driver, it can be seen that when the Linux network subsystem sends a data packet, it will call the hard_start_transmit() function provided by the driver, which is used to start sending the data packet. When the device is initialized, this function pointer needs to be initialized to point to the xxx_tx() function of the device.

        The network device driver completes the process of sending data packets as follows:

        The network device driver obtains the valid data and length of the data packet from the sk_buff parameter passed by the upper layer protocol, and puts the valid data into the temporary buffer.

        For Ethernet, if the length of valid data is less than the minimum length ETH_ZLEN of the data frame required for Ethernet collision detection, then fill the end of the temporary buffer with 0.

        Set hardware registers to drive network devices to perform data transmission operations.

        The packet sending function template of the network device driver that completes the above three steps is as follows:

int xxx_tx(struct sk_buff *skb,struct net_device *dev)
{
    int len;
    char *data,shortpkt[ETH_ZLEN];
    if(xxx_send_available(...)) //发送队列为满,可以发送
    {
        //获得有效数据指针和长度 
        data = skb->data;
        len = skb->len;
        if(len < ETH_ZLEN) //如果帧长小于以太网帧最小长度,补0
        {
            memset(shortpkt,0,ETH_ZLEN);
            memcopy(shortpkt,skb->data,skb->len);
            len = ETH_ZLEN;
            data = shortpkt;
        }
    }
    //记录发送时间戳
    dev-.trans_start = jiffies;
    //设置硬件寄存器,让硬件把数据包发送出去
    if(avail){
        xxx_hw_tx(data,len,dev);
    }else{
        netif_stop_queue(dev);
        ...
    }
}

7. Data receiving flow

        The main method for network devices to receive data is the interrupt processing function of the interrupt-causing device. The interrupt processing function judges the interrupt type. If it is a receiving interrupt, it reads the received data, allocates the sk_buffer data structure and data buffer, and transfers the received The data is copied to the data buffer, and the netif_rx() function is called to pass the sk_buffer to the upper layer protocol. The code template is as follows:

static void xxx_interupt(int irq,void *dev_id)
{
    ...
    switch(status &ISQ_EVENT_MASK)
    {
        case ISQ_RECEIVER_EVENT:
            //获取数据包
            xxx_rx(dev);
            //其他类型的中断
    }
}
 
static void xxx_rx(struct xxx_device *dev)
{
    ...
    length = get_rev_len(...);
    //分配新的套接字缓冲区
    skb = dev_alloc_skb(length + 2);
    skb_reserve(skb,2); //对其
    skb->dev = dev;
 
    //读取硬件上接收到的数据
    insw(ioaddr + RX_FRAME_PORT,skb_put(skb,length),length >> 1);
    if(length & 1)
        skb->data[length - 1] = inw(ioaddr + RX_FRAME_PORT);
    //获取上层协议类型
    skb->protocol = eth_type_trans(skb,dev);
    //把数据包交给上层
    netif_rx(skb);
    //记录接收时间戳
    dev->last_rx = jiffies;
    ...
}

Guess you like

Origin blog.csdn.net/qq_41076734/article/details/129064771