systemtap定位内核bug

         本周在调试网卡驱动的时候遇到一个费解的bug,借助systemtap工具,费了一天的时间给定位出来了,以此做一个记录。
   工作机环境是Ubuntu 14.04,内核版本3.14.8,两台这样的机子通过我们的网卡进行通信。网卡是基于fpga实现的PCIe接口的网卡,驱动是我写的单队列千兆网卡。当时两台主机client和server通过nc进行tcp连接,主机与网卡的pcie通信已经完全调通,但tcp连接怎么也建立不起来,通过tcpdump在server端捕包发现,会持续收到来自client的syn包,但是没有发现回复syn ack包,这说明网卡之间的通信以及主机与网卡之间的PCIe通信是没问题的,初步是怀疑client发送的syn包有问题,但后来经过与正常的syn包进行对比,client的syn包格式完全正确,而且数据包checksum也是正确的。这就很费解了,那么是什么原因导致server的TCP传输层收到syn,而没有回复syn ack呢,或者说syn包根本就没有到达传输层就被丢弃了。
   那就用systemtap折腾一下吧。
   首先打印一下正常的tcp收包调用路径:
   1 probe kernel.function("tcp_v4_do_rcv")
   2 {
   3         printf("_____________________________!\n");
        4         print_backtrace();
   5         printf("_____________________________!\n");
   6        //exit();
   7}
           
           正常协议栈收包函数调用路径如下:
    0xffffffff8161a2e0 : tcp_v4_do_rcv+0x0/0x4c0 [kernel]
    0xffffffff8161c8a0 : tcp_v4_rcv+0x780/0x7a0 [kernel]
    0xffffffff815f7978 : ip_local_deliver_finish+0xa8/0x210 [kernel]
    0xffffffff815f7c78 : ip_local_deliver+0x48/0x80 [kernel]
    0xffffffff815f75fd : ip_rcv_finish+0x7d/0x350 [kernel]
    0xffffffff815f7f48 : ip_rcv+0x298/0x3d0 [kernel]
    0xffffffff815c23e6 : __netif_receive_skb_core+0x666/0x840 [kernel]
    0xffffffff815c25d8 : __netif_receive_skb+0x18/0x60 [kernel]
    0xffffffff815c2643 : netif_receive_skb_internal+0x23/0x90 [kernel]
    0xffffffff815c26cc : netif_receive_skb+0x1c/0x70 [kernel]
    0xffffffffa037daf8 [r8169]
             以上是正常的协议栈收到调用栈,r8169是正常网卡驱动模块。
   同样用上面的stap脚本,以tcp_v4_do_rcv为探测点,clinet连server,发现什么也没打印,soga,原来syn没有进入该函数。
   同样的方法以tcp_v4_rcv为探测点,发现进入了该函数。说明syn包在该函数里面该函数里面被丢弃了。
   
   接着,看看tcp_v4_rcv函数里面可用探测行有哪些:
   stap -L 'kernel.statement("tcp_v4_rcv@ipv4/tcp_ipv4.c:*")'
   结果如下:
            root@shijie-SL:/home/WORK_DIR/kernel/linux-stable# stap -L 'kernel.statement("tcp_v4_rcv@ipv4/tcp_ipv4.c:*")'
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1935") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1936") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1937") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1947") $pao_ID__:int const $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1949") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1954") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1971") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1972") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1973") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1974") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1976") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1984") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1985") $pao_ID__:int const $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1993") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:1997") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2001") $skb:struct sk_buff* $sk:struct sock* $ret:int
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2012") $skb:struct sk_buff* $sk:struct sock* $ret:int
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2017") $pao_ID__:int const $skb:struct sk_buff* $sk:struct sock* $ret:int
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2022") $skb:struct sk_buff* $sk:struct sock* $ret:int
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2024") $skb:struct sk_buff* $sk:struct sock* $ret:int
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2030") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2032") $pao_ID__:int const $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2034") $pao_ID__:int const $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2036") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2041") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2042") $skb:struct sk_buff*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2045") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2046") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2054") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2055") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2056") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2059") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2060") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2062") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2064") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2066") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2069") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2070") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2071") $skb:struct sk_buff* $sk:struct sock*
           kernel.statement("tcp_v4_rcv@net/ipv4/tcp_ipv4.c:2085") $skb:struct sk_buff*
          
           恩,还挺多,看来debuginfo对这个函数还是挺照顾的。那么接下来看看该函数运行时候都执行了那些行:
   1 probe kernel.statement("tcp_v4_rcv@@net/ipv4/tcp_ipv4.c:*")
         2 {
     3         prinf("%s!\n",pp());
   4}
   先看一下tcp_v4_rcv的源码:
   1935 int tcp_v4_rcv(struct sk_buff *skb)
   1936 {
   1937         const struct iphdr *iph;
   1938         const struct tcphdr *th;
   1939         struct sock *sk;
   1940         int ret;
   1941         struct net *net = dev_net(skb->dev);
   1942 
   1943         if (skb->pkt_type != PACKET_HOST)
   1944                 goto discard_it;
        1945 
        1946         /* Count it even if it's bad */
        1947         TCP_INC_STATS_BH(net, TCP_MIB_INSEGS);1948 
        1949         if (!pskb_may_pull(skb, sizeof(struct tcphdr)))
        1950                 goto discard_it;
        1951 
        1952         th = tcp_hdr(skb);
        1953 
        1954         if (th->doff < sizeof(struct tcphdr) / 4)
        1955                 goto bad_packet;
        1956         if (!pskb_may_pull(skb, th->doff * 4))
        1957                 goto discard_it;
          ...
        ...
        2039 discard_it:
        2040         /* Discard frame. */
        2041         kfree_skb(skb);
        2042         return 0;
          ...
          ...
        2077         case TCP_TW_ACK:
        2078                 tcp_v4_timewait_ack(sk, skb);
        2079                 break;
        2080         case TCP_TW_RST:
        2081                 goto no_tcp_socket;
        2082         case TCP_TW_SUCCESS:;
        2083         }
        2084         goto discard_it;
        2085 } 
            
           执行stap脚本后发现,函数执行2039行,原来这样。
   那么从第一个goto discard_it标签开始查,也就是1943行。
   看看函数执行的时候,skb->pkt_type的值,stap如下:
   probe kernel.function("tcp_v4_rcv")
           {
                        printf("pkt_type:%ld!\n",$skb->pkt_type);
           }
           发现其值为2。看看pkt_type的可能取值:
    24 #define PACKET_HOST                  0               /* To us                */
    25 #define PACKET_BROADCAST        1               /* To all               */
    26 #define PACKET_MULTICAST          2               /* To group             */
    27 #define PACKET_OTHERHOST         3               /* To someone else      */
    28 #define PACKET_OUTGOING           4               /* Outgoing of any type */
    29 #define PACKET_LOOPBACK           5               /* MC/BRD frame looped back */
    30 #define PACKET_USER                   6               /* To user space        */
    31 #define PACKET_KERNEL                7               /* To kernel space      */
   这样问题就很明显了,pkt_type正常的取值应该是0,即PACKET_HOST,而2表示PACKET_MULTICAST,硬件多播。
   那么skb的pkt_type究竟_在何时给赋值的呢。我们以此打印tcp_v4_rcv上流各个函数调用时候pkt_type的值,发现均是2。可见不是在协议栈里面赋的值。
   说明是在我的驱动模块给赋的值,在驱动的NAPI poll接收函数dev_clean_rx()函数里,该函数调用eth_type_trans(),它根据硬件目的mac的多播类型设置skb的硬件协议标示pkt_type.
   157 __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
   158 {
   159         unsigned short _service_access_point;
   160         const unsigned short *sap;
   161         const struct ethhdr *eth;
   162 
   163         skb->dev = dev;
   164         skb_reset_mac_header(skb);
   165         skb_pull_inline(skb, ETH_HLEN);
   166         eth = eth_hdr(skb);
   167 
   168         if (unlikely(is_multicast_ether_addr(eth->h_dest))) {
   169                 if (ether_addr_equal_64bits(eth->h_dest, dev->broadcast))
   170                         skb->pkt_type = PACKET_BROADCAST;
   171                 else
   172                         skb->pkt_type = PACKET_MULTICAST;
   173         }
   174         else if (unlikely(!ether_addr_equal_64bits(eth->h_dest,
   175                                                    dev->dev_addr)))
   176                 skb->pkt_type = PACKET_OTHERHOST;
    ...
          ...
          ...  }
   在172行给赋的值,那为什么会进入168行的分支呢。
        107 static inline bool is_multicast_ether_addr(const u8 *addr)
        108 {
        109         return 0x01 & addr[0];
        110 }

           一切都很明显了,因为我们是硬件实现自己的网卡设备,mac地址是硬件同事给定义的,为db:02:03:04:05:06,这是一个多播mac地址,选取一个单播地址就好了。
           以太网的硬件地址长度为48 bits(6 字节),而L2数据帧有三种类型:单播,多播和广播,其中广播可看作多播的一种特殊情况。Bit 0用于表示多播还是单播,当bit 0为1时,为多播,为0时,表示单播。
     

猜你喜欢

转载自blog.csdn.net/hjkfcz/article/details/78572445