raid驱动导致主机crash的问题分析

某外场环境,连续2-3次出现panic,故障当时dmesg日志如下:

[86406.499787] BUG: unable to handle kernel NULL pointer dereference at 0000000000000106
[86406.499808] IP: skb_release_data+0xa3/0x170
[86406.499811] PGD 0 P4D 0 
[86406.499818] Oops: 0000 [#1] SMP PTI
[86406.499822] Modules linked in: sctp_diag sctp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag bnx2fc(OE) cnic(OE) uio fcoe libfcoe libfc drbd lru_cache ocfs2(OE) quota_tree 8021q mrp garp 
stp llc ocfs2_dlmfs(OE) ocfs2_stack_o2cb(OE) ocfs2_dlm(OE) ocfs2_nodemanager(OE) event_keeper(OE) ocfs2_stackglue(OE) dm_round_robin chbk(OE) ib_iser rdma_cm ib_cm iw_cm iscsi_tcp libiscsi_tcp lib
iscsi scsi_transport_iscsi cls_u32 sch_sfq sch_htb openvswitch(OE) udp_tunnel nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nf_nat nf_conntrack libcrc32
c dm_multipath vhost_net vhost tap kvm_intel kvm irqbypass ipmi_ssif dm_mod nfsd nfs_acl auth_rpcgss nfs ttm fscache drm_kms_helper lockd drm sunrpc fb_sys_fops sysimgblt sysfillrect syscopyarea s
hpchp mei_me mei grace mxm_wmi
[86406.499903] lpc_ich ipmi_si ipmi_devintf ipmi_msghandler wmi nls_iso8859_1 acpi_power_meter mac_hid lp parport ext4 jbd2 mbcache ses sd_mod enclosure qla2xxx(OE) ixgbe igb scsi_transport_fc nvme_fc dca i2c_algo_bit ahci nvme_fabrics ptp aacraid(OE) libahci pps_core nvme_core scsi_transport_sas mdio nbd mlx4_ib ib_core mlx4_core devlink
[86406.499942] CPU: 28 PID: 180 Comm: ksoftirqd/28 Tainted: G OE 4.14.0-generic #1
[86406.499947] Hardware name: New H3C Technologies Co., Ltd. UIS R390X G2/RS32M2C9S, BIOS 1.01.23 09/07/2018
[86406.499952] task: ffff9612bb564440 task.stack: ffffabb086974000
[86406.499957] RIP: 0010:skb_release_data+0xa3/0x170
[86406.499962] RSP: 0018:ffffabb086977cb8 EFLAGS: 00010206
[86406.499966] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[86406.499969] RDX: 0000000000000002 RSI: 0000000000000040 RDI: 0000000000000106
[86406.499973] RBP: ffffabb086977cd8 R08: 0000000000000000 R09: ffffffffb4f711c1
[86406.499977] R10: ffffdd858089de00 R11: ffffffffb4c3a380 R12: ffff9612a277cef0
[86406.499980] R13: ffff9611b2a3f100 R14: ffff9612a277cec0 R15: ffff9612b08c3540
[86406.499985] FS: 0000000000000000(0000) GS:ffff9612bfc80000(0000) knlGS:0000000000000000
[86406.499989] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[86406.499993] CR2: 0000000000000106 CR3: 00000016c960a003 CR4: 00000000003626e0
[86406.499997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[86406.500000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[86406.500003] Call Trace:
[86406.500011] skb_release_all+0x24/0x30
[86406.500016] napi_consume_skb+0x54/0x110
[86406.500032] ixgbe_poll+0x7d7/0x14a0 [ixgbe]
[86406.500041] ? insert_work+0x69/0x70
[86406.500045] ? __queue_work+0x119/0x410
[86406.500051] ? __switch_to_asm+0x40/0x70
[86406.500056] ? call_timer_fn+0x35/0x140
[86406.500062] net_rx_action+0x140/0x3a0
[86406.500067] __do_softirq+0xe4/0x2c5
[86406.500073] run_ksoftirqd+0x22/0x60
[86406.500079] smpboot_thread_fn+0xfc/0x170
[86406.500084] kthread+0x128/0x140
[86406.500087] ? sort_range+0x30/0x30
[86406.500091] ? kthread_create_on_node+0x70/0x70
[86406.500096] ret_from_fork+0x35/0x40
[86406.500099] Code: 45 fa 0f 1f 44 00 00 f0 ff 4f 1c 75 05 e8 06 5a a6 ff 41 0f b6 46 02 83 c3 01 49 83 c4 10 39 d8 7f ce 49 8b 7e 08 48 85 ff 74 10 <48> 8b 1f e8 f5 f4 ff ff 48 85 db 48 89 df 75
f0 4d 85 ed 74 54 
[86406.500149] RIP: skb_release_data+0xa3/0x170 RSP: ffffabb086977cb8
[86406.500152] CR2: 0000000000000106

在出现几次后,发现有如下规律:

  • 每次出现报错的空指针地址都是0x106
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000106
  • 每次出现的上下文都是网络释放skb_release_all的上下文
    RIP: skb_release_data+0xa3/0x170 RSP: ffffabb086977cb8

查看panic的堆栈,并返汇编可以得到其对应的代码为:

crash> bt
PID: 180 TASK: ffff9612bb564440 CPU: 28 COMMAND: "ksoftirqd/28"
#0 [ffffabb086977930] machine_kexec at ffffffffb48621f9
#1 [ffffabb086977990] __crash_kexec at ffffffffb4926079
#2 [ffffabb086977a58] crash_kexec at ffffffffb4926dd1
#3 [ffffabb086977a78] oops_end at ffffffffb482fcd8
#4 [ffffabb086977aa0] no_context at ffffffffb487275c
#5 [ffffabb086977b08] __bad_area_nosemaphore at ffffffffb4872b13
#6 [ffffabb086977b48] bad_area_nosemaphore at ffffffffb4872be4
#7 [ffffabb086977b58] __do_page_fault at ffffffffb4872fe4
#8 [ffffabb086977bd0] do_page_fault at ffffffffb487340e
#9 [ffffabb086977c00] page_fault at ffffffffb5201605
[exception RIP: skb_release_data+163]
RIP: ffffffffb4f75143 RSP: ffffabb086977cb8 RFLAGS: 00010206
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000002 RSI: 0000000000000040 RDI: 0000000000000106
RBP: ffffabb086977cd8 R8: 0000000000000000 R9: ffffffffb4f711c1
R10: ffffdd858089de00 R11: ffffffffb4c3a380 R12: ffff9612a277cef0
R13: ffff9611b2a3f100 R14: ffff9612a277cec0 R15: ffff9612b08c3540
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffffabb086977cb0] skb_release_data at ffffffffb4f751b4
#11 [ffffabb086977ce0] skb_release_all at ffffffffb4f74614
#12 [ffffabb086977cf8] napi_consume_skb at ffffffffb4f782e4
#13 [ffffabb086977d18] ixgbe_poll at ffffffffc0244dd7 [ixgbe]
#14 [ffffabb086977de0] net_rx_action at ffffffffb4f8bf80
#15 [ffffabb086977e60] __softirqentry_text_start at ffffffffb54000e4
#16 [ffffabb086977ec8] run_ksoftirqd at ffffffffb48918b2
#17 [ffffabb086977ed8] smpboot_thread_fn at ffffffffb48b068c
#18 [ffffabb086977f08] kthread at ffffffffb48ac9c8
#19 [ffffabb086977f50] ret_from_fork at ffffffffb5200205
crash> dis -l skb_release_data+163
/linux-iso/code/linux-dev/4.14.88/linux-4.14/net/core/skbuff.c: 666
0xffffffffb4f75143 <skb_release_data+163>:      mov    (%rdi),%rbx
 663 void kfree_skb_list(struct sk_buff *segs)                                                                             
 664 {                                                                                                                     
 665         while (segs) {                                                                                                
 666                 struct sk_buff *next = segs->next;     <<<<---- panic地址                                                                                                                                        
 667                                                                                                                       
 668                 kfree_skb(segs);                                                                                      
 669                 segs = next;                                                                                          
 670         }                                                                                                             
 671 } 

结合代码分析可知: 导致故障的原因应该是入参segs本身变成了0x0000000000000106,然后通过指针访问segs->next出现了空指针panic。kfree_skb_list函数应该被编译器优化掉了,因此无法直接获取到segs的值,必须要通过其上级函数skb_release_data才能解析出segs的具体值。查看代码skb_release_data的入参是skb,只要得到了skb就可以很容易的解析出segs。

static void skb_release_data(struct sk_buff *skb)
{
    struct skb_shared_info *shinfo = skb_shinfo(skb);
    int i;

    if (skb->cloned &&
        atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
                  &shinfo->dataref))
        return;

    for (i = 0; i < shinfo->nr_frags; i++)
        __skb_frag_unref(&shinfo->frags[i]);

    /*
     * If skb buf is from userspace, we need to notify the caller
     * the lower device DMA has done;
     */
    if (shinfo->tx_flags & SKBTX_DEV_ZEROCOPY) {
        struct ubuf_info *uarg;

        uarg = shinfo->destructor_arg;
        if (uarg->callback)
            uarg->callback(uarg, true);
    }

    if (shinfo->frag_list)
        kfree_skb_list(shinfo->frag_list);     <<<<----- 反汇编可以看到其被优化掉了

    skb_free_head(skb);
}

将skb_release_data函数进行反汇编,入参skb初始时候是存放在rdi寄存器里面的,

crash> dis -l skb_release_data
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 554
0xffffffff913750a0 <skb_release_data>:  nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff913750a5 <skb_release_data+5>:        push   %rbp
0xffffffff913750a6 <skb_release_data+6>:        mov    %rsp,%rbp
0xffffffff913750a9 <skb_release_data+9>:        push   %r14
0xffffffff913750ab <skb_release_data+11>:       push   %r13
0xffffffff913750ad <skb_release_data+13>:       push   %r12
0xffffffff913750af <skb_release_data+15>:       push   %rbx
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 558
0xffffffff913750b0 <skb_release_data+16>:       movzbl 0x8e(%rdi),%eax
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1248
0xffffffff913750b7 <skb_release_data+23>:       mov    0xcc(%rdi),%r14d
0xffffffff913750be <skb_release_data+30>:       add    0xd0(%rdi),%r14
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 558
0xffffffff913750c5 <skb_release_data+37>:       test   $0x1,%al
0xffffffff913750c7 <skb_release_data+39>:       je     0xffffffff913750f8 <skb_release_data+88>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 559
0xffffffff913750c9 <skb_release_data+41>:       and    $0x2,%eax
0xffffffff913750cc <skb_release_data+44>:       cmp    $0x1,%al
0xffffffff913750ce <skb_release_data+46>:       mov    %eax,%edx
0xffffffff913750d0 <skb_release_data+48>:       sbb    %eax,%eax
0xffffffff913750d2 <skb_release_data+50>:       or     $0xfffeffff,%eax
0xffffffff913750d7 <skb_release_data+55>:       cmp    $0x1,%dl
0xffffffff913750da <skb_release_data+58>:       sbb    %edx,%edx
0xffffffff913750dc <skb_release_data+60>:       xor    %dx,%dx
0xffffffff913750df <skb_release_data+63>:       add    $0x10001,%edx
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./arch/x86/include/asm/atomic.h: 157
0xffffffff913750e5 <skb_release_data+69>:       lock xadd %eax,0x24(%r14)
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 558
0xffffffff913750eb <skb_release_data+75>:       cmp    %eax,%edx
0xffffffff913750ed <skb_release_data+77>:       je     0xffffffff913750f8 <skb_release_data+88>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 571
0xffffffff913750ef <skb_release_data+79>:       pop    %rbx
0xffffffff913750f0 <skb_release_data+80>:       pop    %r12
0xffffffff913750f2 <skb_release_data+82>:       pop    %r13
0xffffffff913750f4 <skb_release_data+84>:       pop    %r14
0xffffffff913750f6 <skb_release_data+86>:       pop    %rbp
0xffffffff913750f7 <skb_release_data+87>:       retq   
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 563
0xffffffff913750f8 <skb_release_data+88>:       xor    %ebx,%ebx
0xffffffff913750fa <skb_release_data+90>:       cmpb   $0x0,0x2(%r14)
0xffffffff913750ff <skb_release_data+95>:       mov    %rdi,%r13                 <<<----- rdi被保存到了r13里面
0xffffffff91375102 <skb_release_data+98>:       lea    0x30(%r14),%r12
0xffffffff91375106 <skb_release_data+102>:      je     0xffffffff9137513a <skb_release_data+154>
0xffffffff91375108 <skb_release_data+104>:      mov    (%r12),%rdi
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/compiler.h: 183
0xffffffff9137510c <skb_release_data+108>:      mov    0x20(%rdi),%rax
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/page-flags.h: 150
0xffffffff91375110 <skb_release_data+112>:      lea    -0x1(%rax),%rdx
0xffffffff91375114 <skb_release_data+116>:      test   $0x1,%al
0xffffffff91375116 <skb_release_data+118>:      cmovne %rdx,%rdi
0xffffffff9137511a <skb_release_data+122>:      nopl   0x0(%rax,%rax,1)
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./arch/x86/include/asm/atomic.h: 118
0xffffffff9137511f <skb_release_data+127>:      lock decl 0x1c(%rdi)
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/mm.h: 854
0xffffffff91375123 <skb_release_data+131>:      jne    0xffffffff9137512a <skb_release_data+138>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/mm.h: 855
0xffffffff91375125 <skb_release_data+133>:      callq  0xffffffff90ddab30 <__put_page>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 563
0xffffffff9137512a <skb_release_data+138>:      movzbl 0x2(%r14),%eax
0xffffffff9137512f <skb_release_data+143>:      add    $0x1,%ebx
0xffffffff91375132 <skb_release_data+146>:      add    $0x10,%r12
0xffffffff91375136 <skb_release_data+150>:      cmp    %ebx,%eax
0xffffffff91375138 <skb_release_data+152>:      jg     0xffffffff91375108 <skb_release_data+104>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 566
0xffffffff9137513a <skb_release_data+154>:      mov    0x8(%r14),%rdi
0xffffffff9137513e <skb_release_data+158>:      test   %rdi,%rdi
0xffffffff91375141 <skb_release_data+161>:      je     0xffffffff91375153 <skb_release_data+179>     
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 666
0xffffffff91375143 <skb_release_data+163>:      mov    (%rdi),%rbx      <<<---- rip的值,当前代码运行到了这个地方    
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 668
0xffffffff91375146 <skb_release_data+166>:      callq  0xffffffff91374640 <kfree_skb>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 665
0xffffffff9137514b <skb_release_data+171>:      test   %rbx,%rbx
0xffffffff9137514e <skb_release_data+174>:      mov    %rbx,%rdi
0xffffffff91375151 <skb_release_data+177>:      jne    0xffffffff91375143 <skb_release_data+163>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1277
0xffffffff91375153 <skb_release_data+179>:      test   %r13,%r13
0xffffffff91375156 <skb_release_data+182>:      je     0xffffffff913751ac <skb_release_data+268>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1248
0xffffffff91375158 <skb_release_data+184>:      mov    0xcc(%r13),%eax
0xffffffff9137515f <skb_release_data+191>:      add    0xd0(%r13),%rax
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1277
0xffffffff91375166 <skb_release_data+198>:      movzbl 0x3(%rax),%edx
0xffffffff9137516a <skb_release_data+202>:      test   $0x8,%dl
0xffffffff9137516d <skb_release_data+205>:      je     0xffffffff913751ac <skb_release_data+268>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1279
0xffffffff9137516f <skb_release_data+207>:      mov    0x28(%rax),%rdi
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1312
0xffffffff91375173 <skb_release_data+211>:      test   %rdi,%rdi
0xffffffff91375176 <skb_release_data+214>:      je     0xffffffff913751ac <skb_release_data+268>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1313
0xffffffff91375178 <skb_release_data+216>:      mov    (%rdi),%rcx
0xffffffff9137517b <skb_release_data+219>:      cmp    $0xffffffff91374e00,%rcx
0xffffffff91375182 <skb_release_data+226>:      je     0xffffffff913751ed <skb_release_data+333>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1316
0xffffffff91375184 <skb_release_data+228>:      test   $0x1,%dil
0xffffffff91375188 <skb_release_data+232>:      jne    0xffffffff913751a6 <skb_release_data+262>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1317
0xffffffff9137518a <skb_release_data+234>:      mov    $0x1,%esi
0xffffffff9137518f <skb_release_data+239>:      callq  0xffffffff91803040 <__x86_indirect_thunk_rcx>
0xffffffff91375194 <skb_release_data+244>:      mov    0xcc(%r13),%eax
0xffffffff9137519b <skb_release_data+251>:      add    0xd0(%r13),%rax
0xffffffff913751a2 <skb_release_data+258>:      movzbl 0x3(%rax),%edx
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1320
0xffffffff913751a6 <skb_release_data+262>:      and    $0xffffffd7,%edx
0xffffffff913751a9 <skb_release_data+265>:      mov    %dl,0x3(%rax)
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 570
0xffffffff913751ac <skb_release_data+268>:      mov    %r13,%rdi
0xffffffff913751af <skb_release_data+271>:      callq  0xffffffff913711a0 <skb_free_head>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 571
0xffffffff913751b4 <skb_release_data+276>:      pop    %rbx
0xffffffff913751b5 <skb_release_data+277>:      pop    %r12
0xffffffff913751b7 <skb_release_data+279>:      pop    %r13
0xffffffff913751b9 <skb_release_data+281>:      pop    %r14
0xffffffff913751bb <skb_release_data+283>:      pop    %rbp
0xffffffff913751bc <skb_release_data+284>:      retq   
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/mm.h: 790
0xffffffff913751bd <skb_release_data+285>:      mov    (%rdi),%rax
0xffffffff913751c0 <skb_release_data+288>:      shr    $0x33,%rax
0xffffffff913751c4 <skb_release_data+292>:      and    $0x7,%eax
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/memremap.h: 165
0xffffffff913751c7 <skb_release_data+295>:      cmp    $0x4,%eax
0xffffffff913751ca <skb_release_data+298>:      jne    0xffffffff9137511f <skb_release_data+127>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/memremap.h: 166
0xffffffff913751d0 <skb_release_data+304>:      mov    0x20(%rdi),%rax
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/memremap.h: 165
0xffffffff913751d4 <skb_release_data+308>:      mov    0x38(%rax),%eax
0xffffffff913751d7 <skb_release_data+311>:      sub    $0x1,%eax
0xffffffff913751da <skb_release_data+314>:      cmp    $0x1,%eax
0xffffffff913751dd <skb_release_data+317>:      ja     0xffffffff9137511f <skb_release_data+127>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/mm.h: 850
0xffffffff913751e3 <skb_release_data+323>:      callq  0xffffffff90dc3b40 <put_zone_device_private_or_public_page>
0xffffffff913751e8 <skb_release_data+328>:      jmpq   0xffffffff9137512a <skb_release_data+138>
/linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1315
0xffffffff913751ed <skb_release_data+333>:      callq  0xffffffff91374f90 <sock_zerocopy_put>
0xffffffff913751f2 <skb_release_data+338>:      mov    0xcc(%r13),%eax
0xffffffff913751f9 <skb_release_data+345>:      add    0xd0(%r13),%rax
0xffffffff91375200 <skb_release_data+352>:      movzbl 0x3(%rax),%edx
0xffffffff91375204 <skb_release_data+356>:      jmp    0xffffffff913751a6 <skb_release_data+262>
0xffffffff91375206 <skb_release_data+358>:      nopw   %cs:0x0(%rax,%rax,1)

从汇编代码可以看到,rdi被保存到了r13寄存器中,而r13寄存器在后面的过程中一直没有被修改过,所以r13寄存器里面的值,肯定就是skb的值。

    RIP: ffffffffb4f75143  RSP: ffffabb086977cb8  RFLAGS: 00010206
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000001
    RDX: 0000000000000002  RSI: 0000000000000040  RDI: 0000000000000106
    RBP: ffffabb086977cd8   R8: 0000000000000000   R9: ffffffffb4f711c1
    R10: ffffdd858089de00  R11: ffffffffb4c3a380  R12: ffff9612a277cef0
    R13: ffff9611b2a3f100  R14: ffff9612a277cec0  R15: ffff9612b08c3540

根据skb_release_data函数的代码,计算可以得到 shinfo = skb->head + skb->end,解析shinfo->frag_list即可得到segs 

crash> sk_buff.head ffff9611b2a3f100
  head = 0xffff9612a277cc00 "\342J\343.\254b\275!"
crash> sk_buff.end ffff9611b2a3f100
  end = 704
crash> eval 0xffff9612a277cc00 + 704
hexadecimal: ffff9612a277cec0       <<<----- shinfo
    decimal: 18446627605512179392  (-116468197372224)
      octal: 1777774541124235747300
     binary: 1111111111111111100101100001001010100010011101111100111011000000
crash> skb_shared_info.frag_list ffff9612a277cec0
  frag_list = 0x106             <<<----   panic时的空指针,这样现象和内存里面的数据就对应起来了

结合内核__alloc_skb 函数代码可知,存放frag_list的skb_shared_info,是通过kmalloc分配。

struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
                int flags, int node)
{
。。。
    /* Get the HEAD */
    skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
    if (!skb)
        goto out;
    prefetchw(skb);

    /* We do our best to align skb_shared_info on a separate cache
     * line. It usually works because kmalloc(X > SMP_CACHE_BYTES) gives
     * aligned memory blocks, unless SLUB/SLAB debug is enabled.
     * Both skb->head and skb_shared_info are cache line aligned.
     */
    size = SKB_DATA_ALIGN(size);
    size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
    data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);    <<<---- 分配skb_shared_info的地方,head和end值通过计算得到
。。。
}

从代码可知,kmalloc_reserve的size为:sk_buff.end +sizeof(skb_shared_info)= 1024,即0xffff9612a277cc00为通过kmallc-1024

skb_shared_info的某个数据变成了0x106,出现这种情况又两种可能:

  • sk_buff已经释放,然后该内存被分配给了其它使用kmallc-1024的模块,相关的数据也被初始化成了其它模块的东西,而网络相关模块还在继续引用skb_shared_info,进而导致相关指针访问出现问题。
  • 其它使用kmallc-1024的模块已经释放,相关的内存分配给了sk_buff,然后故障内存是被其它模块相关流程引用访问给破坏了。

这种踩内存的问题不好界定到底是第一种场景,还是第二种场景导致的,除非打开slub_debug=FZPU开关复现,抓取kmallc-1024这个slab上次申请释放的堆栈,才能准确的确认到底是第一种场景,还是第二种场景。

为了找到另一个使用kmallc-1024的模块,强制读取0xffff9612a277cc00的内存,来看看规律。

crash> rd 0xffff9612a277cc00 128
ffff9612a277cc00:  21bd62ac2ee34ae2 ffff9612a277cc00   .J...b.!..w.....
ffff9612a277cc10:  0000000101487d43 0000000000000000   C}H.............
ffff9612a277cc20:  0000000000000000 0000000000000000   ................
ffff9612a277cc30:  ffff9612b54c9920 0000000000000000    .L.............
ffff9612a277cc40:  0000c8010000001c ffffffff00027003   .........p......
ffff9612a277cc50:  0000000000000001 ffffffff00000000   ................
ffff9612a277cc60:  00000000262a1800 0000000000000000   ..*&............
ffff9612a277cc70:  ffff96111aa82d00 ffff9612a277cc78   .-......x.w.....
ffff9612a277cc80:  ffff9612a277cc78 ffff9612a277cc88   x.w.......w.....
ffff9612a277cc90:  0000000000000000 0000000000000000   ................
ffff9612a277cca0:  0000000000000000 0000000000000000   ................
ffff9612a277ccb0:  0000000000000000 0000000000000000   ................
ffff9612a277ccc0:  ffff9622bb065000 ffff9622b5f3f800   .P.."......."...
ffff9612a277ccd0:  0000000101487861 04004e9614f2f7d4   axH..........N..
ffff9612a277cce0:  ffff9612b54c9960 58ba00001d64d97e   `.L.....~.d....X
ffff9612a277ccf0:  6f030b60735b8152 3400004500080be0   R.[s`..o....E..4
ffff9612a277cd00:  b5b5064000405d22 010b19267f0b1926   "]@.@...&...&...
ffff9612a277cd10:  19962441bc0c1ac5 96ce10805e23199f   ....A$....#^....
ffff9612a277cd20:  0a0801010000d862 c5b194890246180f   b.........F.....
ffff9612a277cd30:  0000000000000000 0000000000000000   ................
ffff9612a277cd40:  0000000000000000 020000182a26002a   ........*.&*....
ffff9612a277cd50:  0000000000000000 ffff9612a277cd48   ........H.w.....
ffff9612a277cd60:  0000000000000010 0000000000000000   ................
ffff9612a277cd70:  0000000000000000 ffff9612ba0f4500   .........E......
ffff9612a277cd80:  ffff9612bb8a5800 0000000000000000   .X..............
ffff9612a277cd90:  0000000000000000 ffff9612a277cd98   ..........w.....
ffff9612a277cda0:  ffff9612a277cd98 0000000fffffffe0   ..w.............
ffff9612a277cdb0:  ffff9612a277cdb0 ffff9612a277cdb0   ..w.......w.....
ffff9612a277cdc0:  ffffffffb4e2ff70 0000000000000000   p...............
ffff9612a277cdd0:  0000000000000000 0000000000000000   ................
ffff9612a277cde0:  ffffffffb48a46c0 ffff9612a277cda8   .F........w.....
ffff9612a277cdf0:  000000000020001c 0000000000000000   .. .............
ffff9612a277ce00:  0000000000000000 0000000000000000   ................
ffff9612a277ce10:  0000000000000000 0000000000000000   ................
ffff9612a277ce20:  0000000000000000 0000000101487861   ........axH.....
ffff9612a277ce30:  0000000500000000 0000000a00000000   ................
ffff9612a277ce40:  0000000000000001 ffff9612a277cd48   ........H.w.....
ffff9612a277ce50:  0000000000000000 0000000000000000   ................
ffff9612a277ce60:  0000000000000000 0000000000000000   ................
ffff9612a277ce70:  0000020000040000 ffff9612a277cc00   ..........w.....
ffff9612a277ce80:  ffff9612ba0f4500 ffffffffb4e33610   .E.......6......
ffff9612a277ce90:  0000000000000000 0000000000000000   ................
ffff9612a277cea0:  0000000000000000 0000000000000000   ................
ffff9612a277ceb0:  0000000000000000 0000000000000000   ................
ffff9612a277cec0:  0001000000000000 0000000000000106   ................
ffff9612a277ced0:  0000000000000000 0000000000000001   ................
ffff9612a277cee0:  0000000100000000 0000000000001000   ................
ffff9612a277cef0:  ffffdd8580e53800 0000003000000000   .8..........0...
ffff9612a277cf00:  0000000e89033000 0000000000001000   .0..............
ffff9612a277cf10:  0000000e89034000 0000000000001000   .@..............
ffff9612a277cf20:  0000000e89035000 0000000000001000   .P..............
ffff9612a277cf30:  0000000e89036000 0000000000001000   .`..............
ffff9612a277cf40:  0000000e89037000 0000000000001000   .p..............
ffff9612a277cf50:  0000000e891f0000 0000000000001000   ................
ffff9612a277cf60:  0000000e891f1000 0000000000001000   ................
ffff9612a277cf70:  0000000e891f2000 0000000000001000   . ..............
ffff9612a277cf80:  0000000e891f3000 0000000000001000   .0..............
ffff9612a277cf90:  0000000e891f4000 0000000000001000   .@..............
ffff9612a277cfa0:  0000000e891f5000 0000000000001000   .P..............
ffff9612a277cfb0:  0000000e891f6000 0000000000001000   .`..............
ffff9612a277cfc0:  0000000e891f7000 0000000000001000   .p..............
ffff9612a277cfd0:  0000000eb2538000 0000000000003000   ..S......0......
ffff9612a277cfe0:  ffffdd85b92e29c0 0000000000001000   .)..............
ffff9612a277cff0:  ffffdd85b9abb940 0000000000001000   @...............

 根据经验可知,上面标红的地址都是一些函数地址,解析这种函数地址

crash> sym ffffffffb4e2ff70
ffffffffb4e2ff70 (T) scmd_eh_abort_handler /linux-iso/code/linux-dev/4.14.88/linux-4.14/drivers/scsi/scsi_error.c: 122
crash> sym ffffffffb48a46c0
ffffffffb48a46c0 (T) delayed_work_timer_fn /linux-iso/code/linux-dev/4.14.88/linux-4.14/kernel/workqueue.c: 1498
crash> sym ffffffffb4e33610
ffffffffb4e33610 (t) scsi_done /linux-iso/code/linux-dev/4.14.88/linux-4.14/drivers/scsi/scsi_lib.c: 1840

 这几个函数地址都和io流程有关,直接搜索引用这几个函数地址的内核代码发现:

  • 同时引用到这几个函数的地方很少,满足都引用的只有struct scsi_cmnd

 但是直接把0xffff9612a277cc00当成scsi_cmnd来解析,可以看到数据明显不对?难道之前的分析都是错的?

crash> scsi_cmnd 0xffff9612a277cc00
struct scsi_cmnd {
  req = {
    __cmd = "\342J\343.\254b\275!\000\314w\242\022\226\377\377", 
    cmd = 0x101487d43 <Address 0x101487d43 out of bounds>,     <<<----- scsi_cmd填写的也不对
    cmd_len = 0, 
    result = 0, 
    sense_len = 0, 
    resid_len = 0, 
    retries = 0, 
    sense = 0xffff9612b54c9920
  }, 
  device = 0x0, 
  list = {
    next = 0xc8010000001c,    <<<<---- 这种链表明显不对
    prev = 0xffffffff00027003
  }, 
  eh_entry = {
    next = 0x1,          <<<<----- 这种链表明显不对
    prev = 0xffffffff00000000
  }, 
  abort_work = {
    work = {
      data = {
        counter = 640292864
      }, 
      entry = {
        next = 0x0, 
        prev = 0xffff96111aa82d00
      }, 
      func = 0xffff9612a277cc78
    }, 
   。。。
}

直接解析scsi_done函数也不对,这说明scsi_cmnd不是从0xffff9612a277cc00地址开始的。

crash> scsi_cmnd.scsi_done 0xffff9612a277cc00
  scsi_done = 0x0

结合内核block和scsi层代码可知,scsi命令本身没有单独分配内存,而是公用block层给request分配的kmallc-1024(request占用0x148字节,scsi_cmd占用后面的字节)

因此将0xffff9612a277cc00当成request来解析,可以看到数据基本都是正确的

crash> request 0xffff9612a277cc00
struct request {
  。。。
  q = 0xffff9612b54c9920, 
  mq_ctx = 0x0, 
  cpu = 28, 
  cmd_flags = 51201, 
  rq_flags = 159747, 
  internal_tag = -1, 
  atomic_flags = 1, 
  __data_len = 0, 
  tag = -1, 
  __sector = 640292864, 
  bio = 0x0, 
  biotail = 0xffff96111aa82d00, 
  {
    hash = {
      next = 0xffff9612a277cc78, 
      pprev = 0xffff9612a277cc78
    }, 
    ipi_list = {
      next = 0xffff9612a277cc78, 
      prev = 0xffff9612a277cc78
    }
  }, 
  {
    rb_node = {
      __rb_parent_color = 18446627605512178824, 
      rb_right = 0x0, 
      rb_left = 0x0
    }, 
    special_vec = {
      bv_page = 0xffff9612a277cc88, 
      bv_len = 0, 
      bv_offset = 0
    }, 
    completion_data = 0xffff9612a277cc88, 
    error_count = -1569207160
  }, 
  {
    elv = {
      icq = 0x0, 
      priv = {0x0, 0x0}
    }, 
    flush = {
      seq = 0, 
      list = {
        next = 0x0, 
        prev = 0x0
      }, 
      saved_end_io = 0x0
    }
  }, 
  rq_disk = 0xffff9622bb065000, 
  part = 0xffff9622b5f3f800, 
  start_time = 4316493921, 
  issue_stat = {
    stat = 288316782655240148
  }, 
  rl = 0xffff9612b54c9960, 
  start_time_ns = 6393422621498988926, 
  io_start_time_ns = 7999249872020275538, 
  nr_phys_segments = 3040, 
  nr_integrity_segments = 8, 
  ioprio = 69, 
  timeout = 4218146, 
  special = 0x10b19267f0b1926, 
  extra_len = 3154909893, 
  write_hint = 9281, 
  deadline = 10866641094412802463, 
  timeout_list = {
    next = 0xa0801010000d862, 
    prev = 0xc5b194890246180f
  }, 
  end_io = 0x0, 
  end_io_data = 0x0, 
  next_rq = 0x0
}

 解析里面的request_queue可以看到也是对的

crash> request_queue 0xffff9612b54c9920
struct request_queue {
  queue_head = {
    next = 0xffff9612b54c9920, 
    prev = 0xffff9612b54c9920
  }, 
  last_merge = 0x0, 
  elevator = 0xffff9612b608d800, 
。。。
  request_fn = 0xffffffffb4e33d00, 
  make_request_fn = 0xffffffffb4bcebc0, 
  prep_rq_fn = 0xffffffffb4e34d70, 
  unprep_rq_fn = 0xffffffffb4e30d20, 
  softirq_done_fn = 0xffffffffb4e33b90, 
  rq_timed_out_fn = 0xffffffffb4e2fd00, 
  dma_drain_needed = 0x0, 
  lld_busy_fn = 0xffffffffb4e33530, 
  init_rq_fn = 0xffffffffb4e31de0, 
  exit_rq_fn = 0xffffffffb4e31b60, 
  initialize_rq_fn = 0xffffffffb4e31bb0, 
  mq_ops = 0x0, 
  mq_map = 0x0,
。。。
}

 解析放在requeset后面的scsi_cmnd,也是对的0xffff9612a277cc00 + sizeof(request) = 0xffff9612a277cc00 + 0x148 = ffff9612a277cd48

crash> scsi_cmnd ffff9612a277cd48
struct scsi_cmnd {
  req = {
    __cmd = "*\000&*\030\000\000\002\000\000\000\000\000\000\000", 
    cmd = 0xffff9612a277cd48 "*", 
    cmd_len = 16, 
    result = 0, 
    sense_len = 0, 
    resid_len = 0, 
    retries = 0, 
    sense = 0xffff9612ba0f4500
  }, 
  device = 0xffff9612bb8a5800, 
  。。。
    timer = {
      entry = {
        next = 0x0, 
        pprev = 0x0
      }, 
      expires = 0, 
      function = 0xffffffffb48a46c0, 
      data = 18446627605512179112, 
      flags = 2097180
    }, 
    wq = 0x0, 
    cpu = 0
  }, 
 。。。。
  request = 0xffff9612a277cc00, 
  sense_buffer = 0xffff9612ba0f4500 "&\201\321lt\376", <incomplete sequence \363>, 
  scsi_done = 0xffffffffb4e33610, 
。。。
}

根据上面的信息可以确定,另一个引用故障slab的模块就是request

通过计算可以看到,修改的内容与slab开始相差:ffff9612a277cec8 - 0xffff9612a277cc00 = 0x2c8

crash> skb_shared_info.frag_list -xo ffff9612a277cec0
struct skb_shared_info {
  [ffff9612a277cec8] struct sk_buff *frag_list;
}  

对应到request里面偏移0x2c8的结构体为:

而struct request本身大小只有0x148,后面的空间用来放scsi_cmnd(内核这样设计应该是提升io路劲的性能)

crash> struct request -xo

struct request {

[0x0] struct list_head queuelist;

。。。

  [0x138] void *end_io_data;

  [0x140] struct request *next_rq;

}

SIZE: 0x148

而scsi_cmnd中偏移0x2c8 – 0x148 = 0x180的数据为:scsi_pointer -> phase

crash> struct -xo scsi_cmnd
struct scsi_cmnd {
    [0x0] struct scsi_request req;
。。。
  [0x100] unsigned char *cmnd;
  [0x108] struct scsi_data_buffer sdb;
  [0x120] struct scsi_data_buffer *prot_sdb;
  [0x128] unsigned int underflow;
  [0x12c] unsigned int transfersize;
  [0x130] struct request *request;
  [0x138] unsigned char *sense_buffer;
  [0x140] void (*scsi_done)(struct scsi_cmnd *);
  [0x148] struct scsi_pointer SCp;       <<<----- 0x180所在位置,减去0x148,可知其在scp中偏移0x38     
  [0x188] unsigned char *host_scribble;
  [0x190] int result;
  [0x194] int flags;
  [0x198] unsigned char tag;
}
SIZE: 0x1a0
crash> struct scsi_pointer -xo 

struct scsi_pointer {
。。。
  [0x30] volatile int have_data_in;

  [0x34] volatile int sent_command;

  [0x38] volatile int phase;     <<<<---- 对应到skb中的frag_list
}

phase在io下发的过程中,由底层raid卡驱动填写。在io的正常下发过程中,会将scsi_cmnd传递给底层raid卡驱动,而底层驱动也会将改指针保存下来,便于后续io回来时填写数据。如果底层驱动在io已经完成时,还在引用这个指针,就可能导致我们遇到的这个问题。

看了一下底层raid卡驱动是用的aacraid,代码中存在多处将phase修改为0x106的地方

aacraid.h:2633:#define AAC_OWNER_FIRMWARE      0x106
aachba.c:597: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE;
aachba.c:727: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE;
aachba.c:767:           scsicmd->SCp.phase = AAC_OWNER_FIRMWARE;
aachba.c:1134:        scsicmd->SCp.phase = AAC_OWNER_FIRMWARE;
aachba.c:2342:        scsicmd->SCp.phase = AAC_OWNER_FIRMWARE;
aachba.c:2434:        scsicmd->SCp.phase = AAC_OWNER_FIRMWARE;
aachba.c:2516:                  if (cmd->SCp.phase == AAC_OWNER_FIRMWARE) {
aachba.c:2591:        scsicmd->SCp.phase = AAC_OWNER_FIRMWARE;
aachba.c:2668:        scsicmd->SCp.phase = AAC_OWNER_FIRMWARE;
aachba.c:3693:        scsicmd->SCp.phase = AAC_OWNER_FIRMWARE;
aachba.c:3738:        scsicmd->SCp.phase = AAC_OWNER_FIRMWARE;

这个时候恰好家里环境添加 slub_debug=FZPU开关后,也复现了该问题,下面为抓到的日志:

Jan 12 16:11:01 cvk126 kernel: [57117.948006] =============================================================================
Jan 12 16:11:01 cvk126 kernel: [57117.948014] BUG kmalloc-1024 (Tainted: G    B      OE  ): Poison overwritten
Jan 12 16:11:01 cvk126 kernel: [57117.948016] -----------------------------------------------------------------------------
Jan 12 16:11:01 cvk126 kernel: [57117.948016] 
Jan 12 16:11:01 cvk126 kernel: [57117.948019] INFO: 0xffff9561784602d0-0xffff9561784602d3. First byte 0x6 instead of 0x6b
Jan 12 16:11:01 cvk126 kernel: [57117.948028] INFO: Allocated in alloc_request_size+0x2b/0x70 age=36 cpu=1 pid=3011   //分配的堆栈
Jan 12 16:11:01 cvk126 kernel: [57117.948034]      __slab_alloc+0x1c/0x30
Jan 12 16:11:01 cvk126 kernel: [57117.948036]      __kmalloc_node+0xb7/0x2a0
Jan 12 16:11:01 cvk126 kernel: [57117.948038]      alloc_request_size+0x2b/0x70
Jan 12 16:11:01 cvk126 kernel: [57117.948045]      mempool_alloc+0x67/0x1a0
Jan 12 16:11:01 cvk126 kernel: [57117.948049]      get_request+0x3b6/0x770
Jan 12 16:11:01 cvk126 kernel: [57117.948052]      blk_queue_bio+0x114/0x400
Jan 12 16:11:01 cvk126 kernel: [57117.948054]      generic_make_request+0x123/0x300
Jan 12 16:11:01 cvk126 kernel: [57117.948056]      submit_bio+0x6c/0x140
Jan 12 16:11:01 cvk126 kernel: [57117.948075]      ext4_writepages+0x463/0xf40
Jan 12 16:11:01 cvk126 kernel: [57117.948078]      do_writepages+0x4b/0xe0
Jan 12 16:11:01 cvk126 kernel: [57117.948080]      __filemap_fdatawrite_range+0xc1/0x100
Jan 12 16:11:01 cvk126 kernel: [57117.948083] INFO: Freed in __blk_put_request+0xb7/0x190 age=36 cpu=3 pid=0      //释放的堆栈
Jan 12 16:11:01 cvk126 kernel: [57117.948085]      __blk_put_request+0xb7/0x190
Jan 12 16:11:01 cvk126 kernel: [57117.948091]      scsi_end_request+0xf7/0x1e0
Jan 12 16:11:01 cvk126 kernel: [57117.948093]      scsi_io_completion+0x237/0x650
Jan 12 16:11:01 cvk126 kernel: [57117.948096]      blk_done_softirq+0x8e/0xc0
Jan 12 16:11:01 cvk126 kernel: [57117.948101]      __do_softirq+0xde/0x2b3
Jan 12 16:11:01 cvk126 kernel: [57117.948105]      irq_exit+0xae/0xb0
Jan 12 16:11:01 cvk126 kernel: [57117.948129]      start_secondary+0x1ac/0x200
Jan 12 16:11:01 cvk126 kernel: [57117.948132]      secondary_startup_64+0xa5/0xb0
Jan 12 16:11:01 cvk126 kernel: [57117.948135] INFO: Slab 0xffffdf1680e11800 objects=24 used=24 fp=0x          (null) flags=0x17ffffc0008100
Jan 12 16:11:01 cvk126 kernel: [57117.948137] INFO: Object 0xffff956178460008 @offset=8 fp=0xffff956178461fe8
Jan 12 16:11:01 cvk126 kernel: [57117.948137] 
Jan 12 16:11:01 cvk126 kernel: [57117.948140] Redzone ffff956178460000: bb bb bb bb bb bb bb bb                          ........
Jan 12 16:11:01 cvk126 kernel: [57117.948142] Object ffff956178460008: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Jan 12 16:11:01 cvk126 kernel: [57117.948145] Object ffff956178460018: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
。。。。
Jan 12 16:11:01 cvk126 kernel: [57117.948234] Object ffff9561784602a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Jan 12 16:11:01 cvk126 kernel: [57117.948236] Object ffff9561784602b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Jan 12 16:11:01 cvk126 kernel: [57117.948238] Object ffff9561784602c8: 6b 6b 6b 6b 6b 6b 6b 6b 06 01 00 00 6b 6b 6b 6b  kkkkkkkk....kkkk    //检查到被改的内容
Jan 12 16:11:01 cvk126 kernel: [57117.948240] Object ffff9561784602d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
。。。。。
Jan 12 16:11:01 cvk126 kernel: [57117.948279] Object ffff9561784603f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
Jan 12 16:11:01 cvk126 kernel: [57117.948282] Redzone ffff956178460408: bb bb bb bb bb bb bb bb                          ........
Jan 12 16:11:01 cvk126 kernel: [57117.948284] Padding ffff956178460548: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
Jan 12 16:11:01 cvk126 kernel: [57117.948287] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G    B      OE   4.14.0-generic #1
Jan 12 16:11:01 cvk126 kernel: [57117.948288] Hardware name: New H3C Technologies Co., Ltd. UIS R390X G2/RS32M2C9S, BIOS 1.00.23P01 07/07/2017
Jan 12 16:11:01 cvk126 kernel: [57117.948288] Call Trace:
Jan 12 16:11:01 cvk126 kernel: [57117.948289]  <IRQ>
Jan 12 16:11:01 cvk126 kernel: [57117.948294]  dump_stack+0x5c/0x85                                                                        //释放后被分配给了skb_buff,再分配的时候检查到了内存有被修改过
Jan 12 16:11:01 cvk126 kernel: [57117.948295]  check_bytes_and_report+0xbc/0xe0
Jan 12 16:11:01 cvk126 kernel: [57117.948297]  check_object+0x230/0x2a0
Jan 12 16:11:01 cvk126 kernel: [57117.948301]  ? pskb_expand_head+0x71/0x2a0
Jan 12 16:11:01 cvk126 kernel: [57117.948302]  alloc_debug_processing+0x106/0x1b0
Jan 12 16:11:01 cvk126 kernel: [57117.948303]  ___slab_alloc+0x441/0x4b0
Jan 12 16:11:01 cvk126 kernel: [57117.948304]  ? pskb_expand_head+0x71/0x2a0
Jan 12 16:11:01 cvk126 kernel: [57117.948305]  ? ___slab_alloc+0x479/0x4b0
Jan 12 16:11:01 cvk126 kernel: [57117.948306]  ? ___slab_alloc+0x479/0x4b0
Jan 12 16:11:01 cvk126 kernel: [57117.948307]  ? pskb_expand_head+0x71/0x2a0
Jan 12 16:11:01 cvk126 kernel: [57117.948308]  ? __slab_alloc+0x1c/0x30
Jan 12 16:11:01 cvk126 kernel: [57117.948314]  ? __nla_reserve+0x38/0x50
Jan 12 16:11:01 cvk126 kernel: [57117.948315]  __slab_alloc+0x1c/0x30
Jan 12 16:11:01 cvk126 kernel: [57117.948317]  __kmalloc_node_track_caller+0xb9/0x2c0
Jan 12 16:11:01 cvk126 kernel: [57117.948339]  __kmalloc_reserve.isra.43+0x2e/0x80
Jan 12 16:11:01 cvk126 kernel: [57117.948340]  pskb_expand_head+0x71/0x2a0
Jan 12 16:11:01 cvk126 kernel: [57117.948344]  netlink_trim+0xa8/0xc0
Jan 12 16:11:01 cvk126 kernel: [57117.948346]  netlink_unicast+0x4c/0x240
Jan 12 16:11:01 cvk126 kernel: [57117.948352]  queue_userspace_packet+0x2d4/0x5c0 [openvswitch]
Jan 12 16:11:01 cvk126 kernel: [57117.948355]  ovs_dp_upcall+0x50/0x60 [openvswitch]
Jan 12 16:11:01 cvk126 kernel: [57117.948357]  ovs_dp_process_packet+0x12b/0x150 [openvswitch]
Jan 12 16:11:01 cvk126 kernel: [57117.948361]  ovs_vport_receive+0x6c/0xc0 [openvswitch]
Jan 12 16:11:01 cvk126 kernel: [57117.948362]  ? secondary_startup_64+0xa5/0xb0
Jan 12 16:11:01 cvk126 kernel: [57117.948366]  ? arp_process+0x13c/0x820
Jan 12 16:11:01 cvk126 kernel: [57117.948367]  ? set_track+0xb5/0x130
Jan 12 16:11:01 cvk126 kernel: [57117.948372]  ? init_object+0x76/0x90
Jan 12 16:11:01 cvk126 kernel: [57117.948378]  ? get_stack_info+0x37/0x160
Jan 12 16:11:01 cvk126 kernel: [57117.948379]  ? common_interrupt+0x7d/0x7d
Jan 12 16:11:01 cvk126 kernel: [57117.948384]  ? __module_text_address+0xe/0x60
Jan 12 16:11:01 cvk126 kernel: [57117.948388]  ? start_kernel+0x4db/0x4fb
Jan 12 16:11:01 cvk126 kernel: [57117.948393]  ? is_bpf_text_address+0xa/0x20
Jan 12 16:11:01 cvk126 kernel: [57117.948399]  ? unwind_next_frame+0xf4/0x660
Jan 12 16:11:01 cvk126 kernel: [57117.948401]  ? set_track+0xb5/0x130
Jan 12 16:11:01 cvk126 kernel: [57117.948402]  ? init_object+0x76/0x90
Jan 12 16:11:01 cvk126 kernel: [57117.948405]  netdev_frame_hook+0x9e/0x140 [openvswitch]
Jan 12 16:11:01 cvk126 kernel: [57117.948407]  ? ___slab_alloc+0x479/0x4b0
Jan 12 16:11:01 cvk126 kernel: [57117.948410]  __netif_receive_skb_core+0x25c/0xb50
Jan 12 16:11:01 cvk126 kernel: [57117.948412]  ? rcu_process_callbacks+0x1b2/0x4c0
Jan 12 16:11:01 cvk126 kernel: [57117.948414]  ? __build_skb+0x25/0xe0
Jan 12 16:11:01 cvk126 kernel: [57117.948416]  ? netif_receive_skb_internal+0x34/0xd0
Jan 12 16:11:01 cvk126 kernel: [57117.948417]  netif_receive_skb_internal+0x34/0xd0
Jan 12 16:11:01 cvk126 kernel: [57117.948419]  napi_gro_receive+0xb8/0xe0
Jan 12 16:11:01 cvk126 kernel: [57117.948424]  igb_poll+0x466/0xe70 [igb]
Jan 12 16:11:01 cvk126 kernel: [57117.948426]  ? common_interrupt+0x7d/0x7d
Jan 12 16:11:01 cvk126 kernel: [57117.948428]  net_rx_action+0x149/0x3c0
Jan 12 16:11:01 cvk126 kernel: [57117.948429]  __do_softirq+0xde/0x2b3
Jan 12 16:11:01 cvk126 kernel: [57117.948431]  irq_exit+0xae/0xb0
Jan 12 16:11:01 cvk126 kernel: [57117.948432]  do_IRQ+0x81/0xd0
Jan 12 16:11:01 cvk126 kernel: [57117.948434]  common_interrupt+0x7d/0x7d
Jan 12 16:11:01 cvk126 kernel: [57117.948434]  </IRQ>
Jan 12 16:11:01 cvk126 kernel: [57117.948436] RIP: 0010:cpuidle_enter_state+0xa2/0x2e0
Jan 12 16:11:01 cvk126 kernel: [57117.948437] RSP: 0018:ffffffff89e03e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffb4
Jan 12 16:11:01 cvk126 kernel: [57117.948439] RAX: ffff95617f822cc0 RBX: 000033f2ceffe9f3 RCX: 000000000000001f
Jan 12 16:11:01 cvk126 kernel: [57117.948440] RDX: 000033f2ceffe9f3 RSI: ffff64404f9e7f32 RDI: 0000000000000000
Jan 12 16:11:01 cvk126 kernel: [57117.948440] RBP: ffffd42b400010b0 R08: 0000000000000e06 R09: 0000000000000c1b
Jan 12 16:11:01 cvk126 kernel: [57117.948441] R10: ffffffff89e03e70 R11: 0000000000000b28 R12: 0000000000000004
Jan 12 16:11:01 cvk126 kernel: [57117.948442] R13: ffffffff89f7af58 R14: 0000000000000000 R15: 000033f2ced8b4bf
Jan 12 16:11:01 cvk126 kernel: [57117.948444]  do_idle+0x181/0x1e0
Jan 12 16:11:01 cvk126 kernel: [57117.948446]  cpu_startup_entry+0x6f/0x80
Jan 12 16:11:01 cvk126 kernel: [57117.948447]  start_kernel+0x4db/0x4fb
Jan 12 16:11:01 cvk126 kernel: [57117.948449]  secondary_startup_64+0xa5/0xb0
Jan 12 16:11:01 cvk126 kernel: [57117.948451] FIX kmalloc-1024: Restoring 0xffff9561784602d0-0xffff9561784602d3=0x6b     //恢复修改的内存
Jan 12 16:11:01 cvk126 kernel: [57117.948451] 
Jan 12 16:11:01 cvk126 kernel: [57117.948454] FIX kmalloc-1024: Marking all objects used

根据堆栈并结合之前的分析,故障导致的根因应该是request释放后,还在被底层aacraid引用修改了scsi_cmd里面的phase。

分析到这儿感觉故障有很大的概率是aacraid卡驱动导致的,因此升级aacraid为厂商提供的最新驱动,长时间测试故障不再复现。

猜你喜欢

转载自www.cnblogs.com/wendaorenwei/p/10524893.html