Some things that spin_lock_bh thought of

Recently, someone asked me why I need to use spin_lock/unlock_bh instead of spin_lock/unlock to protect the critical section in the function of the NF HOOK point PREROUTING.

Facing this problem, I am a little confused. When it comes to the spin_lock family, there are many series of interfaces:

spin_lock/spin_unlock
spin_lock_bh/spin_unlock_bh
spin_lock_irq/spin_unlock_irq
spin_lock_irqsave/spin_unlock_irqrestore
…

The reason why there are so many, it means in order to prevent the closed seize the critical area is interrupted by a high priority sequence of the same CPU and reentrant deadlock.

But you still have to give a concrete case to be convincing, not just in theory.

In fact, you only need to give a case where the process context calls the PREROUTING function:

The process context C1 calls spin_lock(Lx) in the PREROUTING function to enter the critical section.
The critical section has not been out, the CPU running in C1 is interrupted, and then softirq is scheduled to execute net_rx_action.
Enter the PREROUTING function in soft interrupt context C2, call spin_lock (Lx) to try to enter the critical section.
Since C1 has acquired spinlock Lx, C2 starts spinning, waiting for C1 to release Lx.
Since C1 is preempted by C2, and C2 is already spinning, it is properly deadlocked!

But the question is, under what circumstances can the process context reach PREROUTING? ?

I remember that it was about this time in 2015, and I wrote an article:
https://blog.csdn.net/dog250/article/details/48770481
The case in this article is a scenario where data packet reception is performed in the context of a process, and data packet reception It must have passed through the PREROUTING point in the process.

Let me excerpt a description of the article:

A TCP data packet connected to the local machine finally reaches the xmit sending function of loopback, which simply schedules a soft interrupt processing on the CPU, and then schedules its execution after the next interrupt is over. This has a high probability of being current In the context of the sending process, that is to say, the sending process performed the sending operation in its context, and at this time the soft interrupt borrowed its context to trigger the receiving operation,...

However, there is a problem, what is meant by "this is likely to be carried out in the context of the current sending process" I feel this is not rigorous, so today I want to explore this issue in depth:

Why are the sending and receiving logic of the loopback network card performed in the same process context?

To this end, you need to print out stack when ping communication through loopback locally:

#!/usr/local/bin/stap -g

function dump()
%{
    
    
	dump_stack();
%}

probe kernel.function("icmp_rcv")
{
    
    
	dump();
	//print_backtrace(); // 这个不知为何不好使..
}

The following is the result after a ping:

[34197.319729]  [<ffffffff8159a145>] ? icmp_rcv+0x5/0x380
[34197.319732]  [<ffffffff81561b84>] ? ip_local_deliver_finish+0xb4/0x1f0
[34197.319735]  [<ffffffff81561e69>] ip_local_deliver+0x59/0xd0
[34197.319738]  [<ffffffff81561ad0>] ? ip_rcv_finish+0x350/0x350
[34197.319741]  [<ffffffff815617fd>] ip_rcv_finish+0x7d/0x350
[34197.319744]  [<ffffffff81562196>] ip_rcv+0x2b6/0x410
[34197.319747]  [<ffffffff81561780>] ? inet_del_offload+0x40/0x40
[34197.319752]  [<ffffffff815267b2>] __netif_receive_skb_core+0x582/0x7d0
[34197.319755]  [<ffffffff81526a18>] __netif_receive_skb+0x18/0x60
[34197.319757]  [<ffffffff815276ee>] process_backlog+0xae/0x180
[34197.319760]  [<ffffffff81526ed2>] net_rx_action+0x152/0x240
[34197.319765]  [<ffffffff8107e02f>] __do_softirq+0xef/0x280
[34197.319768]  [<ffffffff81646b1c>] call_softirq+0x1c/0x30
[34197.319769]  <EOI>  [<ffffffff81017155>] do_softirq+0x65/0xa0
[34197.319777]  [<ffffffff8107d924>] local_bh_enable+0x94/0xa0
[34197.319780]  [<ffffffff81566a00>] ip_finish_output+0x1f0/0x7d0
[34197.319783]  [<ffffffff81567cff>] ip_output+0x6f/0xe0
[34197.319786]  [<ffffffff81566810>] ? ip_fragment+0x8b0/0x8b0
[34197.319789]  [<ffffffff81565971>] ip_local_out_sk+0x31/0x40
[34197.319791]  [<ffffffff81568746>] ip_send_skb+0x16/0x50
[34197.319793]  [<ffffffff815687b3>] ip_push_pending_frames+0x33/0x40
[34197.319797]  [<ffffffff81590fbe>] raw_sendmsg+0x59e/0x620
[34197.319802]  [<ffffffff810af1a9>] ? ttwu_do_wakeup+0x19/0xd0
[34197.319805]  [<ffffffff8159f604>] inet_sendmsg+0x64/0xb0
[34197.319811]  [<ffffffff8150cc90>] sock_sendmsg+0xb0/0xf0
[34197.319814]  [<ffffffff8150d201>] SYSC_sendto+0x121/0x1c0
[34197.319817]  [<ffffffff8150e221>] ? __sys_recvmsg+0x51/0x90
[34197.319820]  [<ffffffff8150dc8e>] SyS_sendto+0xe/0x10
[34197.319823]  [<ffffffff81645189>] system_call_fastpath+0x16/0x1b

Haha, the truth is out! My analysis in 2015 was wrong:

The sending process performed the sending operation in its context, and at this time the soft interrupt borrowed its context to trigger the receiving operation,...

It is not "borrowing its context" at all , but it is actually the net_rx_action that is actively called in this context!

The calling logic is as follows:

ip_output_finish
    rcu_read_lock_bh
    ...
        dev_queue_xmit
            loopback_xmit
                netif_rx
                    enqueue_to_backlog  # 这里将skb入队列
                        raise_softirq_irqoff(NET_RX_SOFTIRQ)
                    ...
                ...
            ...
        ...
    ...
    rcu_read_unlock_bh # unlock操作触发进程上下文中处理接收操作
        local_bh_enable
            do_softirq
                __do_softirq
                    net_rx_action # 这里对队列中的skb进行处理
                        ...
                        ip_rcv_finish
                            icmp_rcv
                        ...
                    ...
                ...
            ...
        ...
    ...
ip_output_finish return

OK, now, this is a very clear case where the process context executes the packet receiving logic, that is to say:

Since the soft interrupt function net_rx_action may be executed in the context of the process, in order to prevent deadlock, the critical section must be protected with the _bh version of spinlock!

Similar to rcu_read_unlock_bh, which does many things in the unlock process, there are many more in the kernel:

Spin_unlock may trigger the schedule and cause task switching.
spin_unlock_bh may trigger do_softirq to execute soft interrupt routines.
release_sock may execute sk_backlog_rcv and then process the package collection.
…

This is a compensation effect, since the lock operation to ban some of the behavior between the unlock operation, then they would have to compensate as much as possible of these have delayed behavior when unlock, try to execute them immediately. This design is quite clever.

In addition, there is a typical case where the process context executes the data packet receiving logic, that is, the TUN/TAP network card calls tun_get_user from the process context, and then directly calls netif_rx_ni to receive the packet.

Let's take a look at the strange and interesting process of sending and receiving data packets by the loopback network card:

The sending logic has not yet returned, and the receiving logic returns first.

what does this mean? Unknown, but if you encounter some inexplicable problems in the process of connecting this machine, you can start from here to troubleshoot.

The leather shoes in Wenzhou, Zhejiang are wet, so they won’t get fat in the rain.

Some things that spin_lock_bh thought of

Guess you like