The struct sock structure in the linux kernel is an important structure in L3, L4 and the socket layer. It represents a socket. This structure is used in both the process context and the soft interrupt context, so ensure that the structure data in these scenarios Consistency will be very important, and understanding its locking mechanism is also very meaningful for understanding the code.

table of Contents

2 Access operation lock_sock / release_sock of process context

2.1 lock_sock()

2.2 release_sock()

3 Access operation of soft interrupt context

4 Transmission control block reference count

5 Summary

1 Lock structure socket_lock_t

struct sock {
...
	socket_lock_t		sk_lock;
...
}

The sk_lock member in the transmission control block is used to protect access to the structure.

/* This is the per-socket lock.  The spinlock provides a synchronization
 * between user contexts and software interrupt processing, whereas the
 * mini-semaphore synchronizes multiple users amongst themselves.
 */
typedef struct {
	spinlock_t		slock;
	int			owned;
	wait_queue_head_t	wq;
	/*
	 * We express the mutex-alike socket_lock semantics
	 * to the lock validator by explicitly managing
	 * the slock as a lock variant (in addition to
	 * the slock itself):
	 */
	//用于调试，忽略。该宏定义的有bug，在代码实现时对该字段的访问并没有用宏控制
#ifdef CONFIG_DEBUG_LOCK_ALLOC
	struct lockdep_map dep_map;
#endif
} socket_lock_t;

slock: This spin lock is the key to synchronizing process context and soft interrupt context;
owned: A value of 1 indicates that the transmission control block has been locked by the process context, and a value of 0 indicates that it is not locked by the process context;
wq: waiting queue, when the process context needs to hold the transmission control block, but it is currently locked by the soft interrupt, the process will wait

Let's take a look at how these three fields implement synchronous access between the process context and the soft interrupt context.

2 Access operation lock_sock / release_sock of process context

The process context needs to call lock_sock() to lock before accessing the transmission control block, and call release_sock() to release it after the access is completed. The implementation of these two functions is as follows:

2.1 lock_sock()

static inline void lock_sock(struct sock *sk)
{
	lock_sock_nested(sk, 0);
}

void lock_sock_nested(struct sock *sk, int subclass)
{
	//注意：调用lock_sock()可能会导致休眠
	might_sleep();
	//持有自旋锁并关闭下半部
	spin_lock_bh(&sk->sk_lock.slock);
	//如果owned不为0，说明有进程持有该传输控制块，调用__lock_sock()等待，见下文
	if (sk->sk_lock.owned)
		__lock_sock(sk);
	//上面__lock_sock()返回后现场已经被还原，即持有锁并且已经关闭下半部。

	//将owned设置为1，表示本进程现在持有该传输控制块
	sk->sk_lock.owned = 1;
	//释放锁但是没有开启下半部
	spin_unlock(&sk->sk_lock.slock);
	/*
	 * The sk_lock has mutex_lock() semantics here:
	 */
	mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_);
	//开启下半部
    local_bh_enable();
}

//__lock_sock()将进程挂到sk->sk_lock中的等待队列wq上，直到没有进程再持有该该传输
//控制块时返回。注意：调用时已经持有sk->sk_lock，睡眠之前释放锁，返回前再次持有锁
static void __lock_sock(struct sock *sk)
{
	//定义一个等待队列结点
	DEFINE_WAIT(wait);

	//循环，直到sock_owned_by_user()返回0才结束
	for (;;) {
		//将调用进程挂接到锁的等待队列中
		prepare_to_wait_exclusive(&sk->sk_lock.wq, &wait,
					TASK_UNINTERRUPTIBLE);
		//释放锁并打开下半部
		spin_unlock_bh(&sk->sk_lock.slock);
		//执行一次调度
		schedule();
		//再次被调度到时会回到这里，首先持锁并关闭下半部
		spin_lock_bh(&sk->sk_lock.slock);
		//如果没有进程再次持有该传输控制块，那么返回
		if (!sock_owned_by_user(sk))
			break;
	}
	finish_wait(&sk->sk_lock.wq, &wait);
}

#define sock_owned_by_user(sk)	((sk)->sk_lock.owned)

It can be seen from the above code implementation:

When acquiring the lock, the logic is to check whether the owned field is 1. If the field is not 0, then there is already a process holding the lock. At this time, the calling process needs to sleep and wait for the field to become 0; if the owned value is 0, then no process holds the lock, directly set owned to 1. It can be seen that the ultimate purpose of acquiring a lock is to set the owned field to 1, but the access (query and modification) to the owned field is protected by the spin lock sk->sk_lock.
A very important fact is that after changing owned to 1, the spin lock is no longer held, and the lower half has been opened (the lower half here is the soft interrupt ). The advantage of this design is that the processing of the protocol stack does not end immediately. If you simply hold the spin lock from the beginning and close the lower half, release the spin lock and open the lower half at the end of the process, it will decrease The performance of the system, even more terrible, is to close the lower half for a long time, which may also make the network card receive soft interrupt not be called in time, resulting in packet loss.

2.2 release_sock()

After the process context ends the operation of the transmission control block, it needs to call release_sock() to release the transmission control block. As you can imagine, the core of the release is to set owned to 0 and notify other processes waiting for the transfer control block. Look at the code implementation below.

void release_sock(struct sock *sk)
{
	/*
	 * The sk_lock has mutex_unlock() semantics:
	 */
	//调试相关，忽略
	mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);

	//获取自旋锁并关闭下半部
	spin_lock_bh(&sk->sk_lock.slock);
	//如果后备队列不为空，则调用__release_sock()处理后备队列中的数据包，见数据包的接收过程
	if (sk->sk_backlog.tail)
		__release_sock(sk);
	//设置owned为0，表示调用者不再持有该传输控制块
	sk->sk_lock.owned = 0;
	//如果等待队列不为空，则唤醒这些等待的进程
	if (waitqueue_active(&sk->sk_lock.wq))
		wake_up(&sk->sk_lock.wq);
	//释放自旋锁并开启下半部
	spin_unlock_bh(&sk->sk_lock.slock);
}

3 Access operation of soft interrupt context

In the TCP receiving process, there are the following code fragments:

int tcp_v4_rcv(struct sk_buff *skb)
{
...
process:
...
	//获取sk->sk_lock.slock自旋锁
	bh_lock_sock_nested(sk);
	//如果没有进程锁定该传输控制块，将数据接收到奥prequeue或者receive_queue中
	if (!sock_owned_by_user(sk)) {
        if (!tcp_prequeue(sk, skb))
            ret = tcp_v4_do_rcv(sk, skb);
	} else
		//如果进程已经锁定该传输控制块，那么先将数据接收到后备队列中
		sk_add_backlog(sk, skb);
	//释放自旋锁
	bh_unlock_sock(sk);
...

/* BH context may only use the following locking interface. */
#define bh_lock_sock(__sk)	spin_lock(&((__sk)->sk_lock.slock))
#define bh_lock_sock_nested(__sk) \
				spin_lock_nested(&((__sk)->sk_lock.slock), \
				SINGLE_DEPTH_NESTING)
#define bh_unlock_sock(__sk)	spin_unlock(&((__sk)->sk_lock.slock))

As can be seen from the above code, when the soft interrupt context operates the transmission control block, the spin lock is held, because the soft interrupt processing code will exit as quickly as possible, so this is ok.

The remaining question: What does the nested version of spin_lock() mean, this is still unclear.

4 Transmission control block reference count

One more thing to mention here is that the destruction of the transmission control block is not directly related to the synchronization lock mentioned above. Whether to destroy is determined by its reference count member sk_refcnt, which is an atomic variable and can be used sock_get() and sock_put () Increase and decrease its reference count.

5 Summary

When accessing the members of the transmission control block, if you only access those members that will not change once they are created, it is not necessary to ensure that the transmission control block is not released during the access (the reference count will not be reduced to 0). Must hold the transmission control block. But if you want to access those mutable members, you must first lock it. It is very important to keep this principle in mind.

Transmission control block synchronization lock socket_lock_t of linux kernel protocol stack socket layer

1 Lock structure socket_lock_t

2 Access operation lock_sock / release_sock of process context

2.1 lock_sock()

2.2 release_sock()

3 Access operation of soft interrupt context

4 Transmission control block reference count

5 Summary

Guess you like