Advanced programming in Unix environment - blocking access principle - waiting queue

  Sometimes, a system call may not be able to get or send data immediately: if a temperature collector does not adopt the strategy of interrupting or polling, it will collect when the user sends a request, and return the result after a certain period of time. If the user program wants to call read or write and can ensure the desired result when the call returns, the user program should block until there is a result or an error and return. The blocking of the user program is reflected in the sleep of the process, that is, in the system call Switch the process state to sleep state.
 
  Sleep and wait queues

  Sleeping a process means that its process state identifier is set to sleep and removed from the scheduler's run queue until some event occurs that wakes them up from sleep, where the process will not be The CPU is scheduled and, if not woken up, it will never be run.

  It is easy to make the current process sleep by scheduling and other methods in the driver, but the process cannot enter the sleep state at any time. 

    The first rule is: do not sleep when running in an atomic context: such as holding a spinlock, sequential lock, or RCU lock. 

    Can't sleep even during shutdown.

    Sleeping while holding a semaphore is legal, but the semaphore it holds should not affect the execution of the process that woke it up. In addition any threads waiting on the semaphore will also sleep, so any sleep that occurs while holding the semaphore should be short-lived.

    After the process wakes up, the wait event should be checked to make sure it actually happened.

  The wait queue can finish sleeping a process and wake it up when an event occurs, and it consists of a list of processes. In Linux, a waiting queue is managed by a "waiting queue head": 

linux/wait.h
struct __wait_queue_head {
    spinlock_t lock;
    struct list_head task_list;
};
typedef struct __wait_queue_head wait_queue_head_t;

  Since the sleeping process is likely to wait for an interrupt to change some state or announce the occurrence of some event, the interrupt context is likely to modify the waiting queue, so the spin lock in this structure must consider the prohibition of interrupts. Also using spin_lock_irqsave.

  The members of the queue are instances of the following data structures, which form a doubly linked list: 

typedef struct __wait_queue wait_queue_t;
typedef int (*wait_queue_func_t)(wait_queue_t *wait, unsigned mode, int flags, void *key);
int default_wake_function(wait_queue_t *wait, unsigned mode, int flags, void *key);
struct __wait_queue {
    unsigned int flags;
#define WQ_FLAG_EXCLUSIVE    0x01
    void *private;
    wait_queue_func_t func;
    struct list_head task_list;
};

 

The value of flags is either 0 or WQ_FLAG_EXCLUSIVE. The latter indicates that the waiting process wants to be woken up exclusively.
The private pointer points to the task_struct instance of the waiting process. This variable can essentially point to any private data, which is only rarely used by a single kernel.
Call func to wake up the waiting process.
task_list is used as a linked list element to place wait_queue_t instances into the waiting queue.

 

  In order to use a waiting queue, the following steps are usually required: First, a waiting queue head should be created:

DECLARE_WAIT_QUEUE_HEAD(name);

  Another way is to declare it statically and initialize it explicitly: 

wait_queue_head_t wait_queue;
init_waitqueue_head(&wait_queue);

  Then, in order to put the current process to sleep and wait for an event to occur, it needs to be added to the waiting queue. The kernel provides the following functions to complete this function:

wait_event(queue, condition);
wait_event_interruptible(queue, condition);
wait_event_timeout(queue, condition, timeout);
wait_event_interruptible_timeout(queue, condition, timeout);

  In all forms, the parameter queue is the queue head to wait. Since these functions are implemented by macros, the queue head here is not a pointer type, but a direct use of it. The condition condition is an arbitrary boolean expression that is required by these macros before and after sleep. The process continues to sleep until the condition evaluates to true. 

  A process that goes to sleep through wait_event is uninterruptible, and the state member of the process sets the TASK_UNINTERRUPTIBLE bit. But it should be replaced by wait_event_interruptible, which can be interrupted by a signal, which means that the user program can interrupt the execution of the program by a signal during the waiting process. A program that cannot be interrupted by a signal can easily irritate its users. The wait_event function has no return value, while wait_event_interruptible has a return value - ERESTARTSYS that recognizes that sleep was interrupted by some signal. 

  wait_event_timeout and wait_event_interruptible_timeout means to wait for a period of time, expressed in ticks, after which the macro returns a 0 value regardless of whether the event occurs or not. 

  Finally, we need to wake up the sleeping processes on these queues through corresponding functions in other processes or threads (and possibly interrupts). The kernel provides the following functions: 

void wake_up(wait_queue_head_t *queue);
void wake_up_interruptible(wait_queue_head_t *queue);
wake_up wakes up all processes waiting on the given queue.
wake_up_interruptible wakes up all interruptible sleeping processes waiting on the given queue.
 

  Although wake_up can replace the functions of wake_up_interruptible, they should use the function corresponding to wait_event. It is feasible to realize the read and write of a pipeline through the waiting queue. The implementation of the pipeline in fs/pipe.c in the kernel is based on the waiting queue, although it is a bit complicated. In addition, for the device driver, after a temperature collector receives a read data request, the process is put into the waiting queue, and then the boolean variable that wakes it up is set to true in the interrupt handler corresponding to the device.

  Note that the wake_up_interruptible call may wake up multiple sleeping processes, and they have exclusive access to a resource. How to make only one process see this true value, this is the role of WQ_FLAG_EXCLUSIVE, and other processes will continue to sleep.

 

  Waiting queue implementation principle

  The core implementation of the wait_event function is as follows:

#define __wait_event(wq, condition)                     \
do {                                    \
    DEFINE_WAIT(__wait);                        \
                                    \
    for (;;) {                            \
        prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE);    \
        if (condition)                        \
            break;                        \
        schedule();                        \
    }                                \
    finish_wait(&wq, &__wait);                    \
} while (0
DEFINE_WAIT registers a queue element called __wait that contains a hook function called autoremove_wake_function that wakes the process and removes the element from the waiting queue.
prepare_to_wait is used to count the queue elements into the waiting queue, and specify the state of the process as TASK_UNINTERRUPTIBLE. Of course, corresponding to wait_event_interruptible, it is TASK_INTERRUPTIBLE.
The for infinite loop determines that the current process is always scheduled when the condition is not met, and other processes will replace the process for execution. and this loop actually executes only once forever, and only directly on wakeup
When the condition is met, finish_wait sets the process state to TASK_RUNNING and removes it from the waiting queue.

  What needs to be carefully considered is the execution of the for loop. Obviously, it may be executed once or multiple times. When the condition is not satisfied, a scheduling will occur, and when it is scheduled here, the next loop of for will be executed, then prepare_to_wait Isn't the __wait element added every time? Looking at the prepare_to_wait code, it can be found that only when the linked list pointed to by wait->task_list is empty, that is, when the __wait element has not been added to any other waiting queue, it will be added to the current waiting queue, which also indicates that a waiting queue element can only be added to the current waiting queue. Join a waiting queue. 

void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state)
{
    unsigned long flags;
    wait->flags &= ~WQ_FLAG_EXCLUSIVE;
    spin_lock_irqsave(&q->lock, flags);
    if (list_empty(&wait->task_list))
        __add_wait_queue(q, wait);
    set_current_state(state);
    spin_unlock_irqrestore(&q->lock, flags);
}

  Wake up a waiting queue is completed through the wake_up series of functions, and some of the wake-up functions have corresponding interruptible forms: 

#define wake_up(x)            __wake_up(x, TASK_NORMAL, 1, NULL)
#define wake_up_nr(x, nr)        __wake_up(x, TASK_NORMAL, nr, NULL)
#define wake_up_all(x)            __wake_up(x, TASK_NORMAL, 0, NULL)
#define wake_up_locked(x)        __wake_up_locked((x), TASK_NORMAL)

  Their core implementation is analyzed here: 

kernel/sched.c
void __wake_up(wait_queue_head_t *q, unsigned int mode,
            int nr_exclusive, void *key)
{
    unsigned long flags;
    spin_lock_irqsave(&q->lock, flags);
    __wake_up_common(q, mode, nr_exclusive, 0, key);
    spin_unlock_irqrestore(&q->lock, flags);
}

  __wake_up first acquires the spinlock, then calls __wake_up_common. This function traverses the waiting queue through list_for_each_entry_safe, and if the exclusive flag is not set, wakes up each sleeping process according to mode. nr_exclusiv indicates the number of processes that need to wake up with the exclusive flag set. It is set to 1 in wake_up, indicating that when a process with the WQ_FLAG_EXCLUSIVE flag is processed, it will no longer be processed, and the meaning of the exclusive flag is also here. Also see here that the real wake-up function is executed through the func pointer. 

kernel/sched.c
static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
            int nr_exclusive, int wake_flags, void *key)
{
    wait_queue_t *curr, *next;
    list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
        unsigned flags = curr->flags;
        if (curr->func(curr, mode, wake_flags, key) &&
                (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
            break;
    }
}

  If the process with the exclusive flag is not located at the end of the queue, the subsequent processes that do not contain the flag cannot be executed. prepare_to_wait_exclusive solves this problem. It always inserts the process with the exclusive flag into the tail of the queue. This function is called by wait_event_interruptible_exclusive macro call.

  From: http://blog.chinaunix.net/uid-20608849-id-3126863.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324920147&siteId=291194637