Discuss three cases of the return value of read

The copyleft of this article is owned by [email protected], released under GPL, and can be freely copied and reprinted. However, please maintain the integrity of the document when reprinting, indicate the original author and the original link, and strictly prohibit any commercial use.
======================================================================================================
 
Today we will discuss the return value of a seemingly simple API "read". What are the return values ​​of read? Under what circumstances does each value occur?
 
Ask the man first: man 2 read
RETURN VALUE
       On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number.  It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer  bytes  are  actually available  right  now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal.  On error, -1 is returned, and errno is set appropriately.  In this case it is left unspecified whether the file position (if any) changes.
 
From the above description, there are three situations for the return value of read:
1. Greater than 0: the number of bytes successfully read;
2. Equal to 0: reach the end of the file;
3. -1: An error occurs, and the specific error value is determined by errno.
Note: This discussion is limited to blocking fds, not non-blocking cases.
 
Through the introduction of this man, it seems that the application of read is very simple, but is it really so? Don't forget that files in Linux are a very common concept. It can be a real file, a socket, a device, etc. For real files, end-of-file EOF is a definite case.
 
So if it is a socket, when is its return value 0? Also, in the process of reading, if it is interrupted by a signal, will it return -1, or return a positive value or 0? When the peer end closes, can the socket still read the data sent before the peer end closes the socket?
 
In order to understand the behavior of sockets, it is necessary to study the code of the corresponding kernel. This time, take the socket of the TCP connection in the unix domain as an example to discuss the behavior of the socket.
 
unix_stream_recvmsg is the read function of the socket of the TCP connection in the unix domain:
  1. static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
  2.              struct msghdr *msg, size_t size,
  3.              int flags)
  4. {
  5.     struct sock_iocb *siocb = kiocb_to_siocb(iocb);
  6.     struct scm_cookie tmp_scm;
  7.     struct sock *sk = sock->sk;
  8.     struct unix_sock *u = unix_sk(sk);
  9.     struct sockaddr_un *sunaddr = msg->msg_name;
  10.     int copied = 0;
  11.     int check_creds = 0;
  12.     int target;
  13.     int err = 0;
  14.     long I fear
  15.     err = -EINVAL;
  16.     if (sk->sk_state != TCP_ESTABLISHED)
  17.         goto out;
  18.     err = -EOPNOTSUPP;
  19.     if (flags&MSG_OOB)
  20.         goto out;
  21.     target = sock_rcvlowat(sk, flags&MSG_WAITALL, size);
  22.     timeo = sock_rcvtimeo(sk, flags&MSG_DONTWAIT);
  23.     msg->msg_namelen = 0;
  24.     /* Lock the socket to prevent queue disordering
  25.      * while sleeps in memcpy_tomsg
  26.      */
  27.     if (!siocb->scm) {
  28.         siocb->scm = &tmp_scm;
  29.         memset(&tmp_scm, 0, sizeof(tmp_scm));
  30.     }
  31.     mutex_lock(&u->readlock);
  32.     do {
  33.         int chunk;
  34.         struct sk_buff *skb;
  35.         unix_state_lock(sk);
  36.         skb = skb_dequeue(&sk->sk_receive_queue);
  37.         if (skb == NULL) {
  38.             if (copied >= target)
  39.                 goto unlock;
  40.             /*
  41.              *    POSIX 1003.1g mandates this order.
  42.              */
  43.             err = sock_error(sk);
  44.             if (err)
  45.                 goto unlock;
  46.             if (sk->sk_shutdown & RCV_SHUTDOWN)
  47.                 goto unlock;
  48.             unix_state_unlock(sk);
  49.             err = -EAGAIN;
  50.             if (!timeo)
  51.                 break;
  52.             mutex_unlock(&u->readlock);
  53.             timeo = unix_stream_data_wait(sk, timeo);
  54.             if (signal_pending(current)) {
  55.                 err = sock_intr_errno(timeo);
  56.                 goto out;
  57.             }
  58.             mutex_lock(&u->readlock);
  59.             continue;
  60.  unlock:
  61.             unix_state_unlock(sk);
  62.             break;
  63.         }
  64.         unix_state_unlock(sk);
  65.         if (check_creds) {
  66.             /* Never glue messages from different writers */
  67.             if ((UNIXCB(skb).pid != siocb->scm->pid) ||
  68.              (UNIXCB(skb).cred != siocb->scm->cred)) {
  69.                 skb_queue_head(&sk->sk_receive_queue, skb);
  70.                 break;
  71.             }
  72.         } else {
  73.             /* Copy credentials */
  74.             scm_set_cred(siocb->scm, UNIXCB(skb).pid, UNIXCB(skb).cred);
  75.             check_creds = 1;
  76.         }
  77.         /* Copy address just once */
  78.         if (sunaddr) {
  79.             unix_copy_addr(msg, skb->sk);
  80.             sunaddr = NULL;
  81.         }
  82.         chunk = min_t(unsigned int, skb->len, size);
  83.         if (memcpy_toiovec(msg->msg_iov, skb->data, chunk)) {
  84.             skb_queue_head(&sk->sk_receive_queue, skb);
  85.             if (copied == 0)
  86.                 copied = -EFAULT;
  87.             break;
  88.         }
  89.         copied += chunk;
  90.         size -= chunk;
  91.         /* Mark read part of skb as used */
  92.         if (!(flags & MSG_PEEK)) {
  93.             skb_pull(skb, chunk);
  94.             if (UNIXCB(skb).fp)
  95.                 unix_detach_fds(siocb->scm, skb);
  96.             /* put the skb back if we didn't use it up.. */
  97.             if (skb->len) {
  98.                 skb_queue_head(&sk->sk_receive_queue, skb);
  99.                 break;
  100.             }
  101.             consume_skb(skb);
  102.             if (siocb->scm->fp)
  103.                 break;
  104.         } else {
  105.             /* It is questionable, see note in unix_dgram_recvmsg.
  106.              */
  107.             if (UNIXCB(skb).fp)
  108.                 siocb->scm->fp = scm_fp_dup(UNIXCB(skb).fp);
  109.             /* put message back and return */
  110.             skb_queue_head(&sk->sk_receive_queue, skb);
  111.             break;
  112.         }
  113.     } while (size);
  114.     mutex_unlock(&u->readlock);
  115.     scm_recv(sock, msg, siocb->scm, flags);
  116. out:
  117.     return copied ? : err;
  118. }
There is only one exit in this function:
  1. return copied ? : err;
copied in unix_stream_recvmsg for the number of bytes read. Then the meaning of this line of code is more obvious. When a certain amount of data has been read, the return value of read is the number of bytes read. When no data has been read, err is returned.
 
Let's look at the two questions raised above:
1. When the return value is 0;
2. What is the return value of read when interrupted by a signal?
3. After the peer end is closed, can it continue to read the data sent before the peer end closed?
 
Let's look at the first question: when to return a value of 0. This requires copied to be 0 and err to be 0.
  1.             err = sock_error(sk);
  2.             if (err)
  3.                 goto unlock;
  4.             if (sk->sk_shutdown & RCV_SHUTDOWN)
  5.                 goto unlock;
These few lines of code tell us the answer. First of all, these lines of code will run and arrive when there is no data in the receive queue of the socket. When there is no error in the socket, it will continue to insist on the RCV_SHUTDOWN flag bit of the socket. If the flag bit is set, goto to unlock until the final return statement. At this point, when copied is 0, err will also be 0.
The sk->sk_shutdown flag will be set in two cases, see the unix_shutdown function. In the unix_shutdown function, triggered by API close or shutdown, it not only sets the sk_shutdown flag bit of the local end, but also sets the corresponding sk_shutdown flag bit of the peer end. Therefore, whether the local end or the peer end calls shutdown or close, it may cause the read of the local end to return to 0. The reason why it may be caused here is because the shutdown behavior can be specified during shutdown, whether to close sending or receiving.
 
The second question, what is the return value when interrupted by a signal?
  1.             timeo = unix_stream_data_wait(sk, timeo);
  2.             if (signal_pending(current)) {
  3.                 err = sock_intr_errno(timeo);
  4.                 goto out;
  5.             }
这几行代码是这个问题的答案。这几行代码同样是处于receive queue为空的判断中。那么,这说明如果receive queue中已有数据,且大于要读取的字节数,那么在这个函数中,根本就不会去判断是否有pending的信号,返回值为读取的字节数。如果receive queue中没有足够的数据,那么就会运行到此处,检查是否有pending的信号。当有信号pending时,这时的返回值就决定于copied的值。如果已读取了一些字节,那么就返回copied即已读取了的字节数——小于我们要去读取的字节数。如果没有读取任何字节,那么就返回-1,且errno为EINTR。
 
第三个问题,对端关闭后,是否可以读取对端在关闭之前发送的数据。
从前面两个问题以及第一个问题的答案。这个问题的答案也很明显了。在unix_stream_recvmsg中,只要receive queue中有数据,那么就不会去检查是否sk_shutdown的标志。所以答案是,即使对端关闭socket,本端仍可以读取对端在关闭之前发送的数据。
 
本文只探讨了unix域的TCP连接的socket的读取情况。至于其它种类的socket的read行为,我猜测Linux仍然会采取一致的行为——有心人可以去看代码验证一下。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325085003&siteId=291194637