Remember a reading of the socket source code in the Linux kernel

After being familiar with the principles of the TCP protocol, we know that the process and algorithm of TCP are very complicated because it maintains a reliable connection. However, in actual development, generally only a few functions provided by the api need to be called. What's more, various frameworks now wrap the network layer, leaving only the read and write calls at the application layer, which undoubtedly greatly reduces development costs.

      However, we have a question "How are sockets implemented under Linux?"


1. Principle and use

      Generally speaking, to create a socket using the socket interface, use the following constructor.

 int socket(int domain, int type, int protocol)

      Domain refers to PF_INET, PF_INET6, and PF_LOCAL, etc., which represent socket types such as IPV4, IPV6, or domain sockets.

     The available values ​​for type are: SOCK_STREAM: indicates a byte stream, corresponding to TCP; SOCK_DGRAM: indicates a datagram, corresponding to UDP; SOCK_RAW: indicates a raw socket.

      Let's take a look at an example of creating a server. First, use the socket interface to create a socket, and then call the bind function to bind the local port.

int make_socket (uint16_t port){ int sock; struct sockaddr_in name; /* Create a byte stream type IPV4 socket. */ sock = socket (PF_INET, SOCK_STREAM, 0); if (sock <0) {perror ("socket" ); exit (EXIT_FAILURE);} /* Bind to port and ip. */ name.sin_family = AF_INET; /* IPV4 */ name.sin_port = htons (port); /* Specify port*/ name.sin_addr.s_addr = htonl (INADDR_ANY); /* Wildcard address*/ /* Convert the IPV4 address into a universal address format, and pass the length at the same time*/ if (bind(sock, (struct sockaddr *) &name, sizeof (name))< 0) {perror ("bind"); exit (EXIT_FAILURE);} return sock;}//Then the server needs the listen port, accept connection

     

2. Linux source code reading

     The socket of the Linux source code starts from the system call, as shown below:

SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol){  int retval; struct socket *sock; int flags;   ......  if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK))    flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;    retval = sock_create(family, type, protocol, &sock);//①    ......    retval = sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK));    ......    return retval; }
       Among them, the sock_create function is to create a socket, and then call sock_map_fd to bind with the file descriptor, because everything is a file under Linux.       In order to facilitate the reading of the source code below, I have drawn a flow chart, and read it along the lines of this flow chart.

      We look at the sock_create function at ① and call the following __sock_create function.

int __sock_create(struct net *net, int family, int type,            int protocol,struct socket **res, int kern){  int err;  struct socket *sock;  const struct net_proto_family *pf;    ......  sock = sock_alloc();   ......  sock->type = type;   ......  pf = rcu_dereference(net_families[family]);    ......  err = pf->create(net, sock, protocol, kern);    ......  *res = sock;  return 0;}
      Here is mainly calling the sock_alloc function to allocate a struct socket structure. Then call the rcu_dereference function to see what this function does. The structure of the parameter net_families is as follows:
static const struct net_proto_family inet_family_ops = {.family = PF_INET, .create = inet_create,//This is used for socket system call creation...}
       So far, that means that the net_families array is an array of protocol clusters, and each element corresponds to a protocol, such as the IPV4 protocol cluster and the IPV6 protocol cluster. The parameter passed in the socket in our example above is PF_INET , up to the net_proto_family structure here . In fact, it is based on the socket to find the callback that should be called all the way. Therefore, pf-> the Create function is called net_proto_family in inet_create callback function.
static int inet_create(struct net *net, struct socket *sock,                         int protocol, int kern){  struct sock *sk;  struct inet_protosw *answer;  struct inet_sock *inet;  struct proto *answer_prot;  unsigned char answer_flags;  int try_loading_module = 0;  int err;  /* Look for the requested type/protocol pair. */ lookup_protocol:   list_for_each_entry_rcu(answer, &inetsw[sock->type], list) {    err = 0;    /* Check the non-wild match. yishuihan*/    if (protocol == answer->protocol) {      if (protocol != IPPROTO_IP)        break;    } else {      /* Check for the two wild cases. */      if (IPPROTO_IP == protocol) {        protocol = answer->protocol;        break;      }      if (IPPROTO_IP == answer->protocol)        break;    }    err = -EPROTONOSUPPORT;  }......  sock->ops = answer->ops;  answer_prot = answer->prot;  answer_flags = answer->flags;......  sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot, kern);......}
      The list_for_each_entry_rcu function is used to view inetsw[sock->type] in a loop, that is, the inetsw array is an array of protocols. For example, the tcp protocol and the UDP protocol correspond to a protocol element. Therefore, here is to find the array element corresponding to the SOCK_STREAM parameter. Finally, the inet_init function is called .
static int __init inet_init(void){  /* Register the socket-side information for inet_create. */   for (r = &inetsw[0]; r < &inetsw[SOCK_MAX]; ++r)    INIT_LIST_HEAD(r);  for (q = inetsw_array; q < &inetsw_array[INETSW_ARRAY_LEN]; ++q)    inet_register_protosw(q);   //省略其他代码... yishuihan}
       The first for loop here forms the inetsw array into a linked list, because there are many types of protocols, such as TCP, UDP, and so on. The second loop is to register the inetsw_array array to the inetsw array. The structure defined by inetsw_array is shown below. For example, a lot of tcp sockets are created. Here, it can be seen that each socket must be associated with the inetsw element of the tcp protocol.
static struct inet_protosw inetsw_array[] = {{.type = SOCK_STREAM, .protocol = IPPROTO_TCP, .prot = &tcp_prot, .ops = &inet_stream_ops, .flags = INET_PROTOSW_PERMANENT| INET_PROTOSW_ICSK, }, //Omit other protocols, such as . yishuihan}


       From the list_for_each_entry_rcu loop of inet_create, this is in the inetsw array, find the list of this type according to the type, and then compare the protocol of struct inet_protosw in the list is the protocol specified by the user; if it is, the user-specified protocol is obtained. *Answer object of type struct inet_protosw of family->type->protocol.
       Next, the ops member variable of struct socket *sock is assigned the ops of answer. For TCP, it is inet_stream_ops. Any subsequent user operations on this socket are carried out through inet_stream_ops. Next, we create a struct sock *sk object.


      Socket and sock look almost the same. In fact, the socket is responsible for providing an interface to the user, and it has been associated with the file system. The sock is responsible for docking the kernel network protocol stack down. In the sk_alloc function, the tcp_prot of the struct inet_protosw *answer structure is assigned to the sk_prot member of the struct sock *sk. The definition of tcp_prot is as follows, which defines many functions, which are all actions of the kernel protocol stack under sock. The callback function of tcp_prot is as follows, which is the content of the tcp protocol that we are more familiar with.

struct proto tcp_prot = {  .name      = "TCP",  .owner      = THIS_MODULE,  .close      = tcp_close,  .connect    = tcp_v4_connect,  .disconnect    = tcp_disconnect,  .accept      = inet_csk_accept,  .ioctl      = tcp_ioctl,  .init      = tcp_v4_init_sock,  .destroy    = tcp_v4_destroy_sock,  .shutdown    = tcp_shutdown,  .setsockopt    = tcp_setsockopt,  .getsockopt    = tcp_getsockopt,  .keepalive    = tcp_set_keepalive,  .recvmsg    = tcp_recvmsg,  .sendmsg    = tcp_sendmsg,  .sendpage    = tcp_sendpage,  .backlog_rcv    = tcp_v4_do_rcv,  .release_cb    = tcp_release_cb,  .hash      = inet_hash,  .get_port    = inet_csk_get_port,    ......}

3 summary

    The Socket system call has three levels of parameters family, type, and protocol. Through these three levels of parameters, the type linked list is found in the net_proto_family table, and the protocol corresponding operation is found in the type linked list. This operation is divided into two layers. For the TCP protocol, the first layer is the inet_stream_ops layer, and the second layer is the tcp_prot layer. Respectively correspond to the operation of the application layer and the kernel layer.


Guess you like

Origin blog.51cto.com/15060546/2641165