TCP source - system call

 

====================================================||

Welcome to the discussion of technology can add another micro letter: windgs (please note csdn + xx occupation)

====================================================||

table of Contents

1、socket

2、bind

inet_bind:

inet_csk_get_port: SO_BINDTODEVICE, SO_REUSEADDR, SO_REUSEPORT- used in the production of more than

inet_get_local_port_range:

inet_is_local_reserved_port:

Q: not bind operations can listen directly thing? Repeats bind operation can do? listen state rebind? --- can refer to

3、listen

inet_listen:

inet_csk_listen_start:

__inet_hash

4、accept

inet_csk_accept:

Q: multiple processes or threads simultaneously accept that first awakened? --- can look

Q: When the process is carried accept fork, which only child processes close operation, connection will be closed Well? --- can look

5、connect

__inet_stream_connect:

tcp_v4_connect:

tcp_twsk_unique:

tcp_disconnect:

Q: connect time to set up a connection to 0.0.0.0:0 how to deal with?


1、socket

SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol)  

sys_socket-> sock_create -> __ sock_create ( the parameter validity checking ) -> sock_alloc ( application allocates a new inode, initialization inode-> i_op = sockfs_inode_ops, created Socket ) -> [__ sock_create] inet_create [pf-> Create] -> [sys_socket] sock_map_fd [ dispensing fd, Create file, and interconnected between the sock and ] -> sock_alloc_file ( create a file in the sockfs provided sock-> = file file, File-> = sock of private_data ) -> alloc_file ( in sockfs Create a file provided File-> = socket_file_ops the f_op ) -> [sock_map_fd] fd_install ( related fd and file, to current-> Files-> FDT [fd] = file )

inet_create ( our sock type Socket, our sock as sk, search for a corresponding answer from the inetsw_array, sock-> ops = answer-> ops , and create an initialization sk associated with our sock ) -> sk_alloc ( created from slab tcp_prot tcp_sock, the initialization SK-> sk_prot = SK-> sk_prot_creator = answer-> Prot ) -> [inet_create] sock_init_data ( SK-> = sk_socket our sock, sock-> = SK SK, and other members of the initialization sk_data_ready ) -> [inet_create] tcp_v4_init_sock [ sk -> sk_prot-> the init] ( icsk-> icsk_af_ops = & ipv4_specific, tcpmd5 dependent initializing tcp_sk (SK) -> af_specific = & tcp_sock_ipv4_specific ) -> tcp_init_sock ( initialize cwnd, sndbuf related parameters like TCP )

 

sock_mnt {sock_init} [sock_alloc]: sock_fs_type file system superblock, the file system-related operations sockfs_ops sockfs_dentry_operations

inet_family_ops {inet_init}: famile corresponding to the final call by calling a function pointer inet_create to creat the operating PF_INET, sys_socket system

 

 

2、bind

SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen)

sys_bind-> sockfd_lookup_light (fd corresponding to find a large socket) -> [sys_bind] move_addr_to_kernel (copy user-space structure to sockaddr_storage) -> inet_bind [sock-> ops-> bind] -> inet_csk_get_port [sk-> sk_prot-> get_port] -> inet_csk_bind_conflict [inet_csk (sk) -> icsk_af_ops-> bind_conflict]

 

sockfd_lookup_light (fd according to find the socket) -> sock_from_file (return file-> private_data)

 

inet_bind:

1, based on ip_nonlocal_bind parameter settings and IP_TRANSPARENT, IP_FREEBIND option to decide whether to allow non-legally binding IP

2, if the binding port number lower than 1024, the right to determine whether binding

3, TCP_CLOSE before the state and are not bound through port (inet-> inet_num is not bound expressed before 0) in order to rebind

4, ip address binding, inet-> inet_rcv_saddr = inet-> inet_saddr = addr-> sin_addr.s_addr. inet_rcv_saddr used to make hash lookup, inet_saddr used for transmission, broadcast and multicast addresses set inet_saddr = 0.

5, IP_BIND_ADDRESS_NO_PORT option is set to 1 when the reference is not allowed into the port number is 0, or 0 for automatic selection by the kernel into the reference port number

6, calling inet_csk_get_port [sk-> sk_prot-> get_port] port binding operation

7, if successfully set up a valid port number, perform the following update

 

  1. if(inet->inet_rcv_saddr)
  2. sk->sk_userlocks |= SOCK_BINDADDR_LOCK;
  3. if(snum)
  4. sk->sk_userlocks |= SOCK_BINDPORT_LOCK;
  5. inet->inet_sport = htons(inet->inet_num);
  6. inet->inet_daddr =0;
  7. inet->inet_dport =0;

 

inet_csk_get_port: SO_BINDTODEVICE, the SO_REUSEADDR, used in the production of multi-SO_REUSEPORT-

SO_BINDTODEVICE option binding interfaces may be provided, i.e. sk-> sk_bound_dev_if, this option is provided by the different interfaces may be bound to the same port and address ip

The SO_REUSEADDR : As long as not listen state can repeat bind the same address port . But even if this option is set, the port is automatically selected when selected will still try to avoid duplicate IP and port

SO_REUSEPORT : as long as the user uid same or bound in sk TW state can seize. After setting SO_REUSEPORT option, a user with two sockets can listen at the same time the same port and address, and SO_REUSEADDR versa .

 

inet_get_local_port_range:

Ip_local_port_range parameters by sequentially acquiring lock protection

 

inet_is_local_reserved_port:

Parameter setting ip_local_reserved_ports Bit arrays stored in the data structure, determining whether a port is reserved port (CONFIG_SYSCTL macro needs to be compiled into effect)

 

inet_bind_hash[inet_csk_get_port]:

Update inet_sk (sk) -> inet_num = snum, and added to the corresponding sk tb owners of the queue

 

Q: not bind operations can listen directly thing? Repeats bind operation can do? listen state rebind? --- can refer to

A: When not bind directly listen port will be automatically selected . Bind operation can not be repeated . Bind can only be operated at a closed state .

 

 

3、listen

SYSCALL_DEFINE2(listen, int, fd, int, backlog)

sys_listen ( will take a small transfer between the backward reference backlog and somaxconn ) -> sockfd_lookup_light -> [sys_listen ] inet_listen [sock-> ops-> listen] -> inet_csk_listen_start-> (reqsk_queue_alloc parameter initialization icsk_accept_queue, including TFO queue ) -> [inet_listen] inet_csk_get_port [ sk-> sk_prot-> get_port] -> [inet_listen] inet_hash [sk-> sk_prot-> hash] -> __ inet_hash

 

 

sock_net (sock-> sk) -> core.sysctl_somaxconn: / proc / sys / net / core / somaxconn default 128

 

inet_listen:

  1. socket state operation is only carried listen and type SOCK_STREAM to SS_UNCONNECTED

  2. TCP is the only state CLOSE or LISTEN to carry out the operation listen, when to listen operate under the LISTEN state re-update backlog , sk-> = sk_max_ack_backlog backlog, maxlen FASTOPEN queue can not be updated after listen .

  3. tcp_fastopen set: If TFO_SERVER_ENABLE (2) have set, and no option through the socket when the queue length initialization fastopen

    If TFO_SERVER_WO_SOCKOPT1 (0x400) set, the update fastopenq.max_qlen = min (backlog, somaxconn)

    If TFO_SERVER_WO_SOCKOPT2 (0x800) set, the update fastopenq.max_qlen = min (backlog, tcp_fastopen)

 

inet_csk_listen_start:

Initialization accept queue, the queue logic TFO, TFO the RST queue, semi logical connection queue, delay ACK etc.

[Sk-> sk_prot-> get_port] successfully acquired when the port through inet_csk_get_port, adding to listen queue sock through inet_hash

 

__inet_hash

The local port and to the net space bound hash hash bucket, sk-> sk_prot-> h.hashinfo-> listening_hash [inet_sk_listen_hashfn (sk)], listening_hash hash table size is fixed INET_LHTABLE_SIZE (32).

 

4、accept

SYSCALL_DEFINE3(accept, int, fd, struct sockaddr __user *, upeer_sockaddr, int __user *, upeer_addrlen) ->sys_accept4(fd, upeer_sockaddr, upeer_addrlen, 0)

SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr, int __user *, upeer_addrlen, int, flags)

sys_accept4-> sockfd_lookup_light -> [sys_accept4] sock_alloc ( allocation assigned inode, create a new socket Child) -> [sys_accept4] get_unused_fd_flags ( distribution fd ) -> [sys_accept4] sock_alloc_file ( create a file in the file system ) -> [sys_accept4] inet_accept [sock-> ops-> accept] ( final state is set newsock SS_CONNECTED ) -> inet_csk_accept [sk1-> sk_prot-> accept] -> inet_csk_wait_for_connect ( waits for a connection in accordance with timeout ) -> reqsk_queue_remove ( taken from the accept queue head) - > [inet_accept] ( associated newsock and SK2 )

 

inet_csk_accept:

If the sock status is not LISTEN, an error is returned

If accept queue is empty, O_NONBLOCK selected based on the timeout wait or return immediately. Timeout sk-> sk_rcvtimeo default MAX_SCHEDULE_TIMEOUT (signed longmax), can SO_RCVTIMEO to set options, the setting value is rounded up to the HZ accuracy.

After accept, directly connected to the general release req, under TFO if the connection is not yet complete three-way handshake is set req-> sk = NULL, otherwise the same release req.

 

Q: multiple processes or threads simultaneously accept that first awakened? --- can look

A: Multi- programs generated by the operation process accept the time, based on random random number selection, reference __inet_lookup_listener

Accept a plurality of threads when multiple threads actually corresponds to a SK , we will accept their wait time descriptors to wait tail of the queue. When you wake up it is to wake up from the head, and therefore multi-threaded accept when the first accept the first thread to be awakened .

By fork generated plurality of processes accept actually corresponds a SK , processing the same multithreaded accept .

Q: When the process is carried accept fork, which only child processes close operation, connection will be closed Well? --- can look

A: not , after fork parent and child processes actually a reference to a file descriptor, the need for parent and child processes are close, TCP connection will be closed.

 

 

 

5、connect

SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr, int, addrlen)

sys_connect->sockfd_lookup_light->[sys_connect]inet_stream_connect[sock->ops->connect]->__inet_stream_connect->tcp_v4_connect[sk->sk_prot->connect]->[__inet_stream_connect]inet_wait_for_connect

 

__inet_stream_connect:

Connection destination address can not be AF_UNSPEC

CLOSED socket is connected to state SS_UNCONNECTED, TCP state

The timeout calculated O_NONBLOCK

After a successful connection socket status update is SS_CONNECTED

 

tcp_v4_connect:

There are options such as over ip, then initialize the next hop address

Obtain the route to the next hop, if the acquisition fails or is routed multicast or broadcast, an error is returned, if not specified the source address to find the source address according to the routing initialization

 

__inet_check_established->twsk_unique->tcp_twsk_unique[sk->sk_prot->twsk_prot->twsk_unique]

__inet_check_established:

Determining whether the selected port connection and ehash conflict,

1, if the source address found in ehsah, the destination address, source port, destination port, interface, net namespaces are exactly the same sk2,

If timewait sk2 state and returns tcp_twsk_unique 1, put and delete sk inserted ehash tw (sk2) from ehash in tw and may be removed from the parameters according bhash

Otherwise the port can not be used, i.e. the connection can not be repeated

2, if no match is found sk2 in ehash in the same port assignments successfully, sk join ehash.

 

tcp_twsk_unique:

If tcptw recording time stamp information, and recording time stamp information from current time exceeds 1s, tcp_tw_reuse effective when put into tcptw sk recorded in the recording and returns an

 

tcp_connect[tcp_v4_connect]:

Initiates the connection parameters tcp_connect_init

Distribution SKB, initialization SKB, added to the write queue, ECN obtain information about connections

Transmitting SKB, using tcp_send_syn_data the TFO, ordinary connection tcp_transmit_skb

Start the retransmission timer ICSK_TIME_RETRANS

 

tcp_disconnect:

 

Q: connect time to set up a connection to 0.0.0.0:0 how to deal with?

A: 0 for the destination port, destination address and source address may be set according to the destination and source addresses chosen route of entry, will eventually set to 127.0.0.1, a source port will be selected according to inet_hash_connect, when selected based on the source address, destination address and destination port Mr. into a random offset, the offset is then randomly selected according to the source port

 

Q: __ inet_hash_connect port bhash at the scene of conflict, how to add to the list of ehash?

Sk sk If this is the first port bind, then insert ehash directly __inet_hash_connect, otherwise inserted into ehash by __inet_check_established

 

 

Here, for the first question of the close call with the natural conclusion: single-threaded (process) used in close and multi-threading is the same, but the behavior of both the multi-process is not consistent, multi-process shared with a socket must call the close will really close the connection.

Guess you like

Origin blog.csdn.net/Windgs_YF/article/details/94739220