Re-read "Top-Down" --- some new understanding about computer network (Part 1)

Reread from top to bottom --- some new understandings about computer network (Part 1)

foreword

  Recently, I took the time to re-read "Computer Networking - A Top-Down Approach". There are a lot of new understandings in the process, the summary is as follows, I hope it can be convenient for others and myself.

Note: This article is a summary of rereading and reviewing nature, and many understandings come from books. At the same time, the following understanding of the network does not have too many chapter rules in terms of expression (basically from the main line of each network level), and most of them are listed in the form of entries. Please forgive me if I feel uncomfortable reading ( / ω\ )


1. Introduction

1. How many forms of access network?

(1) Dial-up
(2) DSL
(3) Cable
(4) FTTH
(5) Ethernet
(6) WiFi
(7) Wide-Area Wireless Access

2. How is the data transmitted in the network?

  • Circuit switched: Time division multiplexing and frequency division multiplexing can be used
  • Message exchange: can be understood as statistical time division multiplexing

3. How to understand the Internet?

Simply put, the Internet can be understood as a network of connected networks. As shown below.
insert image description here

4. How many indicators to measure network performance?

  • Several delays during packet transmission

    • processing delay
    • queuing delay
    • send delay
    • transmission delay
  • Packet loss rate

  • throughput


5. What is involved in network security?

  • Dissemination of malware: such as viruses, Trojans, worms, etc.
  • Attack servers and network infrastructure, such as Dos tools, such as sending well-designed msg, bandwidth flood attack, connection attack
  • Network monitoring, such as sniff packets
  • Communication masquerading, masquerading as a trusted endpoint


6. Excuse me, when the ip data packet is forwarded through the route, do the source ip and destination ip change?
Generally speaking, the ip address remains unchanged during packet forwarding, but the mac address will change. Unless done Senat address translation. To some extent, a packet crosses from one "local area network" to another "local area network" in the process of network transmission (this is also the reason why the Internet is regarded as a network connected by the network), in this process, ip The address is always the same, but the destination mac address is different in each LAN (usually the mac address of the next router).

For details, please refer to: When the ip data packet is forwarded through the route, whether the source ip and destination ip change


7. For end users or non-network personnel (ordinary programmers), the terminal (end system) is the focus of attention, and various applications are running on it. To some extent, the network part in the middle only provides a "data transmission" service.


2. Application layer

1. What are the two main architectural solutions for network applications?

  • CS
  • P2P


2. A network application includes many components, and the protocol ((used to specify the format of the message and the timing of exchange) may be only a part of it; for example, a web application includes a web server, a web client, html (web page format), and There are http protocols, etc.

3. Two ways of http connection?

  • Long connection
  • Short connection

    4. Web cache is essentially a server cache. When the browser supports the proxy, it will reduce the access pressure to the original server (original web server) to a certain extent. It is a type of proxy server (proxy) (there are other forms of proxy servers).

    5. Architecture diagram of email service


insert image description here
 Regarding the smtp protocol (Push method), the relationship between the   user push from the sending end to the mail server and POP3, IMAP and HTTP (Pull method), the receiving end pulls from the server segment through the pull method
insert image description here

6. What is the difference between the smtp mail protocol and the http protocol?
To some extent, like the http protocol, smtp is also a file transfer protocol. But they also have some differences
(1) The http protocol mainly uses the pull method to transfer files. However, smtp relatively uses the push method to send emails
(2) The smtp protocol requires that each message must be of ASCII type. If there is binary data in the message, it must be re-encoded. But http has no such limitation
(3) If a file contains text and image, then for http, it treats these two kinds of data as different objects and transmits them separately. And smtp transmits them in the same message.


7. Several functions of DNS

  • DNS
  • host aliasing (host alias)
  • load balancing


8. DNS record format
insert image description here
insert image description here

9. For DNS, from a macro point of view, it is a distributed database. The data in this database is spread across many places around the world. The reason why it is designed to be decentralized is because the centralized database has the following problems:
(1) single point of failure (a single point of failure)
(2) performance problem (performance problem). Including request load pressure and dns huge capacity problem
(3) remote distance problem (distant centralized database). That is, different access distances to dns services are different, and it is impossible to take advantage of the short-distance advantages of distributed multi-server


3. Transport layer

1. The transport layer can provide services to the application layer

  • reliable transmission
  • Throughput guarantee (that is, to guarantee a certain throughput)
  • Timing (that is, to ensure certain timing rules)
  • Security Guarantee


2. Services provided by TCP

  • connection-oriented
  • Reliable data transmission (for the host)
  • Congestion control (for the entire network)


3. Services provided by UDP

  • Transfer data as best as possible (ie, there may be data loss and unorder of data)


4. How reliable is the reliable data transmission provided by TCP?

    The reliability provided by tcp mentioned here mainly refers to whether the data sent by the sender is the same as the data received by the receiver. In other words, will the transmitted data be wrong (a certain bit or a certain combination of bits is wrong) due to hardware or software errors, but can pass the layer-by-layer monitoring mechanism (CRC checksum of the Ethernet, ip checksum of the network layer) and the checksum of the tcp layer)

The answer is, yes. That is to say, although tcp is a reliable data transmission mechanism, it is not completely reliable transmission. That is to say, it is possible (of course, the possibility is very small) that the data sent by the sender is inconsistent with the data received by the receiver.

Generally speaking, for strict network applications, it is necessary to add its own set of detection mechanisms in the application layer, such as md5, checksum, etc.

Of course, according to my understanding, even if a detection mechanism is added to the application layer, although the possibility of errors can be greatly reduced, it is still a question whether the probability can be reduced to 0. I didn't find the exact answer, if anyone knows, please advise.

The following [2]-[6] are relevant information.


5. The original TCP does not provide secure data transmission services (such as encryption, etc.), but the improved version of TCP, such as ssl (some also use ssl as an application layer function) can provide services for secure data transmission. If the application layer wants to use the ssl service, it generally needs to include the corresponding library in the application layer code.


6. On the current Internet, the transport layer does not provide throughput and timing guarantees, so for applications with low latency requirements, the application layer needs to take this into consideration.


7. Some questions about port multiplexing?
  For general network programs, a socket can determine a pair of connection terminals, where ip is mainly used to determine the host, and port is mainly used to determine the network process running in the host. However, most of the current operating systems provide the function of port multiplexing, that is to say, multiple processes can share a port, or multiple processes can be bound to one ip+port to improve the performance of network programs. In this case, how does the operating system know which process to forward the request to? According to my understanding, the operating system will make a record of multiple processes (sockets to be precise) bound to the same port, and when the request comes, it will forward the request to the socket in the record as evenly as possible (Round-Robin?), to achieve load balancing from the perspective of the kernel.


When looking at the influx source code recently, I found that its internal tcp service also uses application-level port multiplexing, that is, a general listener is used to monitor the port, receive data, and then judge according to the first byte, and distribute the request To the corresponding sub-listener (the multiplexing process only exists in the connection accept phase).
insert image description here

insert image description here
Now understanding this reuse (especially at the software level) mechanism is essentially like a switch... case... logic. From a general entry, and then through an identification (may be in the protocol message (such as the port port number), it may be in the kernel, or it may be at the application level (such as the first one that accepts data here byte)) to switch and forward to the final handler.


8. For the socket, does the socket in the listen stage change directly from the listen state to the close state when the program exits?

    It should be noted here that generally speaking, for server-side programs, the socket in the listening state (referred to as listenFd) and the accept socket (connFd) afterward are not the same socket connection. The former accepts the connection from the client, and the latter performs specific follow-up communication. . For the former, according to my understanding, when the listenFd is closed, it will not enter the state of time_wait, it should be closed directly. For the latter connFd, if the socket is closed, the socket will enter the state of time_wait.
At the same time, it should be noted that during the three-way handshake, the listenFd will always be in the listening state, and it will not enter SEND_RCV because the client sends a syn connection request. What really enters the SEN_RCV state should be another semi-connected socket (that is, the socket that was later accepted). In other words, according to my understanding, listenFd is generally only in the listening state and the close state.

The above thoughts are all personal opinions, if there is any problem, please correct me (* ^ ▽ ^ *).


9. Pushing and pulling is a philosophy. For abstract data transfer,

  • The method of pushing data facilitates the push end, but the peer end needs to have a mechanism to "wait" and prepare to receive data. At the same time, this method has high real-time performance.
  • The way of pulling data is convenient for the pull end, but the peer end needs a mechanism to "wait" and prepare to transmit data. This method is based on the requirements of the pull end for your transmission, but relatively speaking, the real-time information is not high enough.


10. Generally speaking, the services provided by the transport layer are based on the services provided by the layers below it. If the network layer does not provide certain services, such as throughput and latency guarantees, then the transport layer will not provide such services. But there are exceptions, that is, the network layer does not provide reliable services, but the transport layer provides reliable transmission services to the upper layer through some mechanisms.

11. The network layer provides host to host services, and the most basic service provided by the transport layer needs to be process to process services. At the same time, the transport layer may also provide some other services, such as error detection (udp and tcp), reliable transmission (tcp), etc. 12.

UDP is connectionless, it only needs such a binary (dst ip and dsp port) group can determine a socket, in other words, generally speaking, udp has only one socket, and all data packets (regardless of whether the source ip and source port are the same) will be sent to this socket (as long as the destination ip and destination port are the same) ; while TCP is connection-oriented. For the server, a new socket will be created every time a request is accepted. In this case, the specific socket cannot be determined only by (dst ip and dsp port), and src needs to be added. Only ip and src port can be used.
From a certain point of view, it can be considered that tcp has an additional layer on udp.


13. It should be noted that theoretically reliable data transmission services can be implemented by the data link layer, network layer, transport layer and even the application layer. But it is relatively better to implement this function at the transport layer.

14. Advantages of udp service

  • The application layer can send data faster (after tcp is sent to the buffer, when to finally send it depends on the traffic of the network)
  • No need to establish a connection, if you want to send data, send it directly
  • No connection state, that is, there is no need to save connection state information in the host
  • less header overhead

15. Compared with IP, UDP provides the most basic process to process service (there is also a little bit of error handling function, that is, optional checksum, the reason why it also provides the function of error check is because some UDP underlying protocols It is possible that the function of error check is not provided.)


16. If you want to use udp to achieve the purpose of reliable transmission, then the application layer must undertake the retransmission and confirmation mechanism similar to tcp. (Tencent often likes to ask questions like this during interviews: "How to implement a tcp with udp?")

17. Generally speaking, udp socket does not have a sending buffer (not sure?), only a receiving buffer. So for udp, the data transmitted by the upper layer, udp will be sent out "slowly", regardless of whether the peer end has time to accept it. For the receiving end, if the data at the sending end exceeds the size of the receiving buffer, the entire message will be discarded, regardless of whether a part of the message can be accepted.

18. Generally speaking, when UDP receives data, it accepts according to the message, and reads one message segment at a time, and the received message and message will not be merged.

19. Several models of reliable transmission

  • ARQ: stop waiting protocol
  • GBN (Go-Back-N protocol): sliding window protocol
  • SR (select repeat): select the retransmission mechanism

20. Network transmission model: For a channel with only bit error and no packet loss, how to ensure reliable transmission?
  Adopt ACK confirmation mechanism

21. Network transmission model: For a transmission channel with both bit error and packet loss, how to ensure reliable transmission?
  retransmission mechanism, sequence number mechanism

22. What are the mechanisms involved in reliable transmission?

  • Retransmission, sequence number, ACK confirmation
  • Timer (required for retransmission), checksums (required for retransmission)

23. What is the biggest difference between GBN and SR agreements?

For the GBN protocol, the receiver's window size is 1, that is, only the smallest unacknowledged packet can be ACKed, even if a packet with a larger seq num is received. For SR, the receiver's size is N, you can ack to accept N packets in the window. But the sender needs to set a timer for each packet in the sending window, and whichever timer times out, it will resend that packet. 24. For the sender,

timeout The retransmission time is generally quite long, so in order to speed up the processing of a packet that has already been sent by the sender and may be lost, the sender generally adopts a fast retransmission mechanism-receive three consecutive identical ACKs, and then retransmit Send the response message.
  
25. For the tcp protocol in reality, is it a GNB (back N step) or SR (select retransmission) protocol?

TCP acknowledgment adopts the cumulative acknowledgment method, and no acknowledgment will be given for out-of-sequence packets. This makes TCP look like a GBN protocol, but unlike GBN, TCP buffers out-of-order packets. According to the book, there are suggestions that tcp is implemented according to the selective confirmation mechanism, which allows the TCP receiver to selectively confirm out-of-order segments, instead of cumulatively confirming the last correctly received ordered segment. When this mechanism is combined with the selective retransmission mechanism (that is, skipping retransmission of segments that have been selectively acknowledged by the receiver), TCP looks a lot like our usual SR protocol. So to some extent, TCP is more like a hybrid of GNB and SR.
  
26. Why doesn't tcp use the NACK mechanism, that is, doesn't use the form of "negating the receipt of a certain packet"?
To some extent, ack represents a mechanism for confirming that a message segment has been received (accumulative confirmation can be performed), and it is a positive reply; but for nack, it represents that a certain message segment has not been received. Received, is a negative reply. They are all passive feedbacks by the receiver based on the received message segment.

If there is a scenario where a series of ordered segments are lost (for example, the sender sent 1-10, but the receiver only received 1-5), the former can be based on the timeout (receiver The ack has not been sent for a long time) for retransmission; the latter is a bit embarrassing, because it is not known whether the sender did not send it at all, or lost it on the way. 【8】

Of course, there are some other protocols that use the nack method [9], and they should have taken other measures to avoid this problem. I won't go into details here .

27. In order to prevent a large number of packet loss caused by untimely acceptance of the receiving end, tcp uses a flow control protocol. That is to say, it is necessary to ensure that the data sent by the sender will not exceed the size of the receiver's current receiving window (the receiver will continue to send its acceptable window size, that is, the free buffer size, to the sender).

28. What if the C-side and the S-side receive a request to establish a connection at the same time? What about receiving a request to close the connection at the same time?
The classic saying is that if both ends of A and B open or close the connection at the same time, what will happen to the state of tcp? The answer is as follows:

Simultaneously open:
insert image description here
The state of AB has both experienced SYN_RVD and SYN_SENT.

Simultaneously closed:
insert image description here
The states of both A and B are similar to the normal active close connection.


29. For the congestion control of the transport layer, how to let the sender know that the network is congested?
There are generally two ways, one is without the assistance of the underlying network, mainly through the packet loss rate and delay (tcp uses this); the other is with the assistance of the underlying network, the underlying network can actively provide network congestion It can even provide information such as the sending rate of intermediate routers.


30. TCP uses the normal ACK returned by the receiver as a signal of a smooth network; uses packet loss events (received three repeated ACKs or timeouts) as a signal of network congestion.

31. The state transition of TCP congestion control and the change diagram of cwnd:
insert image description here

insert image description here

Note that it is not only in the state of congestion avoidance that a packet loss event occurs, and then enters fast recovery or slow start; it is also possible in the other two states. That is to say, in the slow start state, if a timeout event caused by packet loss occurs, the congestion window will be reduced to 1MSS, ssthresh will be reduced to cwnd/2, and the slow start state will be restarted.

32. For a network, tcp relatively takes into account the fairness of the entire network, that is, controls the degree of congestion of the entire network. And udp does not have a congestion control mechanism, so if udp and tcp exist in a network at the same time, to some extent, the application of udp may cause the entire network to be congested. It's not fair. So some routers will turn off the transfer function of udp.

33. From God's point of view, in a network, if an application uses more tcp connections than other applications, then to some extent, it occupies more bandwidth resources of the entire network. This is a bit like multi-threading on the linux system. If you open more threads on the linux system (the essence of a thread on linux is a lightweight process, which is scheduled together with a normal single process), then relatively speaking, You are encroaching on more cpu resources (when it is hot, the linux operating system will have certain restrictions on this "encroachment")

34. For tcp congestion control, if the sending rate of each connection is similar, it can be simple The algorithm of congestion control is considered to be relatively fair to each connection, no matter when these connections start to send data.



reference

[1], TCP novice misunderstanding - the meaning of data verification
[2], "Linux multi-threaded server programming"
[3], The Limitations of the Ethernet CRC and TCP/IP checksums for error detection
[4], TCP protocol 100 % reliable?
【5】、Why TCP/IP is an uncertain network
【6】、Misunderstandings of TCP novice - the meaning of data verification
【7】Tell you the unknown UDP: intractable diseases and its use
【8】、NACK vs. ACK? When to use one over the other one?
[9], talk about ACK, NACK and REX in network communication

Guess you like

Origin blog.csdn.net/plm199513100/article/details/120489698