Enter the url on the browser page and hit the enter key, what happens - "How the network is connected" summary

        I recently read the book "How the Internet is Connected", and I feel that I have gained something. But if you don't form words, your impression will not be deep, so you have this article to consolidate your knowledge and at the same time exercise your writing. This book is written by Japanese and aims to show the whole picture of the Internet. . . Well, this book is really well written. It covers software and hardware. It not only explains how, but also why. It is not difficult to see that the author is a full-stack engineer. The author analyzes layer by layer and thoroughly explains the functions and functions of each layer through the clue of "entering the URL in the browser and displaying the content on the screen". The following is a summary of my reading. If there is any inappropriateness, please correct me.

1. When entering the URL and hitting the enter button, there will be an interruption event. Here I refer to the content of other posts (http://blog.csdn.net/xumingjie1658/article/details/6965176):

        When the user presses a key, the keyboard interface will get a keyboard scan code representing the key and generate an interrupt request at the same time. The interrupt service program notifies the keyboard interrupt service program area for processing. The keyboard interrupt service program first obtains the scan code of the key from the keyboard interface, then judges the key pressed by the user according to the scan code and handles it accordingly, and finally notifies the interrupt controller of this interrupt. End and achieve interrupt return.

        The general process of keyboard interruption is like this. When the keyboard interruption is completed, that is, after the user hits the "enter" button, the operation of the browser is triggered, which is described in the book "How the Internet is Connected".

2. The browser parses the url and generates the corresponding request - the application layer. For example, when a browser discovers an HTTP request, it resolves the domain name www.baidu.com. If there is resolution of this domain name in the local dns resolution cache, then the http request is directly entrusted to the tcp transport layer for transmission. If not, then you need to initiate a DNS query to the DNS server to obtain the IP address corresponding to the domain name. There are two types of dns query: recursive query and iterative query (also called loop query). Recursive query, that is, A queries B, B queries C, C queries D, D queries, and returns the results to C, C returns B, and B returns to A. This kind of query can put too much pressure on the root name server. Iterative query is that A queries B, B tells A to query C, then A queries C, C tells A to query D, then A queries D, D queries, and the result is returned to A. DNS query packets are sent using udp. After the IP address corresponding to the domain name is queried through DNS, the HTTP request is entrusted to the tcp/ip protocol stack for transmission.

3. The application layer entrusts the protocol stack to transmit, and the tcp transport layer of the protocol stack transmits through the socket - the transport layer. The browser sends a delegate to the protocol stack through the socket library, involving operations such as socket(), connect(), read(), write(), close(), etc. The first is to call the socket() interface to generate a file descriptor, and then call the connect() interface to perform the three-way handshake of tcp. After success, you can send and receive data. We send the http request by calling the write() interface, but it is actually just sent to the underlying send buffer, and tcp itself will send it according to the timing, or send it immediately according to the set tcp_nodelay. These operations are all system calls, which will trigger the switch between user mode and kernel mode, so it is relatively time-consuming. If a packet is too large, the protocol stack will automatically split it according to the MTU (actually according to the MSS (MTU-40=1500-40=1460B)). TCP guarantees reliability through timeout retransmission, confirmation mechanism, sliding window, etc. When the server receives the http request, it will return the data to the client where the browser is located. The client calls the read() interface to receive the data, and then disconnects the connection. There are four communications here. These connect(), write(), read(), close() entrust the IP module to encapsulate data for transmission. A socket is identified by the four fields of src_ip, src_port, dst_ip, and dst_port. A socket corresponds to the communication link between the application at the source and the application at the destination.

4. The tcp module entrusts the IP module to transmit - the network layer, the data link layer. The tcp module will generate the tcp header and data part, and then pass it to the IP module. The IP module will add an IP header and a MAC header to the tcp header. The IP header is used for routing between networks, that is, for addressing on the Internet. If there are multiple network cards, select a suitable network card to send according to the destination IP address and the local routing table. The MAC header is used for addressing within the local area network. The MAC address can be obtained from the arp cache. If not, you need to broadcast the arp request to obtain the corresponding mac address (usually fill in the mac address of the gateway). After the IP module adds these two headers, it is handed over to the network card for processing.

5. The IP module entrusts the network card to send and receive - the physical layer. After the network card receives the data delivered by the IP module, it will add the header and the start frame delimiter to the header, and add the frame check sequence (FCS) to the end. The header and the frame start delimiter are used to determine the read timing. Let's imagine that there are a bunch of continuous electrical signals like 000011111 on the line. We can't synchronize the clock signals, because they are continuous high level or low level, there is no change in current and voltage, and we can't judge where it should come from. to split a bit. So the header gives 56 bits like 010101, giving us enough time to distinguish the clock signal between bits. FCS is actually a CRC check code, which is used to verify the correctness of the data, because during the transmission process, there will be noise effects and bit error rates. After adding the header, frame start delimiter, and FCS, the MAC module of the network card produces a general signal, which is converted by the PHY (MAU) module into a format that can be transmitted in the network cable (for example, the superimposed signal of the data signal and the clock signal), through The network cable is sent out. Sending data through a network cable is divided into full-duplex (independent sending and receiving lines) and half-duplex (shared sending and receiving lines, and collision detection is required). Now there should be few half-duplex devices such as hubs.

6. The data is sent to the gateway router in the LAN. The data is sent from the network card of the client where the browser is located and arrives at the switch. The switch searches for the forwarding port of the destination mac address according to the MAC address table, and broadcasts all ports if not found. The switching circuit above the switch is grid-like, and forwarding can be performed as long as the switching switch of one line is connected. So, as long as the routes between multiple lines do not cross, parallel transmission is possible. The switch forwards the data to the gateway router.

7. The gateway router is connected to the Internet through ADSL, etc., and sent to the Internet Operator (ISP).

    When the gateway router forwards to the external network, there will be a NAT conversion process to shield the internal network address. If the destination port of the gateway router is also a local area network, then the gateway router routes to the adjacent router according to the IP header of the data, removes the MAC header, replaces it with a new MAC header, and sends it to the adjacent router. The router then routes to the next hop according to the same method, and finally reaches the network where the server is located.

    Generally speaking, if the server is placed inside the company, the above method will be used to route it. When we surf the Internet at home, the gateway router is connected to the access network for connecting to the Internet. Common household access network methods include ADSL, FTTH, CATV, telephone lines, ISDN, etc., and companies may also use dedicated lines. Let's take ADSL as an example. The gateway router (here can be called the Internet access router) will add a total of 3 headers: MAC header, PPPoE header, and PPP header in front of the network packet, and then send it to the ADSL Modem (PPPoE method). Down). ADSL Modem splits packets into cells, converts them into electrical signals and sends them to the splitter. From the splitter, it is the interface for plugging in the telephone line. After the signal comes out from here, it will pass through the indoor telephone line, connect to the outdoor telephone line, pass through the cable erected on the telephone pole, and reach the telephone office, where the data will be sent. to the ADSL access service provider. After the signal reaches the telephone office, it is split into ATM cells through the DSLAM and reaches the BAS (bandwidth access server). The BAS is responsible for restoring the ATM cells into network packets and forwarding them to the interior of the Internet. A dedicated tunnel (such as the L2TP protocol) is established between the BAS and the network operator to reach the inside of the Internet. At this point, the data we send has really entered the Internet.

    In the description just now, the line between the Internet access router at the user end and the BAS of the operator, if the ADSL method is used, is ADSL access. If fiber is selected, it is fiber access, that is, FTTH. Here we need to note that the gateway router, that is, the Internet access router, already has a public network address, which is completed after the user performs dial-up Internet settings and other operations.

8. The network operator forwards the data on the Internet and finally reaches the operator where the server is located. Then it reaches the BAS of the server, and then reaches the access router where the server is located. The server can expose itself to the public network, but this is not safe and the ip address is not enough. It is generally not done now. More practice is to deploy a firewall behind the access router to protect the security of the local area network. There may also be NAT resolution here.

If the client request volume is too large, the server cannot handle it and needs to share the load. The methods are: A, do load balancing on DNS; B, use nginx reverse proxy to do load balancing; C, set up cache on the client; D, use the cache server to share the load (when the content of the real server changes, go to update Cache server), the cache server can be set on the network where the client is located (disadvantage, it is not conducive to the management and control of the real server), or it can be set on the network where the real server is located (disadvantage, cannot reduce the traffic in the Internet), or it can be set in Inside the mainstream operators (this is CDN, Content Delivery Network).

9. After the data reaches the server, it also obtains the data through the network card-IP module-tcp module-application layer, and responds. The processing of the server is slightly different from that of the client. The server will call listen() to listen on the port in advance, then call accetp() to allow a client to connect, and then use multiplexing technology (select, poll, epoll, etc.) to manage multiple connection of a client. We assume that the server calls the read() interface to receive the http request from the client, and then parses the request message, for example, converts the address in the url into the virtual path corresponding to the host header, performs related processing, and then calls write() interface to reply. The reply information will go through the tcp/ip protocol stack, arrive at the network card, and be sent out.

10. The reply data may return to the operator where the client is located, or it may reach the operator where the client is located through other routers. Then the operator reaches the BAS of the telephone office where the client is located through the dedicated tunnel, and then the BAS reaches the gateway router where the client is located, or the Internet access router, through ADSL or FTTH lines. After the router's NAT translation, it is sent to the client.

11. The client's network card receives the signal from the network cable, converts it into a digital signal, puts it in the buffer, checks the FCS, and determines whether it is sent to itself (the destination MAC address is its own MAC address). If it is not its own, then Discard it, otherwise, put it into the network card buffer, and generate an interrupt to notify the computer that the data is received. The interrupt handler asks the NIC driver to process the data, and the NIC driver fetches the data and hands it to the tcp/ip protocol stack for processing: the IP module checks the IP header, receives legitimate packets and performs fragmentation reassembly (if necessary), and then tcp The module finds the corresponding socket according to the ip address in the IP header and the port number in the tcp header, replies to the confirmation packet, puts the data into the receive buffer, and waits for the application to read.

14. The browser calls the read() operation, falls into the kernel, reads the data returned by the server, and then returns to the user mode for related processing. After page rendering, the page text, pictures, audio and video information are displayed.

 

To summarize, the brief process is:

1. Hit the enter button to generate an interrupt. The keyboard interrupt service program parses the button and triggers the browser's parsing operation (I don't know how the browser captures the enter event).

2. The browser parses the domain name in the url, hoping to know the IP address corresponding to the domain name. First look at the local dns cache, if not, send a dns query packet to query.

3. The browser encapsulates the http request, calls connect() to perform a three-way handshake with the server for tcp, and then calls write() to send the http request. The request goes through the transport layer, the network layer, the data link layer, and the physical layer, and changes from a digital signal to an electrical signal, and is sent out via a network cable.

4. The data is forwarded through the switch to reach the gateway router. We assume that the gateway router uses an ADSL line and uses the PPPOE protocol to access the Internet, then the gateway router will lose the MAC header of the received data, add the PPP, PPPOE header, reassemble a MAC header, and send it to the BAS of the telephone office. Then through the dedicated tunnel, it is connected to the operator and the Internet.

5. Routing and forwarding between operators on the Internet eventually reach the operator where the server is located, pass through the dedicated tunnel, reach the BAS, and then reach the gateway router.

6. The gateway router is parsed by NAT, and there may be an ngnix reverse proxy, which sends data to the server through the firewall.

7. The server network card receives the data and transmits it layer by layer to the IP layer, the transport layer, and the application layer.

8. The server application layer processes and replies, entrusts the transport layer, IP layer, data link layer, and physical layer layer by layer, and sends it to the gateway router.

9. The gateway router sends it to the BAS, sends it to the operator through the tunnel, and enters the Internet.

10. Route forwarding in the Internet, reach the operator where the client is located, reach the BAS through the tunnel, and then reach the gateway router of the client.

11. The gateway router is parsed by NAT and sent to the client.

12. The client's network card receives the data and passes it layer by layer to the IP layer, the transport layer, and the application layer (that is, the browser). After the browser gets the data, it renders and displays the page. Subsequently, the http protocol will also disconnect the tcp connection.

At this point, the process ends.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325473257&siteId=291194637