"How the Network is Connected" Chapter 1 Reading Notes (to be updated...)

Chapter One

insert image description here

1. Generate HTTP request message

1.1 Enter the URL

1.1.1 What is URL

On the WWW , each information resource has a unified and unique address on the Internet. This address is called URL (Uniform Resource Locator, Uniform Resource Locator), which is the Uniform Resource Locator of the WWW, referring to the network address.

1.1.2 URL format

Depending on the access target, the URL will be written differently. Common URL formats:
insert image description here

1.2 Browser parsing URL

1.2.1 A complete network address

protocol://hostname[:port]/path/ [:parameters][?query]#fragment

like:http://www.aspxfans.com:80/news/index.html?boardID=5&ID=24618&page=1#name

  1. Protocol part: The protocol part of the URL is "http:", which means that the web page uses the HTTP protocol. Various protocols can be used in the Internet, such as HTTP, FTP, etc. In this example, the HTTP protocol is used. The "//" after "HTTP" is a delimiter.
  2. Domain name part: The domain name part of the URL is "www.test.com". In a URL, an IP address can also be used as a domain name.
  3. Port part: following the domain name is the port, use ":" as the separator between the domain name and the port. The port is not a mandatory part of a URL. If the port part is omitted, the default port 80 will be used.
  4. Virtual directory (path) part: From the first "/" after the domain name to the last "/", it is the virtual directory part. Virtual directories are also not a required part of a URL. The virtual directory in this example is "/news/".
  5. File name part: from the last "/" after the domain name to "?", it is the file name part, if there is no "?", it is from the last "/" after the domain name to "#", yes For the file part, if there is no "?" and "#", then from the last "/" after the domain name to the end, it is the file name part. The file name in this example is "index.asp". The file name part is also not a mandatory part of a URL, if this part is omitted, the default file name will be used.
  6. Parameter part: The part from "?" to "#" is the parameter part, also known as search part and query part. The parameter part in this example is "boardID=5&ID=24618&page=1". Parameters can allow multiple parameters, and "&" is used as a separator between parameters.
  7. Anchor part: From "#" to the end, it is an anchor part. The anchor part in this example is "name". The anchor part is also not a required part of a URL.

1.2.2 Browser Parsing URL

Determine the target server domain name and file path
insert image description here

1.2.3 When the URL omits the file name

http://www.lab.glasscom.com/dir/

The url ending with "/" indicates that the file name is omitted, and the server accesses the default file name set in advance, most of which are index.html or default.htm.

http://www.lab.glasscom.com/

"/" means "root directory", so this way of writing is to access the default file name under the root directory

http://www.lab.glasscom.com

Same as above, this way of writing is to access the default file name in the root directory

http://www.lab.glasscom.com/whatisthis

If there is a file named whatisthis on the web server, whatisthis will be treated as the file name;
if there is a directory named whatisthis on the web server, whatisthis will be treated as the directory.
Note: We cannot create two files and directories with the same name, so it is impossible to have a file named whatisthis and a directory with the same name at the same time.

1.3 HTTP protocol

After parsing the URL, we know where the target should be accessed. Next, the browser uses the HTTP protocol to access the web server.

1.3.1 What is the HTTP protocol

Hypertext Transfer Protocol is a rule for communication between client and server.

A rough request process
  1. First, the client will send a request message (including URI, method) to the server. The content contained in the request message is "what to do" and "how to operate". The part equivalent to "what" is called URI (Uniform Resource Identifier). Generally speaking, the content of URI is a file name storing web page data or a file name of a CGI program, or it can be a URL. In other words, various access targets can be written here, and these access targets are collectively called URI. The part that corresponds to the next "how to do it" is called a method. The method indicates what kind of work needs to be done by the Web server, and typical examples include reading the data represented by the URI, sending the data input by the client to the program represented by the URI, and so on.

URI: Uniform Resource Identifier, which means that every resource available on the Web, such as HTML documents, images, video clips, programs, etc., is identified by a URI.

The main method of HTTP
insert image description here
2. After receiving the request message, the Web server will parse the content, judge "what" and "how to operate" through the URI and method, and complete its own work according to these requirements, and then Store the result in the response message.
3. The response message will be sent back to the client. After the client receives it, the browser will read the required data from the message and display it on the screen. At this point, the entire work of HTTP is completed.

1.4 Generate HTTP request message

After parsing the URL, the browser determines the Web server and file name, and the next step is to generate an HTTP request message based on these information.

1.4.1 Request message

The format of the request message: line, header, blank line, body
Request line: request method, URI, protocol version
Request header: a lot of request information, which is a key-value pair. . . To be added
Empty line
Request body: POST request yes, GET request no
request message example
wait? How do you know which server it is when Jiang Zi writes the uri?
Hmm, the domain name of the server is in the url!
insert image description here

Common request headers
. . . . . . . . . . . .

1.4.2 Response message

When we send the above request message, the web server will return a response message. In fact, sending needs to go through many steps, such as: querying the DNS server for the IP of the web server, etc., which will be expanded later, let's first look at the structure of the response message.

The format of the response message: line, header, blank line, body
Response line: protocol version, status code (informing the program of the execution result), status description string (informing people of the execution result)
Response header: a lot of request information, yes key-value pairs. . . To be added
Empty line
Response body: Yes for POST requests, no for GET requests

Sample Status Code Summary Response Message
insert image description here
insert image description here

  • After returning the response message, the browser will extract the data and display it on the screen, and we can see what the web page looks like. If the content of the web page is only text, then all the processing is completed here, but if the web page also includes resources such as pictures, there will be more below.
  • When a web page contains a picture, the control information representing the tag of the picture file will be embedded in a corresponding position in the web page. **The browser will search for the corresponding tag when displaying text**. When it encounters a tag related to a picture, it will leave a space on the screen for displaying the picture, and then access the web server again, according to the file specified in the tag name to request the corresponding picture from the web server and display it in the reserved space.
  • This step is the same as when obtaining a web page file, just write the file name of the picture in the URI part and generate and send a request message. Since only one URI can be written in each request message, only one file can be obtained at a time. If multiple files need to be obtained, a separate request must be sent for each file.
  • For example, a web page contains 3 pictures, then to get the web page and get the pictures, a total of 4 requests need to be sent to the web server. Judging the required files, then obtaining these files and displaying them on the screen, the overall command of this series of work is also one of the tasks of the browser, but the Web server has no knowledge.
  • The web server doesn't care at all whether the files obtained by these 4 requests are on one web page or on different web pages. Its task is to return a response to each individual request .

2. Query the DNS server for the IP address of the Web server

After generating the HTTP message, next we need to delegate to the operating system to send the message to the web server.
When entrusting the operating system to send messages, what must be provided is not the domain name of the communication object, but its IP address. Therefore, DNS resolution needs to be performed first , that is, the IP address corresponding to the requested server domain name is obtained from the DNS server .

1.1 TCP/IP

1.1.1 TCP/IP network structure

It consists of some small subnets connected by routers to form a large TCP/IP network. The subnet here can be understood as several computers connected by a hub , and we regard it as a unit called a subnet .
tcp/ip structure

  • In a network, all devices are assigned an address. This address is equivalent to "Room × × No. × ×" on a certain road in reality.
  • Among them, the number corresponding to "number" is assigned to the entire subnet, and the number corresponding to "room" is assigned to the computer in the subnet, which is the address in the network.
  • The number corresponding to the "number" is called the network number, the number corresponding to the "room" is called the host number, and the whole address is called the IP address.

Data is sent in packets

1.1.2 Actual IP address

The actual IP address (network number + host number) is a string of 32-bit numbers, divided into 4 groups according to 8 bits (1 byte), expressed in decimal and separated by dots.

1 bit is a binary bit
8 bits == 1 byte

Classification of IP addresses
. . . .

The specific structure of the two parts of the network and the host number is not fixed, and there are 5 types.
When building a network, users can decide their allocation relationship, so we also need additional information to represent the internal structure of the IP address, this additional information is called the subnet mask.

The subnet mask is a string of 32-bit numbers with the same length as the IP address, the left half of which is 1, and the half of the side is 0; among them, the part of the subnet mask of 1 represents the network number, and the subnet mask of 0 The part that represents the host number.

How to represent the IP address
insert image description here ? Why don't you understand citrus?

1.2.3 Why does the url not directly use the IP but needs a domain name?

In fact, it will work fine if the IP address is used instead of the server name.
However, just like it is difficult for you to remember a phone number, it is also very difficult to remember a string of numbers for an IP address. Therefore, it is better to use the server name in the URL than the IP address.

1.2.4 Why must the IP address be used to determine the communication object?

Because the length of the IP address is 32bit, that is, 4 bytes; the domain name needs to process characters with a variable length ranging from dozens to 255 bytes , which increases the burden on the router and takes longer to transmit data.

1.2.3 and 1.2.4 are the reasons for the existence of the DNS mechanism, complementary defects

1.2 DNS resolution

1.2.1 How to issue a query to the DNS server

  1. The browser (application) calls the parser
  2. Resolvers generate query messages to send to DNS servers
  3. The parser delegates the query message to the protocol stack inside the operating system
  4. The protocol stack will perform the operation of sending the message, and then send the message to the DNS server through the network card .
  5. When the DNS server receives the query message, it will query according to the query content in the message. If the web server to be accessed has been registered on the DNS server, then this record can be found, and then its IP address will be written Respond to the message and return to the network card, 1.2.2 details the work of the DNS server .
  6. Then it is passed to the parser through the protocol stack, and the parser will write the extracted IP address into the memory address specified by the application program .
  7. The application has been able to fetch the IP address from memory

The parser is a program contained in the Socket library of the operating system; the Socket library is a library that contains program components that allow other applications to call the network functions of the operating system, and the parser is one of the libraries A program component.

Protocol stack: The network control software inside the operating system, also called "protocol driver", "TCP/IP driver" and so on.

Network card: the hardware responsible for Ethernet or wireless network communication

By the way, when sending a message to the DNS server, of course we also need to know the IP address of the DNS server. It's just that this IP address is set in advance as a setting item of TCP/IP, so there is no need to check it again.

1.2.2 Basic work of DNS server

The DNS server saves the record data corresponding to the three kinds of information in advance, and after receiving the message, finds the content that meets the query request according to these records and responds to the client.
The query message from the client contains the following 3 kinds of information:

  1. Domain name :
    the name of the server to be queried, the mail server (the part after @ in the mail address)
  2. Class :
    Now there is no other network except the Internet, and the value of Class will always represent the IN of the Internet
  3. Record type :
    Indicates what type of record the domain name corresponds to. For example, when the type is A, it means that the domain name corresponds to the IP address; when the type is MX, it means that the domain name corresponds to the mail server...

DNS server basic workflow:
insert image description here

1.2.3 Find the DNS server corresponding to the domain name to obtain the IP address

There are countless servers in the Internet , and it is impossible to store all the information of these servers in one DNS server, so the information is distributed and stored in multiple DNS servers , and these DNS servers cooperate with each other to find information to be queried.

How to cooperate in relay:
Domain names in DNS are allseparated by dotswww.lab.glasscom.com . For example , the dots here represent the boundaries between different levels.

www.lab.glasscom.comAccording to the organizational structure of the company, this domain name is probably "www of the lab section of the glasscom department of the com business group". Among them, the part corresponding to one level is called domain.
So, the next level down from the com domain is the glasscom domain, the next level down is the lab domain, and the next level down is the name www.

Domain name tree structure
The top domain name is the root domain name (root), then the top-level domain name (top-level domain, TLD for short), and then the first-level domain name, second-level domain name, and third-level domain name.

(1) Root domain name
The starting point of all domain names is the root domain name, which is written as a dot. and placed at the end of the domain name. Because this part is the same for all domain names, it is omitted. For example, example.com is equivalent to example.com. (one more dot at the end).
You can try it. Add a dot at the end of any domain name, and the browser can interpret it normally.

(2) Top-level domain name
The next level of the root domain name is the top-level domain name. It is divided into two types: generic top-level domains (gTLDs, such as .com and .net) and country-specific top-level domains (ccTLDs, such as .cn and .us).
The top-level domain name is controlled by the international domain name management agency ICANN, which entrusts commercial companies to manage gTLDs and entrusts countries to manage their own country domain names.

(3) First-level domain name
A first-level domain name is a domain name registered by yourself under a certain top-level domain name. For example, ruanyifeng.com is registered under the top-level domain name .com.

(4) Second-level domain name
The second-level domain name is a subdomain name of the first-level domain name, which is set by the domain name owner without permission. For example, es6 is the second-level domain name of ruanyifeng.com.

This hierarchical domain name information is registered in the DNS server, and each domain is handled as a whole. Information of multiple domains can also be stored in one DNS server.

reference link

1.3 Send an http request to the target ip address

After the above operations are completed, the ip address of the target server has been obtained, so it is ready to start sending http requests to the target ip address.

1.3.1 Brief description of the process - 4 stages

The sending of this http request is somewhat similar to DNS resolution.
When sending a commission to the protocol stack inside the operating system, it is necessary to call the program components in the Socket library according to the specified order.

Data flows through a pipeline-like structure.
First, the server side creates a socket, and then waits for the client to connect a pipeline to the socket.
When the server enters the waiting state, the client can connect to the pipe. Specifically, the client will first create a socket, then extend the pipe from the socket, and finally connect the pipe to the socket on the server side.
When the sockets of both parties are connected, the communication preparation is completed. Next, as we just said, as long as the data is sent to the socket, the data can be sent and received.
When all the data has been sent, the connected pipe will be disconnected. A pipe is initiated by the client when connected, but can be initiated by either the client or the server when disconnected. When one party is disconnected, the other party will also be disconnected. When the pipe is disconnected, the socket will also be deleted.

1.3.2 Create socket stage

The operation of the client to create a socket is very simple, just call the socket program component in the Socket library.
After the socket is created, the protocol stack will return a descriptor, and the application will store the received descriptor in memory.

There may be multiple sockets on the same computer at the same time. In this case, we need a way to identify a specific socket. This method is a descriptor.

1.3.3 Connection stage: connect the pipes

Next, we need to delegate to the protocol stack to connect the socket created by the client with the socket on the server side. The application does this by calling a program component called connect in the Socket library. The main point here is that when calling connect, you need to specify the three parameters of descriptor, server IP address and port number.

1.3.4 Communication Phase: Passing Messages

First, the application needs to prepare the data to be sent in memory. The HTTP request message generated according to the URL entered by the user is the data we want to send. Next, when calling write, you need to specify a descriptor and send data, and then the protocol stack will send the data to the server.
Since the relevant information of the connected communication object has been stored in the socket, as long as the socket is specified through the descriptor, the communication object can be identified and data can be sent to it. Then, the sent data will reach the server we want to access through the network.
Next, the server performs a receiving operation, parses the received data content and performs corresponding operations, and returns a response message to the client.

1.3.5 Disconnection stage: end of sending and receiving data

When the browser receives the data, the process of sending and receiving data is over. Next, we need to call the close program component of the Socket library to enter the disconnection phase. Eventually, the pipe connecting the sockets will be disconnected, and the socket itself will be deleted.

Guess you like

Origin blog.csdn.net/kwroi/article/details/127902185