Three Basics (Lecture 7) 08 Shu type the URL and press Enter, behind what really happened?

Description "Perspective HTTP protocol" is Luo Jianfeng (Qihoo 360 technical experts) at the time Geeks open a lesson column, record what I study notes, for reference purposes only.

After the last lecture of learning, whether you've set up on your computer good "minimize" the HTTP test environment?

I believe your answer must be "Yes", then, let's start at once. "Snail shell to do temple", a look at the environment in this experiment the whole process HTTP protocol works.

Using the IP address to access the Web server

First, we run the "start" batch program under www directory, start OpenResty server of the machine, can be launched with a "list" Batch confirm whether the service is running normally.

Then we open the Wireshark, select "(80) HTTP TCP port" filter, and then double-clicking "Npcap loopback Adapter", on the network data fetch start the machine address 127.0.0.1.

The third step in the Chrome browser address bar type of " http://127.0.0.1/ after", then press the Enter key, and so on welcome page will be displayed in Wireshark captured packets, as shown by shows.

 

If you are not setting up the experimental environment, capture or inconsistent with this article it does not matter. I put the captured data store became pcap package, the file name is "08-1", put on GitHub, you can download to the local then use Wireshark open, completely accurate "replay" HTTP transmission earlier.

Packet capture analysis

You can see in Wireshark, the total catch of 11 packets (here with a bag filter function, filter out the three package, originally 14 package), takes 0.65 seconds, here we come to analyze together what "press Enter, type the URL" after the whole process of data transmission.

Through the front "ice-breaking article" explanation, you should know that HTTP protocol is TCP / IP on the basis of reliable transmission relies on TCP / IP protocols to enable data operation. So the browser to use the HTTP protocol sending and receiving data, first thing to do is to establish a TCP connection.

Because we in the address bar directly enter the IP address "127.0.0.1", and the default Web server port is 80, so the browser will in accordance with the specifications of the TCP protocol, using the "three-way handshake" to establish a connection to the Web server.

Corresponds to Wireshark, the three capture is the beginning, the browser uses port 52,085, port 80 is used by the server, after SYN, SYN / ACK, ACK is three packets, TCP connection to the server and the browser It was built up.

After a reliable TCP connection channels, HTTP protocol can begin work. So, the browser in the format specified HTTP protocol, sends a "GET / HTTP / 1.1" request message through TCP, which is fourth in the Wireshark packet. As for the contents of the package specifically what is now not control, we talk to say next.

Subsequently, Web server replies with a fifth package, confirm the TCP protocol level: "I have just received a message," but the HTTP protocol TCP packet is invisible.

Web server receives the message will handle the request internally. Is also in accordance with the provisions of the HTTP protocol, parses the message, the browser sends the request to see what you want.

It saw that it was required to get the default file in the root directory, well, I'll put the full file read out from the disk, and then makes up the message in line with HTTP format, send it back. This is the sixth in the Wireshark packet "HTTP / 1.1 200 OK", or go underlying TCP protocol.

Similarly, the browser to the server should return a TCP ACK acknowledgment of, "Your response packets received, thank you.", That is, the seventh package.

In this case the browser has received the response data, but what is inside it? So we will have to resolve the message. A look, give me a server is a HTML file, well, I'll call the layout engine, JavaScript engine, etc. to handle it, and then show a welcome page in a browser window.

After that there are two back and forth, a total of four packets, the same procedure is repeated. This is the browser automatically requests the "favicon.ico" file as a Web site icon, regardless of the input of our web site. But because our test environment does not, so the server can not be found on your hard disk, it returned a "404 Not Found".

At this point, "type the URL and then press Enter," the whole process is over.

I drew a map for this interactive process, you can look at the shining. However, to remind you, in view of the TCP connection is closed "four wave" in the grasp of the bag does not appear, it is because the HTTP / 1.1 persistent connection properties, default will not close the connection immediately.

 

This then briefly describe the most simple browser HTTP request process:

Browser to obtain an IP address and port number of the server from the input address bar;

Browser with TCP three-way handshake to establish a connection with the server;

The browser sends to fight the good message to the server;

Processing server receives the request packet, then the packet is also a good fight to the browser;

Browser parses the message, rendering the output page.

Access the Web server using a domain name

Just now we are in the browser address bar enter the IP address directly, but in most cases, we do not know, using a domain name server IP addresses, domain name after the switch to this process will be different?

Or hands-on try it, to enter into the address bar " http://www.Chrono.com ", repeat Wireshark packet capture process, you will find that it seems no different, also shows a welcome screen on the browser , caught package is also 11: first three-way handshake, then twice HTTP transport.

Here a question arises: How browsers know "www.Chrono.com" The IP address is "127.0.0.1" from the URL in it?

Remember the DNS knowledge we talked about it before? See the URL in your browser's "www.Chrono.com", I found that it was not the numeric IP address, it certainly is a domain name, so it will initiate action to resolve the domain name by visiting a series of DNS servers, trying to the domain name translated into TCP / IP protocol in the IP address.

But because the domain name resolution of the whole process is too complicated, if every domain name must be struggling to go online to check, then we will definitely slow Internet can not stand.

Therefore, in the process of domain name resolution there will be multi-level cache, the browser first look at its own cache, there is no, if not to the operating system's cache, not just check your phone DNS file hosts, that is, we talk on a modified "C: \ WINDOWS \ system32 \ drivers \ etc \ hosts".

Just inside a line mapping between "127.0.0.1 www.Chrono.com", then the browser will know the corresponding IP address, you can happily establish a TCP connection to send HTTP requests.

I put this process also plotted a map, but omitting the TCP / IP protocol of the interaction part, inside the browser more out of action a visit hosts file, DNS is the native resolution.

 

The real world of the Internet

By the above two experiments in "minimize" the environment, you have a basic understanding of the workflow HTTP protocol it?

The first experiment is the simplest scenario, there are only two roles: the browser and the server, the browser can find the IP address of the server directly, both directly send HTTP packets established TCP connection communication.

The second experiment outside of the browser and the server adds a DNS role, the browser does not know the IP address of the server, you must be using the IP address of the DNS domain name resolution to get the server before you can communicate with the server.

Internet real world than these two scenarios are much more complex, I use this figure below to do a detailed explanation.

 

If you are using a desktop computer, then you may use twisted pair with crystal head connect to the Internet port, the switch fixed access network. If you are using a mobile phone, tablet PC, then you may be over cellular networks, WiFi, a telecommunications base stations, wireless hotspot access mobile networks.

Access the network, the network runs the Chamber of Commerce to assign your device an IP address, the address may be statically allocated, it may be dynamically allocated. It is always the same static IP, dynamic IP may be your next Internet had changed.

Suppose you want to access the Apple website, you obviously do not know, can only use the domain name "www.apple.com" to access the browser it's true IP address, then the next thing must be DNS. This use of the DNS protocol from the beginning of the operating system, the local DNS, DNS root, DNS top-level, authoritative DNS layers of resolution, of course, this middle cache may not be able to charge too much time to get the results.

Do not forget the Internet there is another important role CDN, it will be in the DNS resolution process "to disrupt." DNS resolution may give CDN server IP address, so you will get the actual address of the CDN server instead of the target site.

Because most of the resources CDN caches websites, such as images, CSS style sheets, so some do not need to send an HTTP request to Apple, CDN can directly respond to your request, send data to you.

By the PHP, Java and other back-office services dynamically generated pages are "dynamic resource", CDN can not be cached, can only be obtained from the target site. So you sent an HTTP request to begin on the Internet, "long trek" through numerous routers, gateways, proxies, and finally reach the destination.

Target web server is the external manifestation of an IP address, but in order to be able to Kangzhu high concurrency, also within a complex architecture. At the inlet is typically the load balancing apparatus, for example, four or seven layers of the Nginx LVS, behind many servers, constitutes a stronger and more stable clusters.

Load balancing will first access system equipment in the cache server, usually Redis level cache memory and disk-level cache Varnish, their role and the CDN is similar, but work within the network, the most frequently accessed data cache or a few seconds a few minutes, reduce the pressure on the back-end application server.

If there is no cache server, the load balancing equipment to take forward the request to the application server. Here is a variety of development frameworks prowess place, such as the Java Tomcat / Netty / Jetty, Python's Django, as well as PHP, Node.js, Golang and so on. They will then visit MySQL, PostgreSQL, MongoDB database services such as the back, to achieve user login, query goods, shopping orders, debit and other payment business operations, and the implementation of the results returned to the load balancing equipment, with potentially cache Lane also put a server.

Output to the application server load balancing device here, even if the processing of the request is complete, we must follow the same route to go back again, or to go through a number of routers, gateways, proxies. If resources allow caching, CDN then after it will do when the cache so that the next request will not reach the same source stood.

Finally, the site of the response data back to your device, it may be HTML, JSON, pictures or other data format, to be displayed by the browser resolve to deal with it, if the data there is also a hyperlink pointing to other resources, then but also to re-take the whole process again until all resources are downloaded.

summary

Today we do in the environment of the machine in two simple experiments, learning the HTTP protocol request - response of the whole process, to make a summary here.

HTTP protocol is based on the underlying TCP / IP protocol, you must use the IP address to establish a connection;

If you do not know the IP address, we will use the DNS protocol to resolve to get an IP address, otherwise the connection fails;

After establishing the TCP connection will send and receive data sequence, and the responder requests have to build and parse the packet based on the HTTP specification;

In order to reduce the response time, the whole process will have a cache every aspect, to achieve a "short" operation;

While the reality of the HTTP transport process is very complicated, but theoretically still be simplified to experiment in the "two-point" model.

Lesson at work

Can you try to explain what things happened in the browser after clicking the page link?

This lesson was talking about are normal request processing flow, if it is a domain name that does not exist, then the workflow browser like that?

Guess you like

Origin www.cnblogs.com/wxcx/p/12555086.html