[Learning python from zero] 86. In-depth understanding of the HTTP protocol and its role in browser and server communication

Analytics using Google/Firefox

In web applications, the server sends the web page to the browser, in fact, it sends the HTML code of the web page to the browser for the browser to display. The transmission protocol between the browser and the server is HTTP, so:

  • HTML is a kind of text used to define web pages. If you know HTML, you can write web pages;
  • HTTP is a protocol for transmitting HTML over the web, used for communication between browsers and servers.

Chrome browser provides a complete set of debugging tools, very suitable for web development.

After installing the Chrome browser, open Chrome, select "View", "Developer", and "Developer Tools" in the menu to display the developer tools:

illustrate

  • Elements display the structure of web pages
  • Network shows the communication between the browser and the server

Let's click Network, make sure the first little red light is on, and Chrome will record all communication between the browser and the server:
insert image description here

Analysis of http protocol

When we enter www.sina.com in the address bar, the browser will display the homepage of Sina. In this process, what does the browser do? Through the records of the Network, we can know. In the Network, find the record of www.sina.com, click it, the Request Headers will be displayed on the right side, click the view source on the right side, and we can see the request sent by the browser to the Sina server:

2.1 Browser Request

insert image description here
insert image description here

illustrate

The most important first two lines are analyzed as follows, the first line:

GET / HTTP/1.1

GET means a read request, which will obtain webpage data from the server, / means the path of the URL, the URL always starts with /, / means the home page, and the last HTTP/1.1 indicates that the HTTP protocol version used is 1.1. The current HTTP protocol version is 1.1, but most servers also support version 1.0. The main difference is that version 1.1 allows multiple HTTP requests to multiplex a TCP connection to speed up transmission.

Starting from the second line, each line looks like Xxx: abcdefg:

Host: www.sina.com

Indicates that the requested domain name is www.sina.com. If a server has multiple websites, the server needs to use Host to distinguish which website the browser requests.

2.2 Server response

Continue down to find Response Headers, click view source, and display the original response data returned by the server:
insert image description here

The HTTP response is divided into two parts, Header and Body (Body is optional). The most important lines of Header we see in Network are as follows:

HTTP/1.1 200 OK

200 indicates a successful response, and the following OK is an explanation.

If the return is not 200, then there are often other functions, such as

  • Failed responses have 404 Not Found: The page does not exist
  • 500 Internal Server Error: Internal server error
  • …etc…
Content-Type: text/html

Content-Type indicates the content of the response, here text/html means HTML webpage.

Please note that browsers rely on Content-Type to determine whether the content of the response is a web page or a picture, whether it is a video or music. The browser does not rely on the URL to determine the content of the response, so even if the URL is http://www.baidu.com/meimei.jpg, it is not necessarily a picture.

The body of the HTTP response is the HTML source code. We can directly view the HTML source code in the browser by selecting "View", "Developer", and "View Web Page Source Code" in the menu bar:
insert image description here

Browser parsing process

When the browser reads the HTML source code of Sina's homepage, it will parse the HTML and display the page, and then, according to the various links in the HTML, send an HTTP request to the Sina server to get the corresponding pictures, videos, Flash, JavaScript Various resources such as scripts and CSS finally display a complete page. So we can see a lot of extra HTTP requests under Network.

insert image description here

3. Summary

3.1 HTTP request

After tracking Sina's homepage, let's summarize the process of HTTP requests:

3.1.1 Step 1:

The browser first sends an HTTP request to the server, the request includes:

  • Method: GET or POST, GET only requests resources, and POST will include user data;
  • Path: /full/url/path;
  • Domain name: specified by the Host header: Host: www.sina.com
  • and other related Headers;
  • If it is a POST, the request also includes a Body containing user data

3.1.2 Step 2:

The server returns an HTTP response to the browser, and the response includes:

  • Response code: 200 means success, 3xx means redirection, 4xx means there is an error in the request sent by the client, and 5xx means an error occurred during processing by the server;
  • Response type: specified by Content-Type;
  • and other related Headers;
  • Usually the server's HTTP response will carry content, that is, there is a Body, which contains the content of the response, and the HTML source code of the web page is in the Body.

3.1.3 Step 3:

If the browser needs to continue to request other resources from the server, such as pictures, it sends an HTTP request again and repeats steps 1 and 2.
insert image description here

The HTTP protocol adopted by the Web adopts a very simple request-response model, which greatly simplifies development. When we write a page, we only need to send HTML in the HTTP request, without considering how to attach pictures, videos, etc. If the browser needs to request pictures and videos, it will send another HTTP request. Therefore, an HTTP The request only processes one resource (at this time, it can be understood as a short connection in the TCP protocol, and each connection only obtains one resource. If you need more than one, you need to establish multiple connections)

The HTTP protocol also has strong scalability. Although the browser requests the home page of http://www.sina.com, Sina can link to resources of other servers in HTML, for example, so that the request pressure is distributed to <img src="http://i1.sinaimg.cn/home/2013/1008/U8455P30DT20131008135420.png">each In addition, a site can link to other sites, and countless sites are linked to each other to form the World Wide Web, or WWW for short.

3.2 HTTP format

Every HTTP request and response follows the same format. An HTTP consists of two parts, Header and Body, where Body is optional.

The HTTP protocol is a text protocol, so its format is also very simple.

3.2.1 Format of HTTP GET request:

GET /path HTTP/1.1
Header1: Value1
Header2: Value2
Header3: Value3

Each Header is one per line, and the line break is \r\n.

3.2.2 Format of HTTP POST request:

POST /path HTTP/1.1
Header1: Value1
Header2: Value2
Header3: Value3

body data goes here...

3.2.3 Format of HTTP response:

The format of the HTTP response is similar to that of the request, and it also consists of two parts: the response line and the response header (Response Headers) and the optional response body (Response Body).

Response line:

HTTP/1.1 200 OK

The first field indicates the HTTP protocol version used, the second field is the response status code, and the third field is the response status message.

Common HTTP status codes are:

  • 200 OK: Indicates that the request was successful
  • 404 Not Found: Indicates that the requested resource was not found
  • 500 Internal Server Error: Indicates an internal server error

Response header:

The response header contains some meta information returned by the server, such as content type, date, server type, etc. For example:

Content-Type: text/html
Content-Length: 1024
Date: Tue, 24 Aug 2023 08:04:28 GMT
Server: Apache/2.4.7 (Ubuntu)

Different response headers use colons to separate key-value pairs, each key-value pair occupies one line, and multiple key-value pairs are separated by carriage return and line feed.

Response body:

The response body contains the data actually returned to the client, such as the source code of the HTML web page, the binary data of the picture, and so on.

To sum it up, the HTTP protocol is a protocol for transferring data between a browser and a server. The browser obtains web page resources by sending HTTP requests, and the server returns the requested resources by sending HTTP responses. In this process, both the request and the response follow a certain format, including information such as request header, request line, response header, and response line. Through the HTTP protocol, the browser can load and display web pages, and interact with the server.

Advanced case

[Python] Python realizes the word guessing game-challenge your intelligence and luck!

[python] Python tkinter library implements GUI program for weight unit converter

[python] Use Selenium to get (2023 Blog Star) entries

[python] Use Selenium and Chrome WebDriver to obtain article information in [Tencent Cloud Studio Practical Training Camp]

Use Tencent Cloud Cloud studio to realize scheduling Baidu AI to realize text recognition

[Fun with Python series [Xiaobai must see] Python multi-threaded crawler: download pictures of emoticon package websites

[Play with Python series] [Must-see for Xiaobai] Use Python to crawl historical data of Shuangseqiu and analyze it visually

[Play with python series] [Must-see for Xiaobai] Use Python crawler technology to obtain proxy IP and save it to a file

[Must-see for Xiaobai] Python image synthesis example using PIL library to realize the synthesis of multiple images by ranks and columns

[Xiaobai must see] Python crawler actual combat downloads pictures of goddesses in batches and saves them locally

[Xiaobai must see] Python word cloud generator detailed analysis and code implementation

[Xiaobai must see] Python crawls an example of NBA player data

[Must-see for Xiaobai] Sample code for crawling and saving Himalayan audio using Python

[Must-see for Xiaobai] Technical realization of using Python to download League of Legends skin pictures in batches

[Xiaobai must see] Python crawler data processing and visualization

[Must-see for Xiaobai] Python crawler program to easily obtain hero skin pictures of King of Glory

[Must-see for Xiaobai] Use Python to generate a personalized list Word document

[Must-see for Xiaobai] Python crawler combat: get pictures from Onmyoji website and save them automatically

Xiaobai must-see series of library management system - sample code for login and registration functions

100 Cases of Xiaobai's Actual Combat: A Complete and Simple Shuangseqiu Lottery Winning Judgment Program, Suitable for Xiaobai Getting Started

Geospatial data processing and visualization using geopandas and shapely (.shp)

Use selenium to crawl Maoyan movie list data

Detailed explanation of the principle and implementation of image enhancement algorithm Retinex

Getting Started Guide to Crawlers (8): Write weather data crawler programs for visual analysis

Introductory Guide to Crawlers (7): Using Selenium and BeautifulSoup to Crawl Douban Movie Top250 Example Explanation [Reptile Xiaobai must watch]

Getting Started Guide to Crawlers (6): Anti-crawlers and advanced skills: IP proxy, User-Agent disguise, Cookie bypass login verification and verification code identification tools

Introductory Guide to Crawlers (5): Distributed Crawlers and Concurrency Control [Implementation methods to improve crawling efficiency and request rationality control]

Getting started with crawlers (4): The best way to crawl dynamic web pages using Selenium and API

Getting Started Guide to Crawlers (3): Python network requests and common anti-crawler strategies

Getting started with crawlers (2): How to use regular expressions for data extraction and processing

Getting started with reptiles (1): Learn the basics and skills of reptiles

Application of Deep Learning Model in Image Recognition: CIFAR-10 Dataset Practice and Accuracy Analysis

Python object-oriented programming basics and sample code

MySQL database operation guide: learn how to use Python to add, delete, modify and query operations

Python file operation guide: encoding, reading, writing and exception handling

Use Python and Selenium to automate crawling#【Dragon Boat Festival Special Call for Papers】Explore the ultimate technology, and the future will be due to you"Zong" #Contributed articles

Python multi-thread and multi-process tutorial: comprehensive analysis, code cases and optimization skills

Selenium Automation Toolset - Complete Guide and Tutorials

Python web crawler basics advanced to actual combat tutorial

Python introductory tutorial: master the basic knowledge of for loop, while loop, string operation, file reading and writing and exception handling

Pandas data processing and analysis tutorial: from basics to actual combat

Detailed explanation of commonly used data types and related operations in Python

[Latest in 2023] Detailed Explanation of Six Major Schemes to Improve Index of Classification Model

Introductory Python programming basics and advanced skills, web development, data analysis, and machine learning and artificial intelligence

Graph prediction results with 4 regression methods: Vector Regression, Random Forest Regression, Linear Regression, K-Nearest Neighbors Regression

Guess you like

Origin blog.csdn.net/qq_33681891/article/details/132476552