Transfer: HTTP protocol Introduction and use in python Detailed

1. Use the Google / Firefox analysis

In Web applications, the web server to pass browser, in fact, is to send a web page HTML code to the browser so that the browser is displayed. The transport agreement between the browser and the server is HTTP, so:

  • HTML is a Web page used to define the text will be HTML, you can write on the page;

  • HTML is the HTTP transport protocol over a network, for communicating the browser and the server.

Chrome browser provides a complete set of debugging tools, ideal for Web development.

After installing the Chrome browser, Chrome opens, open the page directly press F12 "" to

Explanation

  • Elements shows the structure of a web page
  • Network communication display browser and server

We ordered Network, to ensure that the first small red light lit, Chrome will record all communication between the browser and the server:

2. http protocol analysis

When we entered www.sina.com in the address bar, the browser will display Sina home. In this process, the browser what did it matter? By recording Network, we can know. In Network, find www.sina.com that record, click on the right displays the Request Headers, click view source on the right side, we can see the browser requests sent to the server Sina:

2.1 browser request

 

Explanation

The main analysis of the first two lines below the first line:

    GET / HTTP/1.1

  
  

GET represent a read request to obtain data from a web server, / represents the URL path, URL always begins with a /, / home, says the final HTTP / 1.1 indicates the HTTP protocol version used is 1.1. The current version of the HTTP protocol is 1.1, but most server also supports version 1.0, the main difference is that version 1.1 allows multiple HTTP requests a TCP connection multiplexing to speed up the transfer speed.

Start from the second row, each row is similar Xxx: abcdefg:

    Host: www.sina.com

  
  

It represents the requested domain name is www.sina.com. If a server has multiple sites, servers need to be distinguished by the Host browser requests which site.

2.2 server response

Continue down to find the Response Headers, click view source, display raw response data returned by the server:

HTTP响应分为Header和Body两部分(Body是可选项),我们在Network中看到的Header最重要的几行如下:

    HTTP/1.1 200 OK

  
  

200表示一个成功的响应,后面的OK是说明。

如果返回的不是200,那么往往有其他的功能,例如

  • 失败的响应有404 Not Found:网页不存在
  • 500 Internal Server Error:服务器内部出错
  • ...等等...
    Content-Type: text/html

  
  

Content-Type指示响应的内容,这里是text/html表示HTML网页。

请注意,浏览器就是依靠Content-Type来判断响应的内容是网页还是图片,是视频还是音乐。浏览器并不靠URL来判断响应的内容,所以,即使URL是http://www.baidu.com/meimei.jpg,它也不一定就是图片。

HTTP响应的Body就是HTML源码,我们在菜单栏选择“视图”,“开发者”,“查看网页源码”就可以在浏览器中直接查看HTML源码:

浏览器解析过程 

当浏览器读取到新浪首页的HTML源码后,它会解析HTML,显示页面,然后,根据HTML里面的各种链接,再发送HTTP请求给新浪服务器,拿到相应的图片、视频、Flash、JavaScript脚本、CSS等各种资源,最终显示出一个完整的页面。所以我们在Network下面能看到很多额外的HTTP请求。

3. 总结

3.1 HTTP请求

跟踪了新浪的首页,我们来总结一下HTTP请求的流程:

3.1.1 步骤1:浏览器首先向服务器发送HTTP请求,请求包括:

方法:GET还是POST,GET仅请求资源,POST会附带用户数据;

路径:/full/url/path;

域名:由Host头指定:Host: www.sina.com

以及其他相关的Header;

如果是POST,那么请求还包括一个Body,包含用户数据

3.1.1 步骤2:服务器向浏览器返回HTTP响应,响应包括:

响应代码:200表示成功,3xx表示重定向,4xx表示客户端发送的请求有错误,5xx表示服务器端处理时发生了错误;

响应类型:由Content-Type指定;

以及其他相关的Header;

通常服务器的HTTP响应会携带内容,也就是有一个Body,包含响应的内容,网页的HTML源码就在Body中。

3.1.1 步骤3:如果浏览器还需要继续向服务器请求其他资源,比如图片,就再次发出HTTP请求,重复步骤1、2。

Web采用的HTTP协议采用了非常简单的请求-响应模式,从而大大简化了开发。当我们编写一个页面时,我们只需要在HTTP请求中把HTML发送出去,不需要考虑如何附带图片、视频等,浏览器如果需要请求图片和视频,它会发送另一个HTTP请求,因此,一个HTTP请求只处理一个资源(此时就可以理解为TCP协议中的短连接,每个链接只获取一个资源,如需要多个就需要建立多个链接)

HTTP协议同时具备极强的扩展性,虽然浏览器请求的是http://www.sina.com的首页,但是新浪在HTML中可以链入其他服务器的资源,比如<img src="http://i1.sinaimg.cn/home/2013/1008/U8455P30DT20131008135420.png">,从而将请求压力分散到各个服务器上,并且,一个站点可以链接到其他站点,无数个站点互相链接起来,就形成了World Wide Web,简称WWW。

3.2 HTTP格式

每个HTTP请求和响应都遵循相同的格式,一个HTTP包含Header和Body两部分,其中Body是可选的。

HTTP协议是一种文本协议,所以,它的格式也非常简单。

3.2.1 HTTP GET请求的格式


  
  
  1. GET /path HTTP/ 1.1
  2. Header1: Value1
  3. Header2: Value2
  4. Header3: Value3

每个Header一行一个,换行符是\r\n。

3.2.2 HTTP POST请求的格式:


  
  
  1. POST /path HTTP/ 1.1
  2. Header1: Value1
  3. Header2: Value2
  4. Header3: Value3
  5. body data goes here...

When faced with two consecutive \ r \ n, the end of the Header portion, the following data are all Body.

3.2.3 HTTP response format:


  
  
  1. 200 OK
  2. Header1: Value1
  3. Header2: Value2
  4. Header3: Value3
  5. body data goes here...

If the HTTP response contains body, also through the \ r \ n \ r \ n be separated.

Note again, Body of data type is determined by the Content-Type head, if a web page, Body text is, if a picture, Body is the picture of binary data.

When there is Content-Encoding, Body data is compressed, the most common compression method is gzip, therefore, see Content-Encoding: gzip, you need to decompress the data Body first, to get the real data. Compression aims to reduce the size of the Body to accelerate network transmission.

This switched: https: //blog.csdn.net/qq_26442553/article/details/95031100 Author: Cattle large fortune only

Old ape Python, with the old ape learn Python!
Blog address: https: //blog.csdn.net/LaoYuanPython

please support, thumbs up, comment and processing concern! Thank you!

Guess you like

Origin blog.csdn.net/LaoYuanPython/article/details/95305524