concept
web-based
- HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol) .
- Three technologies of WWW (World Wide Web): HTML, HTTP, and URL .
- RFC (Request for Comments, request for comments), the design document of the Internet .
URL
- URL (Uniform Resource Identifier, Uniform Resource Identifier) .
- URL (Uniform Resource Locator, Uniform Resource Locator) .
- URN (Uniform Resource Name, uniform resource name), such as urn:isbn:0-486-27557-4 .
URL includes URL and URN. At present, only URL is popular in WEB, so what you see is basically URL.
request response message
request message
response message
HTTP method
The first line of the request message sent by the client contains the method field.
GET
获取资源
Most of the current network requests use the GET method.
POST
传输实体主体
The main purpose of POST is not to get resources, but to transfer data stored in the content entity.
Both GET and POST requests can use additional parameters, but GET parameters appear in the URL as query fields, while POST parameters are stored in the content entity.
GET /test/demo_form.asp?name1=value1&name2=value2 HTTP/1.1
POST /test/demo_form.asp HTTP/1.1
Host: w3schools.com
name1=value1&name2=value2
HEAD
获取报文首部
Same as the GET method, but does not return the message entity body.
It is mainly used to confirm the validity of URL and the date and time of resource update.
PUT
上传文件
Since it does not have a verification mechanism, anyone can upload files, so there are security issues, and this method is generally not used.
PUT /new.html HTTP/1.1
Host: example.com
Content-type: text/html
Content-length: 16
<p>New File</p>
PATCH
对资源进行部分修改
PUT can also be used to modify resources, but it can only completely replace the original resource, and PATCH allows partial modification.
PATCH /file.txt HTTP/1.1
Host: www.example.com
Content-Type: application/example
If-Match: "e0023aa4e"
Content-Length: 100
[description of changes]
DELETE
删除文件
Contrary to the PUT function, and also without authentication mechanism.
DELETE /file.html HTTP/1.1
OPTIONS
查询支持的方法
Queries the methods supported by the specified URL.
Will return something like Allow: GET, POST, HEAD, OPTIONS.
CONNECT
要求用隧道协议连接代理
The requirement is to establish a tunnel when the proxy server communicates, and use SSL (Secure Sockets Layer, secure socket) and TLS (Transport Layer Security, Transport Layer Security) protocols to encrypt the communication content and transmit it through the network tunnel.
CONNECT www.example.com:443 HTTP/1.1
TRACE
追踪路径
The server returns the communication path to the client.
When sending a request, fill in the value in the Max-Forwards header field, and it will be decremented by 1 every time it passes through a server. When the value is 0, the transmission will stop.
TRACE is usually not used, and it is vulnerable to XST attacks (Cross-Site Tracing, cross-site tracking), so it is even less likely to be used.
HTTP status code
The first line of the status line in the response message returned by the server contains the status code and reason phrase, which is used to inform the client of the result of the request.
status code | category | reason phrase |
---|---|---|
1XX | Informational (informational status code) | The received request is being processed |
2XX | Success (success status code) | The request is processed normally |
3XX | Redirection (redirection status code) | Additional action is required to complete the request |
4XX | Client Error (client error status code) | The server was unable to process the request |
5XX | Server Error (server error status code) | An error occurred while the server was processing the request |
2xx success
- 200 OK
- 204 No Content : The request has been successfully processed, but the returned response message does not contain the body of the entity. It is generally used when you only need to send information from the client to the server, but do not need to return data.
- 206 Partial Content : Indicates that the client has made a range request. The response message contains the entity content in the range specified by Content-Range.
3XX redirection
- 301 Moved Permanently : Permanent redirection
- 302 Found : Temporary redirection
- 303 See Other : It has the same function as 302, but 303 clearly requires that the client should use the GET method to obtain resources.
- Note: Although the HTTP protocol stipulates that it is not allowed to change the POST method to the GET method when redirecting in the 301 and 302 states, most browsers will change the POST method to the GET method in the redirection in the 301, 302 and 303 states.
- 304 Not Modified : If the header of the request message contains some conditions, such as: If-Match, If-ModifiedSince, If-None-Match, If-Range, If-Unmodified-Since, if the conditions are not met, the server will return a 304 status code.
- 307 Temporary Redirect : Temporary redirection, similar to 302 in meaning, but 307 requires that the browser will not change the POST method of the redirection request to the GET method.
4XX client errors
- 400 Bad Request : There is a syntax error in the request message.
- 401 Unauthorized : This status code indicates that the sent request requires authentication information (BASIC authentication, DIGEST authentication). If a request has been made before, it means that the user authentication failed.
- 403 Forbidden : The request is rejected, and the server does not need to give detailed reasons for the rejection.
- 404 Not Found
5XX server error
- 500 Internal Server Error : An error occurred while the server was executing the request.
- 503 Service Unavilable : The server is temporarily overloaded or down for maintenance, and cannot process requests now.
HTTP header
There are 4 types of header fields: general header field, request header field, response header field and entity header field .
General header field
header field name | illustrate |
---|---|
Cache-Control | Control cache behavior |
Connection | Control header fields that are no longer forwarded to proxies, manage persistent connections |
Date | Date and time the message was created |
Pragma | message command |
Trailer | List of headers at the end of the message |
Transfer-Encoding | Specifies the transfer encoding method for the body of the message |
Upgrade | Upgrade to another protocol |
Via | Information about proxy servers |
Warning | error notification |
request header field
header field name | illustrate |
---|---|
Accept | The media types that the user agent can handle |
Accept-Charset | preferred character set |
Accept-Encoding | preferred content encoding |
Accept-Language | Preferred Language (Natural Language) |
Authorization | Web authentication information |
Expect | Expect specific behavior from the server |
From | User's email address |
Host | The server where the resource is requested |
If-Match | Compare Entity Tags (ETags) |
If-Modified-Since | Compare resource update times |
If-None-Match | Compare Entity Tags (as opposed to If-Match) |
If-Range | Send entity Byte range request when resource is not updated |
If-Unmodified-Since | Compare resource update times (as opposed to If-Modified-Since) |
Max-Forwards | Maximum transmission hop-by-hop |
Proxy-Authorization | The proxy server asks for authentication information from the client |
Range | Entity byte range request |
Refer | The original getter for the URI in the request |
THE | Priority of transfer encoding |
User-Agent | Information for HTTP client programs |
response header field
header field name | illustrate |
---|---|
Accept-Ranges | Whether to accept byte range requests |
Age | Estimated resource creation elapsed time |
ETag | is the matching information of the resource |
Location | Redirect the client to the specified URI |
Proxy-Authenticate | 代理服务器对客户端的认证信息 |
Retry-After | 对再次发起请求的时机要求 |
Server | HTTP 服务器的安装信息 |
Vary | 代理服务器缓存的管理信息 |
WWW-Authenticate | 服务器对客户端的认证信息 |
实体首部字段
首部字段名 | 说明 |
---|---|
Allow | 资源可支持的 HTTP 方法 |
Content-Encoding | 实体主体适用的编码方式 |
Content-Language | 实体主体的自然语言 |
Content-Length | 实体主体的大小 |
Content-Location | 替代对应资源的 URI |
Content-MD5 | 实体主体的报文摘要 |
Content-Range | 实体主体的位置范围 |
Content-Type | 实体主体的媒体类型 |
Expires | 实体主体过期的日期时间 |
Last-Modified | 资源的最后修改日期时间 |
Cookie
HTTP 协议是无状态的,主要是为了让 HTTP 协议尽可能简单,是它能够处理更大的事务。HTTP/1.1 引入 Cookie 来保存状态信息。
Cookie 是服务器发送给客户端的数据,该数据会被保存在浏览器中,并且客户端的下次请求保存会包含该数据。通过 Cookie 可以让服务器知道两个请求是否来自同一个客户端,从而实现保持登录状态等功能。
创建过程
服务器发送的响应报文包含 set-Cookie 字段,客户端得到响应报文后把 Cookie 内容保存到浏览器中。
HTTP/1.0 200 OK
Content-type: text/html
Set-Cookie: yummy_cookie=choco
Set-Cookie: tasty_cookie=strawberry
[page content]
客户端之后发送请求时,会从浏览器中读出 Cookie 值,在请求报文中包含 Cookie 字段。
GET /sample_page.html HTTP/1.1
Host: www.example.org
Cookie: yummy_cookie=choco; tasty_cookie=strawberry
Set-Cookie
属性 | 说明 |
---|---|
NAME=VALUE | 赋予 Cookie 的名称和其值(必需项) |
expires=DATE | Cookie 的有效期(若不明确指定则默认为浏览器关闭前为止) |
path=PATH | 将服务器上的文件目录作为 Cookie 的适用对象(若不指定则默认为文档所在的文件目录) |
domain=域名 | 作为 Cookie 适用对象的域名(若不指定则默认为创建 Cookie 的服务器的域名) |
Secure | 仅在 HTTPs 安全通信时才会发送 Cookie |
HttpOnly | 加以限制,使 Cookie 不能被 JavaScript 脚本访问 |
Session 和 Cookie 区别
Session 是服务器用来跟踪用户的一种手段,每个 Session 都有一个唯一标识:Session ID。当服务器创建一个 Session 时,给客户端发送的响应报文包含 Set-Cookie 字段,其中有个名为 sid 的键值对,这个键值对就是 Session ID 。客户端接收到后就把 Cookie 保存在浏览器中,并且之后发送的请求报文都包含 Session ID 。HTTP 就是通过 Session 和 Cookie 两种方式一起合作来实现跟踪用户状态的,Session 用于服务器端,Cookie 用于客户端。
浏览器禁用 Cookie 的情况
会使用 URL 重写技术,在 URL 后面加上 sid=xxx 。
使用 Cookie 实现用户名和密码的自动填写
网站脚本会自动从保存在浏览器中的 Cookie 读取用户名和密码,从而实现自动填写。
缓存
优点:
- 降低服务器的负担
- 提高响应速度(缓存资源比服务器上的资源离客户端更近)
实现方法:
- 让代理服务器进行
- 让客户端浏览器缓存
Cache-Control 字段
HTTP 通过 Cache-Control 首部字段来控制缓存。
Cache-Control: private, max-age=0, no-cache
no-cache 指令
该指令出现在请求报文的 Cache-Control 字段中,表示缓存服务器需要先向原服务器验证缓存资源是否过期;
该指令出现在响应报文的 Cache-Control 字段中,表示缓存服务器在进行缓存之前需要先验证缓存资源的有效性。
no-store 指令
该指令表示缓存服务器不能对请求或响应的任何一部分进行缓存。
no-cache 不表示不缓存,而是缓存之前需要先进行验证,no-store 才是不进行缓存。
max-age 指令
该指令出现在请求报文的 Cache-Control 字段中,如果缓存资源的缓存时间小于该指令指定的时间,那么就能接受该缓存。
该指令出现在响应报文的 Cache-Control 字段中,表示缓存资源在缓存服务器中保存的时间。
Expires 字段也可以用于告知缓存服务器该资源什么时候会过期。在 HTTP/1.1 中,会优先处理 Cache-Control : max-age 指令;而在 HTTP/1.0 中,Cache-Control : max-age 指令会被忽略掉。
持久连接
当浏览器访问一个包含多张图片的 HTML 页面时,除了请求访问 HTML 页面资源,还会请求图片资源,如果每进行一次 HTTP 通信就要断开一次 TCP 连接,连接建立和断开的开销会很大。持久连接只需要建立一次 TCP 连接就能进行多次 HTTP 通信。
持久连接需要使用 Connection 首部字段进行管理。HTTP/1.1 开始 HTTP 默认是持久化连接的,如果要断开 TCP 连接,需要由客户端或者服务器端提出断开,使用 Connection : close;而在 HTTP/1.1 之前默认是非持久化连接的,如果要维持持续连接,需要使用 Connection : Keep-Alive。
管线化方式 可以同时发送多个请求和响应,而不需要发送一个请求然后等待响应之后再发下一个请求。
通信数据转发
代理
代理服务器接受客户端的请求,并且转发给其它服务器。
使用代理的主要目的是:缓存、网络访问控制以及访问日志记录。
代理服务器分为正向代理和反向代理两种,用户察觉得到正向代理的存在,而反向代理一般位于内部网络中,用户察觉不到。
网关
与代理服务器不同的是,网关服务器会将 HTTP 转化为其它协议进行通信,从而请求其它非 HTTP 服务器的服务。
隧道
使用 SSL 等加密手段,为客户端和服务器之间建立一条安全的通信线路。
版本比较
The difference between HTTP/1.0 and HTTP/1.1
- http/1.1 uses a long connection and http1.0 uses a short connection.
- HTTP/1.1 adds a version number to the message for extended compatibility.
- The cache mechanism of http/1.1 is more flexible.
- HTTP/1.1 optimizes bandwidth.
- http/1.0 only defines 16 status response codes, while http/1.1 defines 24 status codes.
- One server in http/1.0 can only bind one address, while one server in http/1.1 can have multiple virtual hosts sharing the same IP address, because both requests and responses support the Host header field.
The difference between HTTP/1.1 and HTTP/2.0
multiplexing
HTTP/2.0 uses multiplexing, using the same TCP connection to handle multiple requests.
header compression
The headers of HTTP/1.1 carry a lot of information and have to be sent repeatedly every time. HTTP/2.0 requires both communication parties to cache a header field table, thereby avoiding repeated transmission.
server push
When the client requests a resource, related resources will be sent to the client together, and the client does not need to initiate the request again. For example, if the client requests the index.html page, the server will send the index.js to the client together.
binary format
HTTP/1.1's parsing is text-based, while HTTP/2.0 uses a binary format.