About the HTTP protocol

Introduction to HTTP

The HTTP protocol is an abbreviation of Hyper Text Transfer Protocol (Hyper Text Transfer Protocol) and is a transfer protocol used to transfer hyper text from a World Wide Web (WWW: World Wide Web) server to a local browser.

HTTP is a TCP / IP communication protocol to transfer data (HTML files, image files, query results, etc.).

HTTP is an object-oriented protocol belonging to the application layer. Due to its simple and fast method, it is suitable for distributed hypermedia information systems. It was proposed in 1990. After several years of use and development, it has been continuously improved and expanded. The sixth edition of HTTP / 1.0 is currently used in WWW. The standardization of HTTP / 1.1 is in progress, and the proposal of HTTP-NG (Next Generation of HTTP) has been put forward.

The HTTP protocol works on the client-server architecture. As an HTTP client, the browser sends all requests to the HTTP server, that is, the WEB server, through the URL. After receiving the request, the Web server sends a response message to the client.


http request-response model.jpg

main feature

1. Simple and fast: When a client requests a service from a server, only the request method and path need to be transmitted. Common request methods include GET, HEAD, and POST. Each method specifies a different type of client-server contact. Because the HTTP protocol is simple, the program size of the HTTP server is small, so the communication speed is fast.

2. Flexible: HTTP allows transmission of any type of data object. The type being transmitted is marked by Content-Type.

3. No connection: The meaning of connectionless is to limit the processing of only one request per connection. After the server processes the client's request and receives the client's response, it disconnects. Using this method can save transmission time.

4. Stateless: HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory for transaction processing. The lack of state means that if subsequent processing requires the previous information, it must be retransmitted, which may result in an increase in the amount of data transferred per connection. On the other hand, when the server does not need previous information, its response is faster.
5. Support B / S and C / S modes.

HTTP之URL

HTTP uses Uniform Resource Identifiers (URI) to transmit data and establish connections. A URL is a special type of URI that contains enough information to find a resource

URL, the full name is UniformResourceLocator, the Chinese name is the uniform resource locator, is the address used to identify a resource on the Internet. Take the following URL as an example to introduce the components of a common URL:

http://www.aspxfans.com:8080/news/index.asp?boardID=5&ID=24618&page=1#name

As can be seen from the above URL, a complete URL includes the following parts:
1. Protocol part: The protocol part of the URL is "http:", which means that the web page uses the HTTP protocol. Various protocols can be used on the Internet, such as HTTP, FTP, etc. In this example, the HTTP protocol is used. "//" after "HTTP" is the separator

2. Domain name part: The domain name part of the URL is "www.aspxfans.com". In a URL, you can also use the IP address as a domain name

3. Port part: The port that follows the domain name is the port. Use ":" as the delimiter between the domain name and the port. The port is not a necessary part of a URL, if you omit the port part, the default port will be used

4. Virtual directory part: From the first "/" after the domain name to the last "/", it is the virtual directory part. Virtual directories are not a necessary part of a URL. The virtual directory in this example is "/ news /"

5. File name part: From the last "/" after the domain name to "?", It is the file name part. If there is no "?", It starts from the last "/" after the domain name to "#" , Is the file part. If there is no "?" And "#", then from the last "/" after the domain name to the end, it is the file name part. The file name in this example is "index.asp". The file name part is not a required part of a URL. If this part is omitted, the default file name is used.

6. Anchor part: from "#" to the end, all are anchor parts. The anchor part in this example is "name". The anchor part is not a part of a URL

7. Parameter part: the part from "?" To "#" is the parameter part, also called search part and query part. The parameter part in this example is "boardID = 5 & ID = 24618 & page = 1". Parameters can allow multiple parameters, and use "&" as the delimiter between parameters.

(Original: http://blog.csdn.net/ergouge/article/details/8185219  )

The difference between URI and URL

URI is a uniform resource identifier, a uniform resource identifier used to uniquely identify a resource.

Each resource available on the Web such as HTML documents, images, video clips, programs are to a URI to locate the
URI is generally composed of three:
① access to the resource naming scheme
② storage resource host name
③ name of the resource itself , Represented by the path, with an emphasis on resources.

The URL is a uniform resource locator, a uniform resource locator. It is a specific URI, that is, the URL can be used to identify a resource, and also indicates how to locate the resource.

The URL is a character string used to describe information resources on the Internet, and is mainly used in various WWW client programs and server programs, especially the famous Mosaic.
Using URL can describe a variety of information resources in a unified format, including files, server addresses and directories. The URL generally consists of three parts:
①Protocol (or called service mode)
②Host IP address (sometimes also including port number) where the resource is stored ③Specific address of the
host resource. Such as directory and file name

URN, uniform resource name, uniform resource name, is to identify resources by name, such as mailto: [email protected].

URI is an abstract, high-level concept that defines a uniform resource identifier, while URL and URN are specific resource identifiers. Both URL and URN are a type of URI. Broadly speaking, every URL is a URI, but not every URI is a URL. This is because the URI also includes a subclass, Uniform Resource Name (URN), which names the resource but does not specify how to locate the resource. The mailto, news, and isbn URIs above are all examples of URNs.

In Java's URI, a URI instance can represent absolute or relative, as long as it conforms to the URI's grammar rules. The URL class not only conforms to the semantics, but also contains information to locate the resource, so it cannot be relative.
In the Java class library, the URI class does not contain any method for accessing resources, its only role is to parse.
On the contrary, the URL class can open a stream to the resource.

HTTP request message

The request message that the client sends an HTTP request to the server includes the following format:

The request line (request line), request header (header), blank line and request data are composed of four parts.




Http request message structure.png
  • The request line begins with a method symbol, separated by a space, followed by the requested URI and protocol version.
Get request example, using Charles to grab the request:
GET /562f25980001b1b106000338.jpg HTTP/1.1
Host    img.mukewang.com
User-Agent    Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36 Accept image/webp,image/*,*/*;q=0.8 Referer http://www.imooc.com/ Accept-Encoding gzip, deflate, sdch Accept-Language zh-CN,zh;q=0.8
The first part: the request line, used to indicate the type of request, the resource to be accessed and the HTTP version used.

GET indicates that the request type is GET, [/562f25980001b1b106000338.jpg] is the resource to be accessed, and the last part of the line indicates that the HTTP 1.1 version is used.

The second part: the request header, the part immediately after the request line (ie the first line) is used to explain the additional information to be used by the server

From the second line is the request header, HOST will indicate the destination of the request. User-Agent, server-side and client-side script can access it, it is an important basis for browser type detection logic. This information is determined by your To define and automatically send in each request, etc.

The third part: blank line, the blank line after the request header is required

Even if the request data in the fourth part is empty, there must be a blank line.

Part 4: The request data is also called the subject, and any other data can be added.

The request data for this example is empty.

POST request example, using Charles to grab the request:
POST / HTTP1.1
Host:www.wrox.com
User-Agent:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022) Content-Type:application/x-www-form-urlencoded Content-Length:40 Connection: Keep-Alive name=Professional%20Ajax&publisher=Wiley

The first part: the request line, the first line is the post request, and the http1.1 version.
The second part: request header, the second line to the sixth line.
The third part: blank line, blank line of the seventh line.
Part IV: Request data, line eight.

HTTP response message Response

Generally, the server will return an HTTP response message after receiving and processing the request from the client.

The HTTP response also consists of four parts, namely: status line, message header, blank line, and response body.

 


http response message format.jpg

example

HTTP/1.1 200 OK
Date: Fri, 22 May 2009 06:07:21 GMT
Content-Type: text/html; charset=UTF-8

<html>
      <head></head> <body> <!--body goes here--> </body> </html>
The first part: the status line, which consists of HTTP protocol version number, status code, and status message.

The first line is the status line, (HTTP / 1.1) indicates that the HTTP version is version 1.1, the status code is 200, and the status message is (ok)

The second part: the message header, used to explain some additional information to be used by the client

The second and third lines are the message header,
Date: the date and time when the response was generated; Content-Type: HTML (text / html) that specifies the MIME type, and the encoding type is UTF-8

Part 3: Blank lines, blank lines after the message header are required
The fourth part: the text of the response body, the server returns to the client.

The html part after the blank line is the response body.

HTTP status code

The status code consists of three digits. The first digit defines the response category, which is divided into five categories:

1xx: Indication message-indicates that the request has been received, continue processing
2xx: Success--Indicates that the request has been successfully received, understood, and accepted
3xx: Redirection--further operations must be performed to complete the request
4xx: Client error-the request has a syntax error or the request cannot be fulfilled
5xx: Server-side error-the server failed to fulfill a legitimate request

Common status codes:

200 OK                        //客户端请求成功
400 Bad Request //客户端请求有语法错误,不能被服务器所理解 401 Unauthorized //请求未经授权,这个状态代码必须和WWW-Authenticate报头域一起使用 403 Forbidden //服务器收到请求,但是拒绝提供服务 404 Not Found //请求资源不存在,eg:输入了错误的URL 500 Internal Server Error //服务器发生不可预期的错误 503 Server Unavailable //服务器当前不能处理客户端的请求,一段时间后可能恢复正常

More status codes http://www.runoob.com/http/http-status-codes.html

HTTP request method

According to the HTTP standard, HTTP requests can use multiple request methods.
HTTP 1.0 defines three request methods: GET, POST and HEAD methods.
HTTP1.1 adds five new request methods: OPTIONS, PUT, DELETE, TRACE and CONNECT methods.

GET     请求指定的页面信息,并返回实体主体。
HEAD     类似于get请求,只不过返回的响应中没有具体的内容,用于获取报头
POST     向指定资源提交数据进行处理请求(例如提交表单或者上传文件)。数据被包含在请求体中。POST请求可能会导致新的资源的建立和/或已有资源的修改。
PUT     从客户端向服务器传送的数据取代指定的文档的内容。
DELETE      请求服务器删除指定的页面。
CONNECT HTTP/1.1协议中预留给能够将连接改为管道方式的代理服务器。 OPTIONS 允许客户端查看服务器的性能。 TRACE 回显服务器收到的请求,主要用于测试或诊断。

How HTTP works

The HTTP protocol defines how a Web client requests Web pages from a Web server, and how the server transmits Web pages to the client. The HTTP protocol uses a request / response model. The client sends a request message to the server. The request message contains the request method, URL, protocol version, request header, and request data. The server responds with a status line. The content of the response includes the protocol version, success or error codes, server information, response headers, and response data.

The following are the steps of the HTTP request / response:

1. The client connects to the web server

An HTTP client, usually a browser, establishes a TCP socket connection with the HTTP port of the web server (80 by default). For example, http://www.oakcms.cn.

2. Send an HTTP request

Through the TCP socket, the client sends a text request message to the Web server. A request message consists of a request line, a request header, a blank line, and request data.

3. The server accepts the request and returns an HTTP response

The web server parses the request and locates the requested resource. The server writes a copy of the resource to the TCP socket, which is read by the client. A response consists of 4 parts: status line, response header, blank line and response data.

4. Release the TCP connection

If the connection mode is close, the server actively closes the TCP connection , and the client passively closes the connection to release the TCP connection ; if the connection mode is keepalive, the connection will remain for a period of time, and can continue to receive requests during that time;

5. The client browser parses the HTML content

The client browser first parses the status line and looks at the status code indicating whether the request was successful. Then each response header is parsed, and the response header informs the following of a few bytes of HTML document and the character set of the document. The client browser reads the response data HTML, formats it according to the syntax of HTML, and displays it in the browser window.

For example: type the URL in the address bar of the browser and press Enter to go through the following process:

1. The browser requests the DNS server to resolve the IP address corresponding to the domain name in the URL;

2. After resolving the IP address, establish a TCP connection with the server based on the IP address and the default port 80 ;

3. The browser sends an HTTP request to read the file (the file corresponding to the domain name in the URL), and the request message is  sent to the server as the data of the third message of the TCP three-way handshake ;

4. The server responds to the browser request and sends the corresponding html text to the browser;

5. Release the  TCP connection ;

6. The browser displays the html text and displays the content;   

The difference between GET and POST requests

GET request
GET /books/?sex=man&name=Professional HTTP/1.1
Host: www.wrox.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1 Connection: Keep-Alive

Note that the last line is a blank line

POST request
POST / HTTP/1.1
Host: www.wrox.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1 Content-Type: application/x-www-form-urlencoded Content-Length: 40 Connection: Keep-Alive name=Professional%20Ajax&publisher=Wiley

1. Submitted by GET, the requested data will be appended to the URL (that is, the data is placed in the HTTP protocol header), use? To split the URL and transmit the data, multiple parameters are connected with &; for example: login.action? Name = hyddd & password = idontknow & verify =% E4% BD% A0% E5% A5% BD. If the data is English letters / numbers, send it as it is, if it is a space, convert it to +, if it is Chinese / other characters, then directly encrypt the string with BASE64, resulting in:% E4% BD% A0% E5% A5% BD, where XX in% XX is the ASCII representation of the symbol in hexadecimal.

POST submission: Place the submitted data in the body of the HTTP package. The red font in the above example indicates the actual transmission data

Therefore, the data submitted by GET will be displayed in the address bar, and the address bar will not change after POST submission

2. The size of the transmitted data: First of all, it is stated that the HTTP protocol does not limit the size of the transmitted data, and the HTTP protocol specification does not limit the length of the URL.

The main limitations in actual development are:

GET : Certain browsers and servers have restrictions on the length of URLs. For example, IE has a limit of 2083 bytes (2K + 35). For other browsers, such as Netscape, FireFox, etc., there is no theoretical length limit, and the limit depends on the support of the operating system.

Therefore, for GET submission, the transmission data will be limited by the length of the URL.

POST : Since the value is not passed through the URL, the data is theoretically not limited. But in fact, each WEB server will stipulate the limit on the size of post submission data. Apache and IIS6 have their own configurations.

3. Security

The security of POST is higher than that of GET. For example: submit data via GET, the username and password will appear in plain text on the URL, because (1) the login page may be cached by the browser; (2) others view the browser history, then others can get yours Account and password, in addition, using GET to submit data may also cause Cross-site request forgery attacks

4. Http get, post and soap protocols all run on http

(1) get: The request parameter is a sequence of key / value pairs (query string) attached to the URL. The length of the
query string is limited by the web browser and web server (such as IE supports up to 2048 characters), Not suitable for transmitting large data sets at the same time, it is very insecure

(2) post: The request parameters are transmitted in a different part of the http header (named entity body). This part is used to transmit form information, so the Content-type must be set to: application / x-www-form- urlencoded . Post is designed to support user fields on web forms, and its parameters are also transmitted as key / value pairs.
But: it does not support complex data types, because the post does not define the semantics and rules of the transmission data structure.

(3) soap: It is a special version of http post, following a special xml message format
Content-type set to: text / xml Any data can be xmlized.

The Http protocol defines a lot of methods for interacting with the server, the most basic of which are four types, namely GET, POST, PUT, and DELETE. A URL address is used to describe a resource on the network, and HTTP GET, POST, PUT, DELETE corresponds to the four operations of checking, modifying, adding, and deleting this resource. The most common ones are GET and POST. GET is generally used to obtain / query resource information, while POST is generally used to update resource information.

Let's see the difference between GET and POST

    1. The data submitted by GET will be placed after the URL, and the URL and transmission data will be separated by?, And the parameters are connected by &, such as EditPosts.aspx? Name = test1 & id = 123456. The POST method is to put the submitted data in the body of the HTTP package .

    2. The size of the data submitted by GET is limited (because the browser has a limit on the length of the URL), while the data submitted by the POST method is not limited.

    3. The GET method needs to use Request.QueryString to get the value of the variable, while the POST method uses Request.Form to get the value of the variable.

    4. Submitting data by GET will bring security problems, such as a login page, when submitting data by GET, the username and password will appear on the URL, if the page can be cached or others can access this machine, you can start from Record the account and password of the user.

Published 7 original articles · 69 praises · 200,000+ views

Guess you like

Origin blog.csdn.net/u014320421/article/details/79641480