Everything you want about the HTTP protocol (Daquan) is here

**Official http protocol: https://www.cnblogs.com/ranyonsue/p/5984001.html]

Introduction to HTTP

The HTTP protocol is the abbreviation of Hyper Text Transfer Protocol (Hyper Text Transfer Protocol), which is a transfer protocol used to transfer hypertext from a World Wide Web (WWW: World Wide Web) server to a local browser.

HTTP is a communication protocol based on TCP/IP to transfer data (HTML files, image files, query results, etc.).

HTTP is an object-oriented protocol belonging to the application layer. Due to its simple and fast method, it is suitable for distributed hypermedia information systems. It was proposed in 1990, and after several years of use and development, it has been continuously improved and expanded. Currently, the sixth edition of HTTP/1.0 is used in the WWW. The standardization of HTTP/1.1 is in progress, and the HTTP-NG (Next Generation of HTTP) proposal has been put forward.

The HTTP protocol works on a client-server architecture. The browser, as an HTTP client, sends all requests to the HTTP server, namely the WEB server, through the URL. The web server sends response information to the client according to the received request.

http request-response model.jpg

main feature

1. Simple and fast: When a client requests a service from the server, it only needs to transmit the request method and path. Commonly used request methods are GET, HEAD, and POST. Each method provides a different type of contact between the client and the server. Because the HTTP protocol is simple, the program size of the HTTP server is small, and the communication speed is very fast.

2. Flexible: HTTP allows the transmission of any type of data object. The type being transmitted is marked by Content-Type.

3. No connection: The meaning of no connection is to limit each connection to only process one request. After the server has processed the client's request and received the client's response, it will disconnect. In this way, transmission time can be saved.

4. Stateless: HTTP protocol is a stateless protocol. Statelessness means that the protocol has no memory capacity for transaction processing. The lack of status means that if the previous information is needed for subsequent processing, it must be retransmitted, which may result in an increase in the amount of data transmitted per connection. On the other hand, when the server does not need previous information, its response is faster.
5. Support B/S and C/S mode.

HTTP之URL

HTTP uses Uniform Resource Identifiers (URI) to transmit data and establish connections. URL is a special type of URI that contains enough information to find a certain resource

URL, the full name is UniformResourceLocator, is called Uniform Resource Locator in Chinese, and is the address used to identify a resource on the Internet. Take the following URL as an example to introduce the components of a common URL:

http://www.aspxfans.com:8080/news/index.asp?boardID=5&ID=24618&page=1#name
As can be seen from the above URL, a complete URL includes the following parts:
1. Protocol part: this The protocol part of the URL is "http:", which means that the web page uses the HTTP protocol. Many protocols can be used in the Internet, such as HTTP, FTP, etc. In this case, the HTTP protocol is used. "//" after "HTTP" is the separator

2. Domain name part: The domain name part of the URL is "www.aspxfans.com". In a URL, you can also use the IP address as a domain name

3. Port part: The port is followed by the domain name, and ":" is used as a separator between the domain name and the port. The port is not a required part of a URL. If the port part is omitted, the default port will be used

4. Virtual directory part: from the first "/" after the domain name to the last "/", it is the virtual directory part. The virtual directory is also not a required part of a URL. The virtual directory in this example is "/news/"

5. File name part: from the last "/" after the domain name to "?", it is the file name part, if there is no "?", it starts from the last "/" after the domain name to "#" , Is the file part, if there is no "?" and "#", then from the last "/" after the domain name to the end, it is the file name part. The file name in this example is "index.asp". The file name part is not a required part of a URL. If this part is omitted, the default file name will be used.

6. Anchor part: From "#" to the end, it is the anchor part. The anchor part in this example is "name". The anchor part is not a required part of a URL

7. Parameter part: The part from "?" to "#" is the parameter part, also known as the search part and the query part. The parameter part in this example is "boardID=5&ID=24618&page=1". The parameter can allow multiple parameters, and the "&" is used as the separator between the parameter and the parameter.

(Original: http://blog.csdn.net/ergouge/article/details/8185219)

The difference between URI and URL

URI is a uniform resource identifier, which is used to uniquely identify a resource.
Every resource available on the Web, such as HTML documents, images, video clips, programs, etc., is a URI to locate the
URI generally consists of three parts:
①Access to the resource naming mechanism
②The host name of the
resource ③The name of the resource itself , Represented by the path, with emphasis on resources.

URL is a uniform resource locator, which is a specific URI, that is, URL can be used to identify a resource, and it also specifies how to locate the resource.
URL is a string used to describe information resources on the Internet, mainly used in various WWW client programs and server programs, especially the famous Mosaic.
The URL can be used to describe various information resources in a unified format, including files, server addresses, and directories. URL generally consists of three parts:
① Protocol (or called service mode)
② Host IP address (sometimes including port number) where the
resource is stored ③ Specific address of the host resource. Such as directory and file name, etc.

URN, uniform resource name, is to identify resources by name, such as mailto:[email protected].
URI is an abstract, high-level concept that defines uniform resource identification, while URL and URN are specific resource identification methods. Both URL and URN are a kind of URI. Generally speaking, every URL is a URI, but not every URI is a URL. This is because URI also includes a sub-category, Uniform Resource Name (URN), which names resources but does not specify how to locate resources. The mailto, news, and isbn URIs above are all examples of URN.

In Java URI, a URI instance can represent absolute or relative, as long as it conforms to the URI syntax rules. The URL class not only conforms to semantics, but also contains information to locate the resource, so it cannot be relative.
In the Java class library, the URI class does not contain any methods for accessing resources, and its only role is to parse.
Conversely, the URL class can open a stream to the resource.

HTTP request message Request

The request message that the client sends an HTTP request to the server includes the following format:

It consists of four parts: request line, header, blank line and request data.

Insert picture description here

Http request message structure.png The
request line starts with a method symbol, separated by spaces, followed by the requested URI and protocol version.
Get request example, using the request captured by Charles:
GET /562f25980001b1b106000338.jpg HTTP/1.1
Host img.mukewang.com
User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0 .2704.106 Safari/537.36
Accept image/webp,image/ , /*;q=0.8
Referer http://www.imooc.com/
Accept-Encoding gzip, deflate, sdch
Accept-Language zh-CN,zh;q=0.8
The first part: the request line, used to indicate the request type, the resource to be accessed and the HTTP version used.
GET indicates that the request type is GET, [/562f25980001b1b106000338.jpg] is the resource to be accessed, and the last part of the line indicates the used It is the HTTP1.1 version.

The second part: the request header, the part immediately after the request line (that is, the first line), is used to explain the additional information to be used by the server
from the second line as the request header, and HOST will indicate the destination of the request. User -Agent, server-side and client-side scripts can access it, it is an important basis for browser type detection logic. This information is defined by your browser and is automatically sent in each request, etc.

The third part: blank line, the blank line after the request header is required
Even if the request data in the fourth part is empty, there must be a blank line.

Part 4: The requested data is also called the subject, and any other data can be added.
The request data for this example is empty.

POST请求例子,使用Charles抓取的request:
POST / HTTP1.1
Host:www.wrox.com
User-Agent:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)
Content-Type:application/x-www-form-urlencoded
Content-Length:40
Connection: Keep-Alive

name=Professional%20Ajax&publisher=Wiley The
first part: request line, the first line clearly is a post request, and http1.1 version.
The second part: request header, the second to sixth lines.
The third part: blank line, blank line on the seventh line.
The fourth part: request data, the eighth line.

HTTP response message Response

Under normal circumstances, the server will return an HTTP response message after receiving and processing the request sent by the client.

HTTP response is also composed of four parts, namely: status line, message header, blank line and response body.

Insert picture description here

http response message format.jpg
example

HTTP/1.1 200 OK
Date: Fri, 22 May 2009 06:07:21 GMT
Content-Type: text/html; charset=UTF-8

The first part: The status line is composed of three parts: HTTP protocol version number, status code, and status message. The first line is the status line, (HTTP/1.1) indicates that the HTTP version is 1.1, the status code is 200, and the status message is (ok)

The second part: the message header, used to describe some additional information to be used by the client. The
second and third lines are the message headers,
Date: the date and time when the response was generated; Content-Type: HTML (text/ html), the encoding type is UTF-8

The third part: blank line, the blank line after the message header is necessary. The
fourth part: response body, the text information that the server returns to the client.
The html part after the blank line is the response body.

HTTP status code

The status code consists of three digits. The first digit defines the response category, which is divided into five categories:

1xx: Instruction information-indicates that the request has been received, continue to process
2xx: Success-indicates that the request has been successfully received, understood, and accepted
3xx: Redirect-To complete the request must perform further operations
4xx: Client error-The request has a syntax error Or the request cannot be fulfilled
5xx: server-side error-the server failed to fulfill the legal request
Common status code:

200 OK //The client request is successful
400 Bad Request //The client request has a syntax error and cannot be understood by the server
401 Unauthorized //The request is unauthorized. This status code must be used with the WWW-Authenticate header field.
403 Forbidden // The server receives the request, but refuses to provide the service
404 Not Found //The requested resource does not exist, eg: the wrong URL is entered
500 Internal Server Error //The server has an unexpected error
503 Server Unavailable //The server cannot currently process the client's request , May return to normal after a period of time
More status codes http://www.runoob.com/http/http-status-codes.html

HTTP request methods
According to the HTTP standard, HTTP requests can use multiple request methods.
HTTP1.0 defines three request methods: GET, POST and HEAD methods.
HTTP1.1 adds five new request methods: OPTIONS, PUT, DELETE, TRACE and CONNECT methods.

GET requests the specified page information and returns the entity body.
HEAD is similar to a get request, except that there is no specific content in the returned response. It is used to get the header
POST to submit data to the specified resource for processing requests (such as submitting a form or uploading a file). The data is contained in the request body. POST requests may result in the creation of new resources and/or the modification of existing resources.
PUT The data transmitted from the client to the server replaces the content of the specified document.
DELETE requests the server to delete the specified page.
The CONNECT HTTP/1.1 protocol is reserved for proxy servers that can change the connection to a pipe mode.
OPTIONS allows the client to view the performance of the server.
TRACE echoes the request received by the server, which is mainly used for testing or diagnosis.

How HTTP works

The HTTP protocol defines how a Web client requests a Web page from a Web server, and how the server transmits the Web page to the client. The HTTP protocol uses a request/response model. The client sends a request message to the server. The request message contains the requested method, URL, protocol version, request header, and request data. The server responds with a status line. The content of the response includes the protocol version, success or error code, server information, response headers, and response data.

The following are the steps for HTTP request/response:

1. The client connects to the Web server
An HTTP client, usually a browser, establishes a TCP socket connection with the HTTP port of the Web server (80 by default). For example, http://www.oakcms.cn.

2. Sending HTTP requests
Through TCP sockets, the client sends a text request message to the Web server. A request message consists of 4 parts: request line, request header, blank line and request data.

3. The server accepts the request and returns an HTTP response to the
Web server to parse the request and locate the requested resource. The server writes a copy of the resource to the TCP socket, which is read by the client. A response consists of 4 parts: status line, response header, blank line and response data.

4. Release the connection TCP connection.
If the connection mode is close, the server actively closes the TCP connection, and the client passively closes the connection and releases the TCP connection; if the connection mode is keepalive, the connection will remain for a period of time, and it can continue to receive during this time request;

5. The client browser parses the HTML content. The
client browser first parses the status line and looks at the status code indicating whether the request was successful. Then each response header is parsed, and the response header informs that the following is a few-byte HTML document and the character set of the document. The client browser reads the response data HTML, formats it according to the HTML syntax, and displays it in the browser window.

For example: Type the URL in the address bar of the browser and press Enter, it will go through the following process:

1. The browser requests the DNS server to resolve the IP address corresponding to the domain name in the URL;

2. After the IP address is resolved, establish a TCP connection with the server based on the IP address and the default port 80;

3. The browser sends an HTTP request to read the file (the file corresponding to the domain name in the URL). The request message is sent to the server as the third message of the TCP three-way handshake;

4. The server responds to the browser request and sends the corresponding html text to the browser;

5. Release the TCP connection;

6. The browser will display the html text and display the content;

The difference between GET and POST request
GET request
GET /books/?sex=man&name=Professional HTTP/1.1
Host: www.wrox.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: 1.7.6)
Gecko/20050225 Firefox/1.0.1
Connection: Keep-Alive
Note that the last line is a blank line

POST请求
POST / HTTP/1.1
Host: www.wrox.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6)
Gecko/20050225 Firefox/1.0.1
Content-Type: application/x-www-form-urlencoded
Content-Length: 40
Connection: Keep-Alive

name=Professional%20Ajax&publisher=Wiley
1. GET submission, the requested data will be appended to the URL (that is, the data is placed in the HTTP protocol header), the URL and transmission data are divided by ?, multiple parameters are connected with &;
for example: login .action?name=hyddd&password=idontknow&verify=%E4%BD%A0 %E5%A5%BD. If the data is English letters/numbers, send it as it is, if it is a space, convert it to +, if it is Chinese/other characters, then directly encrypt the string with BASE64, for example: %E4%BD%A0%E5%A5% BD, where XX in %XX is the ASCII representation of the symbol in hexadecimal notation.

POST submission: Place the submitted data in the body of the HTTP package. The red font in the above example indicates the actual transmission data

Therefore, the data submitted by GET will be displayed in the address bar, while submitted by POST, the address bar will not change.
2. The size of the transmitted data: First of all, it is stated that the HTTP protocol does not limit the size of the transmitted data, and the HTTP protocol specification does not restrict the size of the transmitted data. URL length is restricted.

The main limitations in actual development are:

GET: Certain browsers and servers have restrictions on the length of URLs. For example, IE’s limit on URL length is 2083 bytes (2K+35). For other browsers, such as Netscape, FireFox, etc., there is no length limit in theory, and the limit depends on the support of the operating system.

Therefore, for GET submission, the transmission data will be restricted by the length of the URL.

POST: Since the value is not passed through the URL, the data is not limited in theory. But in fact, each WEB server will stipulate a limit on the size of post submission data, and Apache and IIS6 have their own configurations.

3. Security

The security of POST is higher than that of GET. For example: submit data via GET, the user name and password will appear in the URL in plain text, because (1) the login page may be cached by the browser; (2) other people view the browser history, then others can get yours Account and password, in addition, using GET to submit data may also cause a Cross-site request forgery attack

4. Http get, post, soap protocols are all running on http

(1) get: The request parameter is a sequence of key/value pairs (query string).
The length of the query string attached to the URL is limited by the web browser and web server (for example, IE supports up to 2048 characters). Not suitable for transferring large data sets at the same time, it is very insecure

(2) post: The request parameters are transmitted in a different part of the http header (named entity body). This part is used to transmit form information, so the Content-type must be set to: application/x-www-form-urlencoded . Post is designed to support user fields on web forms, and its parameters are also transmitted as key/value pairs.
But: it does not support complex data types, because post does not define the semantics and rules of the transmission data structure.

(3) soap: It is a special version of http post. It follows a special xml message format and the
Content-type is set to: text/xml. Any data can be xml.

The Http protocol defines many methods of interacting with the server, the most basic of which are GET, POST, PUT, and DELETE. A URL address is used to describe a resource on the network, and HTTP in GET, POST, PUT, DELETE corresponds to the four operations of checking, modifying, adding, and deleting this resource. Our most common ones are GET and POST. GET is generally used to obtain/query resource information, and POST is generally used to update resource information.

Let’s see the difference between GET and POST

The data submitted by GET will be placed after the URL. Separate the URL and transmit data with ?. The parameters are connected by &, such as EditPosts.aspx?name=test1&id=123456. The POST method is to put the submitted data in the body of the HTTP package. .

There is a limit on the size of data submitted by GET (because the browser has a limit on the length of the URL), while there is no limit on the data submitted by the POST method.

The GET method requires the use of Request.QueryString to obtain the value of the variable, while the POST method uses the Request.Form to obtain the value of the variable.

Submitting data in GET mode will bring security problems. For example, when submitting data via GET, the user name and password will appear on the URL. If the page can be cached or other people can access the machine, it can be viewed from the history. Record the account and password of the user.

Guess you like

Origin blog.csdn.net/langezuibang/article/details/107055344