Introduction to HTTP Protocol

1. Introduction to HTTP

        The HTTP protocol is the abbreviation of Hyper Text Transfer Protocol (Hyper Text Transfer Protocol), which is a transfer protocol for transferring hypertext from a World Wide Web (WWW: World Wide Web) server to a local browser.
        HTTP is a communication protocol based on TCP/IP to transfer data (HTML files, image files, query results, etc.).
HTTP is an object-oriented protocol belonging to the application layer. Due to its simplicity and speed, it is suitable for distributed hypermedia information systems. It was proposed in 1990, and after several years of use and development, it has been continuously improved and expanded. The sixth edition of HTTP/1.0 is currently used in the WWW, the standardization work of HTTP/1.1 is in progress, and the proposal of HTTP-NG (Next Generation of HTTP) has been proposed.
        The HTTP protocol works on a client-server architecture. As an HTTP client, the browser sends all requests to the HTTP server, that is, the WEB server, through the URL. The web server sends response information to the client according to the received request. 


2. Main Features

        1. Simple and fast: When a client requests a service from the server, it only needs to transmit the request method and path. Commonly used request methods are GET, HEAD, and POST. Each method specifies a different type of contact between the client and the server. Because the HTTP protocol is simple, the program scale of the HTTP server is small, so the communication speed is fast.
        2. Flexible: HTTP allows the transmission of data objects of any type. The type being transferred is marked by Content-Type.
        3. Connectionless: The meaning of connectionless is to limit processing to only one request per connection. After the server processes the client's request and receives the client's response, it disconnects. In this way, transmission time can be saved.
        4. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory capability for transaction processing. The lack of state means that if previous information is required for subsequent processing, it must be retransmitted, potentially resulting in an increased amount of data transferred per connection. On the other hand, the server responds faster when it does not need the previous information. 
        5. Support B/S and C/S mode.

3. HTTP URL

        HTTP uses Uniform Resource Identifiers (URIs) to transfer data and establish connections. URL is a special type of URI that contains enough information to find a resource.
        URL, the full name is UniformResourceLocator, is called Uniform Resource Locator in Chinese, and is an address used to identify a resource on the Internet. Take the following URL as an example to introduce the components of a common URL:
http://www.aspxfans.com:8080/news/index.asp?boardID=5&ID=24618&page=1#name
        It can be seen from the above URL , a complete URL includes the following parts: 
        1. Protocol part: The protocol part of the URL is "http:", which means that the web page uses the HTTP protocol. Various protocols can be used in the Internet, such as HTTP, FTP, etc. In this example, the HTTP protocol is used. The "//" after "HTTP" is the separator
        2. Domain name part: The domain name part of the URL is "www.aspxfans.com". In a URL, you can also use the IP address as the domain name
        . 3. Port part: The port following the domain name is the port, and ":" is used as the separator between the domain name and the port. The port is not a necessary part of a URL. If the port part is omitted, the default port will be used
        . 4. Virtual directory part: starting from the first "/" after the domain name to the last "/", it is the virtual directory part. A virtual directory is also not a necessary part of a URL. The virtual directory in this example is "/news/"
        5. File name part: It starts from the last "/" after the domain name to "?", which is the file name part. If there is no "?", it starts from the last "/" after the domain name to "#". , is the file part, if there is no "?" and "#", then from the last "/" after the domain name to the end, it is the file name part. The file name in this example is "index.asp". The file name part is also not a necessary part of a URL. If this part is omitted, the default file name is used.
        6. Anchor part: Starting from "#" to the end, it is the anchor part. The anchor part in this example is "name". The anchor part is not a necessary part of a URL.
      7. Parameter part: The part from "?" to "#" is the parameter part, also known as the search part and the query part. The parameter part in this example is "boardID=5&ID=24618&page=1". A parameter can have multiple parameters, and "&" is used as a separator between parameters.

Fourth, the difference between URI and URL

        URI, a uniform resource identifier, is used to uniquely identify a resource.
        Every resource available on the Web, such as HTML documents, images, video clips, programs, etc., is a URI to locate. The 
        URI generally consists of three parts: 
        ① The naming mechanism for accessing the resource 
        ② The host name for storing the resource 
        ③ The name of the resource itself , represented by paths, with emphasis on resources.
        URL is uniform resource locator, uniform resource locator, it is a specific URI, that is, URL can be used to identify a resource, and also indicates how to locate this resource.
        URL is a string used to describe information resources on the Internet, mainly used in various WWW client programs and server programs, especially the famous Mosaic. 
        Using URL can use a unified format to describe various information resources, including files, server addresses and directories. A URL generally consists of three parts: 
        ① Protocol (or service mode) 
        ② The IP address of the host where the resource is stored (sometimes also includes the port number) 
        ③ The specific address of the host resource. such as directory and file names

Five, HTTP request message Request

        The request message that the client sends an HTTP request to the server includes the following format: 

        It consists of four parts: request line, request header, blank line and request data. 

(1)GET

        An example of Get request, using the request captured by Charles:
        GET /562f25980001b1b106000338.jpg HTTP/1.1 
        Host img.mukewang.com 
  User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0 .2704.106 Safari/537.36 
        Accept image/webp,image/,/*;q=0.8 
        Referer http://www.imooc.com/ 
        Accept-Encoding gzip, deflate, sdch 
        Accept-Language zh-CN,zh;q=0.8 
        The first part: the request line, which is used to describe the request type, the resource to be accessed, and the HTTP version used. The 
request line begins with a method symbol, separated by spaces, followed by the requested URI and the version of the protocol. 
GET indicates that the request type is GET, and [/562f25980001b1b106000338.jpg] is the resource to be accessed. The last part of the line indicates that the HTTP 1.1 version is used.
        The second part: the request header, the part immediately after the request line (ie the first line), which is used to describe the additional information to be used by the server
From the second line onwards is the request header, HOST will indicate the destination of the request. User-Agent, which can be accessed by server-side and client-side scripts, is an important basis for browser type detection logic. This information is used by your browser The
        third part: blank line, the blank line after the request header is required
Even if the request data of the fourth part is blank, there must be blank lines.
       The fourth part: The request data is also called the subject, and any other data can be added.
The request data for this example is empty.

(2)POST

        An example of a POST request, using a request captured by Charles:
        POST / HTTP1.1 
        Host:www.wrox.com 
        User-Agent:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; . NET CLR 3.0.04506.648; .NET CLR 3.5.21022) 
        Content-Type: application/x-www-form-urlencoded 
        Content-Length: 40 
        Connection: Keep-Alive 
        name=Professional%20Ajax&publisher=Wiley 
        Part 1: Request Line, p. One line is clearly a post request, and the http1.1 version. 
        The second part: the request header, the second line to the sixth line. 
        The third part: blank line, blank line for the seventh line. 
        Part 4: Request data, line 8.

Six, HTTP response message Response

        Under normal circumstances, the server will return an HTTP response message after receiving and processing the request sent by the client. 
The HTTP response also consists of four parts, namely: the status line, the message header, the blank line and the response body. 
        Example: 
        HTTP/1.1 200 OK 
        Date: Fri, 22 May 2009 06:07:21 GMT 
        Content-Type: text/html; charset=UTF-8 
        < html> 
        < head> < /head> 
        < body> 
        </body> 
        </html> 
        The first part: the status line, which consists of three parts: HTTP protocol version number, status code, and status message. 
 The first line is the status line, (HTTP/1.1) indicates that the HTTP version is version 1.1, the status code is 200, and the status message is (ok) 
        The second part: the message header, which is used to describe some additional information to be used by the client The 
 second line And the third line of the message header, 
 Date: the date and time when the response was generated; Content-Type: HTML (text/html) with the MIME type specified, the encoding type is UTF-8 
        The third part: empty line, the empty space after the message header The line is the required     
        fourth part: the response body, the textual information that the server returns to the client. 
The html part after the blank line is the response body.

Seven, HTTP status code

        The status code consists of three digits. The first digit defines the category of the response, which is divided into five categories:
        1xx: Indication Information – Indicates that the request has been received, continue processing
        2xx: Success – Indicates that the request has been successfully received, understood, accepted
        3xx : redirect – further action must be taken to complete the request
        4xx: client error – the request has a syntax error or the request cannot be fulfilled
        5xx: server side error – the server failed to fulfill a legitimate request 
        Common status codes:
        200 OK // client The request is successful 
        400 Bad Request //The client request has a syntax error and cannot be understood by the server 
        401 Unauthorized //The request is not authorized, this status code must be used with the WWW-Authenticate header field 
        403 Forbidden //The server received the request, but Refused to provide service 
        404 Not Found //The requested resource does not exist, eg: a wrong URL was entered 
        500 Internal Server Error //An unexpected error occurred on the server 
        503 Server Unavailable //The server cannot currently process the client's request, and may recover after a period of time normal

Eight, HTTP request method

        According to the HTTP standard, HTTP requests can use various request methods. 
        HTTP 1.0 defines three request methods: GET, POST and HEAD methods. 
        HTTP1.1 added five new request methods: OPTIONS, PUT, DELETE, TRACE and CONNECT methods.
        GET requests the specified page information and returns the entity body. 
HEAD is similar to a get request, except that there is no specific content in the returned response. It is used to obtain the header 
        POST and submit data to the specified resource for processing requests (such as submitting a form or uploading a file). Data is included in the request body. POST requests may result in the creation of new resources and/or the modification of existing resources. 
        PUT replaces the contents of the specified document with data sent from the client to the server. 
        DELETE requests the server to delete the specified page. 
        CONNECT HTTP/1.1 protocol is reserved for proxy servers that can change the connection to pipe mode. 
        OPTIONS allows the client to view the performance of the server. 
        TRACE echoes requests received by the server, mainly for testing or diagnostics.

9. How HTTP works

        The HTTP protocol defines how web clients request web pages from web servers, and how servers deliver web pages to clients. The HTTP protocol adopts a request/response model. The client sends a request message to the server, and the request message contains the requested method, URL, protocol version, request header and request data. The server responds with a status line containing the protocol version, success or error code, server information, response headers, and response data.
The following are the steps of HTTP request/response:
        1. Client connects to web server
An HTTP client, usually a browser, establishes a TCP socket connection with the HTTP port of the web server (80 by default). For example, http://www.oakcms.cn.
        2. Send HTTP request
Through TCP socket, the client sends a text request message to the Web server. A request message consists of four parts: request line, request header, blank line and request data.
        3. The server accepts the request and returns an HTTP response to
the Web server to parse the request and locate the requested resource. The server writes a copy of the resource to the TCP socket, which is read by the client. A response consists of four parts: status line, response header, blank line and response data.
        4. Release the connection TCP connection
If the connection mode is close, the server actively closes the TCP connection, and the client passively closes the connection and releases the TCP connection; if the connection mode is keepalive, the connection will remain for a period of time, and you can continue to receive within this time. Request;
        5. The client browser parses the HTML content
The client browser first parses the status line for a status code that indicates whether the request was successful. Then each response header is parsed, and the response header tells the following to be a number of bytes of the HTML document and the character set of the document. The client browser reads the response data HTML, formats it according to the HTML syntax, and displays it in the browser window.
        For example: type the URL in the address bar of the browser and press Enter, the following process will be experienced:
        1. The browser requests the DNS server to resolve the IP address corresponding to the domain name in the URL;
        2. After the IP address is resolved, according to the IP address and default port 80, establish a TCP connection with the server;
        3. The browser sends an HTTP request to read the file (the file corresponding to the part behind the domain name in the URL), and the request message is used as the third message of the TCP three-way handshake. The data is sent to the server;
        4. The server responds to the browser request and sends the corresponding html text to the browser;
        5. Releases the TCP connection;
        6. The browser displays the html text and content;  

Ten, the difference between GET and POST requests

        GET请求
         GET /books/?sex=man&name=Professional HTTP/1.1 
        Host: www.wrox.com 
        User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) 
        Gecko/20050225 Firefox/1.0.1 
        Connection: Keep-Alive 
        注意最后一行是空行
        POST请求
        POST / HTTP/1.1 
        Host: www.wrox.com 
        User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) 
        Gecko/20050225 Firefox/1.0.1 
        Content-Type: application/x-www-form-urlencoded 
        Content-Length: 40 
        Connection: Keep-Alive
        name=Professional%20Ajax&publisher=Wiley 
        1. GET submission, the requested data will be attached to the URL (that is, the data will be placed in the HTTP protocol header), the URL and transmission data are separated by ?, and multiple parameters are connected with &; for example: login.action?name=hyddd&password= idontknow&verify=%E4%BD%A0 %E5%A5%BD. If the data is English letters/numbers, send it as it is, if it is a space, convert it to +, if it is Chinese/other characters, directly encrypt the string with BASE64, and get such as: %E4%BD%A0%E5%A5% BD, where XX in %XX is the ASCII hex representation of the symbol.
POST submission: The submitted data is placed in the body of the HTTP packet. The red font in the above example indicates the actual transmission data
. Therefore, the data submitted by GET will be displayed in the address bar, while the data submitted by POST will not change in the address bar
        . The size of the transmitted data is limited, and the HTTP protocol specification does not limit the length of the URL.
The main limitations in actual development are:
GET: Certain browsers and servers have limitations on the length of URLs. For example, IE has a limitation on the length of URLs of 2083 bytes (2K+35). For other browsers, such as Netscape, FireFox, etc., there is no length limit in theory, and the limit depends on the support of the operating system.
Therefore, for GET submission, the transmitted data is limited by the length of the URL.
POST: Since the value is not passed through the URL, theoretically the data is not limited. However, each WEB server will limit the size of the data submitted by the post, and Apache and IIS6 have their own configurations.
        3. Security
POST is more secure than GET. For example, if you submit data via GET, the username and password will appear on the URL in plain text, because (1) the login page may be cached by the browser; (2) if others view the browser's history, then others can get yours. The account number and password are gone. In addition, using GET to submit data may also cause a Cross-site request forgery attack
        . 4. Http get, post, and soap protocols all run on http
        (1) get: The request parameter is used as a The sequence of key/value pairs (query string) The length of the query string appended to the URL is 
limited by web browsers and web servers (such as IE supports up to 2048 characters) and is not suitable for transferring large data sets. At the same time, it is very Unsafe
        (2) post: The request parameters are transmitted in a different part of the http header (named entity body), which is used to transmit form information, so the Content-type must be set to: application/x-www-form - urlencoded. post is designed to support user fields on web forms whose parameters are also transmitted as key/value pairs. 
However: it does not support complex data types, because post does not define the semantics and rules for transferring data structures.
        (3) soap: It is a special version of http post, which follows a special xml message format 
Content-type is set to: text/xml Any data can be xml.
The Http protocol defines many methods for interacting with the server. There are four basic ones, namely GET, POST, PUT, and DELETE. A URL address is used to describe a resource on the network, while GET, POST, PUT, DELETE corresponds to the four operations of checking, modifying, adding, and deleting this resource. Our most common are GET and POST. GET is generally used to obtain/query resource information, while POST is generally used to update resource information.
Let's see the difference between
GET and POST. The data submitted by GET will be placed after the URL, and the URL and transmission data will be separated by ?, and the parameters are separated by & Connected, such as EditPosts.aspx?name=test1&id=123456. The POST method is to put the submitted data in the Body of the HTTP package.
The size of the data submitted by GET is limited (because the browser has a limit on the length of the URL), and the POST method There is no limit to the data submitted. The
GET method needs to use Request.QueryString to get the value of the variable, and the POST method uses Request.Form to get the value of the variable.
Submitting data by GET method will bring security problems, such as a login page. When submitting data by GET method, the username and password will appear on the URL. If the page can be cached or other people can access the machine, it can be accessed from the history. Record the account and password of the user obtained.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324769824&siteId=291194637