Introduction to Web Server and HTTP Protocol

1. Web Server (web server)

A Web Server is a server software (program), or the hardware (computer) that runs the server software. Its main function is to communicate with the client (usually a browser ( Browser )) through the HTTP protocol to receive, store, process HTTP requests from the client , make HTTP responses to its requests, and return to the client the requested information. content (files, web pages, etc.) or return an Error message.
Typically users use a web browser to communicate with the corresponding server. Type " domain name " or "IP address : port number " into the browser , and the browser will first resolve your domain name into the corresponding IP address or directly send an HTTP request to the corresponding Web server based on your IP address. This process first establishes a connection with the target Web server through the three-way handshake of the TCP protocol, and then the HTTP protocol generates an HTTP request message for the target Web server , which is sent to the target Web server through TCP , IP and other protocols.

2. HTTP protocol ( application layer protocol )

2.1 Introduction

Hypertext Transfer Protocol ( HTTP ) is a simple request - response protocol that usually runs on top of TCP. It specifies what kind of messages the client may send to the server and what kind of response it gets. The headers of request and response messages are given in ASCII form; the message contents have a MIME -like format. HTTP is the basis for data communication on the World Wide Web. The development of HTTP was initiated by Tim Berners - Lee at the European Organization for Nuclear Research ( CERN ) in 1989 . The standard development of HTTP is coordinated by the World Wide Web Consortium ( W3C ) and the Internet Engineering Task Force ( IETF ), and eventually a series of RFCs were released , the most famous of which was announced in June 1999 RFC 2616 , which defines a version of the HTTP protocol that is widely used today ——HTTP 1.1

2.2 Overview

HTTP is a standard for client-side (user) and server-side (website) requests and responses ( TCP ). By using a web browser, web crawler, or other tool, the client initiates an HTTP request to the specified port on the server (the default port is 80 ). We call this client the user agent . The responding server stores resources such as HTML files and images. We call this response server the origin server . There may be multiple " intermediate layers" between the user agent and the origin server , such as proxy servers, gateways, or tunnels .
Although the TCP/IP protocol is the most popular application on the Internet, the HTTP protocol does not mandate its use or the layers it supports. In fact, HTTP can be implemented over any Internet protocol, or other network. HTTP assumes that its underlying protocols provide reliable transport. Therefore, any protocol that can provide such guarantees can be used. Therefore, it uses TCP as its transport layer in the TCP/IP protocol suite .
Usually, the HTTP client initiates a request to create a TCP connection to the specified port of the server (the default is port 80 ) . The HTTP server listens for client requests on that port. Once the request is received, the server will return a status to the client, such as "HTTP/1.1 200 OK" , and the returned content, such as the requested file, error message, or other information.

2.3 Working principle

The HTTP protocol defines how a Web client requests a Web page from a Web server , and how the server delivers the Web page to the client. The HTTP protocol uses a request/response model . The client sends a request message to the server . The request message contains the request method, URL, protocol version, request header and request data . The server responds with a status line that includes the protocol version, success or error code, server information, response headers, and response data .
Following are the steps for HTTP request / response:

1. Client connects to web server

    An HTTP client, usually a browser, establishes a TCP socket connection with the Web server's HTTP port (default is 80) . For example, http://www.baidu.com(URL)

2.Send HTTP request

    Through the TCP socket, the client sends a text request message to the Web server. A request message consists of

    It consists of four parts: ①  request line, ② request header, ③ blank line and ④ request data .

3. The server accepts the request and returns an HTTP response

    The web server parses the request and locates the requested resource. The server writes a copy of the resource to the TCP socket, which is read by the client. A response consists of ①  status line, response header, blank line and response data .

4. Release the TCP connection

  • If the connection mode is close , the server actively closes the TCP connection, and the client passively closes the connection and releases the TCP connection; 
  • If the connection mode is keep-alive , the connection will be maintained for a period of time, during which time requests can continue to be received .

5. Client browser parses HTML content

  • The client browser first parses the status line for a status code indicating whether the request was successful .
  • Then each response header is parsed, and the response header informs the following number of bytes of the HTML document and the character set of the document .
  • The client browser reads the response data HTML, formats it according to the syntax of HTML, and displays it in the browser window .

    For example: type the URL in the browser address bar and press Enter, you will go through the following process:

  •     1. The browser requests the DNS server to resolve the IP address corresponding to the domain name in the URL ;
  •     2. After parsing the IP address, establish a TCP connection with the server based on the IP address and the default port 80 ;
  •     3. The browser issues an HTTP request to read a file (the file corresponding to the part after the domain name in the URL), and the request message is    
    • The data of the third message of the TCP three-way handshake is sent to the server ;
  •     4. The server responds to the browser request and sends the corresponding HTML text to the browser;
  •     5. Release the TCP connection;
  •     6. The browser displays the HTML document in the browser window.
The HTTP protocol is an application layer protocol based on the TCP/IP protocol , based on the request - response model. The HTTP protocol stipulates that a request is issued from the client, and finally the server responds to the request and returns. In other words, communication must be established first from the client, and the server will not send a response until it receives the request.

3. HTPP request/response message format

3.1 HTTP request format 

  • GET 
GET / HTTP/1.1
Host: www.baidu.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8
Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Cookie: BAIDUID=6729CB682DADC2CF738F533E35162D98:FG=1;
BIDUPSID=6729CB682DADC2CFE015A8099199557E; PSTM=1614320692; BD_UPN=13314752;
BDORZ=FFFB88E999055A3F8A630C64834BD6D0;
__yjs_duid=1_d05d52b14af4a339210722080a668ec21614320694782; BD_HOME=1;
H_PS_PSSID=33514_33257_33273_31660_33570_26350;
BA_HECTOR=8h2001alag0lag85nk1g3hcm60q
Upgrade-Insecure-Requests: 1
Cache-Control: max-age=0
空行
请求数据为空
POST / HTTP1.1
Host:www.wrox.com
User-Agent:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)
Content-Type:application/x-www-form-urlencoded
Content-Length:40
Connection: Keep-Alive
空行
name=Professional%20Ajax&publisher=Wiley

 

3.2 HTTP response message format

HTTP/1.1 200 OK
Bdpagetype: 1
Bdqid: 0xf3c9743300024ee4
Cache-Control: private
Connection: keep-alive
Content-Encoding: gzip
Content-Type: text/html;charset=utf-8
Date: Fri, 26 Feb 2021 08:44:35 GMT
Expires: Fri, 26 Feb 2021 08:44:35 GMT
Server: BWS/1.1
Set-Cookie: BDSVRTM=13; path=/
Set-Cookie: BD_HOME=1; path=/
Set-Cookie: H_PS_PSSID=33514_33257_33273_31660_33570_26350; path=/; domain=.baidu.com
Strict-Transport-Security: max-age=172800
Traceid: 1614329075128412289017566699583927635684
X-Ua-Compatible: IE=Edge,chrome=1
Transfer-Encoding: chunked

4. HTTP request method:

The HTTP/1.1 protocol defines a total of eight methods (also called "actions") to operate specified resources in different ways:

  • 1.GET : Make a "display" request to the specified resource. The GET method should only be used to read data and should not be used for operations that produce "side effects", such as in Web Applications. One reason is that GET may be accessed randomly by web spiders and the like.
  • 2.HEAD : Like the GET method, it makes a request for the specified resource to the server. It's just that the server will not return the text part of the resource. The advantage is that using this method can obtain the "information about the resource" (metainformation or metadata) without having to transmit the entire content.
  • 3.POST : Submit data to the specified resource and request the server for processing (such as submitting a form or uploading a file). The data is included in the request article. This request may create a new resource or modify an existing resource, or both.
  • 4.PUT : Upload the latest content to the specified resource location.
  • 5.DELETE : Request the server to delete the resource identified by Request-URI.
  • 6.TRACE : Echo the request received by the server, mainly used for testing or diagnosis.
  • 7.OPTIONS : This method allows the server to return all HTTP request methods supported by the resource. Use '*' to replace the resource name and send an OPTIONS request to the web server to test whether the server function is functioning properly.
  • 8.CONNECT : The HTTP/1.1 protocol is reserved for proxy servers that can change the connection to a pipeline. Typically used for links to SSL encrypted servers (via non-encrypted HTTP proxy servers)

5. HTTP status code

The first line of all HTTP responses is the status line , which consists of the current HTTP version number , a 3-digit status code , and a phrase describing the status separated by spaces . The first digit of the status code represents the type of current response:

  • 1xx message————The request has been received by the server and continues to be processed.
  • 2xx Success——The request has been successfully received, understood, and accepted by the server
  • 3xx redirect--subsequent operations are required to complete this request
  • 4xx request error - the request contains a lexical error or cannot be executed
  • 5xx Server Error——The server encountered an error while processing a correct request.

Although RFC 2616 has recommended phrases to describe status, such as "200 0K", "404 Not Found", WEB developers can still decide which phrases to use to display localized status descriptions or custom information.

recommended article:

Detailed explanation of HTTP messages_http messages_Hardworking666's blog-CSDN blog icon-default.png?t=N7T8https://blog.csdn.net/Hardworking666/article/details/123833192  [Writing webserver from scratch·Basics #02] The core of the server --- I/O processing units and task classes - dayceng - Blog Park (cnblogs.com) icon-default.png?t=N7T8https://www.cnblogs.com/DAYceng/p/17418584.html#%E5%A4%84%E7%90%86%E6 %96%B0%E5%AE%A2%E6%88%B7%E7%AB%AF%E7%9A%84%E8%BF%9E%E6%8E%A5%E8%AF%B7%E6%B1 %82

C++ LinuxWebServer 27,000-word long article on face-to-face experience (Part 1) (xjx100.cn) icon-default.png?t=N7T8http://wed.xjx100.cn/news/50942.html?action=onClick

Guess you like

Origin blog.csdn.net/weixin_41987016/article/details/132610837