httpd related terms

socket

When the server and the client communicate with each other, a process and port number are opened. The combination of these two is the socket address
Reprinted on Baidu Encyclopedia

Socket works at the abstraction layer between the application layer and the transport layer. Socket is actually a programming interface (API). It is equivalent to the way in which it defines the application layer and the transport layer to connect the two layers.

Socket communication principle

Insert picture description here

Take httpd as an example:
server side:

  • 1. Create a new socket (socket)
  • 2. Bind the socket to port 80. (bind)
  • 3. Allow the socket to listen (listen)
  • 4. Waiting for the connection of the server
    Client:
  • 1. Create a new socket
  • 2. Connect to the ip of the server: port

Service-Terminal:

  • 1. Listen to a client connection coming in
  • 2. Accept the client's request

Client:

  • 1. The connection is successful.
  • 2. Send request message
  • 3. Waiting for the server's response message

Service-Terminal:

  • 1. Read the client's request message
  • 2. After processing is complete, send a response message

Client:

  • 1. Receive server-side response message
  • 2. End the request and close the connection

Service-Terminal:

  • 1. Close the connection with the client

http

HyperText Transfer Protocol (HTTP, HyperText Transfer Protocol) is the most widely used network protocol on the Internet. All WWW documents must comply with this standard. The original purpose of designing HTTP is to provide a way to publish and receive HTML pages.

HTTP is a communication protocol based on TCP/IP to transfer data (HTML files, picture files, query results, etc.).

The HTTP protocol works on a client-server architecture. As an HTTP client, the browser sends all requests to the HTTP server, ie, the WEB server, through the URL.

HTTP has three characteristics:

  • HTTP is connectionless: each connection can only process one request, and the connection will be closed after the request is processed.
  • HTTP is media-independent: as long as the client and server know how to deal with the content of the transmitted data, any type of data can be transmitted through http. Both the client and the server have specified MMIE-type types
  • HTTP is stateless: stateless means that there is no memory of transaction processing. Any time a user visits, it will be like the first reception.

html

HTML is called Hypertext Markup Language, and it is an identifying language. It includes a series of tags. Through these tags, the document format on the network can be unified, and the scattered Internet resources can be connected as a logical whole. HTML text is descriptive text composed of HTML commands. HTML commands can describe text, graphics, animations, sounds, tables, links, etc.

MIME

The earliest version of the http protocol was only used to transmit text mail. Cannot send things like attachments. Until the advent of MIME (Multipurpose Internet Mail Extension Type). It gave the http protocol a new lease of life. It can support a series of things such as picture, sound and so on. Modern browsers can automatically recognize your MIME type.

HTTP protocol introduction (reproduced)

HTTP/0.9 is the first version of the HTTP protocol, which is outdated. Its composition is extremely simple, only allowing the client to send a GET request, and does not support the request header. Because there is no protocol header, the HTTP/0.9 protocol only supports one content, namely plain text. However, the web page still supports formatting in HTML language, and pictures cannot be inserted.

HTTP/0.9 is typically stateless, each transaction is processed independently, and the connection is released when the transaction ends. It can be seen that the stateless nature of the HTTP protocol has already taken shape in its first version 0.9. An HTTP/0.9 transmission must first establish a TCP connection from the client to the Web server, the client initiates a request, and then the Web server returns the page content, and then the connection is closed. If the requested page does not exist, no error code will be returned.

HTTP/1.0

The second version of the HTTP protocol, the first version of the HTTP protocol that specifies the version number in the communication, is still widely used. Compared with HTTP/0.9, the following main features have been added:

Request and response support header fields The
response object starts with a response status line. The
response object is not limited to hypertext. It
supports the client to submit data to the Web server through the POST method. It supports the GET, HEAD, and POST methods and
supports long connections (but the default is to use short connections) ), caching mechanism, and identity authentication

HTTP/1.1

The third version of the HTTP protocol is HTTP/1.1, which is currently the most widely used protocol version. HTTP/1.1 is the current mainstream HTTP protocol version. Compared with HTTP/1.0, the following content has been added:

(1) The default is a persistent connection. HTTP 1.1 supports PersistentConnection and pipelining of requests. Multiple HTTP requests and responses can be transmitted on a TCP connection, reducing the consumption and delay of establishing and closing connections. Connection: keep-alive is enabled by default in HTTP1.1, which to some extent makes up for the shortcomings of HTTP1.0 that a connection must be created for each request.

(2) Provide range request function (broadband optimization) In HTTP1.0, there are some phenomena of wasting bandwidth. For example, the client only needs a part of an object, but the server sends the entire object and does not support breakpoints. Resume function, HTTP1.1 introduces the range header field in the request header, which allows only a certain part of the resource to be requested, that is, the return code is 206 (Partial Content), which facilitates the free choice of developers to make full use of Bandwidth and connection. This is the basis for supporting file resumable transmission.

(3) Provides the function of virtual host (HOST domain). In HTTP1.0, each server is considered to be bound to a unique IP address. Therefore, the URL in the request message does not convey the hostname. But with the development of virtual host technology, there can be multiple virtual hosts (Multi-homed Web Servers) on a physical server, and they share an IP address. Both the HTTP1.1 request message and response message should support the Host header field, and if there is no Host header field in the request message, an error (400 Bad Request) will be reported.

(4) Some more cache processing fields HTTP/1.1 added some new features of cache on the basis of 1.0, introduced entity tags, generally called e-tags, and added a more powerful Cache-Control header.

(5) Error notification management added 24 error status response codes in HTTP1.1, such as 409 (Conflict) indicates that the requested resource conflicts with the current state of the resource; 410 (Gone) indicates a resource on the server It is permanently deleted.

HTTP/2.0

The fourth version of the HTTP protocol is HTTP/2.0, which adds the following content relative to HTTP/1.1:

Binary framing HTTP 2.0 uses binary encoding for all frames

Frame: The client and the server communicate by exchanging frames, and the frame is the smallest unit of communication based on this new protocol.

Message: Refers to a logical HTTP message, such as request, response, etc., composed of one or more frames.

Flow: A flow is a virtual channel in the connection, which can carry two-way messages; each flow has a unique integer identifier (1, 2… N);

Multiplexing Multiplexing allows multiple request-response messages to be initiated simultaneously through a single HTTP/2.0 connection. With the new framing mechanism, HTTP/2.0 no longer relies on multiple TCP connections to handle more concurrent requests. Each data stream is split into many independent frames, and these frames can be interleaved (sent out of order) and can be prioritized. Finally, at the other end, they are recombined according to the stream identifier in the header of each frame. HTTP 2.0 connections are persistent, and only one connection (one connection per domain name) is required between the client and the server.

Header compression HTTP/1.1 has a large amount of information in the header, and it must be sent repeatedly every time. HTTP/2.0 requires both parties in the communication to cache a table of header fields, thereby avoiding repeated transmissions.

Request priority The browser can immediately dispatch requests when resources are found, specify the priority of each stream, and let the server determine the optimal response order. In this way, requests do not have to be queued, which saves time and maximizes the use of each connection.

Server-side push Server-side push can send the resources required by the client to the client along with index.html, eliminating the need for the client to repeat the request. Because there are no operations such as initiating requests, establishing connections, etc., static resources can be pushed through the server to greatly increase the speed.

web resources

A web page is composed of multiple resources, and many resources will come out when the web page is opened. Each resource must be requested separately. So the web pages we often see are not a single resource, but a collection of multiple resources

  • Static resources: The server does not need to do any processing on the files. Present the original file directly to the user.
  • Dynamic resources: The server needs to process the file first and then present it to the user.

HATE

Uniform Resource Identifier (URI) is a string used to identify the name of a certain Internet resource. Divided into URL and URN.

Uniform Resource Name (URN, Uniform Resource Name) is an Internet resource with a name. Like a person’s name, only your identity is provided. Do not give the address to find it

Uniform resource locator (URL) is a method of specifying the location of information on the Internet's World Wide Web service program. It defines where you go to find a resource and method

URL composition

<scheme>://<user>:<password>@<host>:<port>/<path>;<params>?<query>#<frag>

  • scheme: which protocol to use when accessing the server
  • user: The username required to access certain resources
  • password: the password required to access certain resources
  • Host: host, the host or ip address where the resource is located
  • port: port, the port number that is listening on the server where the resource is located
  • path: path, the local name of the server, for example: www.a.com/index.html
  • params: parameters, specify the input parameters, the parameters are name/value pairs, multiple parameters, separated by;
  • query: query, pass parameters to the program, use? Separate, multiple queries are separated by &. For example?ie=utf-8&f=8&rsv_bp=1
  • frag: Fragment, the name of a small piece or part of the resource, this component is used on the client side, separated by #

Website visits

IP: Only one visit from the same client ip address in a day is counted. Recording the number of times the computer with the remote client ip address visits the website is an important indicator for measuring traffic

PV: The number of page views or clicks. It will be calculated every time the user refreshes. A high PV does not necessarily mean more visitors; PV is directly proportional to the number of visitors, but PV does not directly determine the number of real visitors to the page. For example, if you enter a website alone, you can also create a very high PV by constantly refreshing the page.

UV: Unique Visitor. Independent visitors refer to how many computers a certain site has been visited, based on the cookie of the user's computer.
Independent visitors are close but not exactly true independent people. Secondly, the indicator of unique visitors will be affected by browser settings, such as those that set the browser to disable cookies or disable third-party cookies. Most web analytics tools use first-party cookies to minimize the situation where cookies are disabled (the disabled percentage is probably between 2% and 5%). The rate of third-party cookies being disabled is relatively higher (probably between 10% and 30%).

The time standard for recording the number of unique visitors can generally be one day or one month. According to international practice, the standard for recording the number of unique visitors is generally "one day", that is, if a visitor visits a website n times from the same IP address in a day, the number of visits is counted as n, and the number of unique visitors is counted as 1. Generally, the annual UV number is not calculated.

cookie

http is a stateless protocol and cannot track the status of users. For example, put something in the shopping cart, unless you delete or check out, otherwise it will always exist in your shopping cart. And this is the credit of the cookie. When the user visits the website, the cookie information that the user sends to it is sent to the server at the same time. The server knows the user's information status based on the cookie information. To get the previous information.

[root@localhost httpd]# cat /var/www/html/test.php
<?php
   setcookie("test", "Hello", time()+3600);
?>

Insert picture description here

sendfile mechanism

Traditional network transmission process:

  • 1. When the read() function is called, a context switch occurs, switching from user space to kernel space. Then copy the file data from the hard disk to the kernel buffer.
  • 2. The data is copied from the kernel to the user buffer, and then the system calls read() to return. This is another context switch, switching from the kernel space to the user space
  • 3. The system calls write() to generate a context switch: switch from user mode to kernel mode, and then copy the data read in user buffer in step 2 to the kernel buffer (the data is copied to the kernel buffer for the second time), but this time is different The kernel buffer, this buffer is associated with the socket
  • 4. The system calls write() to return, resulting in a context switch, switching from the kernel space to the user space. Then copy the data from the kernelbuffer to the protocol stack

With the sendfile mechanism:

  • 1. The system calls sendfile() to copy the hard disk data to the kernel buffer through DMA, and then the data is directly copied by the kernel to another socket-related kernel buffer. There is no switching between user mode and kernel mode, and the copy from one buffer to another is directly completed in the kernel.
  • 2. DMA directly copies the data from the kernel buffer to the protocol stack without switching, and there is no need to copy the data from the user mode to the kernel mode, because the data is in the kernel

Guess you like

Origin blog.csdn.net/qq_44564366/article/details/104894833