Web front-end browser chapter - HTTP protocol knowledge summary

An overview of the http protocol and knowledge about it

One appearance_Follow

0.0742017.08.29 00:22:51 Word count 11,800 Reads 913

    The http protocol has three versions: http0.9, http1.0, http1.1 and http2. However, browsers now use the http1.1 standard. This article focuses on the http1.1 version and also intersperses it with http2. some new features.

An introduction

    Without further ado, HTTP is Hyper Text Transfer Protocol. It is an application layer protocol based on TCP/IP. It is mainly used to transmit hypertext from the web server to the local browser. It consists of request and The response composition is a standard client-server model.

    Here is a brief introduction to the difference between http and https: https protocol is carried on SSL+TLS and built based on http and ssl. Its biggest difference from http is its security, and it uses different connections. The methods and ports used are also different (443, http is 80). In addition, because it ensures security, https needs to return the key and confirm the encryption algorithm with the server. In this case, the number of handshakes with the server is increase, affecting performance and being cumbersome.

2. The main features of http

1 Support client/server mode

2 Simple and fast: The client only needs the transmission method and path to request services from the server. Commonly used methods of requests are GET, HEAD, and POST. Since each http protocol is simple, the program size of the http server is small and the communication speed is very fast.

3 Flexible: http allows the transmission of any type of data object. The type being transmitted is marked by the Content-Type in the request header.

4 HTTP 0.9 and 1.0 use non-persistent connections: each connection is limited to processing only one request. After the server processes the client's request and receives the client's response, the connection is disconnected. This method saves transmission time. HTTP 1.1 uses persistent connections: instead of creating a new connection for each web object, one connection can transfer multiple objects (this is controlled by the Connection associated with the header information)

5 http is a stateless protocol. Stateless refers to the lack of memory for transaction processing. The lack of state means that if subsequent processing requires previous information, it must be retransmitted, which may result in an increase in the amount of data transmitted per connection.

(Stateless protocol: The state of the protocol refers to the ability of the next transmission to "remember" the information transmitted this time. http will not maintain the information transmitted by this connection for the next connection, in order to ensure the memory of the server.

For example: For example, after a customer obtains a web page, he closes the browser, then starts the browser again, and then logs in to the website, but the server does not know that the customer closed the browser once. Since the Web server has to face concurrent access from many browsers, in order to improve the Web server's ability to handle concurrent access, when designing the HTTP protocol, it is stipulated that when the Web server sends HTTP response messages and documents, the Web browser process that makes the request is not saved. any status information. It is possible that when a browser accesses the same object twice within just a few seconds, the server process will not accept the second service request because it has already sent a response message to it. Since the web server does not save any information about the web browser process that sent the request, the HTTP protocol is a stateless protocol.

The difference between the HTTP protocol being stateless and Connection: keep-alive:

HTTP is a stateless connection-oriented protocol. Statelessness does not mean that HTTP cannot maintain a TCP connection, nor does it mean that HTTP uses the UDP protocol (no connection).

Starting from HTTP/1.1, Keep-Alive is enabled by default to keep the connection feature. Simply put, when a web page is opened, the TCP connection used to transmit HTTP data between the client and the server will not be closed. If the client Accessing the web page on this server again will continue to use this established connection.

Keep-Alive does not maintain the connection permanently. It has a retention time that can be set in different server software (such as Apache).

Version 1.1 also introduced the pipelining mechanism, that is, in the same TCP connection, the client can send multiple requests at the same time. This further improves the efficiency of the HTTP protocol.

For example, the client needs to request two resources. The previous approach was to send request A in the same TCP connection first, then wait for the server to respond, and then send request B after receiving it. The pipeline mechanism allows the browser to issue A request and B request at the same time, but the server still responds to A request in order, and then responds to B request after completion.

Three workflows

An http operation is called a transaction, and the working process can be divided into four steps:

1) First, the client and server need to establish a connection. As soon as a hyperlink is clicked, the HTTP work begins (the connection is established)

2) After that, the client sends a request to the server. The format of the request is: Uniform Resource Identifier (URL), protocol version number, followed by MIME information including request modifiers, client information and possible content. (send request)

3) After the server receives the request, it gives corresponding response information. The format is a status line, including the protocol version number of the information, followed by MIME information including request modifiers, client information and possible content. (response to request)

4) The client receives the information returned by the server and displays it on the user's display screen through the browser, and then the client disconnects from the server (disconnect)

 

The above process may be that the client reaches the web server after passing through the proxy server.

Since HTTP is based on the TCP/IP protocol at the transport layer, TCP is an end-to-end connection-oriented protocol. The so-called end-to-end can be understood as communication between processes, so HTTP needs to establish a TCP connection before starting transmission. The process of TCP connection requires the so-called "three-way handshake", as shown in the figure. Once connected, the transfer can begin. HTTP will not disconnect the TCP connection between completions of the transfer. This is the default behavior in HTTP 1.1 (set through the Connection header).

 

Detailed explanation of four  URLs

URL: Uniform Resource Locator, a type of URI (Uniform Resource Identifier), used to describe a resource on the network. The basic format is as follows: schema://host[:port#]/path/.../[ ;url-params][?query-string][#anchor]

scheme specifies the protocol used by the lower layer (for example: http, https, ftp)

The IP address or domain name of the host HTTP server

What follows ":" is the port, the default is 80

path: is the path to access resources

What follows ";" is url-params: URL parameters, which can be used as a cache identifier (session id)

query-string: The data sent to the http server, which can also be said to be query parameters, separated by & symbols.

What follows "#" is the anchor

Five request messages

5.1 The requested message format is as follows:

1) Request line, such as GET /images/logo.gif HTTP/1.1, which means requesting the logo.gif file from the /images directory, using the get method, and the protocol version is http1.1

2) Request header, such as Accept-Language: en

3) Blank line

4) Optional message body

The request line and title must end with a carriage return and line feed, and the empty line must only contain a carriage return and line feed.

5.2 Request method

The first three are already in the http0.9 and http1.0 protocols, and the last five are added after http1.1.

GET: Send a request to a specific resource

POST: Submit data to the specified resource to process the request (such as submitting a form or uploading a file). Data is included in the request. POST requests may result in the creation of new resources and/or modification of existing resources.

HEAD: Asks the server for a response consistent with the GET request, but the response body will not be returned. This method allows you to obtain the meta-information contained in the response headers without having to transmit the entire response content. This method is often used to test the validity of a hyperlink, whether it is accessible, and whether it has been updated recently (then it is mainly used to obtain response header information)

PUT: Upload the latest content to the specified resource location

OPTIONS: Returns the HTTP request methods supported by the server for specific resources.

DELETE: Request the server to delete the resource identified by Request-URI

TRACE: echo requests received by the server, for testing or diagnostics

CONNECT: The http1.1 protocol is reserved for proxy servers that can change the connection to a pipeline.

PATCH: used to apply partial modifications to a resource, added to specification RFC5789.

The summary is: GET method is used to obtain data in the server, POST method is used to modify resource data in the server, PUT is used to upload data, DELETE is used to delete resources in the server, and HEAD is used to obtain response header information.

The difference between GET and POST:

1) The location of the submitted data is different. GET is after the URL, while POST is in the body of the HTTP package.

2) The size of the data submitted by GET is limited, up to 1024 bytes, mainly because the browser has restrictions on the length of the URL, but there is no limit on the data submitted by POST.

3) POST is safer than GET, because GET will expose some information on the URL, and the submitted data will be displayed on the URL. If the page can be cached or others can access it, the user account password can be obtained from the history record. material

4) The GET method requires using Request.QueryString to obtain the value of the variable, while the POST method uses Request.Form to obtain the value of the variable.

Six response messages

The client sends a request to the server, and the server responds with a status line. The response content includes: message protocol version, success or error code, server information, entity metainformation, and necessary entity content . Depending on the category of the response class, the server response can contain entity content, but not all responses have entity content.

Format:

http protocol version space status code space Reason-Phrase carriage return and line feed (Reason-Phrase is a simple text description), such as

 

Seven http status response codes

1XX (information type): Indicates that the request is received and processing continues

2XX (successful response): Indicates that the action was successfully received, understood and received

Follow 200: Indicates that the request was completed successfully and the requested resource was sent back to the client

3XX (Redirect): In order to complete the specified action, further processing must be accepted

Pay attention to 304: The requested web page has not been modified since the last request. When the server returns this response, it will not return the web page content, which means that the last document has been cached and can continue to be used.

4XX (client error class): The request contains incorrect syntax or cannot be executed correctly

Pay attention to 404: A 404 error indicates that the server can be connected, but the server cannot obtain the requested web page and the requested resource does not exist. eg: Wrong URL entered

5XX (server-side error class): The server cannot correctly execute a correct request

Eight header information

8.1 Common HTTP request headers

If-Modified-Since: Send the last modification time of the browser-side cached page to the server, and the server will compare this time with the last modification time of the actual file on the server. If the times are consistent, 304 is returned and the client directly uses the local cache file. If the time is inconsistent, 200 and the new file content will be returned. After receiving it, the client will discard the old file, cache the new file, and display it in the browser. (This is related to the comparison cache, which will be discussed later. The opposite is the Last-Modified response header)

If-None-Match: If-None-Match works with ETag. The working principle is to add ETag information in the HTTP response header. When the user requests the resource again, the If-None-Match information (the value of ETag) will be added to the HTTP request header. If the server verifies that the resource's ETag has not changed (the resource has not been updated), it will return a 304 status to tell the client to use the local cache file. Otherwise a 200 status and a new resource and Etag will be returned (this is also related to the comparison cache and has higher priority than the If-Modified-Since/Last-Modified pair above)

Cache-Control: Specifies the caching mechanism followed by requests and responses. Cache directives are one-way (caching directives that appear in the response may not appear in the request) and are independent (setting Cache-Control in the request message or response message does not modify the cache processing during the processing of another message. process). The caching instructions during the request include no-cache, no-store, max-age, max-stale, min-fresh, only-if-cached, and the instructions in the response message include public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, max-age, s-maxage. (Both in request headers and response headers, about forced caching)

Cache-Control:Public Both clients and servers can cache

Cache-Control:Private client can cache

Cache-Control:no-cache requires comparing cache to verify cached data

Cache-Control:no-store All content will not be cached, forced caching, and comparison caching will not be triggered.

Cache-Control:max-age The cached content will expire after xxx seconds.

Cache-Control:min-fresh indicates that the client can receive responses with a response time less than the current time plus the specified time.

Cache-Control:max-stale indicates that the client can receive response messages beyond the timeout period. If you specify a value for max-stale messages, the client can receive response messages that exceed the specified value of the timeout period.

Accept: MIME types that the browser can accept. For example: Accept: text/html means that the browser can accept the type of server postback as text/html, which is what we often call html documents.

Accept-Encoding: The browser declares the encoding method it can accept, usually specifying the compression method, whether it supports compression, and what compression methods it supports (gzip, deflate)

Accept-Language: The browser declares the language it accepts. The difference between language and character set: Chinese is a language, and Chinese has multiple character sets, such as big5, gb2312, gbk, etc.

Accept-Charset: Character set acceptable by the browser

User-Agent: Tells the HTTP server the name and version of the operating system and browser used by the client

Content-Type:例如:Content-Type: application/x-www-form-urlencoded。

Connection: For example:

Connection: keep-alive When a web page is opened, the TCP connection used to transmit HTTP data between the client and the server will not be closed. If the client accesses the web page on the server again, it will continue to use this established connection. . HTTP 1.1 makes persistent connections by default. Taking advantage of persistent connections, when the page contains multiple elements (such as Applets, pictures), the download time is significantly reduced. To achieve this, the Servlet needs to send a Content-Length header in the response. The simplest way to achieve this is to first write the content to a ByteArrayOutputStream, and then calculate its size before officially writing the content out.

Connection: close means that after a Request is completed, the TCP connection used to transmit HTTP data between the client and the server will be closed. When the client sends a Request again, the TCP connection needs to be re-established.

Referer: Contains a URL from which the user accesses the currently requested page.

Host: (This header field is required when sending a request) It is mainly used to specify the Internet host and port number of the requested resource. It is usually extracted from the HTTP URL (must be included in the http1.1 protocol)

For example: We enter: http://www.guet.edu.cn/index.html in the browser, and the request message sent by the browser will include the Host request header field: Host: http://www.guet .edu.cn, the default port number 80 is used here. If the port number is specified, it becomes: Host: Specify the port number.

Cookie: One of the most important request headers, sends the cookie value to the HTTP server

Content-Length: Indicates the length of the request message body

Authorization: Authorization information

8.2 Common HTTP response headers

Allow: Which request methods the server supports (such as GET, POST, etc.)

Date: Indicates the time when the message was sent. The description format of the time is defined by rfc822.

Expires: Indicates when the document should be considered expired, so it will no longer be cached and will be retrieved from the server again, which will update the cache.

P3P: used to set cookies across domains, which can solve the problem of iframe cross-domain access to cookies

Set-Cookie: A very important header, used to send cookies to the client browser. Each cookie written will generate a Set-Cookie.

For example: Set-Cookie: sc=4c31523a; path=/; domain=.acookie.taobao.com

ETag: Used in conjunction with If-None-Match.

Last-Modified: Used to indicate the last modified date and time of the resource. Last-Modified can also be set using the setDateHeader method.

Content-Type: The WEB server tells the browser the type and character set of the object it responds to.

For example: Content-Type: text/html;charset=utf-8

Content-Length: Indicates the length of the entity body, represented by a decimal number stored in bytes.

Content-Encoding: The WEB server indicates what compression method (gzip, deflate) it uses to compress the objects in the response.

Content-Range: Used to specify the insertion position of a part of the entire entity. It also indicates the length of the entire entity.

Content-Language: The WEB server tells the browser the natural language used by the object it responds to.

Connection:

For example: Connection: keep-alive When a web page is opened, the TCP connection used to transmit HTTP data between the client and the server will not be closed. If the client accesses the web page on the server again, it will continue to use this established link. Connection.

Connection: close means that after a Request is completed, the TCP connection used to transmit HTTP data between the client and the server will be closed. When the client sends a Request again, the TCP connection needs to be re-established.

Location: used to redirect to a new location, containing a new URL address

Refresh: Indicates the time after which the browser should refresh the document, in seconds.

Excerpted from: http://www.cnblogs.com/EricaMIN1987_IT/p/3837436.html

Nine HTTP caching mechanisms

WEB cache (cache) is located between the web server and the client.

The cache will save a copy of the output content according to the request, such as html pages, pictures, files. When the next request comes: if it is the same URL, the cache directly uses the copy to respond to the access request instead of sending the request to the source server again.

The HTTP protocol defines relevant headers to make web caching work as well as possible

9.1 Advantages of caching

Reduced response latency: Because requests are served from the cache server (which is closer to the client) rather than the origin server, the process takes less time, making the web server appear to respond faster.

Reduce network bandwidth consumption: When replicas are reused, client bandwidth consumption is reduced; customers can save bandwidth costs, control the growth of bandwidth requirements and make it easier to manage.

9.2  Header fields related to caching in http messages

In order to have a general understanding of some of the header information that can be used below, the following header fields related to caching are introduced.

1. Common header fields (fields that can be used in both request messages and response messages)

2. Request header fields

3. Response header fields

4. Entity header fields

9.3  Caching method

Caching actually determines whether to use some stored information in the browser based on some policy rules. This cache information can be considered as a cache database that exists in the browser (it can also be called a local cache).

Classified according to whether it is necessary to re-initiate a request to the server, which can be divided into two categories ( forced caching and comparative caching )

The mandatory type does not require a request to the server, while the comparison cache requires a request to the server.

9.3.1 Force caching

When you already have cached data and the cache time has not expired, use forced caching.

The forced caching of http1.0 has two fields, Pragma (indicating disabling caching) and Expires (enabling caching and defining cache time). If used at the same time, the Pragma priority will be higher, but the cache time defined by Expires in the response message is relative to the time on the server. If the time on the client is inconsistent with the time on the server (especially if the user has modified System time of your own computer ) , then the cache time may be meaningless. In order to solve this problem, http1.1 uses a new field: Cache-Control ( focus on mastering , use this as a benchmark)

Note: In order to be backward compatible with the http protocol, you can still see that many websites still carry these two fields. In fact, they are two disposable fields.

Cache-Control

Usage: "Cache-Control":"cache-directive"

When used as a request header, the optional values ​​of cache-directive are:

When used as a response header, the optional values ​​of cache-directive are:

In practice, we focus on five values: private, public, no-cache, max-age, and no-store. The default is private.

private: the client can cache

public: Both the client and the proxy server can be cached (front-end students can think of public and private as the same)

max-age=xxx: The cached content will expire after xxx seconds

no-cache: Need to use comparison cache to verify cached data

no-store: All content will not be cached, forced caching, and comparison caching will not be triggered.

for example:

In the figure, Cache-Control only specifies max-age, so the default is private, and the cache time is 31536000 seconds (365 days)

In other words, if you request this data again within 365 days, the data in the cache database will be directly obtained and used directly.

9.3.2 Comparing caches

Compare cache : That is, comparison is needed to determine whether the cache can be used. When the browser requests data for the first time, the server will return the cache identifier and data to the client, and the client will back up both to the cache database. When requesting data again, the client sends the backed-up cache ID to the server, and the server makes a judgment based on the cache ID. After the judgment is successful, it returns a 304 status code to notify the client that it is relatively successful and that the cached data can be used.

Compare the problems solved by caching : The cache time has expired, but the server has not updated the resource. At this time, if the client requests the server to resend the resource again, it will waste bandwidth and time. Comparing the cache is to let the server know that the cache files currently stored by the client are actually consistent with all its own files. Let the client directly use its own cache, which improves the cache reuse rate.

The comparison cache is judged based on the cache identifier of the request header and response header.

Compare cache identification fields used by caches

Last-Modified  /  If-Modified-Since

Last-Modified:

When the server responds to the request, it tells the browser the last modification time of the resource.

If-Modified-Since:

When requesting the server again, use this field to notify the server of the last modification time of the resource returned by the server during the last request.

After receiving the request, the server finds that the If-Modified-Since header is compared with the last modification time of the requested resource.

If the last modification time of the resource is greater than If-Modified-Since, it means that the resource has been modified again, and the entire resource content will be responded and status code 200 will be returned;

If the last modification time of the resource is less than or equal to If-Modified-Since, it means that the resource has no new modifications, and responds with HTTP 304 to tell the browser to continue using the saved cache.

Last-Modified is good but not particularly good, because if a resource is modified on the server, but its actual content has not changed at all, the entire entity will be returned to the client because the Last-Modified time does not match ( Even if there is an identical resource in the client cache)

ETag/  If-None-Match (higher priority than Last-Modified/If-Modified-Since)

In order to solve the above-mentioned possible inaccuracy of Last-Modified, Http1.1 also introduced the ETag entity header field.

Etag can be understood as a unique identifier calculated by the server using an encryption algorithm to identify a resource. When the client makes the first request, the server will transmit it to the client along with the data, and the client will retain the ETag field. , and bring it to the server together with the next request. The server only needs to compare whether the ETag sent by the client is consistent with the ETag of the resource on its own server, and it can well determine whether the resource has been modified relative to the client.

Etag:

When the server responds to the request, it tells the browser the unique identifier of the current resource on the server (the generation rules are determined by the server).

If-None-Match:

When requesting the server again, use this field to notify the server of the unique identifier of the client cache data.

After the server receives the request and finds that there is an If-None-Match header, it compares it with the unique identifier of the requested resource.

Different, indicating that the resource has been modified again, respond to the entire resource content and return status code 200;

The same, indicating that there are no new modifications to the resource, respond with HTTP 304, telling the browser to continue using the saved cache.

Note: If there are two pairs of fields at the same time, both need to pass before caching can be used.

 

Summarize

For forced caching, the server notifies the browser of a cache time. During the cache time, the next request will directly use the cache. If not within the time, a comparison cache policy will be executed.

For comparison caching, the Etag and Last-Modified in the cache information are sent to the server through the request, and are verified by the server. When a 304 status code is returned, the browser directly uses the cache.

The browser's first request:

When the browser requests again:

Excerpted from: http://www.cnblogs.com/vajoy/p/5341664.html, http://www.cnblogs.com/chenqf/p/6386163.html

10. Solve the problem of HTTP statelessness

10.1 Saving state information through cookies

Cookies are actually data (usually encrypted) that is stored on the client and stored on the user's local terminal (Client Side) by the website in order to identify the user. Cookies are data temporarily stored on your computer by the server ( . txt format text file), the status in the HTTP transmission is used by the server to identify your computer. When you browse a website, the web server will first send a small piece of information to your computer, and the cookie will record the text or some choices you type on the website.

Through Cookies, the server can clearly know that request 2 and request 1 come from the same client.

10.2 Saving status information through session

The Session mechanism is a server-side mechanism. The server uses a structure similar to a hash table (or may use a hash table) to save information.

When the program needs to create a session for a client's request, the server first checks whether the client's request already contains a session identifier - called a session id. If it already contains a session id, it means that this client has been used before. After creating a session, the server will retrieve the session according to the session id and use it (if it cannot be retrieved, it may create a new one). If the client request does not include the session id, a session will be created for the client and a session will be generated with this session. The associated session id. The value of the session id should be a string that is neither repetitive nor easy to find patterns to counterfeit. This session id will be returned to the client in this response for storage.

Session implementation:

1. Use cookies to achieve

The server assigns a unique JSESSIONID to each Session and sends it to the client through Cookie.

When the client initiates a new request, it will carry this JSESSIONID in the Cookie header. In this way, the server can find the Session corresponding to the client.

2. Use URL writeback to achieve

URL writeback means that the server carries the JSESSIONID parameter in all links sent to the browser page, so that when the client clicks on any link, the JSESSIONID will be sent to the server. If you directly enter the URL of the server resource in the browser to request the resource, the Session will not be matched.

Tomcat's implementation of Session uses both Cookie and URL writeback mechanisms at the beginning. If it is found that the client supports Cookie, it will continue to use Cookie and stop using URL writeback. If you find that cookies are disabled, always use URL writeback. When jsp development is processing Session, remember to use response.encodeURL() for links in the page.

10.3. Maintain state through form variables

In addition to Cookies, you can also use form variables to maintain state. For example, Asp.net maintains state through an Input="hidden" box called ViewState, such as:

This principle is similar to Cookies, except that the information attached to each request and response becomes a form variable.

10.4. Maintain state through QueryString

QueryString transmits information to the server by saving the information at the end of the requested address. It is usually used in conjunction with a form. A typical QueryString such as: www.xxx.com/xxx.aspx?var1=value&var2=value2

Note: I would like to share my own opinion here. I feel that maintaining a state means that the server identifies a certain state. This is actually very similar to the session mechanism. See below for details.

Eleven cookies and session

11.1 Meaning

cookie mechanism

Cookies are small pieces of text stored by the server on the local machine and sent to the same server with every request. IETF RFC 2965 HTTP State Management Mechanism is a general cookie specification. The web server sends cookies to the client using HTTP headers. On the client terminal, the browser parses these cookies and saves them to a local file. It automatically binds these cookies to any request to the same server  .

Specifically, the cookie mechanism uses a solution that maintains state on the client side. It is a storage mechanism for session state on the user side. It requires the user to turn on cookie support on the client side. The role of cookies is an effort to solve the stateless defects of the HTTP protocol.

Orthodox cookie distribution is achieved by extending the HTTP protocol. The server adds a special line of instructions to the HTTP response header to prompt the browser to generate the corresponding cookie according to the instructions. However, pure client-side scripts such as JavaScript can also generate cookies. The use of cookies is automatically sent to the server in the background by the browser according to certain principles. The browser checks all stored cookies. If the declared scope of a cookie is greater than or equal to the location of the resource to be requested, the cookie is attached to the HTTP request header of the requested resource and sent to the server.

The content of the cookie mainly includes: name, value, expiration time, path and domain. The path and domain together form the scope of the cookie. If the expiration time is not set, it means that the lifetime of this cookie is during the browser session. When the browser window is closed, the cookie disappears. This type of cookie that lasts for the duration of the browser session is called a session cookie. Session cookies are generally not stored on the hard disk but in memory. Of course, this behavior is not specified by the specification. If an expiration time is set, the browser will save the cookies to the hard disk. If you close and open the browser again, these cookies will still be valid until the set expiration time is exceeded. Cookies stored on the hard drive can be shared between different browser processes, such as two IE windows. Different browsers have different processing methods for cookies stored in memory. (Therefore, there are also memory cookies and hard disk cookies)

The session mechanism uses a solution that maintains state on the server side. At the same time, we have also seen that since the solution of maintaining state on the server side also needs to save an identity on the client side, the session mechanism may need to use the cookie mechanism to achieve the purpose of saving the identity. Session provides a convenient way to manage global variables.

Session is for each user. The value of the variable is stored on the server. A sessionID is used to distinguish which user session variable it is. This value is returned to the server through the user's browser when accessing. When the client disables cookies, this Values ​​may also be set to be returned to the server by get.

In terms of security: when you visit a site that uses sessions and create a cookie on your machine, it is recommended that the session mechanism on the server side be safer because it will not arbitrarily read the information stored by the customer.

session mechanism

The session mechanism is a server-side mechanism. The server uses a structure similar to a hash table (or may use a hash table) to save information.

When the program needs to create a session for a client's request, the server first checks whether the client's request already contains a session identifier (called session id). If it does, it means that a session has been created for this client before. , the server will retrieve the session according to the session id and use it (if it cannot be retrieved, a new one will be created). If the client request does not contain the session id, a session will be created for the client and a session id associated with this session will be generated. , the value of the session id should be a string that is neither repeated nor easy to find patterns to imitate. This session id will be returned to the client in this response for storage.

The method of saving this session ID can use cookies, so that during the interaction process, the browser can automatically display this identification to the server according to the rules. Generally, the name of this cookie is similar to SEEESIONID. But cookies can be artificially disabled, and there must be other mechanisms to still pass the session id back to the server when cookies are disabled.

A frequently used technique is called URL rewriting, which appends the session id directly to the end of the URL path. There is also a technique called form hidden fields. That is, the server will automatically modify the form and add a hidden field so that the session id can be passed back to the server when the form is submitted.

11.2 Function

They are all mechanisms that can maintain state.

11.3 Differences

11.3.1 Differences in access methods

Cookies can only store ASCII strings

Session can store any type of data

11.3.2 Differences in privacy policies

Cookies are stored in the client browser and are visible to the client. Some programs on the client may snoop, copy or even modify the contents of the Cookie. The Session is stored on the server and is transparent to the client, so there is no risk of sensitive information being leaked.

If you choose cookies, a better way is to try not to write sensitive information such as account passwords into cookies. It is best to encrypt the cookie information like Google and Baidu, and then decrypt it after submitting it to the server to ensure that only the person can read the information in the cookie. It will be much easier if you choose Session. Anyway, it is placed on the server, and any privacy in Session can be effectively protected.

11.3.3 Differences in validity period

Anyone who has used Google knows that if you have logged in to Google, your Google login information will be valid for a long time. Users do not need to log in again every time they visit, as Google will permanently record the user's login information. To achieve this effect, using cookies would be a better choice. Just set the cookie's expiration time attribute to a very, very large number.

Because Session relies on a cookie named JSESSIONID, and the default expiration time of Cookie JSESSIONID is -1, the Session will become invalid as long as the browser is closed, so Session cannot achieve the effect of information being permanently valid. It cannot be accomplished using URL address rewriting. And if the session timeout is set too long, the more sessions the server will accumulate, the easier it will be to cause memory overflow.

11.3.4 Different server pressures

Session is stored on the server side, and each user will generate a Session. If there are a lot of users accessing concurrently, a lot of Sessions will be generated, consuming a lot of memory. Therefore, websites with extremely high concurrent visits such as Google, Baidu, and Sina are unlikely to use Session to track user sessions.

Cookies are stored on the client side and do not occupy server resources. If there are many users reading concurrently, Cookie is a good choice. For Google, Baidu, and Sina, Cookie may be the only choice.

11.3.5 Differences in browser support

Cookies need to be supported by the client browser. If the client disables cookies or does not support cookies, session tracking will be invalid. Regarding applications on WAP, regular cookies are of no use.

If the client browser does not support cookies, Session and URL address rewriting need to be used. It should be noted that all URLs that use the Session program must be rewritten, otherwise Session tracking will be invalid. For WAP applications, Session+URL address rewriting may be its only option.

If the client supports cookies, the cookie can be set to be valid within this browser window and sub-windows (set the expiration time to –1), or it can be set to be valid in all browser windows (set the expiration time to a value greater than an integer of 0). But Session can only be valid within this reader window and its sub-windows. If two browser windows are independent of each other, they will use two different Sessions. (Session is related to different windows under IE8)

11.3.6 Differences in cross-domain support

Cookie supports cross-domain access. For example, if the domain attribute is set to ".biaodianfu.com", all domain names with the suffix ".biaodianfu.com" can access the cookie. Cross-domain cookies are now commonly used on the Internet, such as Google, Baidu, Sina, etc. Session will not support cross-domain name access. Session is only valid within the domain name where it is located.

Only using Cookie or only using Session may not achieve the desired effect. At this time you should try to use Cookie and Session at the same time. The combination of Cookie and Session will achieve many unexpected effects in practical projects.

11.4 Contact

The client sends a request to the server for the first time. At this time, the server generates a unique sessionID and returns it to the client (through a cookie). It is stored in the client's memory and corresponds to a browser window. Due to the characteristics of the HTTP protocol, This time the connection was lost.

When this client sends a request to the server in the future, it will carry the cookie in the request. Since the cookie contains the sessionID, the server knows that this is the client just now.

In other words, the cookie can store an identifier of the sessionid.

Excerpt from: http://blog.csdn.net/weixin_37196194/article/details/55806366

Twelve new features of http2

        http2 released in 2015

12.1 Binary protocol

The header information of HTTP/1.1 version must be text (ASCII encoding), and the data body can be text or binary. HTTP/2 is a complete binary protocol. The header information and data body are both binary, and are collectively called "frames": header information frames and data frames.

One benefit of the binary protocol is that additional frames can be defined. HTTP/2 defines nearly ten types of frames, laying the foundation for future advanced applications. If you use text to implement this function, parsing the data will become very troublesome, and binary parsing is much more convenient.

12.2 Multiple tasks

HTTP/2 reuses TCP connections. In one connection, both the client and the browser can send multiple requests or responses at the same time without corresponding one-to-one in order, thus avoiding "head-of-line congestion". (Head-of-queue blocking means that when the client or server sends a message, if a piece of data is too large, it will take a long time, and there will be a lot waiting to be sent later, causing congestion)

For example, in a TCP connection, the server received A request and B request at the same time, so it responded to A request first. It turned out that the processing process was very time-consuming, so it sent the processed part of A request, and then responded to B request. After completion, send the remaining part of the A request.

Such two-way, real-time communication is called multiplexing.

12.3 Data packets

Due to the multi-tasking feature, http2 data packets are sent out of order. Consecutive data packets in the same connection may belong to different responses. Therefore, the packet must be marked to indicate which response it belongs to.

HTTP/2 calls all data packets of each request or response a data stream. Each data stream has a unique number. When a data packet is sent, it must be marked with a data flow ID to distinguish which data flow it belongs to. In addition, it is also stipulated that the ID of the data stream sent by the client is always an odd number, and the ID of the data stream sent by the server is an even number.

When the data stream is sent halfway, both the client and the server can send a signal (RST_STREAM frame) to cancel the data stream. The only way to cancel the data flow in version 1.1 is to close the TCP connection. This means that HTTP/2 can cancel a request while keeping the TCP connection open and available for other requests.

The client can also specify the priority of the data flow. The higher the priority, the sooner the server will respond.

12.4 Header compression

The HTTP protocol is stateless and all information must be attached to every request. Therefore, many fields in the request are repeated, such as Cookie and User Agent. The exact same content must be included in every request, which wastes a lot of bandwidth and also affects the speed.

HTTP/2 has optimized this and introduced header compression. On the one hand, the header information is compressed using gzip or compress before being sent; on the other hand, the client and server maintain a header information table at the same time, all fields will be stored in this table, and an index number will be generated, so that the same fields will not be sent in the future. , only the index number is sent, which improves speed.

12.5 Server Push

HTTP/2 allows the server to actively send resources to the client without request. This is called server push.

A common scenario is that the client requests a web page, which contains many static resources. Under normal circumstances, the client must receive the web page, parse the HTML source code, find static resources, and then issue a static resource request. In fact, the server can expect that after the client requests the web page, it is likely to request static resources again, so it actively sends these static resources to the client along with the web page.

You can search here to read the article about http protocol in Ruan Yifeng’s web log. It is short and concise.

Guess you like

Origin blog.csdn.net/qq_29510269/article/details/108201769