Computer network - HTTP basic concept, HTTPS, HTTP status code, HTTP cache, HTTP request

Refer to Kobayashi coding

Basic HTTP concepts

HTTP is Hypertext Transfer Protocol. The so-called hypertext is the text beyond ordinary text. The most important thing is to have hyperlinks, which can jump from one hypertext to another hypertext.

HTML is the most common hypertext. It is a pure text file, but internally uses many tags to define links such as pictures and videos. After being interpreted by the browser, what is presented is a web page composed of text and pictures.

HTTP is transmitted between two points, it can be from the server to the local browser, or between the server and the server

The difference between HTTP and HTTPS

  • HTTP is not encrypted, that isclear textYes, so HTTP is very insecure; HTTPS uses the SSL+HTTP protocol, which can encrypt transmission, identity authentication, and security.
  • Because the HTTPS protocol requires a CA certificate, generally there are relatively few free certificates, and a certain fee is required.
  • HTTP and HTTPS use different connection methods and use different ports: HTTP80, HTTPS443

What problems does HTTPS solve HTTP?

Eavesdropping Risk - Hybrid Encryption

Both symmetric and asymmetric encryption are used.Before establishing communicationWith asymmetric encryption (using two keys, public and private),communication processSymmetric encryption is used in .

Risk of Tampering—Digest Algorithms and Digital Signatures

Use ==digest algorithm (hash function)== to calculate the content hash value, which is unique:

We calculate a "fingerprint" for the content and transmit it to the other party together with the content. After the other party receives it, they also calculate a "fingerprint" for the content and compare it with the "fingerprint" of the sender. If the fingerprints are the same, it means that the content is not tampered with.

But this will have the risk of (content + hash value) being replaced by a middleman. So it can be solved by using asymmetric encryption algorithm .

Asymmetric encryption algorithm

One is the public key and the other is the private key.

  • Public key encryption, private key decryption: ensure the security of the transmitted content, because only the person who holds the private key can decrypt the actual content.
  • Private key encryption, public key decryption: to ensure that the message is not impersonated. Because the private key cannot be disclosed, if the public key can decrypt the content encrypted by the private key, it means that the message is sent by the person holding the private key identity.

Risk of Impersonation—Digital Certificates

What if the public key is forged? Digital certificates can be introduced.
In the computer, there is an authority called CA (digital certificate certification authority). The operator of the server submits an application for a public key to the CA. After the CA determines the identity of the applicant, it will digitally sign the applied public key, then distribute the signed public key, and send the public key to the CA. Bind together after putting in the public key certificate.
When performing HTTP communication, the server sends the certificate to the client. After the client obtains the public key, it first uses a digital signature for verification. If the verification passes, the communication can begin.

insert image description here

How HTTPS establishes a connection

insert image description here

SSL/TLS protocol establishment

The basic idea is to use public key encryption, that is to say, the client first asks the server for the public key, and then encrypts the information with the public key. After receiving the ciphertext, the server decrypts it with its own private key.

  • ClientHello : The client initiates an encrypted communication request to the server.
    Send the supported TLS version, the random number generated by the client (subsequently used to generate the session key), the list of cipher suites supported by the client (RSA encryption algorithm, TLS 1.0 version, and now the popular TLS 1.2 has become the mainstream)

  • ServerHello : After the server receives the client request, it sends a response to the client.
    The content of the response includes confirming the TLS protocol version (if the browser does not support it, turn off encrypted communication), the random number generated by the server, the list of confirmed cipher suites, and the server'sdigital certificate

  • Response from the client : First, confirm the authenticity of the server’s digital certificate through the CA public key in the browser or operating system. If there is no problem with the certificate, the client will take out the server’s digital certificate from the digital certificate.public key, and then use it to encrypt the message.
    Send a random number to the server (this random number will be encrypted by the server's public key, and the random number client and server are the same ), and the encrypted communication algorithm change notification (indicating that subsequent information will be encrypted with the session key) , the client notifies the end of the handshake, and at the same time makes a summary of the occurrence data of all previous content to provide server-side verification.

  • The server finally responds : After receiving the third random number from the client , the server calculates the session key for this communication through the negotiated encryption algorithm (private key).
    Then send the final message to the client: the session key generated by the first three random numbers is used to encrypt the entire next session; the encrypted communication algorithm change notification indicates that subsequent information will be encrypted with the session key; the server shakes hands End notification, indicating that the server handshake phase has ended, and at the same time make a summary of the data that occurred in all previous content to provide verification to the client.

HTTP status code

insert image description here

1XX

The client can continue to send the request or ignore the response

2XX

  • 200:OK
  • 204: The request is successful, but the response message returned by == does not contain the body part of the entity.
  • 206: Indicates that the client performsrange request, the response message contains the entity content in the range specified by content-Range

3XX

  • 301: Permanent Redirect
  • 302: Temporary redirect
  • 303: Temporary redirection, but the client should use the GET method to obtain resources
  • 304: The request message will contain some conditions, such as if-match, if-none-match, if the conditions are not met, return 304
  • 307: Temporary redirection, similar to 302, but 307It is required that the browser will not change the POST method of the redirection request to the GET method

4XX

  • 400: There is a syntax error in the request message
  • 401: This status code indicates that the request sent requires authentication information (BASIC authentication, DIGSET authentication). If a request has been made before, it means that the user authentication failed.
  • 403: Request denied
  • 404: Indicates that the requested resource does not exist or is not found on the server, so it cannot be provided to the client.

5XX

  • 500: Similar to 400, it is a general error code. We do not know what error occurred on the server
  • 501: The function requested by the client is not currently supported
  • 502 Bad Gateway: The error code returned by the server as a gateway or proxy, indicating that the server itself is working normally.An error occurred while accessing the backend server
  • 503: Indicates that the server is currently busy and cannot respond to the client temporarily.

HTTP cache

For some repeated HTTP requests, the data requested each time is the same, so the data of the request response can be cached locally, so that the local data can be read directly next time, without the need to obtain the server response through the network. HTTP has two types of caching: mandatory caching and negotiation caching.

Why cache?

Reduce the pressure on the server and reduce the delay for the client to obtain resources: the cache is usually located in the memory, and the speed of reading the cache is faster. You can let the proxy server cache or let the client browser cache.

Mandatory Caching and Negotiating Caching

mandatory caching

As long as the browser judges that the cache has not expired, it will directly use the browser's local cache, and the initiative to decide whether to use the cache is on the browser's side (from disk cache). If expired, the server will be re-requested. After the server receives the request again, it will update the Cache-Control in the response header again.
Forced caching is implemented using two HTTP response header fields: Cache-Control and Expires. The former is a relative time, the latter is an absolute item, and the former has priority over the latter.

negotiation cache

The response code for some requests is 304, which tells the browser to use locally cached resources. usually thisThe server tells the client whether the cache can be usedThe way is called negotiation cache. It can be achieved by If-Modified-Since and If-None-Match. Reference small coding :
insert image description here

  • The reason why ETag has a higher priority is that there is no need to re-request without modifying the file content; and the granularity that If-Modified-Since can check is second-level, using Etag can ensure that some second-level modifications can be refreshed; and some servers It is not possible to accurately obtain the last modification time of the file.
    insert image description here

Disable caching and confirm caching

HTTP/1.1 controls caching through the Cache-Control header field. The no-store is used to prohibit caching; the no-chche command stipulates that the cache server needs to verify the validity of the cache resource to the source server first, and only when the cache resource is valid can the cache be used to respond to the client's request.

Public and private fields in HTTP cache

private

The private directive stipulates that the resource is used as a private cache, which can only be used by a single user, and is generally stored in the user's browser.

Cache-Control: private
public

The public directive specifies that the resource is used as a public cache, which can be used by multiple users, and is generally stored in a proxy server.

Cache-Control: public

cache expiration mechanism

The cache is guaranteed to be up to date. The max-age directive ( in HTTP/1.1, the max-age directive will be processed first;
in HTTP/1.0, the max-age directive will be ignored
) appears in the corresponding message, indicating the time that the cache resource is stored in the cache server .

Cache-Control: max-age=31536000

The Expires header field can also be used to tell cache servers when the resource will expire.

Expires: Wed, 04 Jul 2012 08:26:05 GMT

HTTP request

HTTP request process

  • DNS
  • Initiate a TCP 3-way handshake
  • Initiate an http request after establishing a TCP connection
  • The server responds to the http request, and the browser gets the html code
  • The browser parses the html code and requests resources in html
  • The browser renders the page and presents it to the user

HTTP request method

The first line of the request message sent by the client contains the method field.
Image source Axiu's study notes
insert image description here

The difference between GET and POST

  • The semantics of GET areGet the specified resource from the server
  • POST isProcess the specified resource according to the request load (message body)
  • The parameters of the GET request are generally written in the URL, and the URL stipulates that only ASCII is supported, so the parameters of the GET request only allow ASCII characters, and the browser will limit the length of the URL (2K, actually depends on the browser and server. The browser will Needless to say, because the server needs to consume more resources to process long URLs, the length of the URL will be limited for performance and security). (The HTTP protocol itself does not specify the length of the URL
  • The position of the data carried by the POST request is generally written in the message, and the body can contain data in any format, as long as the client and the server negotiate, and the browser generally does not limit the size of the body.But actually the amount of data that POST can pass depends on the server settings and memory size. Half of the POST data volume rarely exceeds MB, because when uploading files, if you want to upload relatively large data or files to the server, it may not be uploaded
Safety and idempotence
  • GET is safe and idempotent. Because of the read-only nature, the data is safe no matter how many operations are performed. (The data requested by GET can be cached. This cache can be placed on the browser or proxy such as nginx, and can also be saved as a bookmark)
  • POST will modify data, so it is not safe. Submitting data multiple times will create multiple resources, so it is not idempotent.

However, if the developer does not implement the GET and POST methods in accordance with the semantics of the RFC specification, the security and idempotence of GET and POST cannot be guaranteed.

In fact, from the perspective of transmission, both GET and POST are insecure, because HTTP is transmitted in clear text on the network. As long as the packet is captured on the network node, the data message can be completely obtained. To achieve truly secure transmission, there is only encryption, that is, HTTPS

Can you carry the body

The location of the data carried by the POST request is generally in the body. In fact, the RFC specification does not stipulate that the GET request cannot carry the body, but GET is generally used to obtain resources, so the body is not required.

Generate TCP packets
  • GET generates a TCP packet
  • POST generates two TCP packets. The browser sends the header first, and the server responds with 100continue; the browser sends data again, and the server responds with 200ok. (But after the actual test on Chrome, it is found that the header and body will not be sent separately. This shows thatSending separately is a request method of some browsers or frameworks, which is not an inevitable behavior of POST。)

GET method parameters

In the convention, parameters are written after? and separated by &. We can also agree on the parameters ourselves, as long as the server can explain them.

Cookies and Sessions

The difference between cookie and session

cookie

what is a cookie

The HTTP protocol is stateless, mainly to make the HTTP protocol as simple as possible so that it can handle a large number of transactions. HTTP/1.1 introduces cookies to save state information.

A cookie is a small piece of data sent by the server to the user's browser and stored locally. It will be carried when the browser initiates a request to the same server again, and is used to tell the server whether the two requests come from the same browser. device

Cookies appear because HTTP is a stateless protocol. In other words, the server can't remember you. Maybe every time you refresh the webpage, you have to re-enter your account password to log in. This is obviously unacceptable. The role of a cookie is like the server labels you, and then every time you send a request to the server, the server can recognize you with the cookie.

In addition to those mentioned above, there are two ways to save cookies on the client side, one is session cookie and the other is persistent cookie . Session cookie is to keep the cookie string returned by the server in memory and automatically destroy it after closing the browser. , the persistent cookie is stored on the client disk, and its valid time is specified in the server response header. Within the valid period, the client can directly fetch it from the local when it requests the server again. It should be noted that the cookies stored in the disk can be shared by multiple browser agents.

cookie purpose

  • Session state management: user login status, shopping cart, game score or other information that needs to be recorded
  • Personalization settings: user-defined settings, themes, etc.
  • Browser behavior tracking: track and analyze user behavior, etc.

session

session works

The working principle of the session is that after the client login is completed, the server will create a corresponding session. After the creation, the session id will be sent to the client, and the client will store it in the browser. In this way, every time the client accesses the server, it will bring the session id with it. After the server gets the session id, it can find the corresponding session in memory and then it can work normally.

The session is saved on the server, which can be saved in the database, file or memory. Each user has an independent session user to record the user's operations on the client. We can understand that each user has a unique Session ID as the Hash key of the Session file . Through this value, the data of the specific Session structure can be locked, and the user operation behavior is stored in the Session structure.

session process

  • The user logs in, the user submits a form containing the user name and password, and puts it into the HTTP request message;
  • The server verifies the user name and password, and if it is correct, it stores the user information in redis, and its key in redis becomes the session id
  • The set-cookie header field of the response message returned by the server contains the session id, and the client stores the cookie value in the browser after receiving the corresponding message;
  • The cookie value will be included when the client makes a request to the same server later. After the server receives it, it extracts the session id, extracts the user information from redis, and continues the previous business operation

The security of the session id cannot be easily obtained by malicious attackers, so an easily guessed session id value cannot be generated. In addition, the session id needs to be generated frequently. In scenarios with high security requirements, such as transfers and other operations, in addition to using Session to manage user status, users also need to re-authenticate, such as re-entering passwords or using SMS verification codes.

The comparison between cookie and session

As a stateless protocol, HTTP must maintain the connection state in some way.

Cookie is the method for the client to maintain state; session is the method for server to maintain state.

scenes to be used

Cookies can only store ASCII code strings, while Sessions can store any type of data, so Session is preferred when considering data complexity;
Cookies are stored in the browser and are easily viewed maliciously. If you have to store some private data in the cookie, you can encrypt the cookie value and then decrypt it on the server;
for large websites, if all user information is stored in the session, the overhead is very high, so it is not recommended to All user information is stored in the Session.

When the server needs to identify the client, it needs to combine Cookie. Every time an HTTP request is made, the client will send the corresponding cookie information to the server. In fact, most applications use cookies to implement session tracking. When creating a session for the first time, the server will tell the client in the HTTP protocol that a session ID needs to be recorded in the cookie. The session ID is sent to the server and I know who you are. If the client's browser disables cookies, a technique called URL rewriting will be used for session tracking, that is, every HTTP interaction, a parameter such as sid=xxxxx will be appended to the URL, and the server will use it accordingly. Identify the user, so that the user can complete the operation of automatically filling in information such as the user name.

Guess you like

Origin blog.csdn.net/qaaaaaaz/article/details/130888496