What exactly is the HTTPS protocol more than the HTTP protocol?

Source: Official Account [Jake's IT Journey]
Author: Alaska
ID: Jake_Internet
Original Address: What is the HTTPS protocol more than the HTTP protocol?

Hello everyone, my name is Jack. I recently published a related knowledge of the HTTP protocol. The outline is as follows:
insert image description here

Introduction to HTTP

The HTTP protocol is the abbreviation of Hyper Text Transfer Protocol (Hyper Text Transfer Protocol), which is a transfer protocol for transferring hypertext from a World Wide Web (WWW: World Wide Web) server to a local browser.

HTTP is a communication protocol based on TCP/IP to transfer data (HTML files, image files, query results, etc.).

HTTP is an object-oriented protocol belonging to the application layer. Due to its simplicity and speed, it is suitable for distributed hypermedia information systems. It was proposed in 1990, and after several years of use and development, it has been continuously improved and expanded. The sixth edition of HTTP/1.0 is currently used in the WWW, the standardization of HTTP/1.1 is in progress, and the proposal of HTTP-NG (Next Generation of HTTP) has been proposed.

The HTTP protocol works on the client-server architecture. The browser, as an HTTP client, sends all requests to the HTTP server, ie, the WEB server, through the URL. The Web server sends response information to the client according to the received request.

HTTP Features:

  • Simple and fast : When a client requests a service from the server, it only needs to transmit the request method and path. Commonly used request methods are GET, HEAD, and POST. Each method specifies a different type of contact between the client and the server. Due to the simplicity of the HTTP protocol, the program size of the HTTP server is small, so the communication speed is very fast;
  • Flexible : HTTP allows the transfer of data objects of any type. The type being transferred is marked by Content-Type;
  • Connectionless : The meaning of connectionless is to limit the processing of only one request per connection. After the server processes the client's request and receives the client's response, it disconnects. In this way, transmission time can be saved;
  • Stateless : The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory capability for transaction processing. The lack of state means that if previous information is required for subsequent processing, it must be retransmitted, potentially resulting in an increased amount of data transferred per connection. On the other hand, the server responds faster when it does not need the previous information;
  • Support B/S and C/S mode;

HTTP has so many advantages above, so the question is, what are the disadvantages of HTTP protocol? The answer is yes, and the reason is very simple. If HTTP is perfect, what is the need for a security protocol called HTTPS protocol?

Disadvantages of HTTP:

When we send relatively private data (such as your bank card, ID card) to the server, if we use http for communication. Then the security will not be guaranteed;

First of all, in the process of data transmission, the data may be captured by the middleman, then the data will be stolen by the middleman;

Secondly, after the data is obtained by the middleman, the middleman may modify or replace the data, and then send it to the server;

Finally, after the server receives the data, it cannot determine whether the data has been modified or replaced. Of course, if the server cannot judge that the data really comes from the client;

To sum up, HTTP has three disadvantages:

  • The confidentiality of messages cannot be guaranteed;
  • The completeness and accuracy of the message cannot be guaranteed;
  • The reliability of the source cannot be guaranteed;

Introduction to HTTPS

How to solve the disadvantages of HTTP? HTTPS was born to solve the above problems.

HTTPS (full name: Hyper Text Transfer Protocol over Secure Socket Layer) is a secure HTTP channel, simply a secure version of HTTP.

That is, the SSL layer is added under HTTP, and the security foundation of HTTPS is SSL, so the encrypted details need SSL. It is now widely used for security-sensitive communications on the World Wide Web, such as transaction payments.

HTTPS can make the plaintext information we transmit through asymmetric encryption algorithm, and the plaintext cannot be obtained by inverse inference. Next, let's take a look at what the specific workflow is like?

working principle:

The establishment process of HTTPS
insert image description here

Here, the establishment of HTTPS is divided into 6 stages and 12 processes. The 12 processes are explained below:

1. Client - Hello: The client starts SSL communication by sending a Client Hello message. The message contains the specified version of SSL supported by the client, and the list of encryption components (Cipher Suite) (encryption algorithm used and key length, etc.);

2. Server - Hello: When the server can communicate with SSL, it will respond with a Server Hello message. As with the client, the SSL version and encryption components are included in the message. The content of the encryption component of the server is filtered from the received encryption component of the client;

3. Server—Certificate: The server sends a certificate message. The message contains a public key certificate;

4. Server - I'm done: Finally, the server sends a Server Hello Done message to notify the client that the initial SSL handshake negotiation is over;

5. Client - send key: After the first handshake of SSL, the client responds with a Client Key Exchange message. The message contains a random cipher string called a Pre-master secret used in communication encryption. The message has been encrypted with the public key in step 3;

6. Client - just use this key: the message will prompt the server, and the communication after this message will be encrypted with the Pre-master secret key;

7. Client - I'm done: this message contains the overall checksum of all messages connected so far. Whether the handshake negotiation is successful or not depends on whether the server can correctly decrypt the message as the criterion;

8. Server - send c Change Cipher Spec message (I am receiving the secret key);

9. Server - send d Finished message (I have finished receiving the key);

10. Client - start sending the body: the server sends an HTTP request and sends the relevant content;

11. Server - start receiving body: the client receives the HTTP request and processes the relevant content;

12. Client - Disconnect: The client is finally disconnected. When disconnected, send a close_notify message. Some omissions are made in the above figure. After this step, a TCP FIN message is sent to close the communication with TCP;

In addition, in the above flowchart, a message digest called MAC (MessageAuthentication Code) will be attached when the application layer sends data. The MAC can check whether the message has been tampered with, so as to ensure the integrity of the message;

Next, I will use a diagram to illustrate it visually. This picture is more detailed than the picture of the digital certificate above (the picture comes from "Illustrated HTTP")

The establishment of HTTPS and the procedures in communication are explained above. Since the actual workflow is like this, what kind of algorithm can achieve such a function, and what kind of method can achieve asymmetric encryption? How is it calculated mathematically? So what is the corresponding theoretical basis? What supports HTTPS so that he can encrypt transmission?

Theoretical principles of HTTPS:

HTTPS uses some encryption and decryption, digital certificate, digital signature technology to achieve. The following first introduces the basic concepts of these technologies.

In order to ensure the confidentiality of the message, encryption and decryption are required. The current mainstream encryption and decryption algorithms are divided into symmetric encryption and asymmetric encryption.

Symmetric encryption (shared key encryption)

The client and server share a key to encrypt and decrypt messages, which is called symmetric encryption. The client and server agree on an encryption key. The client uses the key to encrypt the message before sending the message, and after sending it to the server, the server uses the key to decrypt the message to get the message.

Figure encryption process:

The symmetric encryption algorithm used here:

  • M: plaintext, the content we intend to transmit;
  • C: secret key, in the symmetric encryption algorithm, it needs to be encrypted with the secret key and decrypted with the secret key (the encryption algorithm can be very simple, addition, subtraction, multiplication and division, or it can be very complicated);
  • N: ciphertext, the content obtained by encrypting the plaintext with the secret key is called ciphertext, and what is transmitted on the network is also ciphertext;

For example, the client transmits 1 (plaintext) to the server, 1 + 3 (3 is the secret key) = 4 gets the ciphertext, and transmits it, the server gets the ciphertext 4, 4-3 (3 is the secret key) = 1 gets the plaintext, so that Client and server communicate and vice versa;

Advantages of symmetric encryption:

  • Symmetric encryption solves the problem of message confidentiality in HTTP;

Disadvantages of symmetric encryption:

  • Although symmetric encryption ensures message confidentiality, because the client and server share a key, the key is particularly vulnerable to leakage;
  • Because the risk of key leakage is high, it is difficult to ensure the reliability of the source of the message, the integrity and accuracy of the message;

The risk of symmetric encryption key leakage is very high, and the key is fixed, which makes it easy to be cracked. Is there a better way to encrypt transmission? are not the same, or are there other situations to increase security?

Asymmetric encryption (public key encryption)

Since the key is so easy to leak in symmetric encryption, we can use an asymmetric encryption method to solve it. When using asymmetric encryption, both the client and the server have a public key and a private key. The public key can be exposed to the outside world, while the private key can only be seen by itself.

Messages encrypted with the public key can only be decrypted by the corresponding private key. Conversely, messages encrypted with the private key can only be decrypted by the public key. In this way, the client encrypts the message with the server's public key before sending the message, and the server decrypts it with its own private key after receiving it.

Figure encryption process:

The explanation is as follows:

  • M: refers to the plaintext, the content we intend to transmit;
  • D: refers to the public key, which needs to be encrypted with the public key in the asymmetric encryption algorithm;
  • E: refers to the private key, which needs to be decrypted with the private key in the asymmetric encryption algorithm;
  • N: refers to the ciphertext, the content obtained by encrypting the plaintext with the secret key is called the ciphertext, and the transmission on the network is also the ciphertext;

The server generates the public key D and the private key E this time, and the private key is kept by itself. Then the public key D is made public. The client who wants to communicate with the server uses the public key D to encrypt and send it to the server with the private key E. The server can decrypt the ciphertext with the private key E, and finally get the plaintext.

Introduction to Asymmetric Encryption Algorithm RSA

RSA is currently the most influential public key encryption algorithm, it can resist the vast majority of cryptographic attacks known so far, and has been recommended by ISO as the public key data encryption standard.

Today only short RSA keys can be brute-forced. As of 2008, there is no reliable way to attack the RSA algorithm in the world. As long as the length of the key is long enough, a message encrypted with RSA is practically unbreakable. However, with the maturity of distributed computing and quantum computer theory, the security of RSA encryption has been challenged.

The RSA algorithm is based on a very simple fact of number theory: it is easy to multiply two large prime numbers together, but it is extremely difficult to factor the product, so the product can be made public as an encryption key.

HTTP performance tuning

Reduce the number of HTTP requests

Reducing the number of HTTP requests is a very important aspect of performance optimization, so in all basic optimization principles, there is this principle: reduce the number of HTTP requests, without considering others.

Let's first consider why reducing HTTP requests can optimize performance:

1. Reduce the time spent on DNS requests and don't say right or wrong, because basically, reducing the number of HTTP requests can indeed reduce the time spent on DNS requests and resolution;

2. Reducing server pressure is usually considered the most, and it is also the biggest reason I use to explain it to others, because each HTTP request will consume server resources, especially some servers that need to calculate and merge and other operations, consume the CPU of the server Resources are no joke, hard drives can be bought with money, but CPU resources are not so cheap;

3. Reduce the HTTP request header. When we initiate a request to the server, we will carry the cookie under this domain name and some other information in the HTTP header, and then the server will also bring back some cookies when responding to the request. The header information of the class, which is sometimes very large, and will affect the bandwidth performance during such requests and responses;

DNS requests and resolution

To put it simply, for example: a URL like www.taobao.com, where the www part is called the hostname, the taobao part is the second-level domain, and the com is the first-level domain, if it is such a URL: www .ali.tao.com Then ali is the third-level domain.

When we request a URL, we will first go to the local server to find out whether there is a resolution result in the cache. If there is no resolution result, we will go to the root domain name server to request, and the root domain name server will return to the local domain name server a master of the queried domain. The IP address of the domain name server, and then we go to request the domain name server of the IP address just returned, and then return the IP address of the next-level domain name until we find the server IP referred to in the domain name, and then cache the result for the next use. and return this result.

The DNS resolution process of a URL requested for the first time may be very expensive, but after parsing once, the result will be cached, and subsequent requests do not need to go through the above complex resolution process.

Reduce server stress

Too many HTTP requests are very dangerous for the server. If your server is not very strong, please take this one into consideration. Other optimization strategies are just optimization, and here is the server involved. Make sure your server is up and running.

But this is Taobao, we have enough speed to provide enough user experience. If your server can't provide this speed and can't bear such frequent asynchronous requests, this optimization should be done carefully. The delay may cause the navigation to be unavailable, which is also coordinated for the scene.

Taobao is now widely deploying CDNs, and CDNs can provide us with sufficient back-end resource guarantees. With the continuous improvement of CDN and back-end environments, the focus should be on the improvement of front-end transmission speed and display parsing speed.

Reduce HTTP request headers

The HTTP header is a huge guy. You open the home page of taobao.com, alert document.cookie, and you will find that Taobao's cookie is relatively large. Every time you request Taobao's server, it will go back and forth for this data, and some other headers. Information, the space occupied is not small, it is conceivable how much this consumption is.

Then in fact, since CDN is used, there is no need to think about it too much, because CDN and Taobao main site are not under the same domain name, cookies will not pollute each other, and there is basically no cookie and header information under the domain name of CDN, so every time When requesting static resources, it will not run around with the cookie of the main site, but only transmit the subject content of the resource, so the impact on performance will become very small after using cdn. But if your static resource server is under the same domain as the main server, then you need to control the size of cookies and other headers, because they will be sent with every transfer.

Summarize

This time, we have a preliminary understanding of the network protocols HTTP and HTTPS, and understand the advantages and disadvantages of HTTP. It is because of some shortcomings of HTTP that HTTPS appears. We understand its working principle through the legend, but it is still relatively complicated. Yes, further understanding is needed, and then we talked about HTTP performance tuning, about reducing the number of requests, reducing server pressure, etc.;

In short, different focuses should be considered for different scenarios, and appropriate optimization should be carried out for different website sizes and types, and standards and best practices should not be blindly pursued.


Original is not easy, coding is not easy. If you think this article is useful to you, please click for this article , 留言or 转发click, because this will be my motivation to output more high-quality articles, thank you!

⬇⬇⬇⬇⬇⬇⬇⬇

Guess you like

Origin blog.csdn.net/jake_tian/article/details/120821113