HTTPS (1) - Basic knowledge (key, symmetric encryption, asymmetric encryption, digital signature, digital certificate)

1 Overview of HTTPS

HTTPS (full name: Hyper Text Transfer Protocol over Secure Socket Layer or Hypertext Transfer Protocol Secure, Hypertext Transfer Protocol Secure, Hypertext Transfer Protocol Secure) is an HTTP channel with security as the goal. Simply put, it is a secure version of HTTP. That is, the SSL layer is added to HTTP. The security basis of HTTPS is SSL, so SSL is required for the details of encryption. It is a URI scheme (abstract identifier system), the syntax is similar to the http: system. Used for secure HTTP data transmission. https: URL indicates that it uses HTTP, but HTTPS has a default port different from HTTP and an encryption/authentication layer (between HTTP and TCP).

There are two communication methods for WEB service: http and https. HTTP transmission is not encrypted, and 80 is used as the communication port by default; https encrypts the transmitted data, and port 443 is used by default. At present, mainstream websites basically start to use HTTPS as the communication method by default.

SSL/TLS is a cryptographic protocol, and its goal is not just the encrypted transmission of web content. There are four main goals of SSL/TLS: encryption security, interoperability, scalability, and efficiency. For security assurance, it will also be carried out from multiple aspects, including confidentiality, authenticity and integrity. Confidentiality means that the content of the transmission is not obtained by a third party other than the two parties of the communication; authenticity means that the opposite end of the communication is the expected one, rather than being impersonated by other third parties; integrity means that the transmission The data is complete, and the data has not been tampered with or lost. In order to balance multiple needs, SSL/TLS is designed as a security framework in which a variety of different security schemes can be applied, and each scheme consists of multiple complex cryptographic processes. Different security schemes have different trade-offs between security and efficiency, and are composed of different cryptographic processes.

2 Symmetric encryption

Insert picture description here

The encryption and decryption of the symmetric encryption algorithm use the same key .

If both parties in communication each hold the same key and no one else knows it, the security of the communication between the two parties can be guaranteed (unless the key is cracked). However, the biggest problem is how to make this key known to both parties of the transmission while not being known by others. If the server generates a key and transmits it to the browser, the key is hijacked by others during the transmission process, and then he can use the key to unlock any content transmitted by both parties.

If the key of website A is pre-stored in the browser, and it can be ensured that no outsider knows the key except for the browser and website A, then it is theoretically possible to use symmetric encryption. In this way, the browser only needs to pre-store the keys of all HTTPS websites in the world. Obviously, this is unrealistic.

How to do? To solve this problem, we need asymmetric encryption.

3 Asymmetric encryption

Insert picture description here

Based on the problems of symmetric encryption, there is asymmetric encryption. An asymmetric encryption algorithm requires a set of key pairs , namely a public key and a private key , and these two keys appear in pairs. The content encrypted by the public key needs to be decrypted by the corresponding private key, and the content encrypted by the private key needs to be decrypted by the corresponding public key . The private key is saved by the server itself, and the public key is sent to the client. After the client gets the public key, it can encrypt the request and send it to the server. At this time, even if it is intercepted in the middle, it cannot decrypt the sent content without the private key. This ensures the security of the data sent by the client to the server.

4 Improved asymmetric encryption scheme

Insert picture description here

Through a set of public key and private key, the security of transmission in a single direction can already be guaranteed. Then, with two sets of public key and private key, can two-way transmission be guaranteed to be secure? Please see the following process:

  1. A website has a public key A1 and a private key A2 for asymmetric encryption; the browser has a public key B1 and a private key B2 for asymmetric encryption.
  2. The browser makes a request to the website server, and the server transmits the public key A1 to the browser in plain text.
  3. The browser transmits the public key B1 to the server in plain text.
  4. After that, everything that the browser transmits to the server is encrypted with the public key A1, and the server receives it and decrypts it with the private key A2. Since only the server has the private key A2 for decryption, the security of this piece of data can be guaranteed.
  5. Everything that the server transmits to the browser is encrypted with the public key B1, and the browser decrypts it with the private key B2 after receiving it. The same can also guarantee the security of this data.

It can be seen that it is indeed feasible. Aside from the remaining loopholes (man-in-the-middle attacks, discussed below), HTTPS encryption does not use this scheme. Why? The main reason is that asymmetric encryption algorithms are very time-consuming, especially when encrypting and decrypting some larger data. Symmetric encryption is much faster. Can we use the characteristics of asymmetric encryption to solve the aforementioned symmetric encryption problem?

5 Asymmetric encryption + symmetric encryption

Insert picture description here

Since asymmetric encryption is time-consuming, we consider whether a combination of asymmetric encryption + symmetric encryption can be used, and the number of times of asymmetric encryption should be minimized.

Asymmetric encryption and decryption only need one time method:

  1. A website has a public key A1 and a private key A2 for asymmetric encryption.
  2. The browser requests the website server, and the server sends the public key A1 to the transmission browser in plaintext.
  3. The browser randomly generates a key X for symmetric encryption, encrypts it with the public key A1, and sends it to the server.
  4. After the server gets it, decrypt it with the private key A2 to get the key X.
  5. In this way, both parties have the key X, and no one else can know it. After that, all data of both parties can be encrypted and decrypted with the key X.

HTTPS basically adopts this scheme. But there are still loopholes.

6 Man-in-the-middle attack

Insert picture description here

It is true that the middleman cannot obtain the symmetric key X generated by the browser. The key itself is encrypted by the public key A1, and only the server can decrypt it with the private key A2. However, the intermediary can hijack the information without obtaining the private key A2 at all. Please see:

  1. A website has a public key A1 and a private key A2 for asymmetric encryption.
  2. The browser makes a request to the website server, and the server transmits the public key A1 to the browser in plain text.
  3. The middleman hijacks the public key A1, saves it, and replaces the public key A1 in the data packet with the forged public key B1 (of course it also has the private key B2 corresponding to the public key B1).
  4. The browser randomly generates a key X for symmetric encryption, encrypts it with the public key B1 (the browser does not know that the public key is replaced), and then sends it to the server.
  5. After hijacking, the middleman decrypts it with the private key B2 to obtain the key X, then encrypts it with the public key A1 and sends it to the server.
  6. After the server gets it, decrypt it with the private key A2 to get the key X.

In this way, when both parties will not find an abnormality, the middleman obtains the symmetric key X, and then it is definitely not safe to use the symmetric key X to transmit data. The root cause is that the browser cannot confirm whether the public key it received is the website's own. Then the next step is to solve this problem: how to prove that the public key received by the browser must be the public key of the website?

This is like a scammer calling or sending text messages to the parents of the classmates when the classmates are taking the exams for CET-4 and CET, claiming that they are the school counselors, and saying that the classmates are seriously ill and need money urgently, and they ask the parents to send the money. A situation where parents suffered huge losses by sending money to scammers. This is the problem caused by insufficient verification of the authenticity of data/information.

For another example, a fake taobao website, the domain name is very similar to the real website. We accidentally entered the wrong domain name, or the domain name was hijacked and visited this counterfeit website. Then we chose the treasure and paid as we usually do in Taobao, but in the end we couldn't receive the goods.

7. Digital Certificate

In real life, what if I want to prove that a certain ID number must be Xiao Ming? Look at the ID card. The government agency here plays the role of "public trust", the ID card is issued by it, and its own authority can prove a person's identity information. There is also such a public credit organization, CA organization, on the Internet.

Before the website uses HTTPS, it needs to apply to the " CA organization " for a digital certificate . The digital certificate contains information such as the certificate holder and the public key of the certificate holder , similar to the following (actually a bunch of data, here for intuitive ). The server transmits the certificate to the browser, and the browser obtains the public key from the certificate. However, there is another obvious question: how to prevent the certificate from being tampered with during the transmission process? That is, how to prove the authenticity of the certificate itself? How to prevent counterfeiting of digital certificates?
Insert picture description here

The CA organization is the authority that issues digital certificates and is responsible for issuing certificates and verifying the legitimacy of certificates . If the server needs to be an identity server, you need to submit an application to the CA organization. Of course, it is easy to do things if you have money, and you can apply for a certificate if you pay...

The server submits an application to the CA organization, and needs to submit site information such as domain name, company name, public key, etc. After the CA has approved it, the server can be issued a certificate!

After the client gets the server's certificate, it needs to verify whether the certificate number can be found in the corresponding CA organization, and check the basic information of the certificate, such as whether the domain name on the certificate is consistent with the domain name currently accessed, etc., and you can also get it The public key information of the server in the certificate is used to negotiate a symmetric key!

The certificate is issued, but how to prevent forgery? How to ensure that it is not tampered with during transmission? That is, how to prove the authenticity of the certificate itself? How to prevent counterfeiting of digital certificates?

8. Digital Signature

We generate a "signature" from the content of the certificate, and compare the content of the certificate with the signature to detect whether it has been tampered with. This technique is called digital signature.

The production process of the digital signature:

  1. The CA has asymmetrically encrypted private and public keys.
  2. The CA hashes the plain text information of the certificate.
  3. Encrypt the value after the hash with the private key to obtain a digital signature.

The plaintext and digital signature together form a digital certificate, so that a digital certificate can be issued to a website. After the browser gets the digital certificate from the server, how can it verify that it is genuine? (Has it been tampered with or dropped)

The left side of the figure below is the production process of the digital signature, and the right side is the verification process. The
Insert picture description here
browser verification process:

  1. Get the certificate, get the plain text T1, and the digital signature S1.
  2. Decrypt S1 with the public key of the CA organization (because it is an organization trusted by the browser, the browser retains its public key. See below for details), and S2 is obtained.
  3. Use the hash algorithm described in the certificate to hash the plaintext T1 to get T2.
  4. Compare whether S2 is equal to T2, which means that the certificate is credible.

The figure below visually shows how to add a digital signature to the certificate! That is, the certificate issued by the CA to the server is stamped.
Insert picture description here
After the browser gets the digital certificate from the server, how can it verify that it is genuine? (Whether it has been tampered with or dropped) The following figure visually shows how to verify the authenticity of the certificate
Insert picture description here

Why can I use a signature to prove that the certificate is trustworthy?
Assuming that the middleman has tampered with the original text of the certificate, since he does not have the private key of the CA organization, he cannot obtain the encrypted signature at this time, and cannot tamper with the signature accordingly. After the browser receives the certificate, it will find that the original text and the decrypted value of the signature are inconsistent, indicating that the certificate has been tampered with and the certificate is untrustworthy, thus terminating the transmission of information to the server and preventing the information from leaking to the middleman.

Since it is impossible to tamper with, what if the entire certificate is dropped?

Suppose another website B has also obtained a certificate certified by the CA. It wants to destroy website A and hijack website A's information. So it becomes an intermediary who intercepts the certificate passed by A to the browser, replaces it with its own certificate, and sends it to the browser. After that, the browser will incorrectly get the public key in the certificate of B, which will cause the above mentioned Vulnerabilities.

In fact, this does not happen, because the certificate contains the information of website A, including the domain name, and the browser compares the domain name in the certificate with the domain name requested to know if it has been dropped.

Why do I need to hash once when making a digital signature?
The most obvious problem is performance. We have already said that asymmetric encryption is inefficient, and certificate information is generally longer and time-consuming. After hashing, the information obtained is a fixed length (for example, a fixed 128-bit value can be obtained after the md5 algorithm is hashed), so the encryption and decryption will be much faster. Of course, there are also security reasons besides this.

Is it necessary for HTTPS to perform a handshake transmission key at the SSL/TLS layer in every request?
Obviously, each request undergoes a key transmission process which is very time-consuming, so how can it be transmitted only once? You can use session.

The server will maintain a session ID for each browser (or client software), and pass it to the browser during the TSL handshake phase. After the browser generates the key and sends it to the server, the server will save the key to the corresponding session ID After that, the browser will carry the session ID in every request, and the server will find the corresponding key according to the session ID and perform decryption and encryption operations, so that it is not necessary to recreate and transmit the key every time

Finally, add the knowledge of the CA organization and how to view the CA certificate on your browser

Which CA institutions are authoritative or recognized by the client? Take chrome as an example to view the information of the CA organization built in the client, including the CA’s public key, signature algorithm, validity period, etc... In chrome, more tools -> developer options -> Security -> View certificate, just You can view the relevant information of the certificate.

Insert picture description here

9 How HTTPS works

This article is about HTTPS, but HTTPS has not been explained so far! In fact, HTTPS=HTTP+SSL, an SSL/TLS layer is added between the HTTP layer and TCP, as shown in the figure below:

Insert picture description here
SSL (Secure Sockets Layer) is called "Secure Sockets Layer" in Chinese. Later, due to its widespread use, it was renamed TLS (Transport Layer Security) after the standardization of SSL. In fact, HTTPS uses the methods mentioned above to solve the problems that may exist on the network. The problems of data leakage, tampering, and counterfeiting ensure the security of network transmission!

The process of using HTTPS to access the server is as follows:
Insert picture description here

  1. The client sends a request https://baidu.com to the server, and then connects to port 443 of the server.

  2. The server must have a set of digital certificates, which can be made by oneself or applied to the organization. The difference is that the certificate issued by yourself needs to be verified by the client before you can continue to access it, while the certificate applied by a trusted company will not pop up a prompt page. This set of certificates is actually a pair of public and private keys.

  3. Sending a certificate
    This certificate is actually a public key, but it contains a lot of information, such as the issuing authority of the certificate, expiration time, the public key of the server, the signature of a third-party certificate authority (CA), and the domain name information of the server.

  4. Client parsing the certificate
    This part of the work is done by the client's TLS. First, it will verify whether the public key is valid, such as the issuing authority, expiration time, etc. If an abnormality is found, a warning box will pop up indicating that there is a problem with the certificate. If there is no problem with the certificate, then a random value (key) is generated. Then use the certificate to encrypt the random value.


  5. This part of transmitting encrypted information transmits the key (random value) encrypted with the certificate, the purpose is to let the server get this key (random value), and the communication between the client and the server can be encrypted by this random value in the future Decrypted.

  6. Server-side encryption information The
    server-side decrypts with the private key and obtains the key (random value) passed by the client, and then encrypts the content symmetrically with this value.

  7. Transmission of encrypted information
    This part of information is symmetrically encrypted by the server with a key (random value), which can be restored on the client.

  8. The client decrypts the information The
    client uses the previously generated key (random value) to decrypt the information sent by the server, and then obtains the decrypted content.

references

The great gods write very well, thank them for giving me reference

HTTPS encryption mechanism After
reading this article, my grandma understands the principle of https
asymmetric encryption and CA certificate

Guess you like

Origin blog.csdn.net/happyjacob/article/details/108557143