Reprinted https

Although the encryption mechanism of HTTPS (SSL/TLS) is the basic knowledge that everyone should understand, many related articles on the Internet always ignore some content and do not clarify the complete logical context. I also wasted a lot of effort when I was learning it.

Symmetric and asymmetric encryption, digital signatures, digital certificates, etc. During the learning process, in addition to understanding "what is it", have you ever thought about "why is it"? I think that understanding the latter is the only way to truly understand the encryption mechanism of HTTPS.

This article unfolds step by step in the form of questions, unraveling the veil of HTTPS step by step, hoping to help you understand HTTPS thoroughly.

Why is encryption needed?

Because the content of http is transmitted in plain text, the plain text data will pass through multiple physical nodes such as intermediate proxy servers, routers, wifi hotspots, and communication service operators. If the information is hijacked during transmission, the transmitted content will be completely exposed. The hijacker can also tamper with the transmitted information without being noticed by both parties. This is a man-in-the-middle attack. That's why we need to encrypt the information. The easiest to understand is symmetric encryption.

What is symmetric encryption?

Simply put, there is a key, which can encrypt a piece of information and decrypt the encrypted information, which is similar to the key we use in our daily life.
insert image description here

Is it possible to use symmetric encryption?

If both communication parties hold the same key and no one else knows, the communication security of the two parties can of course be guaranteed (unless the key is cracked) .

However, the biggest problem is how to let the two parties of the transmission know the key, and at the same time keep it from being known by others . If a key is generated by the server and transmitted to the browser, what if the key is hijacked by others during the transmission? He would then be able to use the key to decrypt whatever was transmitted between the two parties, so of course not.

Another way of thinking? Just imagine, if the key of website A is pre-stored in the browser, and it can be ensured that no one other than the browser and website A will know the key, then it is theoretically possible to use symmetric encryption, so that the browser only needs to Just pre-store the keys of all HTTPS websites in the world! It is obviously unrealistic to do so.
what to do? So we need asymmetric encryption .

What is asymmetric encryption?

Simply put, there are two keys, usually one is called the public key and the other is called the private key. The content encrypted with the public key must be decrypted with the private key. Similarly, the content encrypted with the private key can only be decrypted with the public key.
insert image description here

Is it possible to use asymmetric encryption?

In view of the mechanism of asymmetric encryption, we may have this idea: the server first transmits the public key to the browser in plain text, and then the browser encrypts the data with this public key before transmitting it to the server. Safety seems to be guaranteed! Because only the server has the corresponding private key to decrypt the data encrypted by the public key.

However, in turn, how to ensure security on the path from the server to the browser ? If the server encrypts data with its private key and sends it to the browser, the browser can decrypt it with the public key, which was originally transmitted to the browser in plain text. If the public key is hijacked by a middleman, he will also The public key can be used to decrypt the information sent by the server. So at present, it seems that only the security of the data transmitted from the browser to the server can be guaranteed (in fact, there are still loopholes, which will be discussed below), so can you think of any solutions to take advantage of this?

Improved asymmetric encryption scheme, it seems possible?

We have already understood that a set of public and private keys can guarantee the security of transmission in one direction, so can two sets of public and private keys be used to ensure the security of two-way transmission? Please see the process below:

  1. A website server has a public key A and a corresponding private key A'; a browser has a public key B and a corresponding private key B'.
  2. The browser transmits the public key B to the server in plain text.
  3. The server sends the public key A to the transmitting browser in plain text.
  4. Afterwards, the content transmitted by the browser to the server is encrypted with the public key A, and the server decrypts it with the private key A' after receiving it. Since only the server owns the private key A', the security of this piece of data can be guaranteed.
  5. Similarly, the content transmitted from the server to the browser is encrypted with the public key B, and the browser decrypts it with the private key B' after receiving it. The same as above can also ensure the security of this data.

Indeed! Leaving aside the loopholes that still exist (will be discussed later), HTTPS encryption does not use this scheme, why? A very important reason is that asymmetric encryption algorithms are very time-consuming, while symmetric encryption is much faster. Then can we use the characteristics of asymmetric encryption to solve the loopholes of symmetric encryption mentioned above?

Asymmetric encryption + symmetric encryption?

Since asymmetric encryption is time-consuming, is it possible to combine asymmetric encryption + symmetric encryption? And it is necessary to minimize the number of asymmetric encryption. Of course it is possible, and asymmetric encryption and decryption only need to be used once.
Please take a look at this process:

  1. A website has public key A and private key A' for asymmetric encryption.
  2. The browser makes a request to the website server, and the server sends the public key A in plain text to the transmitting browser.
  3. The browser randomly generates a key X for symmetric encryption, encrypts it with the public key A and sends it to the server.
  4. After getting it, the server decrypts it with the private key A' to get the key X.
  5. In this way, both parties have the key X, and no one else can know it. After that, all the data of both parties can be encrypted and decrypted by key X.

Perfect! HTTPS basically adopts this scheme. Perfect? There are still loopholes.

man-in-the-middle attack

insert image description here
If the middleman hijacks the data during data transmission, he cannot get the key X generated by the browser at this time. The key itself is encrypted by the public key A, and only the server has the private key A' to unlock it. However, The middleman can do bad things without getting the private key A' at all. Please see:

  1. A website has public key A and private key A' for asymmetric encryption.
  2. The browser makes a request to the website server, and the server sends the public key A in plain text to the transmitting browser.
  3. The middleman hijacks the public key A, saves it, and replaces the public key A in the data packet with his forged public key B (of course it also has the private key B' corresponding to the public key B) .
  4. The browser generates a key X for symmetric encryption, encrypts it with public key B (the browser cannot know that the public key has been replaced) and sends it to the server.
  5. After being hijacked by the middleman, the private key B' is used to decrypt the key X to obtain the key X, which is then encrypted with the public key A and then sent to the server.
  6. After getting it, the server decrypts it with the private key A' to get the key X.

In this way, under the condition that neither party will find any abnormalities, the middleman uses a set of "civet cat for prince" operations to swap the public key sent by the server, and then obtain the key X. The root cause is that the browser cannot confirm whether the received public key belongs to the website itself , because the public key itself is transmitted in plain text, so does it have to encrypt the transmission of the public key? This seems to be a chicken-and-egg problem. What is the solution?

How to prove that the public key received by the browser must be the public key of the website?

In fact, the source of all proofs is one or more self-evident "axioms" (you can recall the axioms in mathematics), from which everything is derived. For example, in real life, if you want to prove that a certain ID number must belong to Xiao Ming, you can look at his ID card, and the ID card is testified by the government. The "axiom" here is that "government agencies are credible", which is also the normal operation of society. premise.

Can similarly have an institution act as the "axiom" of the Internet world? Let it be the source of all proofs, and issue an "ID card" to the website?

It is the CA organization , which is the prerequisite for the normal operation of the Internet world today, and the "ID card" issued by the CA organization is a digital certificate .

digital certificate

insert image description here
Before using HTTPS, the website needs to apply for a digital certificate from the CA organization . The digital certificate contains certificate holder information, public key information, etc. The server transmits the certificate to the browser, and the browser can obtain the public key from the certificate. The certificate is like an ID card, proving that "the public key corresponds to the website". And here is another obvious question, " How to prevent tampering during the transmission of the certificate itself "? i.e. how to prove the authenticity of the certificate itself? ID cards use some anti-counterfeiting technology, but how can digital certificates be anti-counterfeiting? Fix that and we're close to victory!

How to prevent digital certificates from being tampered with?

We generate a "signature" from the original content of the certificate, and compare whether the content of the certificate is consistent with the signature to determine whether it has been tampered with. This is the "anti-counterfeiting technology" of digital certificates, and the "signature" here is called digital signature :

digital signature

For this part of the content, it is recommended to look at the figure below and understand it in conjunction with the text below. The left side of the figure is the digital signature creation process, and the right side is the verification process:
insert image description here

The process of making a digital signature:

  1. The CA organization has asymmetrically encrypted private and public keys.
  2. The CA organization hashes the certificate plaintext data T.
  3. Encrypt the hashed value with a private key to obtain a digital signature S.

The plaintext and the digital signature together form a digital certificate, so that a digital certificate can be issued to a website.
After the browser gets the digital certificate from the server, how to verify whether it is real? (Whether it has been tampered with or dropped)

Browser verification process:

  1. Get the certificate, get the plaintext T, and sign S.
  2. Decrypt S with the public key of the CA organization (since it is an organization trusted by the browser, the browser keeps its public key. See below for details), and get S'.
  3. Use the hash algorithm specified in the certificate to hash the plaintext T to get T'.
  4. Obviously through the above steps, T' should be equal to S', unless the plaintext or signature is tampered with. Therefore, compare whether S' is equal to T' at this time, and if it is equal, it indicates that the certificate is credible.

Why does this ensure that the certificate is credible? Let's think about it.

Is it possible for a man in the middle to tamper with that certificate?

Assuming that the middleman tampers with the original text of the certificate, since he does not have the private key of the CA organization, he cannot obtain the encrypted signature at this time, and cannot tamper with the signature accordingly. After the browser receives the certificate, it will find that the original text and the decrypted value of the signature are inconsistent, indicating that the certificate has been tampered with and the certificate is not trustworthy, so it stops transmitting information to the server and prevents the information from being leaked to the middleman.

Since it is impossible to tamper with, what if the entire certificate is swapped?

Is it possible for a middleman to drop the certificate?

Suppose there is another website B that has also obtained the certificate certified by the CA organization, and it wants to hijack the information of website A. So it becomes a middleman who intercepts the certificate sent by A to the browser, and then replaces it with its own certificate and passes it to the browser. After that, the browser will mistakenly get the public key in B's certificate, which will indeed lead to the above The "man-in-the-middle attack" vulnerability mentioned there?

In fact, this will not happen, because the certificate contains the information of website A, including the domain name. The browser will compare the domain name in the certificate with the domain name requested by itself to know whether it has been swapped.

Why do you need to hash once when making a digital signature?

I had this question when I first met HTTPS, because it seems that the hash there is a bit redundant, and removing the hash process can also ensure that the certificate has not been tampered with.

The most obvious is the performance problem. We have already said that the efficiency of asymmetric encryption is poor, and the certificate information is generally long and time-consuming. After hashing, the information obtained is a fixed length (for example, a fixed 128-bit value can be obtained after hashing with the md5 algorithm), so that encryption and decryption are much faster.

Of course, there are also security reasons. This part of the content is relatively deep. If you are interested, you can read this answer: crypto.stackexchange.com/a/12780

How to prove that the public key of the CA organization is trusted?

You may find that the public key of the CA organization mentioned above, I almost mentioned it briefly, "the browser keeps its public key", what kind of law is this? How to prove whether this public key is trustworthy?

Let's recall what a digital certificate is for? That's right, in order to prove that a public key is trustworthy, that is, "whether the public key corresponds to the website", can the public key of the CA organization also be proved by a digital certificate? That's right, the operating system and the browser itself will pre-install some root certificates they trust. If there is a root certificate from a CA organization, you can get its corresponding trusted public key.

In fact, there can be more than one level of authentication between certificates, A can trust B, B can trust C, and so on, we call it a chain of trust or a digital certificate chain . That is, a series of digital certificates, starting from the root certificate, through layers of trust, so that the holder of the end-entity certificate can obtain the trust of the transfer to prove the identity.

In addition, I wonder if you have encountered a situation where the website cannot be accessed and a certificate needs to be installed? The root certificate is installed here. It means that the browser does not recognize the organization that issued the certificate to this website, then you have to manually download and install the root certificate of the organization (at your own risk XD). After installation, you have its public key, which you can use to verify that the certificate sent by the server is trustworthy.
insert image description here

Do I have to handshake transfer keys at the SSL/TLS layer every time I make an HTTPS request?

This was also one of my confusions at the time. Obviously, it is very time-consuming to go through the key transmission process for each request, so how to achieve only one transmission?

The server will maintain a session ID for each browser (or client software), and pass it to the browser during the TLS handshake phase. After the browser generates a key and sends it to the server, the server will save the key to the corresponding session ID After that, the browser will carry the session ID every time it requests, and the server will find the corresponding key according to the session ID and perform decryption and encryption operations, so that it is unnecessary to recreate and transmit the key every time!

Summarize

You can look at this picture and sort out the whole process (there are some differences between SSL and TLS handshakes, and there are differences between different versions, but the general process is like this): So far,
insert image description here
we have opened up the overall context and core of HTTPS encryption from top to bottom Knowledge point, I wonder if you really understand HTTPS?
Find a few times, read more, think more, understand more times, and it will become clearer and clearer!
So, can you answer the following questions?

  1. Why use symmetric encryption + asymmetric encryption?
  2. Why can't we just use asymmetric encryption?
  3. Why do you need a digital certificate?
  4. Why is a digital signature required?

Of course, due to limited space and capacity, some more in-depth content has not been covered. But I think that for front-end and back-end developers, it is enough to understand this step, and those who are interested can study it in depth~ If there are any omissions, please point out.

Original address: https://zhuanlan.zhihu.com/p/43789231

Guess you like

Origin blog.csdn.net/qq_36968599/article/details/119926201