20 pictures to show you how to understand HTTPS

In recent years, more and more websites use the HTTPS protocol for data transmission, because HTTPS can provide more secure services than HTTP.

Many browsers will add a "warning" sign to the website using the HTTP protocol to indicate that the data transmission is not safe, and will add a "lock" sign to the website using the HTTPS protocol to indicate that the data transmission is safe.

Why is the HTTP protocol insecure? Mainly manifested in the following three aspects:

  • Vulnerable to eavesdropping : HTTP transmits data in clear text . It is easy for hackers to intercept messages through sniffing technology. Since the data is not encrypted, the content can be understood by hackers.

    For example: if the user enters the password to withdraw money, the hacker can do whatever he wants after eavesdropping on the password!

  • Easy to be tampered with : Hackers can modify the message after intercepting the HTTP message, and then send it to the destination.

    For example: if the user wants to transfer money to his family, and the hacker changes the payee to himself, it will cause losses to the user!

  • Easy to forge identity : Hackers can forge HTTP messages, pretending to be the website that the user really wants to visit, and then communicate with the user.

    For example: if a user wants to visit the Taobao website for shopping, and the hacker pretends to be a Taobao website, the user may buy things on this fake Taobao website and cause losses!

How does HTTPS solve the above security problems? The main method looks like this:

  • Data encryption : What HTTPS transmits is no longer plaintext , but ciphertext using an encryption algorithm . Even if a hacker intercepts the message, he will not be able to understand the content!

  • Integrity summary : HTTPS obtains a summary of the message through a summary algorithm. If a hacker tampers with the content of the message, the regenerated summary will change. After verification, the receiver knows that the data is no longer complete and has been tampered with!

  • Digital certificate : HTTPS uses digital certificates to verify the identity of communication entities, and because hackers do not have corresponding certificates, once they pretend to be other websites, they will be seen through!

2. Encryption algorithm

The encryption algorithm is used to solve the problem that HTTP transmission data is easy to be eavesdropped.

In order to prevent the transmitted data from being eavesdropped by hackers, the data needs to be encrypted and decrypted between the client and the server .

The sender uses the encryption algorithm to 明文encrypt to 密文, and the receiver uses the corresponding decryption algorithm to 密文decrypt to 明文. Hackers can only see 密文, and therefore cannot obtain any useful information. As shown below:

Generally speaking, encryption algorithms are divided into two categories, symmetric encryption and asymmetric encryption .

  • Symmetric encryption : means that encryption and decoding use the same key, that is, key A in the figure is equal to key B;

  • Asymmetric encryption : Encryption and decryption use different keys, that is, key A in the figure is not equal to key B.

(1) Symmetric encryption

In the symmetric encryption algorithm, the key for encryption and decryption is the same, called the key .

Caesar cipher is a relatively simple symmetric encryption algorithm that can be used to encrypt and decrypt English text. The main idea is: move each letter in the plaintext to the right by K bits according to the position of the alphabet to obtain the ciphertext (wrap around is allowed).

For example, if K = 2, then the letter "a" in the plaintext is replaced by the letter "c", and the letter "z" is replaced by the letter "b". At this time, K = 2 is the key in the symmetric encryption algorithm.

The disadvantage of this method is that after each letter is encrypted, there is only a unique ciphertext representation. If a hacker collects a lot of data and conducts statistical analysis, it is likely to crack the encryption method.

A better way is to use multiple Caesar ciphers K round-robin encryption, for example, letters with odd positions are encrypted with key K = 2, and letters with even positions are encrypted with key K = 3.

However, the Caesar cipher can only encrypt English text. If you want to encrypt all characters, you can use block encryption .

We know that any data is actually stored in a computer as a combination of 0/1 bits. The main idea of ​​packet encryption is: process the message to be encrypted into K-bit packets, and each packet is encrypted through a one-to-one mapping table.

For example, if K = 3, the mapping table is as shown in the figure below, then the message 010110001111 will be encrypted as 101000111001. At this time, K=3 and the mapping table are the keys in the symmetric encryption algorithm.

Similar to the previous method of using multiple Caesar ciphers K as keys, in order to increase the difficulty of cracking, a better way is to use multiple mapping tables and poll to encrypt data.

Symmetric encryption algorithms commonly used in computer networks are: DES, 3DES, AES, etc., all of which belong to block encryption algorithms.

(2) Asymmetric encryption

The encryption and decryption keys in the asymmetric encryption algorithm are different, called public key and private key respectively . Its characteristics are:

  • If it is encrypted with the public key, it can only be decrypted with the private key, and the public key cannot be decrypted at this time.

  • If it is encrypted with the private key, it can only be decrypted with the public key, and the private key cannot be decrypted at this time.

  • The public key is open to the public and anyone can get it; the private key is only known to you and cannot be disclosed.

Why does asymmetric encryption appear after symmetric encryption?

The reason is that the premise of symmetric encryption is that the communication parties need to negotiate a key , and when negotiating the key, the transmission is in plain text. If the key is intercepted by a hacker, even if the subsequent message is encrypted, the hacker can pass this key. key to decrypt!

A characteristic of asymmetric encryption is: public key encryption, only the private key can decrypt . Then there is no need to negotiate a key in advance like symmetric encryption. The two parties in the communication can directly send their public key to the other party. Even if the public key is known to the hacker, it does not matter. When one party encrypts the message with this public key, even if the hacker intercepts the message, it cannot be decrypted with the public key. Only the private key Only the other party can decrypt successfully!

Commonly used asymmetric encryption algorithms in computer networks are: RSA, ECDHE, etc.

Compared with symmetric encryption, asymmetric encryption algorithms are more complex and difficult to understand, with more mathematical reasoning. If you are interested in specific algorithms, you can read two articles by Ruan Yifeng: Principles of RSA Algorithm (1) and Principles of RSA Algorithm ( 2 ) ) .

(3) Mixed encryption

As mentioned earlier, the symmetric encryption algorithm needs to negotiate the key in advance, and the negotiation process uses plaintext (because there is no key yet), if the hacker intercepts the plaintext key, then even if it is encrypted, the hacker can use the key Decrypt, then there is no security at all!

The asymmetric encryption algorithm solves this problem, but it has a large number of exponential operations, and the encryption speed is very slow! The encryption speed of the symmetric encryption algorithm is very fast, generally 100-10000 times that of the asymmetric encryption algorithm!

Can the two be combined to make data transmission not only safe but also efficient? The answer is yes! HTTPS uses hybrid encryption, using both symmetric and asymmetric encryption.

The weakness of the symmetric encryption algorithm is that it is not safe to use plain text in the process of negotiating the key, and there is a possibility of key leakage. Then, can we not use the plain text, but use an asymmetric encryption algorithm to negotiate the key, and then transmit the data again Encrypted using a symmetric encryption algorithm.

That is, the key is transmitted using an asymmetric encryption algorithm, and the actual data is transmitted using a symmetric encryption algorithm. This key is commonly known as 『会话密钥』.

  • The session key is transmitted through an asymmetric encryption algorithm, very 安全;

  • A large amount of data is transmitted (multiple times) through a symmetric encryption algorithm, and the session key only needs to be transmitted once, very 高效!    

3. Digest algorithm

The digest algorithm is used to solve the problem that HTTP transmission data is easily tampered.

A digest algorithm, also known as a hash algorithm, takes arbitrary data as input and outputs a fixed-length string (called a digest) . The main features are as follows:

  • Irreversible, that is, the input cannot be reversed through the output.

  • The same input must produce the same output.

  • Different inputs will most likely produce different outputs.

  • No matter how long the input data is, the length of the output summary is fixed.

For example: if the bit stream of the data is grouped every 8 bits (insufficient zero padding), and then all the groups are bitwise, then the generated 异或运算result can be called a summary, and this algorithm is a simple summary algorithm .

If the output digests obtained by the digest algorithm of two input data are consistent, it is called a hash collision . A good digest algorithm has a very low probability of hash collisions, and it is very difficult to guess the content of the input from the output!

Digest algorithms commonly used in computer networks are: MD5, SHA-1, SHA-256 , etc.

In order to prevent the transmitted data from being tampered with by hackers, in addition to sending the actual data, the sender also uses a digest algorithm to obtain a summary of the data 摘要, and sends the digest together.

After receiving the data, the receiver uses the same digest algorithm to get the data again , and compares 摘要it with the one sent by the sender . If the two are inconsistent, it means that the data has been tampered with, otherwise not.摘要

Friends can easily see that the above method has obvious flaws. If a hacker not only tampers with the data, but also tampers with the abstract at the same time, won’t the receiver be unable to judge whether the data has been tampered with?

In order to prevent this from happening, the sender and the receiver must have something that only the two know, but hackers cannot know, such as symmetric encryption 会话密钥. However, in order to improve security, the session key is generally not used at this time, but a new key is used, which is called 鉴别密钥, the acquisition of this key is the same as the session key.

With 鉴别密钥it, the input of the digest algorithm is not only the transmission data, but the transmission data and the authentication key! Since the hacker does not know the authentication key, he can no longer forge the input, and the tampered summary will be incorrect, thus ensuring security!

After the data and the authentication key are concatenated, the digest generated by the digest algorithm has a special name, which is called the message authentication code , or MAC for short .

In order to further improve security, in fact, the client and server will use different 会话密钥sums 鉴别密钥, that is, a total of four keys are required:

  1. for data sent from the client to the server 会话密钥;

  2. for data sent from the server to the client 会话密钥;

  3. for data sent from the client to the server 鉴别密钥;

  4. For data sent from the server to the client 鉴别密钥.

4. Digital certificate

Digital certificates are used to solve the problem that identities are easily forged in the HTTP protocol.

As mentioned earlier, HTTPS uses an asymmetric encryption algorithm for transmission 会话密钥. Generally, the server publishes the public key to the outside world, the client uses the public key to encrypt 会话密钥, and then the server decrypts it with the private key to obtain it 会话密钥. At this time, the two parties have negotiated the key for symmetrically encrypting the transmitted data.

But what if the server's public key is forged by a hacker? For example, the classic "man-in-the-middle attack" problem:

  1. Requests sent by the client are hijacked by a man-in-the-middle (hacker) (eg using DNS hijacking) and all requests are sent to the man-in-the-middle.

  2. The middleman pretends to be a regular website (server), returns his public key 2 to the client, and asks for the public key 1 of the regular website.

  3. The client encrypts with the intermediary's public key 2 会话密钥1and sends it to the intermediary.

  4. The man-in-the-middle decrypts it with his own private key 2 会话密钥1, pretends to be the client, encrypts it with the official website's public key 1 会话密钥2(which can be the same as the session key 1) and sends it to the official website.

  5. The data is encrypted by the client 会话密钥1and sent to the intermediary.

  6. The middleman uses it 会话密钥1to decrypt the data and get plaintext data! (Achieving wiretapping)

  7. The middleman uses 会话密钥2encrypted data (possibly tampered) and sends it to the legitimate website.

At this point, the communication between the client and the server is no longer secure! The middleman can not only eavesdrop on the content of the message, but also tamper with it!

How does the client know that the public key it owns comes from a regular website rather than a middleman? Then you need a digital certificate !

The concept of a digital certificate is like our identity card, designed to verify the identity of communicating entities. Our ID card is applied at the police station, while the digital certificate needs to be applied to the Certification Authority (CA), and there is a fee!

The specific process of solving man-in-the-middle attacks through digital certificates is as follows:

  • The server (regular website) first generates a pair of public key and private key, and then integrates the domain name, applicant, public key (note that it is not a private key, the private key cannot be disclosed anyway) and other information to generate a .csr file , and send this file to the certification authority CA.

  • After the CA receives the application, it will verify the applicant's information through various means. If there is no abnormality, it will use the digest algorithm to obtain a summary of the plaintext information in . 私钥The ciphertext is also called a digital signature. The digital certificate contains this digital signature and the plaintext information in .csr. The CA returns this certificate to the applicant.

  • To prevent man-in-the-middle attacks, the client asks the server to send its certificate, which is verified.

  • When the client verifies the certificate, it takes out the signature and the plaintext information in the certificate, and then uses the CA organization it carries 公钥to decrypt the signature to obtain digest 1, and then uses the digest algorithm to obtain digest 2 of the plaintext information, and compares the digests 1 and summary 2, if they are the same, it means that the certificate is legal, that is, the public key in the certificate is correct; otherwise, the certificate is invalid.    

How does the browser get the public key of the certification authority? What if the public key is forged? In order to prevent nesting dolls, the public keys of these certification centers will be built into the actual computer operating system ! Therefore, there is no need to worry about the problem that the public key of the certification authority is forged.

Once the Chrome browser finds that a website's digital certificate is invalid, it will generate the following interface to prompt. If the user forces access, there is a certain risk.

5. SSL/TLS handshake

According to the above, make a summary:

  • HTTPS uses a hybrid encryption algorithm to solve the problem that HTTP transmission data is easy to be eavesdropped, and this process requires negotiation 会话密钥.

  • HTTPS uses a digest algorithm to solve the problem that HTTP transmission data is easily tampered with, and this process requires negotiation 鉴别密钥.

  • HTTPS solves the problem that the identity in the HTTP protocol is easily forged through digital certificates. This process requires the client to verify the server 证书.

So what exactly does HTTPS do? When did the communication parties negotiate 会话密钥and 鉴别密钥verify 证书the legitimacy? The answer is when the SSL/TLS protocol handshakes.

The "S" that HTTPS has more than HTTP refers to the SSL/TLS protocol.

In the HTTPS protocol, when the client and the server establish a TCP connection through a three-way handshake, the data will not be transmitted directly, but will first go through an SSL/TLS handshake process for negotiation and verification of certificates, etc., and then 会话密钥you 鉴别密钥can Transfer data securely!

Let's use Wireshark to capture packets and talk about the SSL/TLS 1.2 four-way handshake process in detail.

first handshake 

The client initiates an encrypted communication request to the server, which mainly includes:

  1. The SSL/TLS protocol version supported by the client, such as TLS 1.2 version.

  2. The random number 1 produced by the client is used for subsequent generation 会话密钥and sum 鉴别密钥.

  3. List of cipher suites supported by the client, each cipher suite contains:

    1. 用于传输会话密钥的非对称加密算法, such as ECDHE, RSA;

    2. 用于验证数字证书的非对称加密算法, such as ECDHE, RSA;

    3. 用于传输数据的对称加密算法, such as AES_128_GCM, AES_128_CBC;

    4. 用于验证报文完整性的摘要算法, such as SHA256, SHA384;

    5. The format is: TLS_asymmetric encryption algorithm_asymmetric encryption algorithm_symmetric encryption algorithm_digest algorithm , if the two asymmetric encryption algorithms are consistent, it can be omitted.

second handshake 

After receiving the encrypted communication request from the client, the server sends a response to the client, which mainly includes:

  1. Confirmed SSL/TLS protocol version. If the versions supported by both parties are different, encrypted communication will be closed.

  2. The random number 2 produced by the server is used for subsequent generation 会话密钥and sum 鉴别密钥.

  3. Confirmed cipher suites such as TLS_RSA_WITH_AES128_CBC_SHA.

  4. The server's digital certificate.

third handshake 

After the client receives the response from the server, it will verify whether its digital certificate is legal (the verification method is explained in the digital certificate section). If the certificate is legal, it will perform a third handshake, which mainly includes:

  1. Another random number 3 generated by the client (called the pre-master key, Pre-Master Secret, abbreviated as PMS), this random number will be 公钥encrypted by the server.

    The client calculates the master key (Master Secret, MS) based on random number 1, random number 2 and the previous master key, and then slices the master key to obtain two 会话密钥and two 鉴别密钥.

  2. Encrypted communication algorithm change notification, indicating that all data will be 会话密钥encrypted in the future.

  3. The client handshake end notification indicates that the handshake phase of the client has ended. The client will generate a summary of all handshake message data and 会话密钥send it to the server after encryption for verification by the server.

fourth handshake 

After receiving the message from the client, the server 私钥decrypts the former master key with its own, and calculates the master key based on random number 1, random number 2 and the former master key, and then slices the master key to obtain two 会话密钥and two 鉴别密钥.

After that, the fourth handshake is carried out, which mainly includes:

  1. Encrypted communication algorithm change notification, indicating that all data will be 会话密钥encrypted in the future.

  2. Server handshake end notification, indicating that the server's handshake phase has ended. The server will generate a summary of all handshake message data and 会话密钥send it to the client after encryption for verification by the client.

At this point, the handshake phase of the entire SSL/TLS is over!

Why do the third and fourth handshakes send summaries of all handshake messages?

The main reason is to prevent the handshake information from being tampered with . For example, in the list of cipher suites supported by the client, some encryption algorithms are weak and some encryption algorithms are strong, and the cipher suites are transmitted in plain text. If a hacker modifies the cipher suite list, only some more secure If the encryption algorithm is low, then the server can only choose from these less secure encryption algorithms, and the security is greatly reduced. Therefore, it is necessary to prevent tampering of the handshake information by sending a digest.

Why not send a master key directly, but regenerate a master key with two random numbers plus a previous master key?

The main reason is to prevent connection replay . If there are no previous two random numbers, only a master key is generated by the client and 公钥sent to the server through server encryption. Then after the hacker sniffs all the messages between the server and the client, he can pretend to be the client again and send the same message to the server (although the hacker doesn't know what the content is), because the message information is the same as before. The server has authenticated, so the server will think that it is the client communicating with it, resulting in another connection.

If the server is a shopping website, the replay of this connection will cause the client to place an order again, causing losses.

And if there are the first two random numbers, even if the hacker pretends to be the client and wants to connect and replay, but because the random numbers are different, the generated keys will be different , and the content resent by the hacker will be invalid (the server cannot understand, the integrity digest cannot wrong).

Finally, use a picture to summarize the process of the TLS four-way handshake.

Guess you like

Origin blog.csdn.net/weixin_45740811/article/details/129185172