HTTPS protocol, it is enough to read this article

insecure HTTP

In recent years, more and more websites use the HTTPS protocol for data transmission, because HTTPS can provide more secure services than HTTP.
Many browsers will add a "warning" sign to the website using the HTTP protocol to indicate that the data transmission is not safe, and will add a "lock" sign to the website using the HTTPS protocol to indicate that the data transmission is safe.

insert image description here

Why is the HTTP protocol insecure? Mainly manifested in the following three aspects:

  • Vulnerable to eavesdropping : HTTP transmits data in clear text . It is easy for hackers to intercept messages through sniffing technology. Since the data is not encrypted, the content can be understood by hackers. For example: if the user enters the password to withdraw money, the hacker can do whatever he wants after eavesdropping on the password!
  • Easy to be tampered with : Hackers can modify the message after intercepting the HTTP message, and then send it to the destination. For example: if the user wants to transfer money to his family, and the hacker changes the payee to himself, it will cause losses to the user!
  • Easy to forge identity : Hackers can forge HTTP messages, pretending to be the website that the user really wants to visit, and then communicate with the user. For example: if a user wants to visit the Taobao website for shopping, and the hacker pretends to be a Taobao website, the user may buy things on this fake Taobao website and cause losses!

How does HTTPS solve the above security problems? The main method looks like this:

  • Data encryption : What HTTPS transmits is no longer plaintext , but ciphertext using an encryption algorithm . Even if a hacker intercepts the message, he will not be able to understand the content!
  • Integrity summary : HTTPS obtains a summary of the message through a summary algorithm. If a hacker tampers with the content of the message, the regenerated summary will change. After verification, the receiver knows that the data is no longer complete and has been tampered with!
  • Digital certificate : HTTPS uses digital certificates to verify the identity of communication entities, and because hackers do not have corresponding certificates, once they pretend to be other websites, they will be seen through!

Encryption Algorithm

In order to prevent the transmitted data from being eavesdropped by hackers, the data needs to be encrypted and decrypted between the client and the server .
The sender encrypts the plaintext into ciphertext using an encryption algorithm , and the receiver decrypts the ciphertext into plaintext using the corresponding decryption algorithm . Hackers can only see the ciphertext, so they cannot obtain any useful information. As shown below:

insert image description here

Generally speaking, encryption algorithms are divided into two categories, symmetric encryption and asymmetric encryption .

  • Symmetric encryption : means that encryption and decoding use the same key, that is, key A in the figure is equal to key B;
  • Asymmetric encryption : Encryption and decryption use different keys, that is, key A in the figure is not equal to key B.

Symmetric encryption

In the symmetric encryption algorithm, the key for encryption and decryption is the same, called the key .
Caesar cipher is a relatively simple symmetric encryption algorithm that can be used to encrypt and decrypt English text. The main idea is: move each letter in the plaintext to the right by K bits according to the position of the alphabet to obtain the ciphertext (wrap around is allowed).
For example, if K = 2, then the letter "a" in the plaintext is replaced by the letter "c", and the letter "z" is replaced by the letter "b". At this time, K = 2 is the key in the symmetric encryption algorithm.

insert image description here

The disadvantage of this method is that after each letter is encrypted, there is only a unique ciphertext representation. If a hacker collects a lot of data and conducts statistical analysis, it is likely to crack the encryption method.
A better way is to use multiple Caesar ciphers K round-robin encryption, for example, letters with odd positions are encrypted with key K = 2, and letters with even positions are encrypted with key K = 3.

insert image description here

However, the Caesar cipher can only encrypt English text. If you want to encrypt all characters, you can use block encryption .
We know that any data is actually stored in a computer as a combination of 0/1 bits. The main idea of ​​packet encryption is: process the message to be encrypted into K-bit packets, and each packet is encrypted through a one-to-one mapping table.
For example, if K = 3, the mapping table is as shown in the figure below, then the message 010110001111 will be encrypted as 101000111001. At this time, K=3 and the mapping table are the keys in the symmetric encryption algorithm.

insert image description here

Similar to the previous method of using multiple Caesar ciphers K as keys, in order to increase the difficulty of cracking, a better way is to use multiple mapping tables and poll to encrypt data.
Symmetric encryption algorithms commonly used in computer networks are: DES, 3DES, AES, etc., all of which belong to block encryption algorithms.

asymmetric encryption

The encryption and decryption keys in the asymmetric encryption algorithm are different, called public key and private key respectively . Its characteristics are:

  • If it is encrypted with the public key, it can only be decrypted with the private key, and the public key cannot be decrypted at this time.
  • If it is encrypted with the private key, it can only be decrypted with the public key, and the private key cannot be decrypted at this time.
  • The public key is open to the public and anyone can get it; the private key is only known to you and cannot be disclosed.

Why does asymmetric encryption appear after symmetric encryption?
The reason is that the premise of symmetric encryption is that the communication parties need to negotiate a key , and when negotiating the key, the transmission is in plain text. If the key is intercepted by a hacker, even if the subsequent message is encrypted, the hacker can pass this key. key to decrypt!

insert image description here

A characteristic of asymmetric encryption is: public key encryption, only the private key can decrypt . Then there is no need to negotiate a key in advance like symmetric encryption. The two parties in the communication can directly send their public key to the other party. Even if the public key is known to the hacker, it does not matter. When one party encrypts the message with this public key, even if the hacker intercepts the message, it cannot be decrypted with the public key. Only the private key Only the other party can decrypt successfully!

insert image description here

Commonly used asymmetric encryption algorithms in computer networks are: RSA, ECDHE, etc.
Compared with symmetric encryption, asymmetric encryption algorithms are more complex and difficult to understand, with more mathematical reasoning. If you are interested in specific algorithms, you can read two articles by Ruan Yifeng: Principles of RSA Algorithm (1) and Principles of RSA Algorithm (2) ).
https://www.ruanyifeng.com/blog/2013/06/rsa_algorithm_part_one.html
http://www.ruanyifeng.com/blog/2013/07/rsa_algorithm_part_two.html

hybrid encryption

As mentioned earlier, the symmetric encryption algorithm needs to negotiate the key in advance, and the negotiation process uses plaintext (because there is no key yet), if the hacker intercepts the plaintext key, then even if it is encrypted, the hacker can use the key Decrypt, then there is no security at all!
The asymmetric encryption algorithm solves this problem, but it has a large number of exponential operations, and the encryption speed is very slow! The encryption speed of the symmetric encryption algorithm is very fast, generally 100-10000 times that of the asymmetric encryption algorithm!
Can the two be combined to make data transmission not only safe but also efficient? The answer is yes! HTTPS uses hybrid encryption, using both symmetric and asymmetric encryption.
The weakness of the symmetric encryption algorithm is that it is not safe to use plain text in the process of negotiating the key, and there is a possibility of key leakage. Then, can we not use the plain text, but use an asymmetric encryption algorithm to negotiate the key, and then transmit the data again Encrypted using a symmetric encryption algorithm.
That is to say, the key is transmitted using an asymmetric encryption algorithm, and the actual data is transmitted using a symmetric encryption algorithm. **This key is generally called "session key".

  • The session key is transmitted through an asymmetric encryption algorithm, which is very safe;
  • A large amount of data is transmitted (multiple times) through a symmetric encryption algorithm, and the session key only needs to be transmitted once, which is very efficient!

digest algorithm

A digest algorithm, also known as a hash algorithm, takes arbitrary data as input and outputs a fixed-length string (called a digest) . The main features are as follows:

  • Irreversible, that is, the input cannot be reversed through the output.
  • The same input must produce the same output.
  • Different inputs will most likely produce different outputs.
  • No matter how long the input data is, the length of the output summary is fixed.

For example: if the bit stream of the data is grouped every 8 bits (insufficient zero padding), and then all the groups are bitwise XORed, then the generated result can be called a summary. This algorithm is a simple The digest algorithm for .

insert image description here

If the output digests obtained by the digest algorithm of two input data are consistent, it is called a hash collision . A good digest algorithm has a very low probability of hash collisions, and it is very difficult to guess the content of the input from the output!
Digest algorithms commonly used in computer networks are: MD5, SHA-1, SHA-256 , etc.

insert image description here

In order to prevent the transmitted data from being tampered with by hackers, in addition to sending the actual data, the sender also uses a digest algorithm to obtain a summary of the data, and sends this summary together.
After receiving the data, the receiver uses the same digest algorithm to obtain the digest of the data again, and compares it with the digest sent by the sender. If the two are inconsistent, it means that the data has been tampered with, otherwise not.

insert image description here

Friends can easily see that the above method has obvious flaws. If a hacker not only tampers with the data, but also tampers with the abstract at the same time, won’t the receiver be unable to judge whether the data has been tampered with?

insert image description here

In order to prevent this from happening, the sender and the receiver must have something that only the two of them know, but hackers cannot know, such as a symmetric encrypted session key. However, in order to improve security, the session key is generally not used at this time, but a new key is used, which is called the authentication key, and the acquisition of this key is the same as the session key.
With the authentication key, the input of the digest algorithm is not only the transmission data, but the transmission data and the authentication key! Since the hacker does not know the authentication key, he can no longer forge the input, and the tampered summary will be incorrect, thus ensuring security!

insert image description here

After the data and the authentication key are concatenated, the digest generated by the digest algorithm has a special name, which is called the message authentication code , or MAC for short .
In order to further improve security, in fact, the client and server will use different session keys and authentication keys, that is, a total of four keys are required:

  1. session key for data sent from client to server;
  2. session key for data sent from the server to the client;
  3. An authentication key for data sent from the client to the server;
  4. The authentication key used for data sent from the server to the client.

digital certificate

As mentioned earlier, HTTPS uses an asymmetric encryption algorithm to transmit session keys. Generally, the server publishes the public key to the public. The client uses the public key to encrypt the session key, and then the server decrypts the private key to obtain the session key. At this time, the two parties have negotiated the key for symmetrically encrypting the transmitted data.
But what if the server's public key is forged by a hacker? For example, the classic "man-in-the-middle attack" problem:

  1. Requests sent by the client are hijacked by a man-in-the-middle (hacker) (eg using DNS hijacking) and all requests are sent to the man-in-the-middle.
  2. The middleman pretends to be a regular website (server), returns his public key 2 to the client, and asks for the public key 1 of the regular website.
  3. The client encrypts the session key 1 with the intermediary's public key 2 and sends it to the intermediary.
  4. The man in the middle uses his own private key 2 to decrypt to obtain session key 1, and at the same time pretends to be the client, encrypts session key 2 with the official website's public key 1 (it can be the same as session key 1) and sends it to the official website.
  5. The client encrypts the data with session key 1 and sends it to the intermediary.
  6. The middleman uses the session key 1 to decrypt the data and get the plaintext data! (Achieving wiretapping)
  7. The man-in-the-middle encrypts the data (possibly tampered) with the session key 2 and sends it to the legitimate website.

At this point, the communication between the client and the server is no longer secure! The middleman can not only eavesdrop on the content of the message, but also tamper with it!
insert image description here

How does the client know that the public key it owns comes from a regular website rather than a middleman? Then you need a digital certificate !
The concept of a digital certificate is like our identity card, designed to verify the identity of communicating entities. Our ID card is applied at the police station, while the digital certificate needs to be applied to the Certification Authority (CA), and there is a fee!
The specific process of solving man-in-the-middle attacks through digital certificates is as follows:

  • The server (regular website) first generates a pair of public key and private key, and then integrates the domain name, applicant, public key (note that it is not a private key, the private key cannot be disclosed anyway) and other information to generate a .csr file , and send this file to the certification authority CA.
  • After the CA receives the application, it will verify the applicant's information through various means. If there is no abnormality, it will use the digest algorithm to obtain a summary of the plaintext information in .csr, and then encrypt the summary with the CA's own private key to generate a String ciphertext, ciphertext is also called digital signature. The digital certificate contains this digital signature and the plaintext information in .csr . The CA returns this certificate to the applicant.
  • To prevent man-in-the-middle attacks, the client asks the server to send its certificate, which is verified.
  • When the client verifies the certificate, it takes out the signature and the plaintext information in the certificate, and then decrypts the signature with the public key of the CA organization it carries to obtain digest 1, and then uses the digest algorithm to obtain the digest of the plaintext information 2. Compare summary 1 and summary 2. If they are the same, the certificate is legal, that is, the public key in the certificate is correct; otherwise, the certificate is invalid. **

insert image description here

How does the browser get the public key of the certification authority? What if the public key is forged? In order to prevent nesting dolls, the public keys of these certification centers will be built into the actual computer operating system ! Therefore, there is no need to worry about the problem that the public key of the certification authority is forged.
Once the Chrome browser finds that a website's digital certificate is invalid, it will generate the following interface to prompt. If the user forces access, there is a certain risk.

insert image description here

SSL/TLS handshake

According to the above, make a summary:

  • HTTPS uses a hybrid encryption algorithm to solve the problem that HTTP transmission data is easy to be eavesdropped. This process requires the negotiation of a session key.
  • HTTPS solves the problem that HTTP transmission data is easy to be tampered with through a digest algorithm. This process requires negotiation of an authentication key.
  • HTTPS uses digital certificates to solve the problem that identities are easily forged in the HTTP protocol. This process requires the client to verify the server's certificate.

So what exactly does HTTPS do? When did the communication parties negotiate the session key and authentication key, and when did they verify the validity of the certificate? The answer is when the SSL/TLS protocol handshakes.
The "S" that HTTPS has more than HTTP refers to the SSL/TLS protocol.

insert image description here

In the HTTPS protocol, when the client and the server establish a TCP connection through a three-way handshake, the data will not be transmitted directly, but will first go through an SSL/TLS handshake process for negotiating session keys, authentication keys, and authentication certificate, etc., after which the data can be transferred securely!
insert image description here

Let's use Wireshark to capture packets and talk about the SSL/TLS 1.2 four-way handshake process in detail.
insert image description here

The first handshake
The client initiates an encrypted communication request to the server, which mainly includes:

  1. The SSL/TLS protocol version supported by the client, such as TLS 1.2 version.
  2. The random number 1 generated by the client is used to subsequently generate session keys and authentication keys.
  3. List of cipher suites supported by the client, each cipher suite contains:
    1. Asymmetric encryption algorithms for transmitting session keys, such as ECDHE, RSA;
    2. Asymmetric encryption algorithms used to verify digital certificates, such as ECDHE, RSA;
    3. Symmetric encryption algorithms for data transmission, such as AES_128_GCM, AES_128_CBC;
    4. Digest algorithm used to verify message integrity, such as SHA256, SHA384;
    5. The format is: TLS_asymmetric encryption algorithm_asymmetric encryption algorithm_symmetric encryption algorithm_digest algorithm , if the two asymmetric encryption algorithms are consistent, it can be omitted.

insert image description here

After the second handshake
server receives the client's encrypted communication request, it sends a response to the client, which mainly includes:

  1. Confirmed SSL/TLS protocol version. If the versions supported by both parties are different, encrypted communication will be closed.
  2. The random number 2 produced by the server is used for subsequent generation of session key and authentication key.
  3. Confirmed cipher suites such as TLS_RSA_WITH_AES128_CBC_SHA.
  4. The server's digital certificate.

insert image description here

The third handshake
After the client receives the response from the server, it will verify whether its digital certificate is legal (the verification method is explained in the digital certificate section). If the certificate is legal, it will perform the third handshake, which mainly includes:

  1. Another random number 3 produced by the client (called the pre-master key, Pre-Master Secret, abbreviated as PMS), this random number will be encrypted by the server's public key. The client calculates the master key (Master Secret, MS) based on random number 1, random number 2 and the previous master key, and then slices the master key to obtain two session keys and two authentication keys.
  2. Encrypted communication algorithm change notification, indicating that the data will be encrypted with the session key in the future.
  3. The client handshake end notification indicates that the handshake phase of the client has ended. The client will generate a summary of all handshake data, encrypt it with the session key and send it to the server for verification by the server.

insert image description here

After the fourth handshake
server receives the message from the client, it uses its own private key to decrypt the former master key, and calculates the master key based on random number 1, random number 2 and the former master key, and then slices the master key to obtain Two session keys and two authentication keys.
After that, the fourth handshake is carried out, which mainly includes:

  1. Encrypted communication algorithm change notification, indicating that the data will be encrypted with the session key in the future.
  2. Server handshake end notification, indicating that the server's handshake phase has ended. The server will generate a summary of all handshake message data, encrypt it with the session key and send it to the client for verification by the client.

insert image description here

At this point, the handshake phase of the entire SSL/TLS is over!
Why do the third and fourth handshakes send summaries of all handshake messages?
The main reason is to prevent the handshake information from being tampered with . For example, in the list of cipher suites supported by the client, some encryption algorithms are weak and some encryption algorithms are strong, and the cipher suites are transmitted in plain text. If a hacker modifies the cipher suite list, only some more secure If the encryption algorithm is low, then the server can only choose from these less secure encryption algorithms, and the security is greatly reduced. Therefore, it is necessary to prevent tampering of the handshake information by sending a digest.
Why not send a master key directly, but regenerate a master key with two random numbers plus a previous master key?
The main reason is to prevent connection replay . If there are no previous two random numbers, only a master key is generated by the client and sent to the server through server public key encryption. Then after the hacker sniffs all the messages between the server and the client, he can pretend to be the client again and send the same message to the server (although the hacker doesn't know what the content is), because the message information is the same as before. The server has authenticated, so the server will think that it is the client communicating with it, resulting in another connection.
If the server is a shopping website, the replay of this connection will cause the client to place an order again, causing losses.
And if there are the first two random numbers, even if the hacker pretends to be the client and wants to connect and replay, but because the random numbers are different, the generated keys will be different , and the content resent by the hacker will be invalid (the server cannot understand, the integrity digest cannot wrong).
Finally, use a picture to summarize the process of the TLS four-way handshake.

insert image description here

Guess you like

Origin blog.csdn.net/h295928126/article/details/129656584