Linux network-HTTPS encryption principle

Table of contents

1. Overview of HTTPS

2. Concept preparation 

3. Why encryption is necessary

4. Common encryption methods

1. Symmetric encryption

2.Asymmetric encryption

5. Data summary, digital signature

 6. Research on the encryption process of HTTPS

1. Option 1 - Only use symmetric encryption

2. Option 2 - Only use asymmetric encryption

3. Option 3 - Both parties use asymmetric encryption

4. Option 4 - Asymmetric encryption + symmetric encryption

5. Man-in-the-middle attack

7. Introduction certificate

1.CA certification

2. Understand data signatures

3. Option 5-Asymmetric encryption + symmetric encryption + certificate authentication


 

1. Overview of HTTPS

HTTPS is also an application layer protocol, which introduces an encryption layer based on the HTTP protocol.
The content of the HTTP protocol is transmitted in plain text, which makes it easy to be tampered with during the transmission process.

Why is http unsafe?

The picture below shows the username and password submitted by the user in the form under the http protocol,The username and password are transmitted in clear text and exist There is a serious risk. Before 2012, there was basically no requirement to bind a mobile phone number to an account, which meant that as long as your username and password were known, your account could be easily stolen.

The purpose of HTTPS is to encrypt transmission messages.

2. Concept preparation 

Encryption is to perform a series of transformations onplain text (the information to be transmitted) to generatePrivate text.
Decryption is to perform a series of transformations on ciphertext and restore it to Plain text.
In this encryption and decryption process, one or more intermediate data are often needed to assist in this process. Such data is called encryption Key.

Encryption and decryption have now developed into an independent discipline:Cryptography.
The founder of cryptography is also one of the forefathers of computer science, Alan Weston Turing.

Compare to our other grandfather: John von Neumann

It seems that Mr. Turing has a little too much hair...
In fact, this is a sad story. Mr. Turing is young and promising, not only laid the foundation for the future a>Computers,artificial intelligence, the basis of cryptography, and in World War II Only by destroying the German Enigma machine and giving the Allies the intelligence advantage can they turn the tide of the war and turn defeat into victory. However, for some reasons, Mr. Turing was attacked. Persecuted by the British royal family, he died young at the age of 41.

There is a movie"The Imitation Game"Changed from"The Life of Alan Turing"< /span>" named after him.
The highest honor in the computer field is the "Turing Award tells the story of Turing’s development of a deciphering machine during World War II.

3. Why encryption is necessary

The infamous "carrier hijacking"

Before the https protocol was popularized, we would search and download a software on the Internet. When we clicked download, the download link that popped up was the download link for other software.

Since any data packets we transmit through the network will pass through the operator's network equipment (routers, switches, etc.), the operator's network equipment can parse out the content of the data you transmit and tamper with it.

Clicking the "Download Button" actually sends an HTTP request to the server, and obtains HTTPThe response actually contains the download link of the APP. After the operator hijacked it, it was discovered that the request was to download "Tiantian" If the response is pleasant", then the response to the user will be automatically tampered with as "a Q browser" download address.

So: Because the content of http is transmitted in clear text, the clear text data will pass through multiple physical nodes such as routers, wifi hotspots, communication service operators, proxy servers, etc. If the information is hijacked during the transmission process, the transmission The content is completely exposed. The hijacker can also tamper with the transmitted information without being noticed by both parties. This is a man-in-the-middle attack, so we need to encrypt the information.
Not only operators can hijack, other hackers can also use similar means to hijack. Stealing users' private information, or tampering with content.
Imagine if a hacker obtained the user's account balance when the user logged in to Alipay, or even obtained the user's account balance. The payment password...
On the Internet, clear text transmission is a more dangerous thing!!!
HTTPS is based on HTTP Encryption is implemented to further ensure the security of user information.

4. Common encryption methods

1. Symmetric encryption

Uses the encryption method of single-key cryptography system. The same key can be used for encryption and decryption of information at the same time. This encryption method is called symmetric encryption
, also called single-key encryption, features: the key used for encryption and decryption is the same.

  • Common symmetric encryption algorithms (understand): DES, 3DES, AES, TDEA, Blowfish, RC2, etc.
  • Features: Open algorithm, small amount of calculation, fast encryption speed, high encryption efficiency

Symmetric encryption actually uses the same "key" to encrypt plain text into cipher text and decrypt the text into plain text.

For example, the simplest symmetric encryption principle-XOR operation:

Assume that the plain text a=1234 and the key key=8888
Then the ciphertext b obtained by encrypting a^key is 9834.
Then for The cipher text 9834 performs the operation b^key again, and the result is the original plain text 1234.
(The same is true for symmetric encryption of strings, each character can be represented as one number)
Of course, bitwise XOR is just the simplest symmetric encryption. Bitwise XOR is not used in HTTPS.

2.Asymmetric encryption

Two keys are required for encryption and decryption. These two keys arepublic key (public key for short)Andprivate key
(privatekey, referred to as private key).

  • Commonly seen asymmetric encryption algorithms (understanding): RSA, DSA, ECDSA
  • Features: The strength of the algorithm is complex, and security depends on the algorithm and key. However, due to the complexity of the algorithm, the encryption and decryption speed is not as fast as that of symmetric encryption and decryption.

Asymmetric encryption uses two keys, one is called the "public key" and the other is called the "private key".
Public key and private key It is paired. The biggest disadvantage is that the operation speed is very slow, much slower than symmetric encryption.

  • Encrypt plain text using the public key and turn it into cipher text
  • Decrypt the cipher text using the private key and turn it into plain text

Can also be used in reverse

  • Encrypt the plain text using the private key and turn it into cipher text
  • Decrypt the cipher text through the public key and turn it into plain text

The mathematical principles of asymmetric encryption are relatively complex and involve some knowledge related to number theory. Here is a simple example in life.
A wants to give B has some important documents, but B may not be there. So A and B made an agreement in advance:
B said: There is a box on my desk, and I will give you a lock. , you put the file in the box and lock it with the lock, and then I turn around and take the key to unlock the lock
to get the file.
In this scene, this The lock is equivalent to the public key, and the key is the private key. The public key can be given to anyone (without fear of leakage), but the private key can only be held by B. Only the person who holds the private key can Decrypt.

5. Data summary, digital signature

  • The basic principle of digital summary (data fingerprint) is to use a one-way hash function (Hash function) to operate on information to generate a series of fixed-length digital summaries. Digital fingerprinting is not an encryption mechanism, but it can be used to determine whether data has been tampered with. Once the data is tampered with, even if only one bit is modified, the results obtained by using the one-way hash function (Hash function) to operate on the information will be very different.
  • Common digest algorithms: MD5, SHA1, SHA256, SHA512, etc. The algorithm maps infinite to finite, so there may be a collision (two different information, the calculated digest is the same, but the probability is very low), in time This rarely happens when space is limited.
  • Summary characteristics: The difference from the encryption algorithm is that the summary is not encryption in the strict sense, because there is no decryption, but it is difficult to infer the original information from the summary, and is usually used for data comparison.
  • After the digest is encrypted, a digital signature is obtained (discussed later).

 6. Research on the encryption process of HTTPS

Since you want to ensure data security, you need to perform "encryption".
In network transmission, plaintext is no longer directly transmitted, but the encrypted "ciphertext".
There are many encryption methods, but they can be divided into two major categories: symmetric encryption and asymmetric encryption.

1. Option 1 - Only use symmetric encryption

If both parties to the communication hold the same key X and no one else knows it, the communication security of the two parties can certainly be guaranteed (unless the key is cracked).

After the introduction of symmetric encryption, even if the data is intercepted, since the hacker does not know what the key is, it cannot decrypt it, and therefore does not know the authenticity of the request
What is the content?
But things are not that simple. The server actually provides services to many clients at the same time. With so many clients, the secret key used by everyone must be are different (if they are the same, the key will be spread too easily, and hackers can get it). Therefore, the server needs to maintain the association between each client and each key, which is also a very difficult problem. ⿇ Annoying things.


 

The ideal approach is to when the client and server establish a connection, the two parties negotiate to determine what the key is this time.
However, if clear text is used to transmit the key when negotiating the key, the content of the key will be easily intercepted by hackers. Therefore, the transmission of the key must also be encrypted. !If you want to encrypt the key for transmission, you need a key for the key. This becomes a chicken-and-egg problem. At this time, symmetric encryption will no longer work for key transmission.

2. Option 2 - Only use asymmetric encryption

In view of the asymmetric encryption mechanism, if the server first transmits the public key to the browser in clear text, then the browser will use this public key to encrypt the data before transmitting it to the server. From the client The channel to the server appears to be secure (there are security issues) because only the server has the corresponding private key and can decrypt the public key encrypted data.
But how to ensure security on the path from the server to the browser?
If the server uses its private key to encrypt the data and sends it to the browser, then the browser can decrypt it using the public key, and this public key is initially transmitted to the browser in clear text. If This public key can easily be hijacked by the middleman, and he can use the public key to decrypt the information sent by the server.

3. Option 3—Both parties use asymmetric encryption

  1. The server has the public key S and the corresponding private key S', and the client has the public key C and the corresponding private key C'.
  2. The client and server exchange public keys.
  3. The client sends a message to the server: first use S to encrypt the data, and then send it. It can only be decrypted by the server, because only the server has the private key S'.
  4. The server sends information to the client: it first uses C to encrypt the data. After sending, it can only be decrypted by the client, because only the client has the private key C'.

 This seems to solve the security problem, but in fact there are still security problems, and the overall efficiency becomes very low.

4. Option 4 - Asymmetric encryption + symmetric encryption

Let’s solve the efficiency problem first:

  1. The server has an asymmetric public key S and a private key S'.
  2. The client initiates an https request and obtains the server's public key S.
  3. The client generates the symmetric key C locally, encrypts it with the public key S, and sends it to the server.
  4. Since the intermediate network device does not have a private key, even if the data is intercepted, the internal original text cannot be restored, and the symmetric key cannot be obtained (really?).
  5. The server decrypts the private key S and restores the symmetric key C sent by the client. It also uses this symmetric key to encrypt the response data returned to the client.
  6. Subsequent communication between the client and the server can only use symmetric encryption. Since the key is only known by the two hosts, the client and the server, it is meaningless even if other hosts/devices do not know the key and intercept the data.
     

5. Man-in-the-middle attack

The above solutions all have a fatal problem - they cannot intercept man-in-the-middle attacks. Man-in-the-MiddleAttack, referred to as "MITM attack".

Indeed, in scheme 2/3/4, after the client obtains the public key S, it encrypts the symmetric secret key C formed by the client with the public key S given to the client by the server, and the middleman Even if the data is stolen, the middleman cannot decipher the key C formed by the client at this time, because only the server has the private key S.
However, if the man-in-the-middle attack is carried out at the beginning of the handshake negotiation, it is not necessarily true. It is assumed that the hacker has successfully become the middleman.

  1. The server has the public key S and the private key S' of the asymmetric encryption algorithm.
  2. The middleman has the public key M and the private key M' of the asymmetric encryption algorithm.
  3. The client initiates a request to the server, and the server sends the public key S to the client in plain text.
  4. The middleman hijacks the data message, extracts the public key S and saves it, then replaces the public key S in the hijacked message with his own public key M, and sends the forged message to the client.
  5. The client receives the message, extracts the public key M (of course it does not know that the public key has been changed), forms its own symmetric secret key X, encrypts X with the public key M, and sends the message to the server.
  6. After the middleman hijacks it, he directly uses his own private key M to decrypt it and obtains the communication secret key
  7. The server gets the message, decrypts it with its own private key S, and obtains the communication secret key X.
  8. Both parties began to use X for symmetric encryption to communicate. But everything is under the control of the middleman, who can hijack data, eavesdrop or even modify it.
     

 Where does the essence of the problem lie? The client cannot be sure that the received datagram containing the public key was sent by the target server!

7. Introduction certificate

1.CA certification

Before using HTTPS, the server needs to apply for a digital certificate from the CA organization. The digital certificate contains the certificate applicant information, public key information, etc. The server transmits the certificate to the browser, and the browser obtains the public key from the certificate. The certificate is like an ID card, proving the authority of the server's public key.

 This certificate can be understood as a structured string, which contains the following information:Certificate issuing authority, certificate validity period, public key, and signature of the certificate owner.

It should be noted that when applying for a certificate, you need to generate a query on a specific platform, and a pair of key pairs, namely the public key and the private key, will be generated at the same time. This key pair is used for plaintext encryption and digital signatures in network communications.
The public key will be sent to the CA along with the CSR file for authoritative certification, and the private key server will retain it for subsequent communication (in fact, it is mainly used for exchange symmetric key).


2. Understand data signatures

The formation of the signature is based on an asymmetric encryption algorithm (the public key and private key of the CA organization). Note that it has nothing to do with https for the time being. Do not confuse it with the public and private keys in https.

 When the server applies for a CA certificate, the CA organization will review the server and form a digital signature specifically for the website. The process is as follows:

  1. The CA organization owns the asymmetrically encrypted private key A and public key A'
  2. The CA organization hashes the certificate plain text data applied by the server to form a data summary.
  3. Then encrypt the data digest using the CA private key A' to obtain the digital signature S

The certificate plain text applied by the server and the digital signature S together form a digital certificate, so that a digital certificate can be issued to the server.

3. Option 5-Asymmetric encryption + symmetric encryption + certificate authentication

When the client and server first establish a connection, the server returns a certificate to the client. The certificate contains the public key of the previous server and the identity information of the website.

Client authentication:
After the client obtains this certificate, it will verify the certificate (to prevent the certificate from being forged).

  1. Determine whether the certificate's validity period has expired.
  2. Determine whether the issuing authority of the certificate is trusted (a trusted certificate issuing authority built into the operating system).
  3. Verify whether the certificate has been tampered with: Obtain the public key of the certificate issuing authority from the system, decrypt the signature, and obtain a hash value (called data digest), set as hash1. Then calculate the hash value of the entire certificate, set as Hash2. Compare hash1 and hash2 to see if they are equal. If they are equal, it means that the certificate has not been tampered with.

Is it possible for the middleman to tamper with the certificate?

  1. The middleman tampered with the plain text of the certificate
  2. Since he does not have the private key of the CA organization, he cannot hash it and then use the private key to encrypt it to form a signature. Then there is no way to form a matching signature for the tampered certificate.
  3. If forcibly tampered with, after receiving the certificate, the client will find that the plain text and the decrypted value of the signature are inconsistent, indicating that the certificate has been tampered with and the certificate is not trustworthy, thus terminating the transmission of information to the server to prevent information from being tampered with. Leaked to middlemen

The middleman swaps the entire certificate?

  1. Because the middleman does not have the CA private key, it cannot create a fake certificate (why?).
  2. Therefore, the middleman can only apply for a real certificate from the CA, and then use the certificate he applied for to perform packet switching.
  3. This can indeed achieve the overall package replacement of the certificate, but don't forget that the certificate plain text contains server authentication information such as domain name. If the entire package is packaged, the client can still recognize it.
  4. Always remember: the intermediary does not have the CA private key, so it cannot legally modify any certificate, including your own.
     

Why must the digest content be encrypted to form a signature when transmitted over the network?
Common digest algorithms include:MD5 and SHA series
Taking MD5 as an example, we do not need to study the specific process of calculating signatures, we only need to understand the characteristics of MD5:

  1. Fixed length: No matter how long the string is, the calculated MD5 value is a fixed length (16-byte version or 32-byte version).
  2. Dispersion: As long as the source string changes a little bit, the final MD5 value will be very different.
  3. Irreversible: It is easy to generate MD5 from the source string, but it is theoretically impossible to restore the original string through MD5.
     

Because MD5 has such characteristics, we can think that if the MD5 values ​​of two strings are the same, the two strings are considered to be the same.
Understand the process of determining certificate tampering :(This process is like determining whether the ID card is a fake ID card).
 

Assume that our certificate is just a simple string hello. Calculate the hash value (such as md5) for this string, and the result is
BC4B2A76B9719D91.
If any character in hello is tampered with, such as hella, then the calculated md5 value will change greatly.
BDBD6F9CF51F2FD8.

Then we can return the string hello and hash value BC4B2A76B9719D91 from the server to the client. At this time, the client
How to verify whether hello has been tampered with?

Then just calculate the hash value of hello and see if it is BC4B2A76B9719D91.
But there is still a problem,If⿊customer After tampering with the hello and recalculating the hash value, the client cannot tell the difference. Therefore, the transmitted hash value cannot transmit plaintext and needs to transmit ciphertext. Although hackers can decrypt it, they cannot encrypt it using the corresponding private key. It is impossible to form a signature.

So, the certificate plain text (here is "hello") is hashed to form a hash digest, and then the CA uses its own private key to encrypt to form a signature, and
hello and The encrypted signatures are combined to form a CA certificate, which is issued to the server. When the client requests it, it is sent to the client. The middleman intercepts it. Because without the CA private key, it cannot be changed or the entire package is dropped, so it is safe. Proof of the validity of the certificate.
Finally, the client decrypts the public key of the certificate issuing authority already stored in the operating system, restores the original hash value, and then verifies it.

Why is the signature not encrypted directly, but must be hashed first to form a digest?

  • Reduce the length of the signature password and speed up the calculation of digital signature verification signatures

How to become a middleman

  1. ARP spoofing: In a LAN, a hacker can eavesdrop on the (IP, MAC) addresses of other nodes by receiving ARPRequest broadcast packets. For example, the hacker receives the addresses of two hosts A and B, and tells B (victim) that he is A, so that the data packets sent by B to A are intercepted by the hacker.
  2. ICMP attack: Since there is a redirected message type in the ICMP protocol, we can forge an ICMP message and send it to the client in the LAN, and pretend that we are a better routing path. As a result, all Internet traffic of the target will be sent to our designated interface, achieving the same effect as ARP spoofing.
  3. Fishing wifi&&fake websites, etc.
     

Guess you like

Origin blog.csdn.net/qq_63943454/article/details/134489843