[Computer Network Notes 8] Application Layer (5) HTTPS

What is HTTPS

HTTPS solves the problem of HTTP insecurity

The data in the entire HTTP transmission process is in clear text. Anyone can intercept , modify or forge request/response messages in the link, and the data is not trustworthy.

  • HTTPS uses encryption algorithms to encrypt messages , and hackers cannot understand them even if they intercept them.

  • HTTPS uses a digest algorithm to confirm the integrity of the message . Once a hacker modifies the message, HTTPS can detect it in time and handle it accordingly.

  • HTTPS uses digital signatures to ensure that hackers cannot forge request or response messages.

HTTPS consists of HTTP + SSL/TLS , that is, an SSL secure socket layer ( HTTP protocol running on top of SSL ) is added under HTTP

Insert image description here

SSL is Secure Sockets Layer , which is at layer 5 (session layer) in the OSI seven-layer network model. It was invented by Netscape in 1994, and two versions, v2 and v3, are commonly used .

The IETF renamed it TLS ( Transport Layer Security ) in 1999 , officially standardized it, and recalculated the version number from 1.0, so TLS1.0 is actually SSLv3.1.

Today, TLS has developed three versions, namely 1.1 in 2006, 1.2 in 2008 and 1.3 in 2018. The most widely used TLS is 1.2.

The security of HTTPS is based on the encryption algorithm of TLS, so to understand HTTPS is to understand the encryption algorithm principle behind TLS.

Data encryption algorithm

classical cryptography

In ancient wars, in order to prevent important information from being leaked if letters were intercepted, people began to encrypt letters.

shift encryption

Such as cipher sticks, which use cloth strips wrapped around wooden sticks to encrypt letters.

Insert image description here

  • Encryption algorithm: write after wrapping
  • Key: size of stick

replacement encryption

Use different text according to rules to replace the original text for encryption.

For example, code table:
Original characters: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Password characters: BCDEFGHIJKLMNOPQRSTUVWXYZA
Original letter: I love you
Encrypted letter: J mpwf zpv
After interpretation: I love you

  • Encryption Algorithm: Replacement Text
  • Key: replacement code table

Corresponding to classical cryptography is modern cryptography , which mainly includes two categories: symmetric encryption algorithms and asymmetric encryption algorithms.

  • Can encrypt any binary data
  • The emergence of asymmetric encryption has given cryptography a wider range of uses: digital signatures

Symmetric encryption algorithm

Symmetric encryption algorithm: Use the same key, different encryption and decryption algorithms , encrypt plaintext and decrypt ciphertext.

Insert image description here

There are many symmetric encryption algorithms to choose from in TLS , such as RC4, DES, 3DES, AES, ChaCha20 , etc. The first three are considered unsafe, and currently only AES and ChaCha20 are commonly used.

  • AES means " Advanced Encryption Standard " (Advanced Encryption Standard), and the key length can be 128 bits (bits), 192 bits (bits) or 256 bits (bits). The security strength is very high, the performance is also very good, and some hardware will be specially optimized, so it is very popular and is the most widely used symmetric encryption algorithm.

  • ChaCha20 is another encryption algorithm designed by Google. The key length is fixed at 256 bits. The pure software running performance is better than AES . It was once popular on mobile clients, but after ARMv8 , AES hardware optimization was also added , so it is no longer Has obvious advantages, but is still a good algorithm.

Cracking symmetric encryption

Breaking ideas:

  • Get one or more original text-ciphertext pairs
  • Try to find a key that can encrypt the original text in these original text-ciphertext pairs into ciphertext, and decrypt the ciphertext into the original text, which is a successful crack.

Anti-cracking:

  • The standard for an excellent symmetric encryption algorithm is that the cracker cannot find a more effective cracking method than the brute force method (brute force cracking method), and the cracking time of the brute force method is long enough (for example, thousands of years).

Disadvantages of symmetric encryption

The biggest problem facing symmetric encryption algorithms: how to safely transfer the key to the other party . The term is called " key exchange ".

Insert image description here

The disadvantage of the symmetric encryption algorithm is that the key cannot be transmitted over an insecure network, because once the key is leaked, the encrypted communication will fail.

asymmetric encryption algorithm

The asymmetric encryption algorithm uses the same encryption algorithm + two different keys (asymmetric), one is called the public key (public key) and the other is called the private key (private key). The public key can be made public for anyone to use, and The private key must be kept strictly confidential.

Insert image description here
Insert image description here

Public keys and private keys have one characteristic: they are one-way . Although both can be used for encryption and decryption, after the public key is encrypted, it can only be decrypted with the private key . Conversely, after the private key is encrypted, it can only be decrypted with the public key .

Using asymmetric encryption communication, you can pass the public keys of both parties to each other on an untrusted network, and then use the other party's public key to encrypt the message and use your own private key to sign the message before sending the message, achieving an untrusted network Reliable key dissemination and encrypted communication on the Internet.

Insert image description here

RSA

The design of asymmetric encryption algorithms is much more difficult than symmetric algorithms. There are only a few types in TLS , such as DH, DSA, RSA, ECC , etc.

RSA is probably the most famous one, almost synonymous with asymmetric encryption. Its security is based on the mathematical problem of " integer decomposition ", using the product of two very large prime numbers as the material to generate the key . If you want to generate a key from the public key Deducing the private key is very difficult.

The recommended length of RSA keys 10 years ago was 1024 , but with the improvement of computer computing power, 1024 is no longer safe, and it is generally believed that at least 2048 bits are required.

ECC

ECC (Elliptic Curve Cryptography) is a "rising star" in asymmetric encryption. It is based on the mathematical problem of " elliptic curve discrete logarithm " and uses specific curve equations and base points to generate public and private keys . The sub-algorithm ECDHE is used for key exchange . , ECDSA is used for digital signatures .

The two commonly used curves at present are P-256 (secp256r1, called prime256v1 in OpenSSL) and x25519 . P-256 is the curve recommended by NIST (National Institute of Standards and Technology) and NSA (National Security Agency), and x25519 is considered the safest and fastest curve.

Compared with RSA, ECC has obvious advantages in security strength and performance. 160-bit ECC is equivalent to 1024-bit RSA, and 224-bit ECC is equivalent to 2048-bit RSA. Because the key is short, the corresponding amount of calculation, memory and bandwidth consumption are less, and the performance of encryption and decryption is improved, which is very attractive for today's mobile Internet.

Cracking asymmetric encryption

Breaking ideas:

  • The difference from symmetric encryption is that the public key of asymmetric encryption is easy to obtain, so it is not difficult to create a original text-ciphertext pair.
  • Therefore, the key to asymmetric encryption is only to find a correct private key that can decrypt all ciphertext encrypted by the public key. Finding such a private key is a successful crack
  • Due to the characteristics of asymmetric encryption, how to infer the private key through the public key is usually an idea (such as RSA), but often the best method is still the exhaustive method, but the difference from symmetric encryption cracking is that symmetric encryption cracking It is to constantly try whether your new key can encrypt and decrypt the original text-ciphertext pair you obtained. In asymmetric encryption, you constantly try to see whether your new private key and the public key are mutually decryptable.

Anti-cracking:

  • Like symmetric encryption, the standard for an asymmetric encryption algorithm to be excellent is that the cracker cannot find a more effective cracking method than the exhaustive method, and the cracking time of the exhaustive method is long enough.

Advantages and Disadvantages of Asymmetric Encryption

  • Advantages: Keys can be transmitted over unsecured networks
  • Disadvantages: The calculation is complex, so the performance is much worse than symmetric encryption

Symmetric encryption VS asymmetric encryption

Symmetric encryption asymmetric encryption
There is a key exchange issue No key exchange issues
Fast computing speed The operation speed is very slow because the asymmetric encryption algorithm is based on complex mathematical problems. Even ECC is several orders of magnitude worse than AES .

If only asymmetric encryption is used, although security is guaranteed, the communication speed is very slow and the practicality becomes zero.

TLS uses hybrid encryption

At the beginning of communication, use asymmetric algorithms , such as RSA and ECDHE , to first solve the problem of key exchange .

Hybrid encryption solves the key exchange problem of symmetric encryption algorithms and takes into account both security and performance.

Insert image description here

Then a random number is used to generate the " session key " used by the symmetric algorithm , and then the public key is used to encrypt it. Because session keys are short, usually only 16 or 32 bytes, it doesn't matter if it's a little slower.

data integrity

summary algorithm

Although hackers cannot get the session key and cannot decipher the ciphertext, they can collect enough ciphertext through eavesdropping, and then try to modify and reorganize it before sending it to the website. Because there is no integrity guarantee, the server can only "accept everything", and then he can obtain further clues through the server's response, and eventually decipher the clear text.

The main means to achieve integrity is the Digest Algorithm.

You can approximately understand the digest algorithm as a special compression algorithm that can "compress" data of any length into a fixed-length and unique "summary" string , just like generating a number for this piece of data. "Fingerprint" .

Insert image description here

From another perspective, the digest algorithm can also be understood as a special "one-way" encryption algorithm . It only has an algorithm and no key. The encrypted data cannot be decrypted, and the original text cannot be deduced from the digest .

MD5 (Message-Digest 5) and SHA-1 (Secure Hash Algorithm 1) are the two most commonly used digest algorithms, capable of generating 16-byte and 20-byte length digital digests. However, the security strength of these two algorithms is relatively low and not secure enough, and their use has been banned in TLS .

Currently, TLS recommends using the successor of SHA-1: SHA-2

SHA-2 is actually a collective name for a series of digest algorithms. There are 6 types in total. The commonly used ones are SHA224 , SHA256 , and SHA384 , which can generate 28-byte, 32-byte, and 48-byte digests respectively.

Insert image description here

If a hacker changes even one punctuation mark in the middle, the summary will be completely different, and the website will find that the message has been tampered with through calculation and comparison, which is not trustworthy.

digital signature

Hackers can disguise themselves as websites to steal information. In turn, he can also pretend to be you and send payment, transfer and other messages to the website. The website has no way to confirm your identity, and the money may be stolen.

In real life, the solution to identity authentication is signature and seal . As long as you write your signature or stamp on the paper, you can prove that the document was indeed issued by you and not someone else.

Is there anything in TLS that is very similar to a signature or seal in real life that can only be held by me and not by anyone else ? Just use this thing to prove your identity in the digital world.

This thing is the " private key " in asymmetric encryption . Using the private key plus the digest algorithm can achieve "digital signature" and " identity authentication ".

A digital signature is the content of the digest generated by the digest algorithm and then encrypted with the private key . This part of the content is put into the message together with the original digest as identity authentication.

Private key encryption - public key decryption

The process of the client sending encrypted messages to the server:

Insert image description here

Client signing process:

  • ① Execute the digest algorithm (SHA-2) on the plain text to generate a digest
  • ② Use the client's private key to encrypt the generated digest to generate a signature . Only the digest is encrypted because the digest is shorter , so encryption and decryption are faster.
  • ③ Attach the digest and signature to the original text, then use the session key to encrypt the message, generate ciphertext, and transmit it.

Insert image description here

Server-side verification signature process:

  • ① Use the session key to decrypt the ciphertext and obtain the plaintext
  • ② Execute the digest algorithm (SHA-2) on the plain text to generate a digest, and compare the digest in the message
  • ③ Use the client public key to decrypt the signature to obtain the plain text of the digest, and compare it with the digest in the message

The process of sending encrypted messages from the server to the client is similar to the above:

Server-side signing process:

  • ① Execute the digest algorithm (SHA-2) on the plain text to generate a digest
  • ② Use the server private key to encrypt the generated digest to generate a signature
  • ③ Attach the digest and signature to the original text, then use the session key to encrypt the message, generate ciphertext, and transmit it.

Client verification signature process:

  • ① Use the session key to decrypt the ciphertext and obtain the plaintext
  • ② Execute the digest algorithm (SHA-2) on the plain text to generate a digest, and compare the digest in the message
  • ③ Use the server public key to decrypt the signature to obtain the plain text of the digest, and compare it with the digest in the message

Public key trust issues

Find a recognized trusted third party and let it serve as the "starting point of trust and the end point of recursion" to build a public key trust chain .

This "third party" is what we often call CA (Certificate Authority). It is like the Public Security Bureau, Ministry of Education, and Notary Center in the online world. It has extremely high credibility. It signs each public key and uses its own reputation to ensure that the public key cannot be forged and is trustworthy.

CA digital certificate

The CA 's signature authentication of the public key also has a format. It does not simply bind the public key to the holder's identity, but also includes the serial number, purpose, issuer, validity time , etc. These are typed into A package is then signed to completely prove the various information associated with the public key, forming a " digital certificate " (Certificate).

There are only a few well-known CAs in the world, such as DigiCert, VeriSign, Entrust, Let's Encrypt, etc. The certificates they issue are divided into three types : DV, OV, and EV . The difference lies in the degree of trustworthiness.

DV is the lowest, it is only credible at the domain name level, and we don’t know who is behind it. EV is the highest, has been strictly verified by law and audit, and can prove the identity of the website owner.

Insert image description here

Question: How does a CA prove that it is trustworthy?

  • A smaller CA can allow a larger CA to sign and authenticate, but the end of the chain, the Root CA , can only prove itself. This is called the " Root Certificate ". You must believe it, otherwise the entire certificate trust chain will not go forward.
    Insert image description here

With this certificate system, the operating system and browsers have built-in root certificates of major CAs . When surfing the Internet, as long as the server sends its certificate, the signature in the certificate can be verified, following the Certificate Chain. Verify layer by layer until you find the root certificate, you can be sure that the certificate is trustworthy, and the public key inside is also trustworthy.

TLS handshake process

Insert image description here

For the detailed TLS handshake process of ECDHE and RSA , please refer to: https://www.processon.com/view/link/62bed0685653bb214fa3d58f

Summarize

HTTPS:

  • HTTPS = HTTP + SSL/TLS, which is an HTTP protocol running on top of SSL . Below the HTTP layer, a layer of SSL/TLS secure socket layer is added above the transport layer . It is located at the 5th session layer in the OSI model.

The security of HTTPS is mainly reflected in three aspects:

  • ① Use encryption algorithm to encrypt the message . Hackers will not be able to understand it even if they intercept it.
  • ② HTTPS uses a digest algorithm to confirm the integrity of the message . Once a hacker modifies the message, HTTPS can detect it in time and handle it accordingly.
  • ③ Use digital signatures to ensure that hackers cannot forge request or response messages.

Symmetric encryption algorithm:

  • Use the same key and different encryption and decryption algorithms to encrypt plaintext and decrypt ciphertext.
  • Symmetric encryption algorithms in TLS include RC4, DES, 3DES, AES , ChaCha20 , etc. The first three are considered unsafe, and currently only AES and ChaCha20 are commonly used.
  • AES : Advanced Encryption Standard (Advanced Encryption Standard), the key length can be 128 bits, 192 bits or 256 bits, with high security strength and good performance.
    ChaCha20 : An encryption algorithm designed by Google with a fixed key length of 256 bits. It was once popular on mobile clients, but AES hardware optimization was added after ARMv8, so it no longer has obvious advantages, but it is still a good algorithm.
  • The problem that symmetric encryption needs to solve is key exchange . Its disadvantage is that the key cannot be transmitted over an insecure network.

Asymmetric encryption algorithm:

  • Asymmetric encryption uses two different asymmetric keys, a public key and a private key. The public key can be made public for anyone to use, and the private key must be kept strictly confidential .
  • The public key and the private key are one-way . After the public key is encrypted, it can only be decrypted with the private key . Conversely, after the private key is encrypted, it can only be decrypted with the public key .
  • The more famous asymmetric encryption algorithm is RSA . Its security is based on the mathematical problem of integer decomposition. ECC is more secure and performant than RSA , and it is also an algorithm based on mathematical problems.
  • Although the asymmetric encryption algorithm solves the key exchange problem, it is based on complex mathematical problems and the operation speed is very slow . If only asymmetric encryption is used, the practicality is zero .

TLS uses hybrid encryption :

  • First solve the key exchange problem through an asymmetric encryption algorithm , use random numbers to generate a " session key " used in the symmetric encryption algorithm , and then use the public key to encrypt the " session key " and send it to the other party, and the other party uses the private key to decrypt it . " Session key ", the two parties can use the " session key " to communicate based on the symmetric encryption algorithm .

    1) The server issues a public key (asymmetric encryption algorithm)
    2) The client uses the public key to encrypt the generated [ session key ] (asymmetric encryption algorithm)
    3) The server uses the private key to decrypt the message sent by the client [ Session key ] (asymmetric encryption algorithm)
    4) Then the client and server use the [ session key ] to communicate (symmetric encryption algorithm)

In this way, the key exchange problem is first solved with an asymmetric encryption algorithm, and the performance is improved by using a symmetric encryption algorithm.

Data integrity:

  • Digest algorithm : It can be understood as a special compression algorithm , or Hash algorithm , which can compress data of any length into a fixed-length and unique digest string . It can also be understood as a special one-way encryption algorithm . There is only an algorithm but no key. The encrypted data cannot be decrypted and the original text cannot be deduced from the digest .

  • Commonly used digest algorithms include MD5 and SHA-1 , but these two are not secure enough and have been disabled by TLS . The current recommendation is SHA-2 . SHA-2 is a series of digest algorithms. The commonly used ones are SHA224, SHA256, and SHA384, respectively. Able to generate 28, 32, and 48 byte digests.

digital signature:

  • ① The client uses the private key to encrypt the digest to generate a signature . As identity authentication , the signature and encrypted digest are put together with the message and used [ session key ] for encrypted transmission.
  • ② The server first uses the [ session key ] to decrypt the message, then executes the digest algorithm , compares the digest , and verifies the data integrity . Then it uses the client's public key to decrypt the signature , compares the digest , and performs identity authentication .

This is the process of the client sending an encrypted message to the server. In turn, the process of the server sending an encrypted message to the client is similar to this.

Public key trust issues:

  • The CA agency performs signature certification on the public key and generates a digital certificate . The operating system and browser have built-in root certificates of major CAs . As long as the server sends its certificate, it can verify the signature in the certificate and follow the certificate chain one level. Verify layer by layer until you find the root certificate, you can be sure that the certificate is trustworthy, and the public key inside is also trustworthy.

Interaction flow chart:

Insert image description here


Supplement other coding-related content:

Base64

An encoding algorithm that converts binary data into a 64-character string.

Algorithm: Correspond to every 6 bits of the original data into a character in the Base 64 index table and organize it into a string (8 bits per character).

Base64 index table:

Insert image description here

Encoding example: Base64 encoding "Man"

Insert image description here

Encoding example: Base64 trailing padding

Insert image description here

The purpose of Base64

  1. Expands the storage and transmission methods of binary data (for example, you can save data to text files, send binary data through chat dialog boxes or text messages, and add simple binary data to URLs)
  2. Ordinary strings will become unreadable to the naked eye after being Base64 encoded, so it can be used for anti-peeping under certain conditions (less commonly used)

Disadvantages of Base64

  • Because of its own principle (6-bit to 8-bit), the data will increase by about 1/3 after each Base64 encoding , which will affect storage and transmission performance.

"Base64 encrypted image transmission is more secure and efficient"? ? ?

  • No. First of all, Base64 is not encryption; in addition, Base64 will increase the data by 1/3, reduce network performance, and increase user traffic overhead, which is an unnecessary step.

  • Base64 encoding of images is useful when sometimes you need to transmit images in text form. Other than that, there is absolutely no need to use Base64 for additional processing of the image.

Variant: Base58

  • The encoding method used by Bitcoin removes the number "0", the uppercase letter "O", the uppercase letter "I", and the lowercase letter "l" in Base64, as well as the "+" and "/" symbols, used in Bitcoin Representation of address.
  • Base58 The main purpose of changes to Base64 is user convenience. Base58 is more convenient for "manual transcription" by removing indistinguishable characters. In addition, removing the "+" and "/" signs also makes double-click selection convenient for most software.

Compression and decompression

  • Compression: Encode data using an encoding algorithm that has more storage advantages.
  • Decompression: Decode the compressed data and restore it to its original form for ease of use.

The purpose of compression is to reduce the storage space occupied by data.

Examples of crude algorithms

Compress the following text content:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

The data compressed using a certain algorithm is:

compress:a:1062;b:10

Note: There are many specific compression scenarios, so the compression algorithm will be much more complex. The above is just a prototype algorithm.

Is compression encoding?

  • yes. The so-called encoding is to convert data from one form to another form. The compression process belongs to the encoding process, and the decompression process belongs to the decoding process.

Common compression algorithms: DEFLATE, JPEG, MP3, etc.

Hash

Convert arbitrary data into data within a specified size range (usually very small, such as within 256 bytes).

Function: It is equivalent to extracting summary information from data, so the main use is digital fingerprint .

Practical uses of Hash:

  • Uniqueness verification , such as the method in Java hashCode(). (How to rewrite the hashCode method? Put equals()every variable used to determine equality in the method hashCode()into and together generate an integer that will avoid collisions as much as possible)
  • Data integrity verification : After downloading a file from the Internet, you can confirm whether the downloaded file is damaged by comparing the hash value of the file (such as MD5, SHA1). If the hash value of the downloaded file is consistent with the hash value given by the file provider, it proves that the downloaded file is intact.
  • Quick search like HashMap
  • Privacy protection , when important data must be exposed, you can choose to expose its Hash value (such as MD5) to ensure the security of the original data. For example, when logging in to a website, you can only save the hash value of the user's password. During each login verification, you only need to compare the hash value of the entered password with the hash value saved in the database. The website does not need to know the user's password. In this way, when website data is stolen, users will not have their passwords stolen and the security of other websites will be compromised.

Guess you like

Origin blog.csdn.net/lyabc123456/article/details/133327300
Recommended