A layer of software layer SSL/TLS is added between the http and tcp protocol transport layers. The combination of http and this layer of software is called HTTPS.

( HTTPS is also an application layer protocol. It introduces an encryption layer on the basis of the HTTP protocol . The content of the HTTP protocol is transmitted in plain text , which may lead to tampered with by others )

2.HTTP与HTTPS

(1) Different ports, two sets of services

The port bound to http is 80; the port bound to https is 443; they are two sets of services, the difference is that https is encrypted.

(2) HTTP is more efficient and HTTPS is more secure

HTTP does not need encryption, which is more efficient, but not safe; HTTP requires encryption, which is relatively inefficient, but safe; it is recommended to use the HTTP protocol under absolutely secure conditions such as intranets

3. Encryption, decryption, key concepts

Encryption: Encryption is to perform a series of transformations on plaintext (information to be transmitted, such as "hello world") to generate ciphertext.

Decryption: Decryption is to perform a series of transformations on the ciphertext and restore it to plaintext.

Key: In the process of encryption and decryption, one or more intermediate data is often needed to assist in this process. Such data is called a key (correctly pronounced yue four times, but everyone usually reads Make yao four times).

83 edition <<Burning the Old Summer Palace>> , someone wants to plot against and kill the Empress Dowager Cixi. The brochure handed to Cixi by Prince Gong Yixin. See what it really means.

Ming text: "Beware of Sushun, Duanhua, and Dai Heng" (these people were powerful men at the time, and were later taken over by Empress Dowager Cixi).

Ciphertext: The full text of the memorial

key: holed paper

4. Why encryption?

All encryption is to prevent someone in the middle from stealing and tampering

The infamous "carrier hijacking"

Download a Tiantiandongting

The effect of not being hijacked, click the download button, and the download link of Tiantiandongting will pop up.

Hijacked effect, click the download button, the download link of QQ browser will pop up

Since any data packets we transmit through the network will pass through the operator's network equipment (routers, switches, etc.), then the operator's network equipment can parse out the content of the data you transmit and tamper with it.

Clicking the "Download button" is actually sending an HTTP request to the server, and the obtained HTTP response actually contains the download link of the APP. After the operator hijacks it, it finds that the request is to download Tiantiandongting, so it will Automatically falsified the response given to the user into the download address of "QQ Browser"

Therefore: because the content of http is transmitted in plain text , the plain text data will pass through multiple physical nodes such as routers, wifi hotspots, communication service operators, proxy servers, etc. If the information is hijacked during transmission, the transmitted content will be completely exposed up . The hijacker can also tamper with the transmitted information without being noticed by both parties. This is a man-in-the-middle attack, so we need to encrypt the information.

Think about it, why do operators hijack?

5. Common encryption methods

(1) Symmetric encryption

• Using the encryption method of single-key cryptosystem , the same key can be used for encryption and decryption of information at the same time. This encryption method is called symmetric encryption , also known as single-key encryption . Features: encryption and decryption The keys used are the same

• Common symmetric encryption algorithms (understand): DES , 3DES , AES, TDEA, Blowfish , RC2, etc.

• Features: open algorithm, small amount of calculation, fast encryption speed, high encryption efficiency

Symmetric encryption is actually to encrypt plaintext into ciphertext and decrypt ciphertext into plaintext through the same "key".

A simple symmetric encryption, bitwise XOR

Suppose plaintext a = 1234, key key = 8888

Then the ciphertext b obtained by encrypting a ^ key is 9834.

Then perform the operation b ^ key on the ciphertext 9834 again, and the original plaintext 1234 is obtained. The key here is the key in symmetric encryption. (The same is true for symmetric encryption of strings, each character can be expressed as a number)

Of course, bitwise XOR is just the simplest symmetric encryption. Bitwise XOR is not used in HTTPS.

(2) Asymmetric encryption

• Two keys are required for encryption and decryption , the two keys are public key (public key, referred to as public key) and private key

(private key, referred to as private key).

• Common asymmetric encryption algorithms (understand): RSA, DSA, ECDSA

• Features: The strength of the algorithm is complex, and the security depends on the algorithm and the key. However, due to the complexity of the algorithm, the speed of encryption and decryption is not as fast as that of symmetric encryption and decryption.

Asymmetric encryption uses two keys, one is called "public key" and the other is called "private key".

The public key and the private key are paired. The biggest disadvantage is that the operation speed is very slow , which is much slower than symmetric encryption.

• Encrypt the plaintext with the public key and turn it into ciphertext

• Decrypt the ciphertext with the private key and turn it into plaintext

can also be used in reverse

• Encrypt the plaintext with the private key and turn it into ciphertext

• Decrypt the ciphertext with the public key and turn it into plaintext

The mathematical principle of asymmetric encryption is relatively complicated, involving some knowledge related to number theory. Here is a simple example in daily life.

A wants to give B some important documents, but B may not be there. So A and B make an agreement in advance:

B said: I have a box on my desk, and I will give you a lock, you put the file in the box and lock it with the lock, then I turn around and take the key to unlock the lock to get the file. In this scene, This lock is equivalent to the public key, and the key is the private key. The public key can be given to anyone (not afraid of leaking), but the private key can only be held by B himself. Only the person who holds the private key can decrypt it.

4. Data Summary && Data Fingerprint

• Digital fingerprint (data summary), the basic principle is to use a one-way hash function (Hash function) to operate on information to generate a string of fixed-length strings—digital summary (this string is called data summary) /data fingerprint) . Digital fingerprint is not an encryption mechanism, because it cannot reversely decipher the original data through this string, that is, it cannot be decrypted. But it can be used to judge whether the data has been tampered with.

• Common digest algorithms: MD5 , SHA1, SHA256, SHA512, etc. The algorithm maps infinite to finite, so there may be collisions (two different information, the calculated digest is the same, but the probability is very low)

• Summary feature: The difference from the encryption algorithm is that the strict sense of the abstract is not encryption, because there is no decryption , but it is almost impossible to deduce the original information from the abstract, which is usually used for data comparison

5. Digital signature

• After the digest is encrypted, a digital signature is obtained (detailed later)

6. Understand the chain - a link between the past and the next

• Can symmetrical encryption of http solve the problem of data communication security? what is the problem?

• Why use asymmetric encryption? Why not use asymmetric encryption for all?

Exploring the working process of HTTPS

Since data security is to be ensured, "encryption" is required.

In network transmission, plaintext is no longer directly transmitted, but "ciphertext" after encryption.

There are many encryption methods, but they can be divided into two categories: symmetric encryption and asymmetric encryption

2. Research on the working process of HTTPS

Since data security is to be ensured, "encryption" is required.

In network transmission, plaintext is no longer directly transmitted, but "ciphertext" after encryption.

There are many encryption methods, but they can be divided into two categories: symmetric encryption and asymmetric encryption

1. Solution 1 - Only use symmetric encryption (plaintext transmission is not advisable)

If both communication parties hold the same key X, and no one else knows, the communication security of the two parties can of course be guaranteed (unless the key is cracked)

After the introduction of symmetric encryption, even if the data is intercepted, since the hacker does not know what the key is, they cannot decrypt it, and therefore do not know what the real content of the request is.

But things are not that simple. The server actually provides services to many clients at the same time. With so many clients, the secret key used by each person must be different (if it is the same, the key is too easy to spread , hackers can also get it). Therefore, the server needs to maintain the association between each client and each key, which is also a very troublesome thing

Ideally, when the client and the server establish a connection, the two parties negotiate to determine what the key is this time~

But if the key is directly transmitted in plain text, then the hacker will be able to obtain the key~~ At this point, the subsequent encryption operations will be useless.

Therefore, the transmission of the key must also be encrypted!

But if you want to encrypt the key symmetrically, you still need to negotiate and determine a "key key". This becomes a "chicken or the egg" problem. At this time, the transmission of the key Then using symmetric encryption will not work.

2. Option 2 - Only use asymmetric encryption (one-way security only, not advisable)

In view of the asymmetric encryption mechanism, if the server first transmits the public key to the browser in plain text, and then the browser uses this public key to encrypt the data before transmitting it to the server, the channel from the client to the server seems to be It is safe (there is a security problem), because only the server has the corresponding private key to decrypt the data encrypted by the public key.

Although the browser to the server is encrypted and secure, the path from the server to the browser cannot guarantee security :

If the server encrypts data with its private key and sends it to the browser, the browser can decrypt it with the public key, which was originally transmitted to the browser in plaintext. If the public key is hijacked by an intermediary, Then he can also use the public key to decrypt the information sent by the server.

3. Option 3 - Both parties use asymmetric encryption (too inefficient to be feasible)

1. The server has the public key S (server) and the corresponding private key S', and the client has the public key C and the corresponding private key C'

2. The client and the server exchange public keys

3. The client sends information to the server: encrypt the data with S first, and then send it, which can only be decrypted by the server, because only the server has the private key S'

4. The server sends information to the client: first use C to encrypt the data, and then send it, it can only be decrypted by the client, because only the client has the private key C' This seems to work, but

• Too inefficient

• There are still security concerns

4. Solution 4 - Asymmetric encryption + Symmetric encryption (There are still security issues, see 5)

Solve the efficiency problem first

• The server has an asymmetric public key S and private key S'

• The client initiates an HTTPS request to obtain the public key S of the server

• The client generates a symmetric key C locally, encrypts it with the public key S, and sends it to the server.

• Because the intermediate network device does not have a private key, even if the data is intercepted, the internal original text cannot be restored, and the symmetric key cannot be obtained (really?)

• The server decrypts with the private key S', and restores the symmetric key C sent by the client. And use this symmetric key to encrypt the response data returned by the client. Subsequent communication between the client and the server only uses symmetric encryption That’s it. Since the key is only known by the two hosts, the client and the server, other hosts/devices don’t know the key, even if they intercept the data, it doesn’t make sense.

Since the efficiency of symmetric encryption is much higher than that of asymmetric encryption, asymmetric encryption is only used when negotiating keys at the initial stage, and later

Subsequent transmissions still use symmetric encryption.

Although the above is relatively close to the answer, there are still security issues

Plan 2, Plan 3, and Plan 4 all have a problem. What if the man-in-the-middle has already started to attack?

5. Man-in-the-middle attack - for the above scenario

• Man-in-the-MiddleAttack, referred to as " MITM attack "

Indeed, in Scheme 2/3/4, after the client obtains the public key S, the symmetric secret key X formed by the client is encrypted with the public key S provided by the server to the client. Even if the middleman steals When the data arrives, at this time the intermediary really cannot solve the key X formed by the client, because only the server has the private key S'

However, if the man-in-the-middle attack is carried out at the beginning of the handshake negotiation, it is not necessarily true. Assuming that the hacker has successfully become the man-in-the-middle

1. The server has a public key S and a private key S' of an asymmetric encryption algorithm

2. The middleman has the public key M and the private key M' of the asymmetric encryption algorithm

3. The client initiates a request to the server, and the server sends the public key S to the client in plain text

4. The middleman hijacks the data message, extracts the public key S and saves it, then replaces the public key S in the hijacked message with its own public key M, and sends the forged message to the client

5. The client receives the message, extracts the public key M (of course it does not know that the public key has been changed), forms a symmetric secret key X, encrypts X with the public key M, and sends the message to the server

6. After being hijacked by the middleman, decrypt it directly with your own private key M' to get the communication key X, and then encrypt it with the saved server public key S, and push the message to the server

7. The server gets the message, decrypts it with its own private key S', and obtains the communication key X

8. The two parties start to use X for symmetric encryption to communicate. But everything is in the hands of the middleman, hijacking data, eavesdropping or even modifying it is all possible

The above attack scheme is also applicable to scheme 2 and scheme 3

Where is the essence of the problem? The client cannot be sure that the received data message containing the public key is sent by the target server!

6. Import certificate

(1) CA certification

Before using HTTPS, the server needs to apply for a digital certificate from the CA organization. The digital certificate contains certificate applicant information, public key information, etc. The server transmits the certificate to the browser, and the browser just needs to obtain the public key from the certificate. The certificate is like an ID card, proving the authority of the public key on the server side.

(2) Composition of the certificate

Basic instructions: https://baike.baidu.com/item/CA%E8%AE%A4%E8%AF%81/6471579?fr=aladdin

This certificate can be understood as a structured string, which is divided into two parts: plaintext and signature. It contains the following information:

• Certificate issuing authority

• Certificate validity period

• public key

• certificate owner

• signature

• ......

It should be noted that when applying for a certificate, our server needs to generate a pair of key pairs on a specific platform, that is, a public key and a private key. This key pair is used for plaintext encryption and digital signature in network communication.

Among them, the public key will be sent to the CA for authoritative authentication along with the CSR file, and the private key server will keep it for subsequent communication (in fact, it is mainly used to exchange symmetric keys)

CSR and private key can be generated online: https://myssl.com/csr_create.html

After forming a CSR, the follow-up is to apply for certification from a CA. However, the general certification process is very cumbersome. Various service providers that provide certificate applications on the Internet generally really need it, and you can directly find the platform to solve it.

(3) Understanding data signatures

The formation of the signature is based on an asymmetric encryption algorithm. Note that it has nothing to do with https for the time being. Don't confuse it with the public key and private key in https.

The significance of data signature for data documents: to prevent content from being tampered with. See 7.(2) for details

(4) The process of applying for a certificate:

When the server applies for a CA certificate, the CA institution will review the server and form a digital signature for the website. The process is as follows:

1. The CA organization has asymmetrically encrypted private key A and public key A'

2. The CA organization hashes the plaintext data of the certificate applied by the server to form a data summary

3. Then the CA encrypts the data digest with the private key A to obtain the digital signature S

The plain text of the certificate applied by the server and the digital signature S together form a digital certificate, so that a digital certificate can be issued to the server

(5) View the trusted certificate issuing authority of the browser

Chrome browser, click on the upper right corner

Select "Settings", search for "Certificate Management", and you will see the following interface. (If not, look for it in Privacy Settings and Security->Security)

7. Solution 5 - Asymmetric Encryption + Symmetric Encryption + Certificate Authentication

(1) Asymmetric encryption + symmetric encryption + certificate authentication scheme

Based on Scheme 4, the public key S becomes a certificate.

When the client and the server just establish a connection, the server returns a certificate to the client, which contains the public key of the previous server and the identity information of the website.

Client to authenticate

After the client obtains the certificate, it will verify the certificate (to prevent the certificate from being forged).

• Determine whether the validity period of the certificate has expired

• Determine whether the issuing authority of the certificate is trusted (the trusted certificate issuing authority built into the operating system).

• Verify whether the certificate has been tampered with: Get the public key of the certificate issuing authority from the system, decrypt the signature, get a hash value (called data digest), set it to hash1 . Then calculate the hash value of the entire certificate , set it to hash2. Compare whether hash1 and hash2 are equal. If they are equal, it means that the certificate has not been tampered with.

(2) When the server sends the public key to the client, is it possible for the middleman to tamper with the certificate?

The intermediary may tamper with the plaintext of the certificate

• Because the intermediary does not have the private key of the CA organization, it cannot be encrypted with the private key to form a signature after hashing , so there is no way to form a matching signature for the tampered certificate

• If it is forcibly tampered with, after receiving the certificate, the client will find that the value after the hash of the plain text is inconsistent with the value after decryption of the signature , indicating that the certificate has been tampered with and cannot be trusted. Transfer information to prevent information leakage to intermediaries

If the intermediary changes the plaintext of the certificate, can it be encrypted with the intermediary's own private key?

-Can't. When the client decrypts, it uses the public key of the CA stored in the system. If it cannot be decrypted, it means that it has been tampered with.

Summary: The certificate may be tampered with, but if the client finds that the tampering does not match the digital signature, indicating that the certificate has been tampered with, the request will not be initiated.

(3) Can the intermediary switch the entire certificate?

• Because the intermediary does not have the CA private key, it is impossible to make a fake certificate (why? The certificate encrypted with the intermediary's own private key cannot be opened by other people's public key, which is useless)

• So the intermediary can only apply for a real certificate from the CA, and then use the certificate he applied for to switch

• This can indeed achieve the overall package of the certificate, but don't forget that the plain text of the certificate contains server authentication information such as the domain name.

Even if the package is dropped, the client can still recognize it.

• Always remember: the intermediary does not have the CA private key, so any certificate cannot be legally modified, including your own

frequently asked questions

Why must the digest content be encrypted to form a signature when it is transmitted over the network?

Common digest algorithms are: MD5 and SHA series

Taking MD5 as an example, we don't need to study the specific process of calculating the signature, we only need to understand the characteristics of MD5:

• Fixed length: No matter how long the string is, the calculated MD5 value is a fixed length (16-byte version or 32-byte version)

• Dispersion: As long as the source string is changed a little, the final MD5 value will be very different.

• Irreversible: It is easy to generate MD5 from the source string, but it is theoretically impossible to restore the original string through MD5.

8. Frequently Asked Questions

(1) Why must the abstract be encrypted to form a signature when it is transmitted over the network?

Common digest algorithms are: MD5 and SHA series

Understand the process of judging certificate tampering: (This process is like judging whether this ID card is a fake ID card)

Assuming that our certificate is just a simple string hello, calculate the hash value (such as md5) for this string, and the result is

BC4B2A76B9719D91

If any character in hello is tampered with, for example, it becomes hella, then the calculated md5 value will change greatly.

BDBD6F9CF51F2FD8

Then we can return the string hello and the hash value BC4B2A76B9719D91 from the server to the client. At this time, how does the client verify whether hello has been tampered with?

Then just calculate the hash value of hello to see if it is BC4B2A76B9719D91.

If the transmitted hash value is the transmitted plaintext. The hacker can tamper with hello and recalculate the hash value at the same time, so that the client cannot tell the difference.

Therefore, the transmitted hash value cannot be transmitted in plaintext, and needs to be transmitted in ciphertext

Therefore, a hash digest is formed from the hash of the plaintext of the certificate (here is "hello"), and then the CA uses its own private key to encrypt and form a signature, which will be

Hello and the encrypted signature are combined to form a CA certificate, which is issued to the server. When the client requests it, it is sent to the client. The middleman intercepts it. Because there is no CA private key, it cannot be changed or the entire package is dropped. It can safely prove the legitimacy of the certificate.

Finally, the client decrypts with the public key of the certificate issuing authority already stored in the operating system, restores the original hash value, and then performs verification.

(2) Why is the signature not directly encrypted, but hashed first to form a digest?

• Reduce the length of the signature ciphertext and speed up the calculation speed of the verification signature of the digital signature.

(3) How to become a middleman - understand

• ARP spoofing: In a LAN, hackers can eavesdrop on the (IP, MAC) addresses of other nodes after receiving ARP Request broadcast packets. For example, the hacker receives the addresses of two hosts A and B, and tells B (the victim) that he is A, so that all the data packets sent by B to A are intercepted by the hacker

• ICMP attack: Since there is a redirection packet type in the ICMP protocol, we can forge an ICMP message and send it to the client in the LAN, and pretend that we are a better routing path. As a result, all Internet traffic of the target will be sent to the interface we specified, achieving the same effect as ARP spoofing

• Fake wifi && fake website etc.

(4) Complete process

The left side is what the client does, and the right side is what the server does

9. Summary: There are three groups of keys involved in the working process of HTTPS.

The first group (asymmetric encryption): used to verify whether the certificate has been tampered with. The server holds the private key (the private key is obtained when forming the CSR file and applying for the certificate), and the client holds the public key (the operating system contains What are the trusted CA certification authorities, and hold the corresponding public key). When the server requests from the client, it returns the certificate with the signature. Further guarantee the authority of the server public key carried in the certificate.

The second group (asymmetric encryption): used to negotiate and generate a symmetric encryption key. The client uses the public key in the received CA certificate (which can be trusted) to generate a randomly generated symmetric encryption key Encrypted, transmitted to the server, the server obtains the symmetric encryption key by decrypting the private key.

The third group (symmetric encryption): The subsequent data transmitted by the client and the server are encrypted and decrypted by this symmetric key.

In fact, the key of everything is around this symmetric encryption key. Other mechanisms are to assist this key to work.

The second set of asymmetric encryption keys is for the client to pass this symmetric key to the server.

The first set of asymmetric encryption keys is for the client to obtain the second set of asymmetric encryption public keys.

Linux articles [15]: application layer - network https protocol

1. Introduction to HTTPS

1. Definition of HTTPS