Introduction to the Simple Principle of HTTPS

Why do you need https

HTTP is transmitted in clear text, which means that any node between the sender and the receiver can know what the content of your transmission is. These nodes may be routers, proxies, etc.

For the most common example, user login. The user enters the account number and password. If you use HTTP, you can get your password by doing some tricks on the proxy server.

User login -> proxy server (manipulation) -> actual authorization server

Encrypt the password on the sender side? It's useless, although others don't know what your original password is, but if you can get the encrypted account password, you can still log in.

How HTTPS is Secure

HTTPS actually means secure http , which is an upgraded version of HTTP. Students who have a little understanding of the basics of the network know that HTTP is an application layer protocol, and under the HTTP protocol is the transmission protocol TCP. TCP is responsible for the transmission, and HTTP defines how the data is packaged.

HTTP –> TCP (clear text transmission)

How is HTTPS different from HTTP? In fact, an encryption layer TLS/SSL is added between HTTP and TCP .

What is TLS/SSL?

In layman's terms, TLS and SSL are actually similar things. SSL is an encryption suite that is responsible for encrypting HTTP data. TLS is an upgraded version of SSL. Now referring to HTTPS, the cipher suite basically refers to TLS.

Transmission encryption process

Originally, the application layer sent data directly to TCP for transmission, but now the application layer sends data to TLS/SSL, encrypts the data, and then sends it to TCP for transmission.

Roughly as shown.
enter image description here

That's it. Encrypting the data before transmission, rather than letting the data run naked on the complex and dangerous network, ensures the security of the data to a large extent. In this way, even if the data is intercepted by intermediate nodes, the bad guys can't understand it.

How HTTPS Encrypts Data

Students who have an understanding of the basics of security or cryptography should know common encryption methods. Generally speaking, encryption is divided into symmetric encryption and asymmetric encryption (also called public key encryption).

Symmetric encryption

Symmetric encryption means that the key used to encrypt data is the same key used to decrypt data.

The advantage of symmetric encryption is that the encryption and decryption efficiency is usually relatively high. The disadvantage is that the data sender and the data receiver need to negotiate, share the same key, and ensure that the key is not leaked to others. In addition, for multiple individuals with data exchange requirements, a key needs to be allocated and maintained between them, and the cost of this is basically unacceptable.

Asymmetric encryption

Asymmetric encryption means that the key (public key) used to encrypt data is different from the key (private key) used to decrypt data.

What is a public key? In fact, it literally means that the public key can be found by anyone. Therefore, asymmetric encryption is also called public key encryption.

Correspondingly, the private key is a non-public key, generally held by the administrator of the website.

What is the relationship between the public key and the private key?

Simply put, data encrypted with the public key can only be decrypted with the private key. Data encrypted with the private key can only be decrypted with the public key.

Many students know that the data encrypted by the public key can be decrypted by using the private key, but they have overlooked one point. The data encrypted by the private key can also be decrypted by the public key. This is critical to understanding the entire encryption and authorization system of HTTPS.

An example of asymmetric encryption

Login user: Xiao Ming
Authorized website: a well-known social networking site (hereinafter referred to as XX)

Xiao Ming is a user of a well-known social networking site XX, and XX uses asymmetric encryption in the login place for security reasons. Xiao Ming enters the account number and password on the login interface, and clicks "Login". Therefore, the browser uses the public key to encrypt Xiao Ming's account and password, and sends a login request to XX. XX's login authorization program decrypts the account and password through the private key, and passes the verification. After that, Xiao Ming's personal information (including privacy) is encrypted with the private key and transmitted back to the browser. The browser decrypts the data through the public key and shows it to Xiao Ming.

Step 1: Xiao Ming enters the account password -> the browser encrypts with the public key -> the request is sent to XX
Step 2: XX decrypts with the private key, passes the verification –> obtains Xiaoming’s social data, encrypts it with the private key –> the browser decrypts the data with the public key, and displays it.

Can asymmetric encryption solve the problem of data transmission security? As mentioned earlier, the data encrypted by the private key can be decrypted by the public key, and the public key is encrypted. That is, asymmetric encryption can only guarantee the security of one-way data transmission.

Also, there is the issue of how the public key is distributed/obtained. These two issues are discussed further below.

Public Key Encryption: Two Obvious Problems

The previous example of Xiaoming's login to the social networking site XX was mentioned, and it was mentioned that there are two obvious problems in simply using public key encryption.

How to get the public key
Data transfer is only one-way secure

Question 1: How to obtain the public key

How does the browser get the public key of XX? Of course, Xiao Ming can check it online, and XX can also post his public key on his homepage. However, for a social networking site that can easily succeed or fail by tens of millions, it will cause great inconvenience to users. After all, most users do not know what a "public key" is.

Problem 2: Data transmission is only one-way secure

As mentioned earlier, only the private key can unlock the data encrypted by the public key, so Xiao Ming's account and password are safe, and he is not afraid of being intercepted halfway.

Then there is a big problem: the data encrypted by the private key can also be decrypted by the public key . In addition, the public key is public, and Xiao Ming's private data is equivalent to running naked on the Internet in a different way. (After the intermediate proxy server gets the public key, it can decrypt Xiao Ming's data without hesitation)

The following answers to these two questions respectively.

Question 1: How to obtain the public key

There are two very important concepts involved here: certificate, CA (Certificate Authority).

Certificate

It can be temporarily understood as the ID card of the website. This ID contains a lot of information, including the public key mentioned above.

That is to say, when Xiaoming, Xiaowang, Xiaoguang and other users access XX, they no longer need to search for XX's public key all over the world. When they visit XX, XX will send the certificate to the browser, telling them to say, good, use the public key in this to encrypt the data.

Here is a question, where does the so-called "certificate" come from? This is the responsibility of the CA mentioned below.

CA (Certificate Authority)

Emphasize two points:

There are many CAs (both at home and abroad) that can issue certificates.
Only a few CAs are considered authoritative and fair, and the certificates issued by these CAs are considered trustworthy by browsers. Such as VeriSign . (It's not that the CA forged its own certificate has not happened...)

The details of certificate issuance are not expanded here. It can be simply understood that the website submits an application to the CA. After the CA passes the review, the certificate is issued to the website. When the user accesses the website, the website sends the certificate to the user.

As for the details of the certificate, it is also mentioned later.

Problem 2: Data transmission is only one-way secure

As mentioned above, data encrypted with the private key can be decrypted and restored with the public key. So, does this mean that the data the website transmits to the user is not secure?

The answer is: yes! ! ! (Three exclamation marks represent the emphasis on the third power)

Seeing this, you may have this thought in your heart: using HTTPS, the data is still streaking, so unreliable, it is better to use HTTP directly to save trouble.

However, why is the industry's voice for HTTPS becoming more and more popular? This is obviously contrary to our perceptual knowledge.

Because: although HTTPS uses public key encryption, it also combines other means, such as symmetric encryption, to ensure the efficiency and security of authorization and encrypted transmission.

In a nutshell, the entire simplified encrypted communication process is:

Xiaoming visits XX, XX gives his certificate to Xiaoming (actually it is given to the browser, Xiaoming will not perceive it)
The browser gets the public key A of XX from the certificate
The browser generates a symmetric key B with only its own, encrypts it with the public key A, and passes it to XX (in fact, there is a negotiation process, here is simplified for the sake of understanding)
XX decrypts with the private key and gets the symmetric key B
Browser and data communication after XX are encrypted with key B

Note: For each user accessing XX, the generated symmetric key B is theoretically different. For example, Xiaoming, Xiaowang, and Xiaoguang may generate B1, B2, and B3.

Refer to the picture below: (attach the source of the original picture )

enter image description here

What are the possible problems with the certificate

After understanding the process of HTTPS encrypted communication, the doubts about data streaking should be basically dispelled. However, attentive viewers may have questions: how to ensure that the certificate is legal and valid?

There may be two situations in which the certificate is illegal:

The certificate is forged: it was not issued by a CA at all
The certificate has been tampered with: for example, the public key of the XX website has been replaced

for example:

We know that there is a thing called a proxy in this world. Therefore, the above Xiaoming's login to the XX website may be like this. Xiaoming's login request first arrives at the proxy server, and then the proxy server forwards the request to the authorization server.

Xiao Ming –> Evil Proxy Server –> Login Authorization Server
Xiao Ming

Then, there are too many bad people in this world. One day, the proxy server has a bad idea (it may also be hacked) and intercepts Xiao Ming's request. At the same time, an invalid certificate was returned.

Xiaoming –> evil proxy server –x –> login to the authorization server
Xiaoming login to the authorization server

If the kind-hearted Xiao Ming believed this certificate, then he would run naked again. Of course not, then, what mechanism is used to prevent this kind of thing from being released?

Next, let's take a look at what the "certificate" contains, and then we can roughly guess how to prevent it.

Certificate Introduction

Before formally introducing the format of the certificate, a small advertisement is inserted, and the digital signature and abstract are popularized, and then a non-in-depth introduction to the certificate is given.

why? Because digital signatures and digests are very critical weapons for certificate anti-counterfeiting.

Digital Signature and Digest

Simply put, "summary" is a fixed-length string calculated by hash algorithm for the content of the transmission (is it associated with the article abstract). Then, the digest is encrypted by the private key of the CA, and the result obtained after encryption is the "digital signature". (The private key of CA is mentioned here, which will be introduced later)

Plaintext –> hash operation –> digest –> private key encryption –> digital signature

Combining the above content, we know that this digital signature can only be decrypted by the public key of the CA.

Next, let's take a look at what the mysterious "certificate" contains, and then roughly guess how to prevent illegal certificates.

For more information on digital signatures and digests, please refer to this article .

certificate format

Shamelessly paste a large paragraph of content first, the certificate format comes from this good article " OpenSSL and SSL Digital Certificate Concept Post "

There is a lot of content, here we need to pay attention to a few points:

The certificate contains the name of the authority that issued the certificate - CA
Digital signature of certificate content itself (encrypted with CA private key)
Certificate holder's public key
Hash algorithm used for certificate signing

In addition, one thing needs to be added, that is:

CA itself has its own certificate, which is called "root certificate" by Jianghu people. This "root certificate" is used to prove the identity of the CA and is essentially an ordinary digital certificate.
Browsers usually have the root certificates of most major authoritative CAs built in.

certificate format

1. Certificate version number (Version)
The version number indicates the format version of the X.509 certificate. The current value can be:
1) 0: v1
2) 1: v2
3) 2: v3
is also predefined for future versions

2. Certificate Serial Number (Serial Number) The
serial number specifies the unique "numeric identifier" assigned to the certificate by the CA. When a certificate is revoked, the serial number of this certificate is actually put into the CRL issued by the CA,
which is why the serial number is unique.

3. Signature Algorithm
The signature algorithm identifier is used to specify the "signature algorithm" used when the certificate is issued by the CA. The algorithm identifier is used to specify the CA to use when issuing the certificate:
1) Public key algorithm
2) Hash algorithm
example: sha256WithRSAEncryption
must be registered with an internationally renowned standards organization (such as ISO)

4. Issuer Name (Issuer)
This field is used to identify the X.500 DN (DN-Distinguished Name) name of the CA that issued the certificate. Include:
1) Country (C)
2) Province (ST)
3) Region (L)
4) Organization (O)
5) Organizational Unit (OU)
6) Common Name (CN)
7) Email Address

5. Validity
specifies the validity period of the certificate, including:
1) The date and time when the certificate becomes effective
2) The date and time when the certificate expires
Every time you use a certificate, you need to check whether the certificate is within the validity period.

6. The certificate user name (Subject)
specifies the X.500 unique name of the certificate holder. Include:
1) Country (C)
2) Province (ST)
3) Region (L)
4) Organization (O)
5) Organizational Unit (OU)
6) Common Name (CN)
7) Email Address

7. The certificate holder's public key information (Subject Public Key Info) The
certificate holder's public key information field contains two important information:
1) The value of the certificate holder's public key
2) The public key used Algorithm identifier. This identifier contains the public key algorithm and hash algorithm.
8. Extension (extension)
X.509 V3 certificate is based on v2 and adds extension items in standard form or common form, so that the certificate can carry additional information. Standard extensions are extensions defined
by X.509 V3 that have broad application prospects and are added to V2. Anyone can
register some other extensions with some authoritative organizations, such as ISO. If these extensions are widely used, Maybe it will become a standard extension in the future.

9. Issuer Unique Identifier The Issuer Unique Identifier
was added to the certificate definition in version 2.
This field is used to uniquely identify the issuer's X.500 name with a one-bit string when the same X.500 name is used for multiple certification authorities . Optional.

10. The certificate holder's unique identifier (Subject Unique Identifier)
The certificate holder's unique identifier is added to the X.509 certificate definition in the second edition of the standard.
This field is used to uniquely identify the certificate holder's X.500 name with a one-bit string when the same X.500 name is used for multiple certificate holders . Optional.

11. Signature Algorithm The signature algorithm
of the certificate issuing authority for the above content of the certificate
example: sha256WithRSAEncryption

12. Issuer's Signature
The signature value of the certificate issuing authority for the above content of the certificate

How to Identify Illegal Certificates

As mentioned above, the XX certificate contains the following:

The certificate contains the name of the authority that issued the certificate - CA
Digital signature of certificate content itself (encrypted with CA private key)
Certificate holder's public key
Hash algorithm used for certificate signing

The root certificate of the browser's built-in CA contains the following key contents:

CA's public key (very important!!!)

Ok, let's explain how to identify the two illegal certificates mentioned above.

Completely fake certificate

This case is relatively simple, check the certificate:

The certificate issuing authority is forged: the browser does not recognize it and directly thinks it is a dangerous certificate
The certificate issuing authority does exist, so according to the CA name, find the corresponding built-in CA root certificate and CA's public key.
Use the CA's public key to decrypt the digest of the forged certificate, and find that it cannot be solved. considered dangerous certificate

Tampered certificate

Suppose the agent obtains the certificate of XX through some way, and then secretly changes the public key of the certificate to his own, and then happily thinks that the user is going to be hooked. But it's too simple:

Check the certificate and find the corresponding CA root certificate and CA's public key according to the CA name.
Use the CA's public key to decrypt the digital signature of the certificate to obtain the corresponding certificate digest AA
Calculate the digest BB of the current certificate according to the hash algorithm used in the certificate signature
Comparing AA and BB, found inconsistency -> judged to be a dangerous certificate

HTTPS handshake process

The above talk about a big pass, how HTTPS ensures the security of data encryption and transmission is basically covered, and if it is too technical, it will be skipped directly.

Finally there are two last questions:

How does the website give the certificate to the user (browser)
How is the symmetric key mentioned above negotiated?

The above two problems are actually what to do in the HTTPS handshake phase. The data transmission process of HTTPS is similar to HTTP as a whole, and it also includes two stages: handshake and data transmission.

Handshake: certificate issuance, key negotiation (all in clear text at this stage)
Data transmission: This stage is encrypted, using the symmetric key negotiated in the handshake stage

Teacher Ruan's article is very well written and easy to understand. Interested students can read it.

Attachment: " Overview of the Operating Mechanism of the SSL/TLS Protocol "