How to construct HTTP requests and the working process of HTTPS

1. How to construct an HTTP request:

  1. Based on HTML/JS (the client constructs an HTTP request, the most common HTTP client is the browser)

    • formform based
    • based onajax
  2. Based on Java (this solution is completely feasible, but it is not used as much as the above method in actual development)

    • based onsocket

1. Construct HTTP request based on form

1.1、form

form is a common tag in HTML and can be used to send GET or POST requests to the server.

Important parameters of form:

  • action: Which server the constructed HTTP request is sent to, it is a URL
  • method: Whether the constructed HTTP request method is GET or POST (form only supports GET and POST, and is not case-sensitive)

With this form tag alone, it is not possible to submit, and there is nothing to submit. It
also needs to be matched with some other tags in the form, such as input.

Important parameters of input:

  • type: Indicates the type of input box, ``text 表示文本, password 表示密码,submit` indicates the submit button
  • value: The value of the input tag. For the submit type, the value corresponds to the text displayed on the button.
  • name: Not id, not class, the name attribute has nothing to do with style. The data submitted by the from form to the server is essentially key-value pairs. The name here represents the key of the query string of the constructed HTTP request, and the value of the query string is the content entered by the user in the input box.
<input type="text" name="username"> <!-- key 就是 username,value 就是用户在输入框输入的内容 -->
<input type="password" name="password"> <!-- key 就是 password,value 就是用户在输入框输入的内容 -->

Assume that the user name entered by the user here is zhangsan and the password is 123.
At this time, the data to be submitted generated by the form form will be in the form: username=zhangsan&password=123

Just having two input boxes is not enough. You also need a "submit button" submit to trigger the HTTP request here.


1.2. get request

<form action="http://www.sogou.com/index.html" method="get">
    <input type="text" name="username">
    <input type="password" name="password">
    <input type="submit" value="提交">
</form>

The query string here is exactly the data to be submitted to the server by the page:


1.3. Post request

<form action="http://www.sogou.com/index.html" method="post">
    <input type="text" name="username">
    <input type="password" name="password">
    <input type="submit" value="提交">
</form>

At this time, there is no name in the url.

Insert image description here

View request:

Insert image description here

Note: If you change to lisi 123, the submission is still Sogou's homepage, because currently we submit such a request directly to Sogou's homepage, but Sogou does not process such parameters. When you write your own server, your own server can submit it to the front end
. By processing the parameters, you can achieve some different functions.


2. Ajax constructs HTTP request

2.1、ajax

The form form is a more primitive construction method. Using form will definitely involve page jumps and
the browser needs to load a new page.

This matter is very unscientific, especially when the page is very complex. As the
front-end page becomes more and more complex, we hope that the page will not load the entire page, but only a small part of it that needs to be changed.

In this case, you can use ajax.
In JavaScript, you can construct an HTTP request through ajax, and then use js code to process the response here, and update some of the obtained data to the page.

  • The full name of ajax Asynchronous Javascript And XMLis a method proposed in 2005 for JavaScript to send HTTP requests to the server.
  • The characteristic is that data can be transferred without refreshing the page/jumping the page.

2.2. Asynchronous

The concept of asynchronous is a very common concept in computers. The synchronization mentioned here and the synchronization mentioned in the locking section are not synchronization. It is a computer term that may have different meanings in different contexts.

For example, when I go out with my girlfriend and make an appointment, I arrive first and wait for her first. The waiting here is a synchronous waiting . I am the caller and my girlfriend is the callee. The caller will always be there. Wait here, take the initiative to get the result of the callee

When waiting asynchronously , I just told her, I'll find a cool place to play with my phone for a while, and you'll call me when you get down. After the caller initiates a call request, he will ignore it until the result of the callee comes out. Will proactively notify the caller

Synchronization waiting:

  1. Waiting in a blocked manner (see you or not see you)
  2. Wait non-blockingly (query the results every once in a while)

Another example is when I go to eat and come to the store: Boss, please have some fried rice with eggs.

  1. Synchronous blocking waiting:
    I squat here at the front desk, watching the kitchen to cook, until the meal is ready, I take it away myself

  2. Synchronous non-blocking waiting:
    I took a look at the front desk and found that the meal was not ready. I went out for a walk. After a while, I came back to the
    front desk to check and found that the meal was still not ready. I went to play with my mobile phone... After several times Afterwards, I found that the meal was ready and took it away myself.

  3. Asynchronous waiting:
    I just don’t care about anything, just find a corner in the lower left corner, play with my phone, and do whatever I have to do.
    After the meal is ready, people serve it to me directly.

Both methods 2 and 3 allow you to do other things while waiting.
The difference is that the second method is more expensive for the caller (repeatedly querying the results), while the third method often is better

In IO scenarios, these three situations are often involved.
IO includes, you input and output through the console / input and output through the file / input and output through the network

Scanner, input stream object and output stream object, default to synchronous blocking and waiting

Ajax uses asynchronous wait

  • Synchronous and asynchronous : The main difference lies in whether the caller actively pays attention to the result, or whether the callee notifies the caller.
  • Blocking and non-blocking : The difference is whether you can do other things while waiting.

2.3. Ajax request

Ajax is based on asynchronous waiting .

  • First construct an HTTP request and send it to the server
  • But the browser is not sure when the server will respond, so it ignores it for now and continues to execute other code in the browser (whatever it needs to do)
  • After the server's response comes back, the browser notifies us of the corresponding JS code and processes the response in the form of a callback function.

This is to construct the request and process the response through ajax of native JS. The native writing method is very troublesome, abstract and difficult to understand. Use a simpler and better-understood
method to demonstrate the relevant code based on ajax in jQuery
jQuery It is the most well-known library (no one) in the JS world. jQuery's status in js is equivalent to spring's status in Java. In recent years,
jQuery's limelight has been taken away by some new JS frameworks. The three major frameworks of Shao, Vue, React and Angela

Insert image description here

Introduce jquery:

  1. First search the jquery cdn query term in the search engine
  2. In the results, find a suitable cdn url
  3. Open the corresponding url and load the jquery ontology
  4. Copy the posted content to a local file
http://libs.baidu.com/jquery/2.0.0/jquery.min.js

Insert image description here

Ajax using jquery:$

Variable name. js allows $As part of the variable name, this $is the core object in jquery. Various APIs of jquery are $triggered through

$.ajax ({
    
    
	
});

The ajax function is called through the $ object. There is only one parameter, but it is an "object"

Values ​​in the object:

  • type: Indicates the method of HTTP request, not only supports GET and POST, but also supports other methods such as PUT and DELETE.
  • url: URL of HTTP request
  • success: Corresponds to a callback function. This callback function will be called after correctly obtaining the HTTP response. It is an asynchronous process.

The ajax parameter can also have some other values ​​here: jQuery ajax - ajax() method

<script src="jquert.js"></script>
<script>
    $.ajax ({
    
    
        type: 'get',
        url: 'http;//www.sogou.com/index.html',
        success: function(body) {
    
    
            // 回调函数的参数就是 HTTP 响应的 body 部分
            console.log("获取到响应数据!" + body);
        },
        error: function() {
    
    
            // error 也对应一个回调函数 会在请求失败后触发 也是异步
            console.log("获取响应失败!");
        }
    });
</script>

In the ajax request just now, we can see through packet capture that the response is 200 OK , and the body is also html data.

But the browser still thinks this is an "error" request

Insert image description here

The reason for this error is that the browser prohibits ajax from performing cross-domain access across multiple domain names/multiple servers.

The server where the current page is located is a local file, the URL requested by ajax in the page, and the domain name is www.sogou.com

The server where the current page is located is in www.sogou.com. The page then requests the URL through ajax. The domain name is www.sogou.com. This is not considered cross-domain.

The above behavior is a restriction given by the browser . Of course, we also have ways to bypass this restriction.
If the response returned by the other server contains relevant response headers and allows cross-domain operations, it can be displayed normally by the browser.

Therefore, the ajax request we construct now cannot be processed correctly. When can it be processed correctly? We just need to have a server of our own, so that the addresses of the page and ajax are both this server, that's it.


3. Construct HTTP request through Java socket

Java constructs an HTTP request, mainly based on TCP socket. According to the message format of the HTTP request, a matching string is constructed, and then written into the socket.

In actual development, there will indeed be some situations where http requests are constructed based on Java, which can be implemented directly based on third-party libraries, and do not necessarily have to use sockets directly.

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.Socket;

public class HttpClient {
    
    
    private Socket socket;
    private String ip;
    private int port;

    public HttpClient(String ip, int port) throws IOException {
    
    
        this.ip = ip;
        this.port = port;
        socket = new Socket(ip, port);
    }

    public String get(String url) throws IOException {
    
    
        StringBuilder request = new StringBuilder();
        // 构造首行
        request.append("GET " + url + " HTTP/1.1\n");
        // 构造 header
        request.append("Host: " + ip + ":" + port + "\n");
        // 构造 空行
        request.append("\n");
        // 发送数据
        OutputStream outputStream = socket.getOutputStream();
        outputStream.write(request.toString().getBytes());
        // 读取响应数据
        InputStream inputStream = socket.getInputStream();
        byte[] buffer = new byte[1024 * 1024];
        int n = inputStream.read(buffer);
        return new String(buffer, 0, n, "utf-8");
    }

    public String post(String url, String body) throws IOException {
    
    
        StringBuilder request = new StringBuilder();
        // 构造首行
        request.append("POST " + url + " HTTP/1.1\n");
        // 构造 header
        request.append("Host: " + ip + ":" + port + "\n");
        request.append("Content-Length: " + body.getBytes().length + "\n");
        request.append("Content-Type: text/plain\n");
        // 构造 空行
        request.append("\n");
        // 构造 body
        request.append(body);
        // 发送数据
        OutputStream outputStream = socket.getOutputStream();
        outputStream.write(request.toString().getBytes());
        // 读取响应数据
        InputStream inputStream = socket.getInputStream();
        byte[] buffer = new byte[1024 * 1024];
        int n = inputStream.read(buffer);
        return new String(buffer, 0, n, "utf-8");
    }

    public static void main(String[] args) throws IOException {
    
    
        HttpClient httpClient = new HttpClient("42.192.83.143", 8080);
        String getResp = httpClient.get("/AjaxMockServer/info");
        System.out.println(getResp);
        String postResp = httpClient.post("/AjaxMockServer/info", "this is body");
                System.out.println(postResp);
    }
}

The HTTP client constructed using Java no longer has "cross-domain" restrictions. At this time, it can also be used to obtain data from other servers.
Cross-domain is just a browser behavior. It is effective for ajax. For other languages, it is generally the same as cross-domain. domain independent

HttpClient httpClient = new HttpClient("www.sogou.com", 80);
String resp = httpClient.get("/index.html");
System.out.println(resp);

// 此时可以获取到 搜狗主页 的 html

In the standard http request header, which of the following statements is correct (ABCD)

  • A.User-Agent: declares the user’s operating system and browser version information
  • B.Content-Type: data type
  • C.Host: The client tells the server which host and which port the requested resource is on.
  • D.location: Used with 3xx status code to tell the client where to visit next

2. HTTPS

1. Operator hijacking

HTTPS is also an application layer protocol. HTTPS is equivalent to the twin brother of HTTP. It introduces an encryption layer based on the HTTP protocol.

The content of the HTTP protocol is transmitted in clear text , which leads to some tampering during the transmission process.

The infamous "carrier hijacking":

Download a tune every day

The effect is not hijacked, click the download button, and the download link for Tiantiandongting will pop up.

The effect has been hijacked, click the download button, and the download link of QQ browser will pop up.

Insert image description here

Insert image description here

Not only operators can hijack, but other hackers can also use similar means to hijack users' private information or tamper with content.
Imagine if a hacker obtains the user's account balance when the user logs in to Alipay, or even obtains the user's payment. password…

On the Internet, clear text transmission is a relatively dangerous thing!!!

HTTPS is encrypted on the basis of HTTP to further ensure the security of user information.


2. Encryption

Encryption is to perform a series of transformations on the plaintext (the information to be transmitted) to generate **ciphertext**.
Decryption is to perform a series of transformations on the ciphertext and restore it to plaintext.

In this encryption and decryption process, one or more intermediate data are often needed to assist in this process . Such data is called a key (correctly pronounced yue with four tones, but everyone usually pronounces it as yao with four tones, or shi second tone)

Encryption and decryption have now developed into an independent discipline: cryptography .
The founder of cryptography is also one of the founders of computer science, Alan Matheson Turing.

Insert image description here

Compare to our other ancestor von Neumann

Insert image description here

Turing was a young and promising man who not only laid the foundations of computers, artificial intelligence, and cryptography, but also destroyed the German Enigma machine in World War II, giving the Allies an intelligence advantage and turning defeat into victory. However, Turing was criticized by the British royal family. Persecuted, died at the age of 41. The movie "The Imitation Game" tells the story of Turing.
The highest honor in the field of computing is the "Turing Award" named after him.

In the 83rd edition of <<Burning the Old Summer Palace>>, someone wanted to rebel and kill the Empress Dowager Cixi. Prince Gong Yixin (one of the representatives of the Westernization Movement) handed Cixi a brochure. The contents of the brochure were just some household details, and a hole-decked one was wrapped around it. You can see the true meaning on paper

Insert image description here

  • Plain text: The original message to be transmitted, "Beware of Sushun, Duanhua, and Dai Heng." (Sushun, Duanhua, and Dai Huan were the assistant ministers appointed by the old emperor before his death, and were later taken over by Cixi)

  • Encrypted text: The full text of the memorial. Even if it is obtained by others, nothing can be seen with the encrypted text.

  • Key: Use the key to convert plain text into cipher text, or restore cipher text into plain text. This is paper with holes.

Insert image description here


3. Working process of HTTPS

Encryption and decryption itself is a matter closely related to mathematics.
Here, we can only briefly discuss the "process" and cannot discuss the "implementation details" of encryption and decryption.

After encryption, it is not absolutely safe . It just means that it is very computationally intensive and costly to crack.
After some data is encrypted, even using the most powerful computers at present, it will take decades or hundreds of years to crack. This kind of It is considered safe
as long as the cost of cracking is higher than the value of the data itself .
(There is a gang that makes counterfeit banknotes, and they are so good at it that the banknote detector cannot distinguish it at all... But make a counterfeit 100-yuan banknote. , the actual cost is 110 yuan...)

The encryption layer introduced in HTTPS is called SSL (old name) / TLS (new name)

In SSL, the encryption operations involved are actually mainly in two ways:

  1. Symmetric encryption : using the same key for both encryption and decryption
  2. asymmetric encryption

3.1. Symmetric encryption

Symmetric encryption actually uses the same "key" to encrypt plaintext into ciphertext, and it can also decrypt ciphertext into plaintext.

A simple symmetric encryption, bitwise XOR

  • Assume plaintext a = 1234, key key = 8888
  • Then the ciphertext b obtained by encrypting a ^ key is 9834.
  • Then perform the operation b ^ key again on the ciphertext 9834, and the result is the original plaintext 1234.
  • (The same is true for symmetric encryption of strings, each character can be represented as a number)
  • Of course, bitwise XOR is just the simplest symmetric encryption. Bitwise XOR is not used in HTTPS.

Insert image description here

Client and server hold the same key

The data transmitted by the client (the header and body of the HTTP request) are symmetrically encrypted using this key , and what is actually transmitted on the network is ciphertext.

After the server receives the ciphertext, it can then decrypt it based on the key just now and get the plaintext.

The above process looks pretty good, but there is a fatal flaw.
How to ensure that the client and server hold the same key? Especially when one server corresponds to many clients.

Insert image description here

Obviously, different clients must use different keys .
If each client has the same key, this key will be too easy for hackers to get (the hacker only needs to start a client...)

Since different keys are required, the server needs to be able to record what the keys of different clients are
and the key must be passed between the client and the server.

Because different clients need different keys, either the client actively generates a key and tells the server, or the server generates a key and tells the client that this key needs to be transmitted through the network.

Insert image description here

In this figure, it is assumed that the client generates a key, and the client needs to tell the server the key through the network.

The client generates a key, 888888, and the client has to tell the server that our key is 888888

Since the device may have been hacked long ago,
what is the key? If it is transmitted in clear text, it can be easily obtained by hackers . If the hacker knows your key, any subsequent encryption will be in vain.


3.2. Asymmetric encryption

After the above discussion, it is clear that the biggest problem with using symmetric encryption is that the key must be able to be passed. If it is passed in plain text, it will not work. The key must be encrypted again.

The key to solving the problem here is the introduction of asymmetric encryption.

Asymmetric encryption has two keys , called public key and private key.

  • The public key means everyone can get it

  • The private key is only known to you

You can use the public key to encrypt and the private key to decrypt.
Or, you can use the private key to encrypt and the public key to decrypt.

Intuitively understand the public key and private key:

  • In many communities, there is a "mailbox" at the door of the unit.
  • You have a key and many locks. You give these locks to the messenger boy.
  • Every letter delivery boy can use this lock to lock the letter into your mailbox. Only you hold the key and can open the box and take out the letter.
  • The lock here is equivalent to the public key, and the key in your hand is the private key.

Based on asymmetric encryption, the server can generate a pair of public and private keys by itself. The public key is sent out (everyone can get it), and the private key is kept by itself.

The client generates a symmetric key, the client can use the server's public key to encrypt the symmetric key, and then transmits the data to the server, and the server decrypts it through the private key

Insert image description here

The server holds the private key, and the client holds the public key. Hackers can get the public key, but not the private key.

After the client generates a symmetric key, it can encrypt the symmetric key based on the public key just now.

If a hacker gets the ciphertext , he cannot decrypt it because he does not have the private key, and he does not know what the symmetric key is.

Since asymmetric encryption is so easy to use, why do we need symmetric encryption? Just use asymmetric encryption directly?

  • In actual implementation, the computational overhead of symmetric encryption << asymmetric encryption
  • If you just come and go less often, use this asymmetric encryption, the cost is not bad
  • But if all data is encrypted asymmetrically, the cost will be too high.

3.3. Man-in-the-middle attack

The above process may seem perfect, but it is not. There is still a huge hole here!!!

The server needs to return its public key to the client. In this operation , a very classic "man-in-the-middle attack" may be involved.

Normal situation:

Insert image description here

Man-in-the-middle attack:

Insert image description here

The key to a man-in-the-middle attack is that the hacker himself generates a pair of public and private keys

Intercept the public key returned by the server to the client and replace it with the public key generated by yourself.

After the hacker intercepts the ciphertext of the symmetric key, since the ciphertext is encrypted using public key2!! Therefore, the hacker can use private key2 to decrypt!!! The hacker obtains the symmetric key, 888888

Immediately afterwards, in order to hide himself, the hacker encrypted 888888 using the public key previously obtained from the server, obtained another ciphertext, and returned it to the server.


3.4. Introduction of certificates

Since there is a man-in-the-middle attack, how to solve this problem?
The key point is to allow the client to confirm that the current public key does come from the server and is not forged by a hacker

Think about it, how are other scenarios in life verified?
For example, if you go to an Internet cafe or stay in a small hotel, you need to register your identity.

  • How to verify your identity? You have an ID card
  • The network administrator will take your ID card and swipe it. This swipe is actually accessing the relevant server of the Public Security Bureau to verify your identity information.
  • Therefore, it is necessary to introduce a third-party public trust organization to prove that the public key is a legitimate public key.
  • Because we trust this public trust institution (just like we trust jc), and the public trust institution says that the public key is OK, we can think that the public key is trustworthy!!!

Insert image description here

When the server first comes online, you need to go to the CA organization and apply for a certificate!
Then the public key generated by the server itself is placed in this certificate (it is a piece of data)

When the client and server first establish a connection, the server returns a certificate to the client. This certificate contains the public key just now and the identity information of the website.

This certificate can be understood as a structured string containing the following information:

  • Certificate issuing authority
  • Certificate validity period
  • public key
  • certificate owner
  • sign

It is also possible for hackers to forge certificates on the client side.

When the client obtains this certificate, it will verify the certificate (to prevent the certificate from being forged)

How does the client verify whether this certificate is reasonable?

  1. The certificate itself has some verification mechanisms

  2. Seek verification from public trust agencies

If a hacker forges the certificate, the secret will be exposed at this time, and the browser will pop up an alert.

  • Determine whether the validity period of the certificate has expired
  • Determine whether the issuing authority of the certificate is trusted (a trusted certificate issuing authority built into the operating system)
  • Verify whether the certificate has been tampered with: Obtain the public key of the certificate issuing authority from the system, decrypt the signature, and obtain a hash value (called data digest ), set as hash1. Then calculate the hash value of the entire certificate, set as hash2. Compare hash1 and hash2 to see if they are equal.
    If they are equal, it means the certificate has not been tampered with.

Wouldn’t it be too troublesome to visit this public trust organization every time for verification?
Indeed, in fact, the client itself will contain some public trust organization information (built into the operating system)
without requesting it through the server network. Authentication can be done locally (this is like a very good Internet cafe, the public security bureau directly sent a jc to be stationed here for a long time)

The things described above are all included in SSL. SSL is not only applied to HTTPS, but also used in many other places.

  • This entire encryption process is expected to prevent data from being intercepted, but more importantly, it is to prevent data from being tampered with.

  • Since the HTTP data has been encrypted, why can fiddler still capture and parse the datagrams in HTTPS?

    • The reason why fiddler can capture packets is closely related to the dialog box that pops up when we enable the HTTPS function for the first time after installing fiddler!!!
    • The point is the operation, which is actually to enable the operating system to trust the certificate provided by fiddler
    • It is equivalent to the user authorizing fiddler, allowing fiddler to conduct "man-in-the-middle attacks"

    Insert image description here

Check the trusted certificate issuing authority of the browser:
Chrome browser, click on "Settings" in the upper right corner, search for "Certificate Management", you will see the following interface

Insert image description here

Understanding data digests/signatures:

  • In the future, when we join the work, we will often be involved in "reimbursement" scenarios. If you take an invoice and want to reimburse, you need the approval of the leader. But the leader cannot go to the
    finance department with you. What should we do?

  • It's very simple, just ask the leader to sign for you. When the financial officer sees the leader's signature, he "sees the words as he sees the person". Because
    different people, the "signature" will be very different . Using signatures can distinguish certain people to a certain extent. a specific person.

  • Similarly, for a piece of data (such as a string), you can also use some specific algorithms to generate a "signature
    " for this string. And ensure that the "signature" generated is very different for different data . Use it like this The signature can distinguish different data to a certain extent.

  • Common algorithms for generating signatures include: MD5 and SHA series.
    Taking MD5 as an example, we do not need to study the specific process of calculating signatures. We only need to understand the characteristics of MD5:

  • Fixed length: No matter how long the string is, the calculated MD5 value is fixed length (16-byte version or 32-byte version)

  • Dispersion: As long as the source string changes a little, the final MD5 value will be very different.

  • Irreversible: It is easy to generate MD5 from the source string, but it is theoretically impossible to restore the original string through MD5.

  • Because MD5 has such characteristics, we can think that if the MD5 values ​​​​of two strings are the same, the two strings are considered to be the same.

  • Understand the process of determining whether a certificate has been tampered with : (This process is like determining whether the ID card is a fake ID card)

  • Assume that our certificate is just a simple string hello, calculate the hash value (such as md5) for this string, and the result is
    BC4B2A76B9719D91

  • If any character in hello is tampered with, such as hella, then the calculated md5 value will change greatly.
    BDBD6F9CF51F2FD8

  • Then we can return the string hello and the hash value BC4B2A76B9719D91 from the server to the client. At this time,
    how does the client verify whether hello has been tampered with?

  • Then just calculate the hash value of hello and see if it is BC4B2A76B9719D91

  • But there is another problem. If a hacker tampered with hello and recalculated the hash value, the client would not be able to tell the difference.

  • Therefore, the transmitted hash value cannot transmit plaintext, but needs to transmit ciphertext.

  • This hash value is encrypted on the server side through another private key (this private key is given to the server by the certificate issuing authority when applying for a certificate
    , not the private key used to transmit symmetric keys between the client and the server).

  • Then the client decrypts it using the public key of the certificate issuing authority already stored in the operating system, restores the original hash value, and then
    verifies it.


Guess you like

Origin blog.csdn.net/qq_56884023/article/details/125121614