JavaWeb~http protocol/network communication transport model/tools to monitor http requests and responses/forward proxy/reverse proxy/URL Uniform Resource Locator

Introduction to HTTP Protocol

what is http

HTTP is a very important protocol in computer network communication.超文本传输协议(Hypertext Transfer Protocol)

According to the name, we can divide http: hypertext (Hpertext), transmission (Transfer), protocol (Protocol) .

insert image description here
Hypertext:
Text is what we usually call text, that is, the existence of simple characters, meaningful binary data packets that can be parsed by a computer. In early computers, the information we entered could only be stored locally, in the form of text. With the development of the Internet, people are not satisfied with only transferring text between two computers, but also want to transfer pictures, videos, audios, and even hyperlinks that can be jumped after clicking. Then the semantics of the text is expanded, and the expanded text is called 超文本(Hypertext)
transmission:
two computers are connected to each other, and binary data packets are sent to one computer through physical transmission carriers (optical cables, telephone lines, coaxial cables), etc. The process of transmitting a terminal to another computer terminal is called a 传输(transfer)
protocol:
a network protocol is some specifications for transmitting and managing information in the network, and the rules that need to be followed when communicating between computers are called 协议(Protocol).

In the network model, the HTTP protocol itself belongs to the application layer protocol and
is implemented based on the TCP protocol of the transport layer.
(HTTP 1.0 HTTP1.1 HTTP2.0 is based on TCP, HTTP 3 is based on UDP)
insert image description here
Currently, HTTP 1.1 and HTTP2.0 are mainly used.

Several core concepts in network communication

Browser :
Similar to the email transmission protocol used by major mailboxes SMTP, the browser is httpthe main carrier of the protocol. The official name for a browser is Web Broser, an application for retrieving and viewing Internet web resources. (The Web here refers to the World Wide WebWorld Wide Web, that is, in the URL www)

The process of the browser initiating the request and the response :
When we enter the URL (web address) in the browser address bar, the browser will provide the URL to the DNS (domain name server), which completes the mapping of the URL to the IP address, and then converts the browser's request. Submit it to a specific server, and then the server returns the request result to the browser in the form of HTML encoding, and then the browser executes the HTML file and displays it in the browser body.
Server (Client) :
The server that stores web pages and database data. All dynamic web pages are executed on the server side into HTML, CSS, JS and other files, and then downloaded to the client side for display.
Client (Server) :
That is, the computer we use, including some commonly used browsers. It can also be understood as a terminal operated by a user, a program capable of providing local services to the user.

Web server:
The official name of the Web server is Web Serverthat this server generally refers to the website server. In the local computer, the browser is httpthe initiator of the request, and the Web server is httpthe responder of the request.

There are various transmission models between the server and the client :
1. Once sent and received, there is a response for each request, which is a one-to-one correspondence. This model is also the most common model in web development.
insert image description here
2. Multiple sending and receiving, multiple requests corresponding to one response, such as the upload of some large files, etc. are similar scenarios.
insert image description here
3. One sending and multiple receiving, one request, corresponding to multiple layers of responses, such as watching live broadcasts.
insert image description here
4. Multiple sending and multiple receiving, multiple requests correspond to multiple responses, such as steam link, the computer is operated through the mobile phone, and the computer continuously sends requests and receives data to the server.
insert image description here

Use tools to observe http requests and responses

Two ways : 1. chrome developer tools, the browser comes with it. The information is not very complete, and the complete format of the request and response cannot be seen
insert image description here

2. wireshark packet capture tool
Get data directly from the network card (very comprehensive, even including Ethernet data frames)
3. fiddler packet capture tool
A tool for capturing HTTP/HTTPS requests

insert image description here
On the inspectors interface on the right, click raw to see the specific request content information. If it is an HTTPS encrypted request, the fiddler packet capture tool will automatically decrypt it.
insert image description here

Clear request/response data: select all (ctrl+a) press delete to clear

The difference between http protocol and https protocol:

https is just a layer of encryption on the basis of http.

Forward proxy and reverse proxy

Forward/reverse proxy refers to the operation mode in which there are some intermediate servers or third-party tools between the client and the server.
Forward: From the perspective of the server, there is a third party to help the client send requests, and the server returns the response directly to the third party, regardless of who the client is. Our commonly used online game accelerators, vpns, etc. are all forward proxies.
insert image description here
Reverse: From the perspective of the client, there is a third party to help the server send the response, that is, the third-party proxy server accepts the request, then sends it to the real server, and then sends the server response to the client.
insert image description here
The difference between the two:
insert image description here

Format of http request and response

The data in the http request is all text mode data.

The data comes in two formats: text mode and binary mode. Judgment method, open it with Notepad, you can understand it, it is not garbled, it is text mode. I don't understand, it's garbled, that's binary mode data.

As shown below:
insert image description here
Request format :

1. The first line
The first line is divided into three parts: HTTP method (method), requested url, version number

2. Header (request header) header: a bunch of key pair values, a key value pair is a line, the
key is in the front, the value is in the back, and the middle is separated by a colon + space.

3. A blank line
indicates the end of the header section

4. Body: optional, usually the content of the file

Response format:
1. First line: three parts: protocol version number, status code, description of status code
2. Protocol header header: same as the request format, it is also a series of key-value pairs
3. Blank line: end sign
3 .body body : may or may not be present, similar to the request, it is the file content of the response

Summary of http protocol

The http protocol is a protocol in which data is in text mode. The protocol is divided into two parts: request and response.

request
method URL version number
request header key:value
request header key:value

[empty line]
body

Response
Version Number Status Code Status Code Description
Response Header Key: Corresponding value
Response Header Key: Corresponding value

[Blank Line]
Body

As shown below:
insert image description here
The format of the request and response is very similar, but the information is very different.

HTTP request (request)

URL

The so-called URL is the url (Uniform Resource Locator.)

Many protocols of the network have related standard documents to describe their RFC series documents
. Various protocols and various detailed specifications are described by the RFC series documents.
RFC7939 standard
insert image description here
URLs are not only for HTTP services, but also for many protocols.
Such as HTTPS, jdbc:mysql://, file://

The composition of the url :
the first is the protocol name http or https
: // is the fixed content, and then the server address.
A domain name is essentially an IP address.
The address can be followed by a colon, and a specific port number can be written after the colon, indicating the server port to be accessed.
If you do not write the port number, the browser will give a default port number. For HTTP protocol, the default port number is 80, and for https, the default port number is 443.

The port number is followed by a path.
The path in the URL indicates access to different resources on the server ~
there are many resources on a server program

The path is followed by parameters.
Also called 查询字符串.
The query string
parameters are some information passed by the browser to the server.

These parameters are optional, generally organized in the form of
key-value pairs, between key-value pairs, use & to separate
between keys and values, use = to separate
query string use? As the starting flag
, what does each key-value pair mean? This is the programmer's own agreement.
Different websites, the key-value pairs in the string queried here are completely different.

Fragment identifiers in URLs
Uncommon, used to locate a specific location on an HTML page Common
in "document-like" sites

URL format summary:
1. Protocol name: URL supports many kinds of protocols
2. Username and password: First, it has been abandoned and is no longer used
3. Server address, which can be a domain name or an IP address.
4. Port number: If you do not write the port number, there will be a default value (automatically added by the browser)
HTTP default value is 80, HTTPS default value is 443
5. Path: Indicates which resource is on the server
6. Query string: Some parameters passed by the browser to the server, customized by the programmer
7. Fragment identification: locate a certain part of the page.

The original intention of URL is to distinguish the only resource on a network~
1) First, locate a specific server through the server address
2) Then locate a specific application through the port number
3) Then locate the application through the path A specific resource that a program manages.
4) Through the query string, further explain the requirements of this specific resource
5) Finally, determine which part of the resource is located through the fragment identifier

For the URL, these parts are not necessary, and some parts can be omitted.
The protocol name can be omitted, and the
username and password can be omitted.
Server address, you can omit
port number, shenglue
path, you can omit
query string, you can omit
fragment identifier, you can omit.

/path This URL is equivalent to only the path, and the others are omitted. At this time, the server address is the port number of the current server address (context is required)
, that is, it matches the current server address, and the protocol name is also the same.
The query string and fragment identifier are originally optional.

URL encode

The query string entered by the user may contain various special characters (? / etc.) In order to avoid ambiguity during parsing by the browser, the browser will automatically escape the special characters in the query string.
Convert special characters into escape characters => URL encode
restore the escape characters to their original characters => URL decode
In addition to special symbols, such as Chinese characters, also need url encode

The so-called escape is actually to express the value in this special character in hexadecimal, and put a % in front of each byte.

Note: But when we manually construct the web page request, we must not directly write Chinese characters in the code in the query string. But to manually eccode.

Methods in HTTP Requests

The original intention of http is to set these methods and divide requests according to different functions.
For example, GET is to obtain data from the server, and POST method is to submit data to the server.

But in fact, both GET and POST can be used to fetch, and can also be used to submit, it all depends on how the programmer's code is implemented.
Therefore, in 2022, these methods have lost their substantial meaning, and different methods are interchangeable.
Further, there is no "essential difference" between these methods and methods

A question often asked by interviewers is: What is the difference between the GET and POST methods in the HTTP protocol? ?
The first sentence is to answer: there is no essential difference! (Theoretically, they can be replaced with each other) It's just that during development, there may be some customary habits, and the usage habits are different.

Which way will trigger the http request

1. Entering the URL directly in the browser will trigger the HTTP request
2. Some special tags in the HTML page, link, img, script (the first three are triggered when the page is loaded), a tag (triggered when the user clicks) etc. will also trigger HTTP GET requests.
3. Form form
4. Ajax
5. Use java code/other libraries
6. Through wget /curl command under linux
7. Through third-party tools, such as postman~

Reference article:
https://blog.csdn.net/qq_36894974/article/details/103930478?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522164704532016780271585425%2522%252C%2522scm%2522%253A%252220140713.130102334...%2522 %257D&request_id=164704532016780271585425&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2 all top_positive~default-1-103930478.pc_search_result_control_group&utm_term=http&spm=1018.2226.3001.418

Guess you like

Origin blog.csdn.net/Merciful_Lion/article/details/123431454