javaEE elementary - HTTP protocol request and response

What is HTTP


The full name is Hypertext Transfer Protocol .

HTTP is one of the most widely used protocols in the application layer, and the browser obtains web pages based on http.
In other words, HTTP is the bridge between the browser and the server.


HTTP is often implemented based on the TCP protocol at the transport layer .

We usually transmit data through the HTTP protocol to open a website.


When we enter a Sogou search "web address" ( URL ) in the browser, the browser sends an HTTP request to Sogou's server, and
Sogou's server returns an HTTP response.

After the response result is parsed by the browser, it will be displayed as the content of the page we see.
(During this process, the browser may send multiple HTTP requests to the server, and the server will return multiple responses correspondingly. These responses include page HTML, CSS, JavaScript, pictures, fonts, etc.)




When we enter a "URL" in the browser, the browser will send an HTTP request to the corresponding server.
After receiving the request, the other party's server will return an HTTP response after calculation and processing.


HTTP protocol message format

HTTP protocol format

1 Use of packet capture tools


There are many packet capture tools, fiddler is used here , because it focuses on http, and it will be more convenient to use.

It can be downloaded and installed from the Internet, and the process is very simple.



After starting, it will immediately display the process of a program on the current computer using the http server to interact.

2 The principle of the packet capture tool


The essence of fiddler is a proxy program , and there are two precautions when using it.

1. It may conflict with other proxy programs , and other proxy programs (including some browser plug-ins) must be closed when using it.


2. To capture packets correctly, you also need to enable the https function.

https is an evolved version of the protocol based on http, but most servers on the Internet currently use https.
By default, fiddler cannot capture https packages, and you need to manually start https and install the certificate.

2.1 What is a proxy


An agent is actually looking for someone to do the work for you.

For example, I want to eat, but I am too lazy to buy it, so I say to my roommate: "Bring me a meal when you go to the cafeteria, ok?
" Bring me the rice, and I'll hand it over to him.



If I go to the cafeteria by myself, I can also buy food, but because I am lazy, I asked my roommate to bring it for me.
At this time, the roommate is an "agent". In other words, the roommate is helping me with the work. When I am inconvenient, I need to send an agent.


During the above-mentioned process, the roommate was very clear about what kind of transactions I had with the cafeteria owner, so he could take out a small notebook and write it down.
fiddler is such a proxy .

2.2 Forward and reverse proxy


Proxies are also divided into forward proxy and reverse proxy .
The proxy representing the client is called a forward proxy, and the proxy representing the server is called a reverse proxy.



When my roommate was buying food, the cafeteria owner was busy, so I found a part-time school girl to buy the food. At this time, the roommate represented me, and he was my tool person. This is the positive agent
.

The part-time school girl represents the cafeteria owner, who is the boss' tool person, that is, the reverse agent .

3 How to enable https


1. Open the fiddler interface and click Tools .


2. On the interface that appears, click Options...


3. In the interface that appears, select https, and check all the four options that appear.



When ticking for the first time, the following interface will appear. It means whether to install the certificate, be sure to click Yes,






the blue part here means it is an html page, and the green part means it is a js .
The gray and black parts represent simple data.


Double-click the option in the request list on the left to view the details of the request.






The http request has a certain format, and fiddler will parse it according to the format, showing different display effects.

Selecting Raw in the above picture shows the most original effect.



Select Raw in the lower right corner to display the response.


Click the button in the red box below and select Notepad to open.



The following is the request for Notepad to open.



The part circled in red above is called the first line , which contains three parts, separated by spaces .

1. GET

This is the HTTP method (method)


2、https://www.sogou.com/

This is the URL (Unique Resource Locator), also known as a web address, and the role of the URL is to identify unique location on the Internet .


3、HTTP/1.1

This is the version number.

HTTP request

1 Understanding URLs




The port number describes which program, and the hierarchical file path is responsible for finding which file under the program's jurisdiction.
The query string is the string used when obtaining resources, the server address is the domain name, and the fragment identifier is basically not used now.
The login information is no longer used anymore, because there will be a special page for logging in when logging in.



To give an example, for example, if I want to eat pancakes in the school cafeteria, I must first determine which cafeteria I am in, and I also need to know which window to
eat. Also choose seasoning or something.

Here, which canteen represents the ip address, which window represents the port number, what side dishes are added represents the path with layers, and what seasoning represents the
query string (organized by a key-value pair) )


A url has several parts here, some of which can be omitted.

https://www.sogou.com/

The above is to omit the port, after omission, the browser will provide a default port.
The default port is 80 for http and 443 for https.
/ is also a path, here represents a "root directory", which is the root directory of the HTTP server.

2 Knowing method




The method describes the semantics of this request.

Most of the methods here are not used, only GET and POST are the most common.


For example, there are ten competitions in the world for literary talents, Cao Zijian monopolizes eight competitions, I own one competition, and all people in the world share one competition.
There are ten HTTP methods in the world, the GET method occupies eight, the POST method occupies one, and the remaining methods are divided into one.

2.1 GET method


GET is the most common HTTP method and is often used to get a resource on the server. Enter the URL directly in the browser,
and the browser will send a GET request. In addition, tags such as link, img, and script in the HTML will also trigger the GET request.

Open Fiddler, visit the Sogou homepage, select Notepad to open, and observe the packet capture results.




A GET request is divided into two parts, the first line and harder .

2.2 POST method


The most typical is that POST is involved when logging in and jumping, and the other is uploading files.

Use Fiddler to observe the POST method, select Notepad to open.



An HTTP request can be divided into four parts:

1. The first line

2. Request header (harder)

3. Empty lines

4. Body

If it is a GET request, there is no body, and if it is a POST request, there is a body.

2.2.1 Understanding the request "header" (header)


Harder is a bunch of key-value pairs, each line is a key-value pair, and the key and value are separated by: .
Key-value pairs have special meanings.

Host: gitee.com Here roughly describes the address and port where the server is located.
The address and port of Host here are used to describe the most important access target.

Content-Length indicates the length of the data in the body, and Content-Type indicates the format of the data in the requested body.


If it is a GET request, there is no body, and these two fields do not exist in the request. If it is a POST request, but there is a body, these two fields must be present.

User-Agent is referred to as UA, which describes the version of the browser and operating system.

The current User-Agent is mainly used to distinguish PC and mobile.

Referer indicates the "source" of the current page. If you directly enter the address through the address bar or click on favorites...there will be no referer.

Cookie is a very important harder attribute, which is essentially a local storage data mechanism provided by browsers to web pages.
By default, web pages are not allowed to access your computer's hard drive, for security reasons.

Cookies have an expiration time, what is the use of the expiration time?

Some public computers (e-reading rooms of school libraries) log in with their own accounts on public computers.
At this time, the login status is saved in the cookie. When the next person uses it, it is likely that the cookie will expire. logged in.
The more sensitive the site, the shorter the expiration time.

2.2.1.1 Three questions about cookies


1、Where do cookies come from?

The data in the cookie comes from the server, and
the server will determine what to store in the browser's cookie through the header part of the HTTP response (Set-Cookie field).


2、Where are cookies stored?

It can be considered that it exists in the browser and on the hard disk. When the cookie is stored, it is subdivided according to the browser + domain name dimension.
Different browsers store their own cookies, and different domain names of the same browser correspond to different cookies.

The content in the cookie is not only a key-value pair, but also an expiration time. For example, after many websites log in once, the login status is automatically recorded.


3、Where do cookies go?

Back to the server, there are many clients at the same time.

The client side will use cookies to save the current user's intermediate state.
When the client accesses the browser, it will automatically bring the content of the cookie into the request, and
the server can know what the client looks like now. .

2.3 The difference between GET and POST

  • The semantics are different: GET is generally used to obtain data, and POST is generally used to submit data.
  • The body of GET is generally empty, and the data to be transmitted is transmitted through the query string . The query string of POST is generally
    empty, and the data to be transmitted is transmitted through the body.
  • GET requests are generally idempotent , and POST requests are generally not idempotent. (If multiple requests get the same result, the request
    is considered idempotent).
  • GET can be cached, POST cannot be cached. (This is also idempotent)

For example, if a cow eats grass for the first time, it will produce milk; if it eats grass for the second time, it will still produce milk.
If the first grazing doesn't squeeze out the same as the second grazing, it's not idempotent.


There is actually no essential difference between GET and POST. In most scenarios, they can replace each other.
However, there are still differences in usage habits.

You can use GET to submit and use POST to get; you can set POST to be idempotent, but GET is not.

HTTP response




The response is also divided into four parts

1. The first line : HTTP/1.1 200 OK


2、header


3. Empty line : indicates the end tag of the header


4、body

1 Know the "status code"




A status code of 200 OK indicates success.

404 Not Found means that the accessed resource does not exist, that is, it is not found on the server.

403 Forbidden means access is denied (no permission)

302 Move temporarily means redirection (similar to call transfer)
call transfer . For example, if a person changes his mobile phone number, someone dials the previous number and it will be transferred to the new number.

500 indicates an internal server error (the server code threw an exception)

504 gateway timeout means that the response is too long, and the browser can't wait.


The difference between redirection and request forwarding





Redirection can be redirected to external resources (jump to other websites).
Request forwarding can only be forwarded between resources inside the server. One less interaction will be more efficient.


Status codes can be divided into several categories:

1** means wait, 2** means success, 3** means redirection, 4** means client error, 5** means server error.

2 Recognize the response "header" (header)


The basic format of the response header is basically the same as that of the request header.

Similar to Content-Type , the meaning of attributes such as Content-Length is also consistent with the meaning in the request.

\

Guess you like

Origin blog.csdn.net/m0_63033419/article/details/129789613