Table of contents
- network model
- HTTP
-
- The difference between HTTP and HTTPS
- The content of the HTTP request message
- What are the main fields in the HTTP header
- HTTP response message
- HTTP response header main fields
- HTTP status code
- HTTP hijacking
- cross domain
- HTTP request response interruption reason
- HTTP has several request methods
- The difference between GET and POST
- cookies and sessions
- DNS lookup process (application layer)
- Long Links & Short Links
- The principle that the page is reset when searching for sensitive words
- IP
-
- IP address classification
- Conversion between ipv4 and ipv6. Transition means from ipv4 to ipv6
- ARP protocol
- After entering a URL in the browser, what happens after pressing enter
- Causes of web page lag
- how to check
- When a web page loads slowly, how to analyze its cause and solve the problem?
- What should be parsed first and then parsed for a URL (domain name resolution order)?
- TCP、UDP
-
- The process of three-way handshake of TCP connection
- The process of TCP waving four times
- Why TIME_WAIT:
- Characteristics of TCP and UDP
- Application scenarios of TCP and UDP
- How does TCP achieve reliable transmission
- flow control
- sliding window
- timer
- congestion control
- Slow start and congestion avoidance
- Fast retransmission and fast recovery
network model
Seven-Layer Network Model
- Physical layer: use the transmission medium to provide a physical connection for the data link layer, and realize the transparent transmission of the bit stream.
- data link layer
- network layer IP
- Transport layer TCP, UDP
- session layer
- The presentation layer interprets the underlying commands and data
- Application layer Application layer protocols: DNS, HTTP, SMTP, etc., users interact with the network at this layer
TCP/IP four layers
- network interface layer
- Internet Layer IP
- Transport layer TCP, UDP
- Application layer HTTP, SMTP, FTP, etc.
Layer 5 protocol
- The physical layer realizes the transparent transmission of bit streams between adjacent computers
- The data link layer assembles IP datagrams into frames, and the control information is transmitted on the link between two adjacent nodes
- Network layer IP, which serves for packet switching information of different hosts
- The transport layer TCP and UDP provide general data transmission services for the communication between two hosts
- Application layer HTTP, SMTP, FTP, etc.
HTTP
The difference between HTTP and HTTPS
- https needs to go to ca to apply for a certificate, so it needs a certain fee
- Http is a hypertext transfer protocol, and the information is transmitted in plain text, while https is a secure ssl encrypted transfer protocol, and the cost is relatively high
- The http connection is very simple and stateless. The https protocol is a network protocol constructed by the ssl+http protocol that can perform encryption and authentication
- The port used by http is 80, and the port used by https is 443
The content of the HTTP request message
- The request line includes the request method (GET, POST...), URL, HTTP protocol version
- request header. The format is, header field: value.
- request body
What are the main fields in the HTTP header
-
Host: The address of the server that accepts the request, which can be an IP or a domain name
-
User-Agent: The name of the application sending the request
-
Connection: Specify connection-related attributes, such as (Keep_Alive, long connection)
-
Accept-Charset: Inform the server of the encoding format that can be sent
-
Accept-Encoding: Inform the server of the data compression format that can be sent
-
Accept-Language: Notify the language that the server can send
HTTP response message
- Status line: protocol version, status code, status code description
- response header
- response body
HTTP response header main fields
- Server: the name and version of the server application software
- Content-Type: the type of the response body
- Content-Length: The length of the response body
- Content-Charset: The encoding used for the response body
- Content-Encoding: The data compression format used by the response body
- Content-Language: The language used in the response body
HTTP status code
1xx The server received the request and needs the requester to continue the operation
2xx ok, the request was successful
3xx redirection, resource has been reallocated
4xx Client request error, 403 forbidden request resource rejected, 404 not found request resource not found
5xx server error, 500 server failure, 503 server overloaded or down for maintenance
200, the request was successful
"301 Moved Permanently" indicates a permanent redirection, indicating that the requested resource no longer exists and needs to be accessed again with a new URL.
"302 Found" indicates a temporary redirection, indicating that the requested resource is still there, but it needs to be accessed by another URL temporarily.
Both 301 and 302 use the field Location in the response header to indicate the URL to be redirected, and the browser will automatically redirect to the new URL.
"304 Not Modified" does not have the meaning of jumping, indicating that the resource has not been modified, redirecting the existing buffer file, also known as cache redirection, that is, telling the client to continue to use the cache resource for cache control.
"400 Bad Request" indicates that there is an error in the message requested by the client, but it is only a general error.
"403 Forbidden" means that the server prohibits access to resources, not that the client's request is wrong.
"404 Not Found" means that the requested resource does not exist or is not found on the server, so it cannot be provided to the client.
"500 Internal Server Error" and 400 types are general and common error codes. We don't know what error occurred on the server.
"501 Not Implemented" means that the function requested by the client is not yet supported, similar to "opening soon, please look forward to it".
"502 Bad Gateway" is usually an error code returned by the server as a gateway or proxy, indicating that the server itself is working normally, and an error occurred when accessing the back-end server.
"503 Service Unavailable" means that the server is currently busy and temporarily unable to respond to the client, similar to "the network service is busy, please try again later".
HTTP hijacking
Insert specific network data packets into the normal data flow, let the client explain the wrong data, and display small advertisements or web content to users in the form of pop-up new windows
step:
Identify the HTTP protocol link in the TCP connection;
Change the HTTP response body;
Send the tampered data packets back to the user first, so that the subsequent data packets will be directly discarded after arriving. and the client displays the modified page
Prevention:
Pre-encryption: HTTPS, preventing plaintext transmission from being hijacked (but not DNS hijacking)
Encryption in the event: split HTTP request packets, the operator's bypass device does not have a complete TCP/IP protocol stack, and cannot be marked, the web server has a complete TCP/IP protocol stack, and can assemble the received data packets into a complete The HTTP request does not affect the service
Post-event shielding: when the front end displays HTTP, it detects the content and triggers a callback when the DOM structure changes
DNS hijacking: By hijacking the DNS server, gaining control over the resolution record of a domain name and modifying the resolution result of the domain name. The original access to domain name A is transferred to domain name B, and the wrong query result is returned. It may be the continuous promotion of some products
Difference: DNS hijacking tends to be persistent. Advertisements are forced to be pushed when accessing an interface. The frequency of HTTP hijacking is changeable, and the hijacking process is also very fast. Generally, it often occurs in small tails of websites.
HTTPS hijacking: fake certificates for hijacking....
cross domain
Cross-domain means that browsers cannot execute scripts from other websites. It is caused by the browser's same-origin policy, which is a security restriction imposed by the browser.
Same origin: same domain name, protocol, and port
That is, the browser can only execute website scripts under the same protocol, the same domain name, and the same port. If the script of the website does not belong to the current interface during execution, it will not execute
HTTP request response interruption reason
The network is broken, the network is blocked, the request times out, the browser has a problem, and the server has a problem
How to check
Check network, check local...
HTTP has several request methods
HTTP1.0 defines three request methods: GET, POST and HEAD methods.
HTTP1.1 adds six new request methods: OPTIONS, PUT, PATCH, DELETE, TRACE and CONNECT methods.
The difference between GET and POST
The parameters of GET are placed in the url and returned to the server to get the data; POST has a parameter in the request body to get the specified data from the server;
Because the parameters of GET are exposed on the url, the security cannot be guaranteed, and there is also a length limit
Application scenarios
GET is used to query data, POST is used to modify data, and other scenarios that require more security such as passwords
cookies and sessions
What are cookies?
A cookie is actually a small piece of text information. The client requests the server, and if the server needs to record the user status, it uses the response to issue a cookie to the client browser. The client will save the cookie.
When the browser requests the website again, the browser submits the requested URL together with the cookie to the server. The server checks the cookie to identify the user status. The server can also modify the content of the cookie as needed.
What is Session?
Session is another mechanism for recording client status. The difference is that Cookie is saved in the client browser, while Session is saved on the server. When the client browser accesses the server, the server records the client information on the server in some form. This is Session. When the client browser visits again, it only needs to find the status of the client from the Session.
The difference between Session and Cookie?
1. Data storage location: cookie data is stored on the client's browser, and session data is stored on the server.
2. Security: Cookies are not very secure. Others can analyze cookies stored locally and cheat them. Considering security, sessions should be used.
3. Server performance: the session will be saved on the server within a certain period of time. When the number of visits increases, it will take up the performance of your server. Considering the reduction of server performance, cookies should be used.
4. Data size: The data saved by a single cookie cannot exceed 4K, and many browsers limit a site to save up to 20 cookies.
5. Importance of information: You can consider storing important information such as login information as a session. If you need to keep other information, you can put it in a cookie.
DNS lookup process (application layer)
Used to resolve user-supplied hostnames to IP addresses
- The browser extracts the domain name address from the received url, and sends the domain name to the client of the DNS application
- Check whether the browser cache and the local hosts file have a mapping of this URL, and if so, call this IP address mapping
- If not, check whether the local DNS resolver cache has a mapping for this URL, and if so, return the mapping
- If not, make a query request to the DNS server
- When the server receives the query, it queries the resources in the local configuration area, and returns the result if found
- If it cannot be found, but the server has cached the URL mapping relationship, return the search result
- If there is no cache, continue to forward the request to the upper-level DNS server for query. Finally, the resolution results are returned to the local DNS server in turn, and the local DNS server returns to the client, and stores the mapping in the server's cache
Long Links & Short Links
- In HTTP/1.0, short connections are used by default. That is to say, every time the browser and the server perform an HTTP operation, a connection is established, but the connection is terminated when the task ends. If an HTML or other type of web page accessed by the client browser
contains other web resources, such as JavaScript files, image files, CSS files, etc.; when the browser encounters such a web resource, it will create a HTTP sessions. - But starting from HTTP/1.1, long connections are used by default to maintain connection characteristics. Using the HTTP protocol of long connection, this line of code will be added in the response header: Connection:keep-alive
- In the case of using a long connection, when a webpage is opened, the TCP connection used to transmit HTTP data between the client and the server
will not be closed. If the client visits the webpage on this server again, it will continue to use this one. connection established. Keep-Alive will not keep the connection forever, it has a keep time, which can be set in different server software (such as Apache). To implement a persistent connection, both the client and the server must support persistent connections.
The principle that the page is reset when searching for sensitive words
According to the regulations of the TCP protocol, a three-way handshake is required to establish a connection between the user and the server: the first handshake, the user sends a SYN packet to the server to send a request (SYN, x:0), and the second handshake server sends a SYN/ACK packet to the user Response (SYN/ACK, y:x+1), the third handshake user sends an ACK packet to the server to issue a confirmation (ACK, x+1:y+1), so far a TCP connection is successfully established. Where x is the serial number sent by the user to the server, and y is the serial number sent by the server to the user.
Keyword detection, for plaintext or base64 and other weakly encrypted communication content, match with the prepared sensitive word library, when sensitive words are found, change the SYN/ACK packet sent back by the server to SYN/ACK, Y:0, this It means that the TCP connection is reset, and the user voluntarily gives up the connection, prompting that the connection failed. Let the user mistakenly think that the server refuses to connect, and voluntarily give up the connection with the server, and automatically block the recording of web pages containing sensitive words
IP
IP address classification
Class A: 1-byte (8-bit) network number, 3-byte (24-bit) host number. The first digit of the network number is fixed at 0, and the remaining 7 digits can be used freely. Reserved address 0 (00000000) means "this article network", 127 (01111111) means local loopback software test
Class B: 2-byte (16-bit) network number, 2-byte (16-bit) host number. The first two digits of the network number are fixed at 10, and the remaining 16 digits can be used freely. reserved address
Class C: 3-byte (24-bit) network number, 1-byte (8-bit) host number. The first three digits are fixed at 110, and the remaining 21 digits are available.
Class D:
Class E:
An IP address whose host ID is all 0s indicates a single network to which "this host" is connected.
An IP address whose host number is all 1s indicates all hosts on the network.
The representation range of class A addresses is: 0.0.0.0-126.255.255.255, and the default network mask is: 255.0.0.0. Class A addresses are assigned to large-scale networks.
Class B addresses represent the range: 128.0.0.0-191.255.255.255, the default network mask is: 255.255.0.0, and Class B addresses are assigned to general medium-sized networks
The representation range of class C address is 192.0.0.0-223.255.255.255, the default network mask is: 255.255.255.0, class C address is allocated to small networks, such as LAN
Class D addresses are called broadcast addresses, and are used by special protocols to send information to selected nodes.
Conversion between ipv4 and ipv6. Transition means from ipv4 to ipv6
The transition between ipv4 and ipv6 is a gradual process. While users experience the benefits brought by IPv6, they can still communicate with other IPv4 users in the network.
Mainstream technology:
- Dual-stack strategy: (the most direct way) add the IPv4 protocol stack to the IPv6 node. The nodes with dual protocol stacks are called "IPv6/v4 nodes", and these nodes can communicate with IPv4 nodes using IPv4, or directly use IPv6 to communicate with IPv6 nodes.
- Tunnel technology: (In order to solve the isolated island problem formed by the isolation of local pure IPv6 network and IPv4 backbone, use tunnel technology to solve it) Use the tunnel technology that traverses the existing IPv4 Internet to connect the isolated islands, and gradually expand the scope of IPv6 implementation. At the entrance of the tunnel, the router encapsulates the IPv6 array group into IPv4, and the source address and destination address of the IPv4 group are the IPv4 addresses of the tunnel entrance and exit respectively. At the exit of the tunnel, the IPv6 packet is taken out and forwarded to the destination node.
There are four specific forms of tunnel technology in practice: tunnel construction, automatic configuration tunnel, multicast tunnel, and 6to4. - Tunnel Broker TB, Tunnel Broker. (The purpose is to simplify the configuration of the tunnel and provide automatic configuration means), TB can be regarded as a virtual IPv6 ISP, which provides the means for users connected to the IPv4 network to connect to the IPv6 network, while connecting to the IPv4 network The users are the customers of TB.
- Protocol conversion technology. The main idea is that the communication between V6 nodes and V4 nodes requires the help of an intermediate protocol conversion server. The main function of this protocol conversion server is to convert the network layer protocol header between V6/V4 to adapt to the protocol type of the peer.
- SOCKS64。
Introduce the SOCKS library in the client, which is between the application layer and socket, and replace the socket API and DNS domain name resolution API of the application layer.
The other is a SOCKS gateway.
- transport layer relay
The working mechanism is similar to SOCKS64, except that the "protocol translation" of the transport layer is performed on the transport layer repeater
- Application Layer Proxy Gateway (ALG)
similar. Protocol translation is performed at the application layer.
ARP protocol
Address Resolution Protocol, that is, address resolution protocol, is used to realize the mapping from IP address to MAC address, that is, to ask the MAC address corresponding to the target IP
-
查主机缓存里(的ARP列表里)有没有记录这个IP和MAC地址的对应
-
有就直接发送,没有就向本网段所有主机发送广播,发送自己的IP地址和MAC地址,询问谁是这个IP地址,这个地址的MAC地址是什么
-
网络中的其他主机收到之后对照被询问的地址和自己能不能对上,是的话就从数据包中提取源主机的IP和mac地址写入自己的ARP列表,并将自己的MAC地址写入响应包,回复源主机
-
源主机收到ARP响应包之后,就可以用这些信息发送数据
Why use the ARP protocol: OSI divides the network into 7 layers, and each layer does not communicate directly, only specific interfaces communicate. IP is at the third layer network layer, and MAC address works at the second layer data link layer. The protocol needs to encapsulate the IP address and MAC address when sending packets, but only knows the IP, and cannot directly find it across layers, so the service of the ARP protocol must be used to help obtain the MAC address of the destination node
After entering a URL in the browser, what happens after pressing enter
URL, uniform resource locator, l simple point is URL = ip or domain name + port number + resource location + parameter + anchor point
1. After entering a URL, the browser first searches for the IP address of the URL by querying the DNS (searching up the DNS server layer by layer until the IP address corresponding to the URL is found)
2. After obtaining the IP address and port number of the target server (http port 80, https port 443), the system library function socket will be called to request a TCP stream socket. The client sends an HTTP request message to the server
(1) Application layer: The client sends an HTTP request message.
(2) Transport layer: (add source port, destination port) to establish a connection. Before actually sending data, the three-way handshake client and server establish a TCP connection.
(3) Network layer: (add IP header) routing addressing.
(4) Data link layer: (add frame header) to transmit data.
(5) Physical layer: Physical transmission bit.
3. The server side parses the request message and sends the HTTP response message through the physical layer→data link layer→network layer→transport layer→application layer.
4. Closing the connection, TCP waves four times.
5. The client parses the HTTP response message, and the browser starts to display HTML
Causes of web page lag
Slow network speed, insufficient bandwidth, low hardware configuration, and full memory.
The JS script is too large, blocking the loading of the page.
There are too many web resources, it takes a long time to receive data, and loading a certain resource is slow.
DNS resolution speed.
how to check
Hardware problem: Check whether the network cable or wireless network card is plugged in, whether it is connected to the router, that is, whether the bottom layer is in Unicom state;
Software problem: Check whether there is a corresponding driver, whether the server is good, whether the DNS is correct, or the proxy may not be turned off
When a web page loads slowly, how to analyze its cause and solve the problem?
too many http requests
Too many resources, too many resources
JS script is too large
slow internet
…
What should be parsed first and then parsed for a URL (domain name resolution order)?
Domain name hierarchy: from right to left are the top-level domain name, the second-level domain name...the leftmost is the host name (server name). For example, com in www.baidu.com is the top-level domain name, cn in email.tsinghua.edu.cn is the top-level domain name, which is the domain name of China, edu is the domain name of the education and scientific research department, and email is the server name.
When domain name resolution, search for the matching subdomain name first. If the subdomain name exists, query the resolution result from the configuration file of the subdomain name. If the subdomain name does not exist, query the result from the configuration file of the upper level.
TCP、UDP
The process of three-way handshake of TCP connection
Initial state client CLOSED, server LISTEN
- Client A sends a SYN packet (SYN, x:0) to server B to request a connection. At this time, the status is SYN_SENT, indicating that the client has sent a SYN message.
- Server B receives it and sends a SYN/ACK packet (SYN/ACK,
y:x+1) in response. At this time, the server status changes from LISTEN (the server socket is in the listening state and can accept connections) to SYN_RECV, indicating that a SYN message is received - Client A receives and sends a confirmation ACK (ACK, x+1:y+1), and the connection is successful. Status of both parties ESTABLISHED
The process of TCP waving four times
Initial state both sides ESTABLISHED
-
Client A sends a FIN to close the data transfer from client A to server B. Client FIN_WAIT_1. Indicates that the connection is actively closed, FIN is sent to the other party, enter FIN_WAIT_1, and wait for the confirmation of the other party
-
Server B receives this FIN, and it sends back an ACK, confirming that the serial number is the received serial number plus 1. The client FIN_WAIT_2 means a semi-connection, and the server may still have data to send, which will be closed later. ServerCLOSE_WAIT.
-
Server B closes the connection with client A and sends a FIN to client A. Server LAST_ACK, waiting for the opposite ACK message
-
Client A sends back an ACK packet for confirmation, and sets the acknowledgment sequence number to the received sequence number plus 1. The client enters TIME_WAIT, indicating that it has received the other party's FIN message and sent an ACK message, and it will return to the CLOSED available state after waiting for 2MSL.
Why TIME_WAIT:
To prevent the server from resending the FIN message without receiving the ACK message in the LAST_ACK state, the function of this TIME_WAIT state is to resend the ACK message that may be lost.
Characteristics of TCP and UDP
TCP connection-oriented, UDP connectionless
TCP is reliable, guarantees security, UDP best effort delivery, does not guarantee security
TCP is point-to-point, UDP can be one-to-one, many-to-many, many-to-one
TCP is byte-oriented, UDP has no congestion control
TCP overhead is large, UDP overhead is small
Application scenarios of TCP and UDP
UDP usage scenario DNS protocol (because UDP is fast), watch video, send voice, QQ chat, multimedia classroom screen broadcast
TCP usage scenarios HTTP protocol, QQ file transfer, email, login
How does TCP achieve reliable transmission
Confirmation and retransmission mechanism: confirmation when establishing a connection and sending a packet, verification failure during transportation, packet loss or delayed retransmission by the sender
Data sorting: Divide data into many packets and transmit them in order
Flow Control: Sliding Windows and Timers
Congestion control: slow start, congestion avoidance, fast retransmission, fast recovery
flow control
It acts on the receiver to control the sending speed of the sender so that the receiver can receive it in time to prevent packet loss.
Implemented by sliding window
sliding window
The way TCP performs flow control, the receiver controls the sending speed of the sender by telling the other party its own window size, so as to prevent itself from being overwhelmed due to the sender sending too fast
timer
The sender starts a timer after receiving the window with a value of 0, and sends a packet to inquire about the current sliding window after the time is up to prevent deadlock (the packet sent back by the receiver with a window that is not 0 is lost, and the two parties wait for each other)
congestion control
It acts on the network to prevent too much data from being injected into the network and avoid excessive network load.
Congestion: The demand for a resource in the network exceeds the available part that the resource can provide, affecting network performance
Congestion control: prevent excessive data injection into the network, so that routers or links in the network will not be overloaded.
Congestion window
The flow control used by the sender. In addition, considering the receiving capacity of the receiver, the sending window may be smaller than the congestion window.
Slow start and congestion avoidance
Slow start: Do not send a large amount of data at the beginning, first detect the congestion level of the network, that is to say, gradually increase the size of the congestion window from small to large.
The congestion window is set to 1 at the beginning, and every time a confirmation is received, the congestion window will be doubled. When the window value is 16 (slow start threshold), it will be increased by addition, each time +1, until the network congestion. When there is congestion, set the new slow start threshold to half of the congestion time, and set the congestion window to 1, and then let it repeat. At this time, the amount of network data will be greatly reduced in an instant.
Congestion avoidance: The congestion avoidance algorithm makes the congestion window grow slowly, that is, the sender's congestion window cwnd is increased by 1 instead of doubled every time a round-trip time RTT passes.
Fast retransmission and fast recovery
Fast retransmission: Every time the receiver receives an out-of-sequence segment (after receiving 2, it receives 4, indicating that 3 is lost), it immediately sends a repeated confirmation of packet 2, so that the sender can know the packet loss as soon as possible.
The sender retransmits immediately after receiving three consecutive duplicate acknowledgments 3
Fast recovery: When the sender receives 3 consecutive acknowledgments, the slow start threshold is halved, the value of the congestion window is set to half of the slow start threshold, and the congestion avoidance algorithm is implemented. After each acknowledgment is received, +1