[Linux] Network Basics (3)

 foreword

        hi~, hello everyone, welcome to my network basics study notes 3~

My network 1, 2 blog links~

[Linux] Network Basics (1)

[Linux] Network Basics (2)

        This note will record the understanding and learning of the network layer IP protocol in the TCP/IP protocol, focusing on understanding the division of IP network segments, knowing the defects of the public network and private network or IPV4 and the positioning of the network layer; and then learning the data link Route layer, understand how the real transmission is carried out on a route, and the role of the Ethernet data link layer protocol in the local area network; finally summarize the TCP/IP protocol, and establish an understanding and cognition of the entire network system .

Table of contents

1. Network layer

1. Protocol header format

PERSON

2.IP address

1. Subnet mask

2. Special IP address

3. Private IP and public IP 

Private IP range

4. NAT technology

The conversion process of NAT ip:

NPT

Defects of NAT Technology

5. Routing

Second, the data link layer

1. Ethernet frame format

Effect of MTU

2.ARP protocol

ARP spoofing

3. Other important protocol technologies

1.DNS

2. NAT and proxy server

forward proxy server

reverse proxy server

3. ICMP protocol

Four. Summary

data link layer

Network layer

transport layer

application layer


1. Network layer

        First, let's review the general content before the TCP/IP protocol stack:

        At the application layer, we mainly discuss how to use data and normalize it.

        For the transport layer, network layer, and data link layer, the solution is the control and process of specific data transmission, and the data is reliably sent from the source host to the target host across the network .

        Among them, the network layer is to determine an appropriate path in a complex network environment.

        As for the relationship between the transport layer and the protocol layer, just like the boss and employees, the reliability is determined by the transport layer (TCP), and the transmission capability is determined by the IP protocol.

        In the network, we first determine the following definitions:

1. Host: a device with an IP address but no routing control
2. Router: a device with an IP address and routing control;
3. Node: a collective name for a host and a router.

Now, we discuss how to encapsulate, unpack, and deliver upwards for         the header encapsulated by the IP protocol .

1. Protocol header format

-4- bit version: Specifies the version on which the IP protocol is based.

        There is IPV4 and IPV6, for IPV4 it is 4.

-4- bit header length: Specifies the total size of the IP datagram header. The unit is 4 bytes.

        Because it is 4 bits, it is 4bit, and the unit of the header length is 4byte, because the maximum value represented by 4bit is 1111 = 15, so the maximum header length should be: 15*4=60byte.

- 8-bit service type: indicates the specified service characteristics. 3-bit priority field (deprecated), 4-bit TOS field (minimum delay, maximum throughput, highest reliability, minimum cost. These four conflict with each other, only one can be selected), and 1-bit reserved field (must set to 0) .

        Just know the type of service. For example, for applications such as ssh/telnet, the minimum delay is more important; for programs such as ftp, the maximum throughput is more important.

-16-bit total length (number of bytes): how many bytes the IP datagram occupies as a whole, and the unit is also 4 bytes.

        For the header, it is divided into fixed-length header (20byte) + options. Then the length of the option can be obtained by subtracting 20byte from the length of the 4-bit header. Then when encapsulating , just copy the data after the option.

        For separation , since the total size of the datagram (16-bit total length) is known, the size of the data can be calculated according to the length of the header, and then the data can be obtained by segmenting from the back to the front, so that the header and the payload can be separated. Payload length (unit: byte) = (16-bit total length - 4-bit header length) * 4byte . Thus performing a good separation.

-8-bit time-to-live TTL: The maximum number of hops for a datagram to reach its destination, generally 64, each time it passes through a route --, if it is reduced to 0 and has not arrived, it will be discarded directly .

        To prevent a packet from being forwarded in a loop, the lifetime needs to be set. (The Internet is very large, and the bottom layer of the equipment may be different. Some data black holes in the network cannot be ruled out.) If no constraints are designed, then this message may become garbage, and it will be forwarded continuously, resulting in a waste of resources.

-8-bit protocol: the specific protocol corresponding to the upper transport layer: TCP or UDP .

- 16-bit header checksum: use CRC to check whether the header is damaged .

-32-bit source address and 32-bit destination address: indicate the IP addresses of the sender and receiver .

- Option field: variable length, skipped .

        When delivering upward, we can judge the protocol corresponding to the transport layer through the 8-bit protocol, and then remove the header and deliver the payload upward.

PERSON

        Because the network layer needs to deliver down to the link layer. Due to its physical characteristics, the link layer generally cannot forward too large data. MTU indicates the maximum size of data accepted by this data link layer, and generally defaults to 1500byte .

        But if our IP datagram exceeds 1500byte at this time, how should we pass it to the lower link layer? At this point, it is related to the three fields we will introduce next.

-16-bit identifier: uniquely identifies the message sent by the host .

        If the same IP datagram is divided into multiple pieces for sending, each piece of identification should be the same.

-3-bit flag: the first bit is reserved, the second bit is 1, indicating that fragmentation is prohibited, the third bit indicates more fragmentation, the last fragment of this id is set to 0, and the remaining fragments are set to 1, indicating that there are more fragments in the future piece .

        If fragmentation is disabled, then if the length of the datagram exceeds the MTU at this time, it will be discarded directly.

-13-bit fragment offset: This fragment is equivalent to the offset at the beginning of the original datagram .

        First of all, we need to make it clear that the behavior of fragmentation is done by the network layer, and the same assembly behavior is also done by the network layer of the other host, which is transparent to the upper layer. That is to say, the fragmentation and assembly behavior of the IP protocol, TCP does not know and does not care.

        Here, let’s briefly explain how to separate and assemble the fragments, that is, even if the IP datagram exceeds 1500byte, the other host can get a complete IP datagram.

For the receiver network layer, we need to pay attention to the following points:

        1. First of all, it is better not to fragment the network layer. (no sharding in most cases)

        2. It has the ability to identify the difference between messages. (16-bit identification distinction)

        3. Identify whether the message is fragmented, and gather the same message and fragments together. (If the third bit of the 3-bit flag is 1 , it means that there is a fragment, but note that if it is the last fragment of a message, then it is 0 at this time, and the 13-bit fragment offset must not be 0 to determine fragmentation ) (so it should be noted that if the message is not fragmented, it means that the more fragmentation flag is 0, and the 13-bit offset is 0)

        4. Identify which fragments are the beginning and the middle end. (You can judge according to the displacement of the film)

        It should be noted that the offset + its own size = the offset of the next packet. Sort and scan the entire message in ascending order according to the offset. If it does not match, it must be lost in the middle. If the calculation reaches the end, it must be collected completely. (Assembly process, and guarantee reliability) 

        5. For exception handling, we recognize that if any fragment is lost during the assembly process, we need to identify it.

Fragmentation is seriously not recommended:

        1. In the process of network layer fragmentation and assembly, the upper layer (transmission and application) does not know.
        2. Packet loss is probable. Fragmentation increases the probability of packet loss.
        3. Once the packet loss affects the upper transport layer, retransmission is required .       

        So, what exactly is an IP address? Specifically how to divide it?

2.IP address

        Since the birth of the Internet, we need to identify a host on the network so that two hosts can know the corresponding location.

        Starting from IPV4, 32 bits are used to identify the ip address, that is, to the power of 2^32. For the convenience of marking, we mark a section every 8 bits, and mark it in decimal, separated by . This is the commonly used dotted decimal system. For example: 127.0.0.1 = 0111111 00000000 00000000 00000001.

        The ip address is divided into two parts: network number + host number .

Network number: the network number is the same in the same network segment, ensuring that the two network segments connected to each other have different identifications;

Host number: In the same network segment, hosts have the same network number, but must have different host numbers;

1 Different subnets are actually putting hosts with the same network number together.
2 If a new host is added to a subnet, the network number of this host is the same as that of this subnet, but the host number must not Repeat for other hosts on the subnet.

3 However, it is troublesome to manually manage the IP in the subnet. There is a technology called DHCP , which can automatically assign IP addresses to new host nodes in the subnet, avoiding the inconvenience of manually managing IP. General routers have a DHCP function . So the router can also be regarded as a DHCP server .

        In the past, regardless of the lack of IP addresses, all IP addresses were divided into 5 categories. The network number starts from the high digit, 8 digits, 16 digits, 24 digits... Divided into the following a, b, Classes c, d, and e.

         However, with the development of the Internet, this shortcoming will cause a lot of IP addresses to be wasted. And the number of IP addresses of IPV4 itself is not enough to support the huge increase of Internet users.

        A new division scheme is proposed for this situation: called CIDR .

1. Subnet mask

        We can introduce an additional subnet mask to distinguish network numbers from host numbers.

        Like the IP address, the subnet mask is also 32 digits. Usually ends with a string of "0". Perform a bitwise AND operation on the IP address and the subnet mask, and the result is the network number.

        The division of network numbers and host numbers has nothing to do with Class A and Class B.

        At this time, when an IP packet is extracted by the router, the destination IP in it is proposed, and after bitwise ANDing with the subnet mask of the current router, the destination network to which the packet needs to go can be obtained. Just send it there directly, there is no need to send it to the next router for forwarding according to various strategies.

         In addition, there is a more concise way to express the IP address and subnet mask. For example, in the above example 1, we don’t need to write the whole thing separately, just write it directly: 140.252.20.68/24. Among them, 24 indicates how many bits are 1 from the high bit, which means the entire subnet mask.

2. Special IP address

1. Set all the host addresses in the IP address to 0, which becomes the network number, representing this LAN; 0.0.0.0
2. Set all the host addresses in the IP address to 1, which becomes the broadcast address , which is used to All hosts connected to each other in the same link send data packets; 3. The
IP address of 127.* is used for the local loopback (loop back) test, usually 127.0.0.1;

        As for how to distinguish between special IP addresses and normal ones, as shown in the following figure ( loopback device):

3. Private IP and public IP 

        However, we can find that although CIDR has alleviated the number of IPs to a certain extent (improved utilization), the upper limit of IP addresses has not increased. The number of IPV4 addresses is about 4.3 billion, and we still need other methods to solve the problem of insufficient IP addresses!

Solutions for insufficient IP addresses:

        Dynamically assign IP addresses ;

        Mainstream: NAT technology ;

       IPV6

        We focus on NAT technology here. Before introducing NAT technology, we first need to briefly understand private IP and public IP, which are often referred to as public network and private network.

        First of all, we can start with our lives.

1. If you want to connect to the wireless network at home, what will your home do first?

        a. There is an operator, and there is network coverage near your home
        b. Family members contact the operator for fiber-optic entry
        c. Staff come to the door, modem (cat), and wireless router.
        d. Open an account, account, password, and configure the router (account, password -- the operator authenticates your password); e
        . Configure the router -- set the router's wifi name + password (router authenticator)
        f. Normal Internet access, monthly Or pay annually.

2. We use the data on the mobile phone to play the corresponding app, but in the end we have to charge the phone bill to the operator? Why

        Infrastructure is laid by operators.
        To access any web resource, first of all, it goes through the operator.

3. Preliminary understanding: We cannot visit foreign websites, why?
        operator.
        Both your mobile phone and your router at home have accounts.
            -Certification: If the balance is enough, it will be released, if it is not enough, it will be discarded!
            You can also set the corresponding account to arrears, etc., to intercept arrears.
            Legitimacy, account number and other identification.
        For the wall, it is enough to discard the direct message when detecting foreign IP.

        First of all, the router will naturally construct a local area network (subnet). When the data packet is not sent to the public network, the private IP is used. (The IP of the private network is local. It can be repeated in different subnets . In this way, the problem of insufficient IP is greatly alleviated.) The erection of routers and base stations is done for us by operators, and they can be used as make an organization.

        If an organization builds a local area network, the IP address is only used for communication within the local area network and is not directly connected to the Internet. In theory, any IP address can be used, but RFC 1918 stipulates the private IP address used to build a local area network :

Private IP range

10.*, the first 8 digits are the network number, a total of 16,777,216 addresses
172.16. to 172.31., the first 12 digits are the network number, a total of 1,048,576 addresses
192.168.*, the first 16 digits are the network number, a total of 65,536 addresses
are included in this In the range, all become private IPs, and the rest are called global IPs (or public IPs).

        The following is a simple diagram of a network communication process:

        From the above figure, our simple analysis is not difficult to get:

        A router can be configured with two IP addresses, one is the WAN port IP , and the other is the LAN port IP (subnet IP).
        The hosts connected to the LAN port of the router are all subordinate to the current subnet of the router.
        Different routers, subnets The IPs are actually the same (usually 192.168.1.1). The IP addresses of the hosts in the subnet cannot be repeated. However, the IP addresses between subnets can be repeated.
        Every home router is actually used as an operator router. A node in the subnet. Such carrier routers may have many levels, and the outermost carrier router, WAN port IP is a public network IP .

        When the host in the subnet needs to communicate with the external network, the router replaces the IP address in the IP header (with the WAN port IP), and replaces it step by step, and finally the IP address in the data packet becomes a public network IP. This technology is called NAT (Network Address Translation).
        If we want our own server program to be accessed on the public network, we need to deploy the program on a server with an external IP. Such Servers can be purchased on Alibaba Cloud/Tencent Cloud.

4. NAT technology

        When our source IP address is forwarded in network nodes of different intranets and different levels, the replaced technology is called NAT technology.

        NAT technology is an important means to solve the shortage of IP addresses at present, and it is also an important function of routers.

        For NAT, for example, in the above network communication diagram, if host A sends communication to a public network IP, even if it performs layer-by-layer conversion after routing selection, what should I do when the target host returns to host A? ? Because host A is a private network IP, this is exactly the purpose of our learning NAT.

The conversion process of NAT ip:

        When we send it out: replace the source IP with the router existing in the public network, and perform a mapping between the subnet IP and the replacement IP . (In fact, the public network should be the router of the operator, and the router can also be converted to the IP of the upper subnet in the process of communication.) Then there is normal routing and forwarding.

        When receiving: just replace the target address with the private network ip mapped to the target address before, and then send it to the specified host.

        For example, in the figure above, we can find that client A communicates with the server ip 163.221.120.9 in the global IP. The source IP address is replaced by a public network IP of the NAT router (the middle is replaced layer by layer), that is, 202.244.174.37 and then sent to the server of the other party. Since it is in the public network, the Normal route forwarding is enough. When the server sends information to client A, it just sends it directly to the public network IP of the NAT router. After receiving the corresponding information, the router forwards it to the corresponding client according to the 163.221.120.9 lookup table.

        Then the question arises at this time, if there are two clients in the private network of the NAT router sending to the same server or no matter who they are sending to, how to distinguish the two clients? And where does the mapping relationship between them exist. This involves the NAPT conversion table .

NPT

         In fact, in the conversion table, a mapping relationship is a quaternion {source LAN port IP: port, target IP: port, source WAN port IP: port, target IP: port} .

         It can be found that during the NAT replacement process, we can use different ports on the public network to distinguish different hosts on the subnet according to the NAPT mapping table . This also means that the NAT technology needs to move to the port number of the upper layer to perform the mapping conversion table.

        1. In fact, in the process of source address conversion, it may not necessarily only replace the original IP, but also the source port when necessary! (When the internal network port number is the same, the router will process it when converting the external network port)

        2. In the process of NAT conversion, in addition to simple replacement, the router will also build a mapping relationship for us based on the quadruple requested by the message.

        3. The source IP indicates the only host, the source port indicates the only process on the host, and the source IP+source port - indicates the only process. (For internal and external networks)
        4. No matter from the inside to the outside, it can represent the uniqueness of each network, so this mapping relationship: mutual key value!
        5. If I have never visited the external network, can the external network directly access the internal network? - Theoretically, it is impossible, because NAT translation cannot be performed.
        6. But there are a lot of software based on the NAT principle, which can help us access the internal network from the external network--intranet penetration.

Defects of NAT Technology

        1. The external cannot directly establish a connection to the internal server. (can think of a way)

        2. There is additional overhead in the generation and consumption of the conversion table.

        3. Once the NAT device is abnormal, all TCP connections will be disconnected even if there is a hot backup (there is a backup, and the external network is sent to the other);

        So, apart from the conversion between the private network and the public network designated by NAT, how does host A locate host B through IP in the real public network (both are public IPs)? That is the routing and forwarding we will learn next.

5. Routing

        When two hosts are communicating on the network, we need to find a route to the destination in the complex network structure.

        The specific process of routing is the process of hop by hop in each LAN (each different LAN).

        The so-called hop is an interval in the data link layer. Specifically, in Ethernet, it refers to the frame transmission interval from the source MAC address to the destination MAC address.

         What we need to know is that the process of IP datagram transmission is similar to asking for directions.

        The router checks the destination IP of the data packet and decides whether to send it directly to the target host, or to send it to the next router (public network forwarding), if not, send it to the default router (intranet forwarding) .

        That is to say, if the data packet is in the middle of the process, it is first forwarded to the router with the public IP, check the routing table to find the next router that should go, and when it reaches the target subnet, it is found that the target IP is one of them One of the hosts is directly forwarded to the corresponding host.

        Repeat the above steps until the target IP address is found.

        Find the target network first, and then find the target host.

        Then the basis is the routing table inside each router or host. (Or maintain a routing table inside the node)

        So the routing process can be shown in the following figure:

         Under Linux My Cloud Server, you can use the command route to view the routing table:

        So how to determine the next hop process according to the routing table?

        Assume that the network interface configuration and routing table on a host are as follows:

         It can be found that there are two network interfaces on this host: 192.168.10.0/24 and 192.168.56.0/24. If the network number of the target IP hits one of the network interfaces, just forward it directly .

        If there is no match, the last line in the routing table is mainly composed of the next hop address and the sending interface. One-hop address.

In the router:

        Destination is the destination network address, Genmask is the subnet mask, Gateway is the next hop address, Iface is the sending
interface U sign in Flags indicates that this entry is valid (some entries can be disabled), and the G sign indicates the next entry of this entry The one-hop address is the address of a certain router, and the entries without the G mark indicate that the destination network address is the network directly connected to the interface of the machine, and does not need to be forwarded by the router.

        According to the above routing table, we can introduce two examples to understand.

1. If the destination address of the sent data packet is 192.168.56.3.

        Calculate with the subnet mask in the first line, and get: 192.168.56.0, which does not match the target network address in the first line, find the next one.

        Computing with the subnet mask in the second line, we get: 192.168.56.0, which matches the target network address in the second line, so it is sent out through the eth1 interface. Because the flag is U, it means that it is a network directly connected to the local interface, and it can be forwarded directly.

2. If the destination address of the data packet to be sent is 202.10.1.2.

        It can be found that the target network address of the two networks does not match. According to the default routing entry, it is sent from eth0 and sent to the 192.168.10.1 router. This router is determining the address of the next hop based on its router.

        Now we can briefly summarize the transport layer and network layer: the TCP/IP protocol layer realizes the reliable transmission of data from host A to host B.

        The network layer mainly solves: packet forwarding according to IP datagrams , and solves the problem of insufficient IP addresses through subnetting and NAT technology , and realizes where to send datagrams sent in the network through routing search and routing algorithms strategy.

        But note that what the IP layer provides is a forwarding strategy, and the real hop-by-hop transmission is done through the data link layer!

Second, the data link layer

        The data link layer provides the transfer between two nodes on the same data link node .

        The network layer logically solves the route problem from one host to another host, but it is the data link layer that actually solves the problem of hop by hop in this route.

        It can be understood as follows: When deciding to deliver data to the next-hop router, the next-hop router must be in the same LAN as me. The essence of each node is subnet forwarding. -> Macroscopically, all networks are composed of subnets.

        In the overall study of the TCP/IP protocol stack, we are in the driver until the data link layer corresponds to the computer, so the connection below is the physical layer. Many LAN standards are specified differently, and the representative ones are Ethernet , wireless LAN/WAN , etc.

        Such as "Ethernet" is not a specific network, but a protocol standard ; it includes not only the content of the data link layer, but also some content of the physical layer. For example: specifies the network topology, access control mode, transmission rate, etc.;

        For example, the network cable in Ethernet must use twisted pair; the transmission rate is 10M, 100M, 1000M, etc.;
        Ethernet is currently the most widely used local area network technology; and Ethernet is paralleled by token ring network, wireless LAN, etc.;

1. Ethernet frame format

1. The source address and destination address refer to the hardware address of the network card (also called MAC address ), the length is 48 bits, and it is solidified when the network card leaves the factory;

2. The frame protocol type field has three values, corresponding to IP, ARP, and RARP;

3. At the end of the frame is a CRC check code, which is used to check for errors.

        Understand the so-called hardware address. The purpose of the MAC address is to distinguish nodes in the same data link layer.

        The length is 48 bits, and 6 bytes. It is generally represented by a hexadecimal number plus a colon (for example: 08:00:27:03:fb:19)

         It is determined when the network card leaves the factory and cannot be modified. The mac address is usually unique (the mac address in the virtual machine is not the real mac address, which may conflict; some network cards also support user configuration of the mac address).

        We can re-understand the principle of LAN communication:

        Now in this LAN, suppose H1 host sends to H6.

        The H1 host first encapsulates the upper-layer IP datagram into a mac frame at the data link layer. The format of this mac frame is: m6-m1-0800-IP datagram-CRC.

        After sending to the LAN, all the hosts actually received the mac frame. After extraction, the header of the mac frame was extracted and compared to the first attribute target address. The rest of the hosts were directly discarded when they checked that they were not their own, but H6 found that it was their own. The header and payload are separated and forwarded upwards.

        It should be noted that when sending information in the LAN, it is inevitable that other people may also send it. We cannot control other people's hosts. At this time, collision problems will occur.

        For the collision problem , there is a corresponding protocol in the LAN protocol to solve it, such as pre-detecting whether there is a collision, if there is a collision, stop sending, and repeat sending after a period of time. At this time, the entire LAN can be regarded as a shared resource, so solving the collision problem here is actually solving the security problem of using the shared resource.

        1. For the collision problem, how good are the hosts in a LAN or as few as possible? Obviously less. But more can't be avoided, so a device is added: a switch, which can help us divide the collision domain and reduce the probability of collision in the area.

        2. When sending LAN data frames, is the data frame as long as possible or as short as possible? The data frame has a minimum size requirement and a maximum number of bytes. (Ethernet: 46~1500byte) MTU

        Therefore, here we can briefly mention the principle of LAN attack: if we can bypass the data link layer to check the collision problem, can we continue to send garbage data to the LAN soon, so that the entire LAN will not be able to communicate, and the network will be disconnected soon Well. (Achievable with some tools). 

        As for MTU, we have already mentioned it when the network layer IP protocol is subcontracted, and we can find the Ethernet protocol definition: the maximum data limit of mac frame is 1500byte (Maximum Transmission Unit MTU) , and the minimum is 46byte. For the data packets whose type is arp, if they are not enough, they will be filled later. And the MTU standards are different for different data link layers.

        Regarding the impact of MTU on upper-layer protocols, here is a brief summary:

Effect of MTU

        The first is the impact of MTU on the IP protocol . In the above network layer, it is specifically proposed that the 16-bit flag, 3-bit flag, and 13-bit offset in the IP header solve this kind of problem. If it is larger than the MTU, it must be subcontracted. IP provides a way to identify the order, and Let the other party's IP layer be able to assemble. However, it should be noted that the network layer does not provide reliability, that is to say, if the assembly fails during this process, the upper layer protocol will control the retransmission . (TCP)

        The impact on the transport layer UDP protocol . If the UDP protocol is used to transmit data, when the data size exceeds 1500byte (MTU) - 20byte (standard IP protocol header size) - 8byte (UDP protocol header size) = 1472byte, then fragmentation will be performed at the IP layer, and the data will be The probability of packet loss is greatly increased. (Data cannot guarantee reliability)

        Impact on the transport layer TCP protocol :

        First of all, you need to know that TCP messages cannot be infinitely large, and are also limited by MTU. The maximum message length of a single TCP datagram is called MSS . (refers to data size)

        In the process of TCP establishing a connection, the two parties in communication will conduct MSS negotiation .
        Ideally, the value of MSS is just the maximum length that IP will not be fragmented (this length is still subject to the MTU of the data link layer ).
        When both parties send SYN, they will write the MSS value they can support in the TCP header.
        Then, after both parties know the MSS value of the other party, they choose the smaller one as the final MSS.
        The value of MSS is the 40 characters in the TCP header Section variable length option (kind = 2);

        The relationship between MSS and MTU can be memorized according to the following figure:

         Under Linux, we can use the ifconfig command to view the MTU, ip address and mac address.

        But here, careful students found a problem, that is, when we encapsulate the header of the mac frame, we know our own mac address, so how do we know the other party's mac address ? How do we get its address?

         At this time, another protocol ARP is needed to solve it, helping us to obtain the mac address of the sent host.

2.ARP protocol

        The ARP protocol is not a pure data link layer protocol, but a protocol between the data link layer and the network layer.

        As in the format of the mac frame above, when the type is 0806, the arp request and response are constructed.

        The purpose of the arp protocol is to establish a relationship between host and mac address mapping. When the upper-layer IP datagram is forwarded to this layer, first determine whether the target IP address is in the arp cache table (find the corresponding mapped mac address), if not found, then build an arp datagram to broadcast to this LAN:

        The format of this arp datagram is as follows:

        First of all, the header of the previous Ethernet can be ignored.

        In the arp request response: the hardware type, protocol type, hardware address length, and protocol address length are all in a fixed format.

        For the 2byte op, it is used to distinguish whether the arp datagram is a request or a response .

        The IP address of the sending end is the source IP address of the sending host. If the destination Ethernet is unknown, set all 1 bits to indicate unknown, and the destination IP address is the target IP address of the sending end. 

         After broadcasting to the LAN, all hosts have received the request at this time. And it must be at the data link layer and delivered to the ARP layer. After delivery, first look at the op field: first distinguish between request and response. 1 is a request, and then find the destination IP address for comparison, discard if the comparison is unsuccessful, and keep it if it succeeds.

        The reserved party builds an arp response, identifies the response in the op field, and then fills in its own address in the mac address of the sender. The destination Ethernet address is the sender Ethernet address of the previously sent arp datagram request. (And it will intervene in the arp cache table to build the corresponding mapping -ip<->mac) Similarly, after receiving the request, the requesting party will extract the mac address and map the target ip with the corresponding mac address, and then proceed normally Ethernet frame sent.

         It can be briefly summarized that the premise of ARP needs to know the IP of the target, and only based on this can the MAC address of the other party be obtained . It should be noted that the ARP cache table will not keep the mapping all the time, and there is a certain time limit (if it is not used during this period), because there may be a situation: if it is kept all the time, the original host will be powered off and restarted. After the connection, the IP will be reassigned, so the IP address will not correspond to it at this time.

        You can view the arp cache table through the Linux command arp -a:

         Now that we have talked about the arp protocol, we can briefly explain the principle of arp spoofing here .

ARP spoofing

        ARP spoofing can easily modify the dynamic arp cache tables of both hosts in the same LAN, so that the attacker is called a man-in-the - middle .

        Suppose host A and host B in a local area network have passed the arp request and response. At this time, there is such a mapping in the arp cache table in A: IPb->MACb, and there is such a mapping in the arp cache table in B: IPa->MACa .

        At this time, host C frantically builds and sends a large number of ARP requests to host B, where the IP address of the sending end is IPa, but the Ethernet address of the sending end is MACc.

        It should be noted that for the dynamic arp cache table, the new one will be fetched instead of the old one, so host B updates the cache table IPa->MACc at this time. Similarly, host C is building a large number of arp responses to host A, in which the IP address of the sending end is IPb, but the Ethernet address of the sending end is MACc. At this time, host A also updates the arp cache table: IPb->MACc. So at this time, whenever host a wants to send information to host B, it will send it to host c, and vice versa.

        Although the information cannot be constructed through the protection of https or other security layers, the designated host can be disconnected from the network.

3. Other important protocol technologies

1.DNS

        In the daily Internet surfing with browsers, we found that we directly use domain names instead of ip addresses for identification.

        Usually the IP address is inconvenient to remember, so people invented a name called the host name, which is a string. With the development of the Internet, the DNS system was finally created to maintain these IP and host name relationships.

        A domain name is composed of host name, structure name, network name, and top-level domain name. For example, in a host domain name for.zj.edu.cn, for is the host name, zj is the structure name, edu is the network name, and cn is the top-level domain name.

        The system management agency of an organization maintains the correspondence between the IP and host name of each host in the system.
        If a new computer is connected to the network, this information will be registered in the database;
        when the user enters the domain name, it will automatically query the DNS server, The database is retrieved by the DNS server to obtain the corresponding IP address.

        The DNS system provides a domain name resolution service. For example, when a url is entered in the browser, it first starts sending a request to the DNS system - (note that the DNS system is based on UDP), and the DNS system returns the corresponding URL after receiving the request. IP. (There are several layers of DNS resolution servers, which will eventually be uploaded to the root server).

        For example, sometimes, QQ and games are logged in normally, but the webpage cannot be opened. While using DNS, the browser's built-in DNS ip address service is down.

2. NAT and proxy server

        First of all, in the previous network layer, we introduced the NAT basis to distinguish between the private network and the external network, and provided a method for host communication in the private network and the public network, which effectively solved the problem of insufficient current IPV4 addresses.

        In fact, the proxy server is also a NAT device, but the NAT device mainly solves the problem of insufficient IP addresses, but the proxy server is more biased towards the application layer and can fulfill some specific needs for us. Therefore, NAT devices generally work at the network layer and replace IP addresses, but for proxy servers, they work at the application layer.

        The proxy server is a software device that is set up on the server, while the NAT device is set up in hardware devices such as routers or firewalls.

forward proxy server

        For example, in the school's local area network, all host network communications need to be connected to the school's proxy server before going out.

 At this point, the school's proxy server can extract information for detection:

    1. Identity authentication.
    2. Cache data to improve access efficiency.
    3. Content moderation.
    4. Ensure intranet security.

        This is a forward proxy server , that is to say, we need to let the forward proxy server do it for us .

reverse proxy server

        When the company sets up the service and builds the computer room, it will maintain a proxy server.
        When the client accesses company resources, the request is first forwarded to the proxy server, and all requests will be passed through certain methods such as polling, random numbers, etc. For example, establish a mapping relationship between IP and access pressure cnt. It does not do any business processing, but is only responsible for pushing the request to the designated host on the backend. Ensure load balancing across the cluster .

        That is to say, the reverse proxy server is equivalent to the process of proxying the normal process of receiving network information, and using this server to distribute, so as to achieve better control effect.

        Proxy servers are widely used. For different networks: LANs and WANs have different application scenarios. For example, a scientific proxy is a proxy in a wide area network.

        Here you can briefly mention the principle of scientific Internet access:

        In China, the erection of all local area networks is realized by operators. Then at this time, there is a large server in the operator, and all domestic visits must go through the operator. However, if domestic operators find that the accessed IP is from abroad, they will directly discard those that are not allowed to access.

        However, in some areas of China (Hong Kong, Macao and Taiwan), they do not need to go through domestic operators. They have their own operators and can directly access the external network.

        Therefore, you can set up a proxy server in these areas, and pair the corresponding public key and private key, and send it to the corresponding domestic client. If you want to access the external network in China, use the public key to encrypt, encapsulate http, ip for these areas , but the content inside is indeed the content of the external network. After the domestic operator received it, they found that the ip was fine, but the corresponding http content could not be unpacked, so it was sent to the proxy servers in these areas without detection.

        This proxy server is a forward proxy server. After decryption, it forwards to the external network. After receiving the content, it can be encrypted and returned.

3. ICMP protocol

        The ICMP protocol is a network layer protocol. A newly built network often needs to conduct a simple test first to verify whether the network is smooth; but the IP protocol does not provide reliable transmission . If the packet is lost, the IP protocol cannot notify Whether the transport layer loses packets and the packet loss.

The main functions of ICMP include:
        confirming whether the IP packet has successfully reached the destination address.
        Notifying the reason why the IP packet was discarded during the sending process.
        ICMP also works based on the IP protocol. But it is not a function of the transport layer, so people still attribute it to Network layer protocol;
        ICMP can only be used with IPv4. If it is IPv6, it needs to use ICMPv6;

Four. Summary

        For the study of the TCP/IP protocol stack, we focus on summarizing the following knowledge:

data link layer

        The role of the data link layer: Ethernet is a technical standard for transferring data between two devices (the same data link node)
        ; it includes both the content of the data link layer and some content of the physical layer. For example: specify the network topology, access control mode, transmission rate, etc.;         understand the mac address
        of the Ethernet frame format ,         understand the arp protocol,         and understand the MTU


Network layer

        The role of the network layer: determine a suitable path in a complex network environment.
        Understand the IP address, understand the difference between an IP address and a MAC address.
        Understand the IP protocol format.
        Understand the network segment division method
        and understand how to solve the problem of insufficient IP numbers Two schemes for network segment division. Understand private IP and public network IP, understand the IP address routing process at the network layer. Understand how a data packet crosses a network segment to reach the final destination. Understand the reason for IP data packet subcontracting. Understand the ICMP 
        protocol .Understand
        the working principle of NAT equipment.

transport layer

        The role of the transport layer: Responsible for data transmission from the sender to the receiver. Understand the concept of port numbers. 
        Know the UDP protocol and understand the characteristics of the UDP protocol.

        Know the TCP protocol, understand the reliability of the TCP protocol. Understand the state transition of the TCP protocol. Master the TCP connection
        management, confirmation response, timeout retransmission, sliding window, flow control, congestion control, delayed response, piggyback response characteristics. Understand TCP-oriented Byte stream, understand sticky packet problems and solutions.
        Be able to achieve reliable transmission based on UDP.
        Understand the impact of MTU on UDP/TCP.

application layer

        The role of the application layer: The network programs that meet our daily needs are all in the application layer that can design the application layer protocol according to their own needs.
        Understand the HTTP protocol. 
        Understand the principle and workflow of DNS.

Guess you like

Origin blog.csdn.net/weixin_61508423/article/details/129877642