Computer Network Chapter 6 (Application Layer)

Reference tutorial: 6.1 Application layer overview_bilibili_bilibili

1. Overview of the application layer

1. The status of the application layer

The application layer is the top layer of the computer network architecture. It is the ultimate goal of designing and establishing a computer network. It is also the fastest growing part of the computer network.

b2d12efd09bc4484b1e2ead9ecdd9752.png

2. Some classic network applications

83bd60504d8e4fee8f68b8ea009535a0.png

2. Client/server mode and peer-to-peer mode

1. How network applications are organized on various end systems and the relationships between them

Network applications run on different end systems at the edge of the network and communicate with each other to complete a certain task together. Therefore, the first issue to consider when developing a new network application is the organization of network applications on various end systems and the relationship between them. Currently, the following two are popular:

①One is the client/server method , also known as C/S (Client/Server) method .

②One is the peer-to-peer method , also known as the P2P (Peer-to-Peer) method .

2. Client/server method

(1) Client and server refer to the two application processes involved in communication. The client/server approach describes the relationship between services and being served.

(2) As shown in the figure below, host A at the edge of the network runs a client program. The running client program is called a client process , or simply called a client . The host running the client process should be called a client computer , but sometimes Also referred to as the client; host B at the edge of the network runs a server program. The running server program is called a server process , or can also be called a server for short. The host running the server process should be called a server computer, but sometimes it is also called a server computer for short. server.

(3) In the client/server mode, the client requests services from the server, and the server provides services to the client after receiving the service request. In other words, the client is the requester of the service and the server is the provider of the service.

(4) The server is always running and waiting for customer service requests. The server has a fixed port number (for example, the default port number for an HTTP server is 80), and the host running the server also has a fixed IP address.

cae95390a262492d959b2a5a843f6c7f.png

(5) The C/S method is the traditional and most mature method on the Internet. Many network applications we are familiar with use the C/S method, including the World Wide Web WWW, email, file transfer FTP, etc.

(6) Application services based on the C/S method are usually service centralized, that is, application services are concentrated on server computers that are much smaller than client computers in the network. Since a server computer has to provide services for multiple clients, in C/S applications, it often happens that the server computer cannot keep up with the requests of many clients. For this reason, in C/S applications, computer clusters (or Server farm) to build a powerful virtual server.

3. Peer-to-peer method

(1) In the P2P method, there are no fixed service requesters and service providers. The application processes distributed in each end system at the edge of the network are peers and are called peers. Peers communicate directly with each other, and each peer is both a requester and provider of services.

(2) As shown in the figure below, hosts C, D, E, and F at the edge of the network run the same P2P program, such as some kind of network download tool software. The P2P processes in E and F are peers of each other, the P2P processes in C and D are peers of each other, and the P2P processes in E are also peers of the P2P process in D. It can be imagined that E's P2P process is downloading files from F, and at the same time it also provides download services for D's P2P process.

c06f5727a90949fc98c068b808b26623.png

(3) Currently, popular P2P applications on the Internet mainly include P2P file sharing, instant messaging, P2P streaming media, distributed storage, etc.

(4) P2P-based applications are service-dispersed, because services are not concentrated in a few server computers, but dispersed among a large number of peer computers. These computers are not owned by the service provider, but are controlled by individuals. of desktop computers and laptops, which are commonly found in residences, campuses, and offices.

(5) One of the most prominent features of the P2P method is its scalability, because every time a peer is added to the system, not only the service requesters are added, but also the service providers are added, and the system performance will not be affected. decreases with increasing scale.

(6) The P2P method has cost advantages because it usually does not require huge server facilities and server bandwidth. In order to reduce costs, service providers are increasingly interested in using P2P methods for applications.

3. Dynamic Host Configuration Protocol DHCP

1. The role of DHCP

(1) As shown in the figure below, there is such a network topology. Network-related configuration information such as IP address, subnet mask, default gateway, DNS server, etc. need to be correctly configured for each host in the network so that the user host can access the Web server normally.

(2) When there are a large number of hosts in the network, if manual configuration is adopted, the workload will be relatively large and error-prone.

(3) If a DHCP server is added to the network , the network configuration information that can be configured for other hosts in the network is set in the server. Each host in the network automatically starts the DHCP program after booting and requests its own network from the DHCP server. Configuration information , so that each host in the network can automatically obtain network configuration information from the DHCP server without manual configuration.

cf936eb7d12a492aaa98a34e67fb18cc.png

2. Working process of DHCP

(1) As shown in the figure below, assume that there are two DHCP servers and multiple user hosts in the network. DHCP uses the client/server method . The DHCP server process (also referred to as DHCP server) runs on the DHCP server, and the DHCP client process (also referred to as DHCP client) runs on the user host .

(2) DHCP is a protocol in the application layer of the TCP/IP protocol system. It uses the services provided by UDP at the transport layer . That is to say, DHCP messages will be encapsulated into UDP user datagrams at the transport layer. The DHCP server uses The UDP port is 67 , and the UDP port used by DHCP clients is 68. Both UDP ports are well-known ports.

09e55ae168ca405f9c8d0e4cf484b30d.png

(3) UDP user datagrams encapsulated with DHCP messages will be encapsulated into IP datagrams at the network layer, and then encapsulated into corresponding data link layer frames for transmission according to the network interface used, for example, encapsulated into Ethernet Network frame. (For the sake of simplicity, if there is no special need, the following description will not describe the layer-by-layer encapsulation process of DHCP messages every time)

(4) The interaction process between DHCP client and DHCP server:

① When the host's DHCP is enabled, the DHCP client will broadcast and send DHCP discovery messages .

[1] The source IP address of the IP datagram encapsulating the message is 0.0.0.0 . This is because the host has not yet been assigned an IP address, so this address is used instead; the destination IP address is the broadcast address 255.255.255.255 , and Therefore, the broadcast is sent because the host currently does not know how many DHCP servers there are in the network and what their IP addresses are.

[2] Since it is a broadcast IP datagram, all devices in the network will receive the IP datagram and decapsulate it layer by layer, decapsulating the UDP user datagram encapsulated with the DHCP discovery message.

[3] The format of the DHCP message is relatively complex. For the DHCP discovery message, you only need to know that the thing ID and the MAC address of the DHCP client are encapsulated inside it .

② For the DHCP client , the application layer does not have a process that listens to the destination port 67 of the UDP user datagram , that is, the DHCP server process . Therefore, the DHCP discovery message cannot be delivered and can only be discarded; while for the DHCP server , the application layer is always running. The DHCP server process will accept the DHCP discovery message and respond.

6983640236774d09bc4b27e9d01ded20.png

After receiving the DHCP discovery message, the DHCP server searches its own database based on the MAC address of the DHCP client encapsulated in it to see if there is configuration information for the MAC address .

[1] If there is configuration information for the MAC address , use this configuration information to build and send the DHCP offer message ; if there is no configuration information for the MAC address , use the default configuration information to build and send the DHCP offer message. .

[2] The source IP address of the IP datagram encapsulating the message is the IP address of the DHCP server ; the destination IP address is still the broadcast address . The reason why the broadcast address is still used is that the host has not yet configured an IP address. In order to enable the host to receive , only broadcasts can be sent. In this way, all devices in the network will receive the IP datagram, decapsulate it layer by layer, and decapsulate the UDP user datagram encapsulated with the DHCP provided message.

[3] When the DHCP server selects an IP address to be leased to a host from its own IP address pool, it will use ARP to ensure that the selected IP address is not occupied by other hosts on the network.

[4] DHCP packets also encapsulate configuration information, such as IP address, subnet mask, address lease, default gateway, DNS server, etc.

④ For the DHCP server , its application layer does not have a process that monitors the destination port 68 of the UDP user datagram , that is, the DHCP client process . Therefore, it cannot deliver the DHCP message and can only discard it; while for the DHCP client , its application layer runs DHCP The client process will therefore accept the DHCP message and process it accordingly.

06182a76269c432e92df58eac919b6c4.png

After receiving the DHCP offer message, the DHCP client determines whether the message is the message it requested based on the thing ID encapsulated in it . In other words, if the thing ID is the same as the thing encapsulated in the DHCP discovery message it sent before, If the transaction IDs are equal, it means that this is the message requested by itself , and the message can be accepted, otherwise the message will be discarded.

⑥In this example, the DHCP client will receive DHCP offer messages from two DHCP servers . The DHCP client chooses one of them. Generally speaking, it chooses the one that arrives first and sends a DHCP request message to the selected DHCP server. arts .

[1] The source IP address of the IP datagram encapsulating the message is still 0.0.0.0 , because at this time the DHCP client selects one from multiple DHCP servers as its own DHCP server, and it first needs to obtain the consent of the server. , only then can the IP address leased from the DHCP server be officially used; the destination IP address is still a broadcast address . The purpose of this is to avoid unicasting a DHCP request message to every DHCP server in the network to tell them whether to request them as Own DHCP server.

[2] The DHCP request message is encapsulated with information such as the transaction ID, the MAC address of the DHCP client, the IP address in the accepted lease, and the IP address of the DHCP server that provides the lease.

⑦ In this example, assume that the DHCP client selects DHCP server 1 as its DHCP server, and DHCP server 1 accepts the request , so DHCP server 1 sends a DHCP confirmation message to the DHCP client . The source IP address of the IP datagram encapsulating the message is the IP address of DHCP server 1 ; the destination IP address is still the broadcast address .

db570cb4dc564bee8bc26ba44abcf5b3.png

After receiving the confirmation message, the DHCP client can use the leased IP address . Before using the leased IP address, the host will also use ARP to detect whether the IP address is occupied by other hosts on the network. If it is occupied, it will send a DHCP decline message to the DHCP server to revoke the IP address lease and resend a DHCP discovery message. If it is not occupied, you can use the IP address in the lease to communicate with other hosts on the network.

20e1f3fc16654cb19a7ef1bda9f4a930.png

⑨When half of the lease period has passed , the DHCP client will send a DHCP request message to the DHCP server to request an update of the lease period . The source IP address of the IP datagram encapsulating the message is the IP address previously leased by the DHCP client ; the destination IP address is the address of DHCP server 1 .

[1] If the DHCP server agrees, it will send back a DHCP confirmation message, so that the DHCP client gets a new lease period.

[2] If the DHCP server does not agree, it will send back a DHCP deny message. At this time, the DHCP client must immediately stop using the previously leased IP address and resend the DHCP discovery message to reapply for the IP address.

[3] If the DHCP server does not respond, the DHCP client must resend the DHCP request message when 87.5% of the lease period has passed, and then continue to wait for a possible response from the DHCP server. If the DHCP server still does not respond , then when the lease period expires, the DHCP client must immediately stop using the previously leased IP address and resend the DHCP discovery message to reapply for the IP address.

⑩The DHCP client can terminate the lease period provided by the DHCP server at any time in advance . In this case, it only needs to send a DHCP release message segment to the DHCP server .

d77d3e294ac1465296ff3b0fc8f94f87.png

3. DHCP relay agent

(1) As shown in the figure below, there is such a network topology. Hosts on the network to the left of the router cannot automatically obtain network configuration information such as IP addresses because hosts on this network broadcast DHCP discovery messages, but the broadcast messages are not forwarded by the router but discarded.

(2) The solution is to configure the router with the IP address of the DHCP server and make it a DHCP relay agent . In this way, each host in the network can automatically obtain network configuration information through DHCP. When the router receives the broadcast DHCP discovery message, it will unicast it and forward it to the DHCP server. The subsequent interaction process between the DHCP client and the DHCP server through the router will not be described again.

9d04a89c33b543fd91ba1a92dd412375.png

(3) The main reason for using a DHCP relay agent is that not every network is equipped with a DHCP server , because this will cause too many DHCP servers.

4. Domain Name System DNS

1. The role of domain name system DNS

(1) As shown in the figure below, if a host on the Internet wants to access a certain Web server, it only needs to run a certain browser software on the user's host, enter the domain name of the Web server to be accessed in its address bar and press Press the Enter key to access the content provided by the web server. Even if you do not use a domain name, you can also address the destination host through an IP address. However, compared with an IP address, a domain name is easier for people to remember. Therefore, for most network applications, domain names are generally used to access the destination host instead of directly using IP addresses. Come visit.

(2) When we enter the domain name of a web server in the browser address bar:

① The user host will first search for the IP address corresponding to the domain name in its own DNS cache. If it is not found, it will query a DNS server on the network.

②The DNS server has a database of mapping relationships between domain names and IP addresses. When the DNS server receives the DNS query message, it searches it in its database, and then sends the search results to the user host. (DNS messages are encapsulated using the UDP protocol of the transport layer, and the transport layer port number is 53)

③Then the browser in the user's host computer can access the web server through its IP address.

154da97107dd452dbd7803891168a2d4.png

(3) The Internet cannot use only one DNS server . Because the Internet is very large, using only one domain name server will definitely not work properly due to overload. Moreover, once the domain name server fails, the entire Internet will be paralyzed.

① As early as 1983, the Internet began to use a hierarchical naming tree as the name of the host (i.e. domain name), and used the distributed domain name system DNS.

②DNS enables most domain names to be resolved locally, and only a small amount of resolution requires communication on the Internet, so the system is very efficient.

③Since DNS is a distributed system, even if a single computer fails, it will not hinder the normal operation of the entire system.

2. The hierarchical tree-like domain name structure used in the Internet domain name system

(1) The structure of a domain name consists of several components, each component is separated by "dots", representing different levels of domain names.

① Each level of domain name consists of English letters and numbers, no more than 63 characters, and does not distinguish between uppercase and lowercase letters.

②The domain name with the lowest level is written on the far left, and the top-level domain name with the highest level is written on the far right.

③The complete domain name should not exceed 255 characters.

85678b9dfa8b4ce39c3c94c7ae75de93.png

(2) The domain name system does not stipulate how many lower-level domain names a domain name needs to contain, nor does it stipulate what each level of domain name means.

(3) Domain names at all levels are managed by the domain name management agency at the upper level, and the highest top-level domain name is managed by ICANN, the Internet Corporation for Assigned Names and Numbers.

5db9c9de01b246aa9ed7a8499bf2f93b.png

(4) Top Level Domain TLD (Top Level Domain) is divided into three categories:

National top-level domain name nTLD : adopt the regulations of ISO 3166, such as cn means China, us means the United States, uk means the United Kingdom, etc.

Generic top-level domain name gTLD : There are seven most common generic top-level domain names, namely com (company), net (network service organization), org (non-profit organization), int (international organization), edu (American educational structure) , gov (U.S. government department), il (U.S. military department).

③Reverse domain arpa : used for reverse domain name resolution, that is, IP address is reversely resolved into domain name.

(5) The second-level domain names registered under the national top-level domain name are determined by the country itself. For example, Japan, whose top-level domain name is jp, sets the second-level domain names of its educational and corporate institutions as ac and co instead of edu and com. .

①China divides second-level domain names into the following two categories:

[1] Category domain names : seven in total, namely ac (scientific research institutions), com (industrial, commercial, financial and other enterprises), edu (educational institutions), gov (government departments), net (institutions that provide network services), mil ( military agencies) and orgs (non-profit organizations).

[2] Administrative region names : 34 in total, applicable to all provinces, autonomous regions, and municipalities directly under the Central Government in China, such as bi for Beijing City, sh for Shanghai City, js for Jiangsu Province, etc.

②It should be noted that domain names with the same name may not have the same level.

(6) Internet domain name space:

bfd6c324fc1f4d0ba55f61d6d50219e3.png

① This hierarchically managed naming method makes it easy to maintain the uniqueness of names, and it is also easy to design an efficient domain name query mechanism.

②It should be noted that the domain name is only a logical concept and does not represent the physical location of the computer.

3. Domain name server

(1) The mapping relationship between domain names and IP addresses must be stored in the domain name server for query by all other applications, but obviously all information cannot be stored in one domain name server.

(2) DNS uses domain name servers distributed in various places to convert domain names to IP addresses.

(3) Domain name servers can be divided into the following four different types:

①Root domain name server :

[1] The root domain name server is the highest level domain name server. Each root domain name server knows the domain names and their 1P addresses of all top-level domain name servers.

[2] There are 13 root domain name servers with different IP addresses on the Internet. Although we regard each of these 13 root domain name servers as a single server, "each server" is actually composed of many distributed around the world. A server cluster composed of computers in various places.

[3] When the local domain name server sends a query request to the root domain name server, the router forwards the query request message to the root domain name server closest to the DNS client. This speeds up the DNS query process and makes it more reasonable. Utilized the resources of the Internet.

[4] The root domain name server usually does not directly resolve the domain name, but returns the IP address of the top-level domain name server to which the domain name belongs.

②Top -level domain name server :

[1] These domain name servers are responsible for managing all second-level domain names registered in the top-level domain name server.

[2] When receiving a DNS query request, these domain name servers will give corresponding answers (which may be the final result, or the IP address of the lower-level authority domain name server).

③Authority domain name server :

[1] These domain name servers are responsible for managing domain names in a certain zone.

[2] The domain name of each host must be registered with an authorized domain name server, so the authorized domain name server knows the mapping relationship between the domain name and IP address under its jurisdiction.

[3] In addition, the authority domain name server also knows the address of its subordinate domain name server.

④Local domain name server :

[1] The local domain name server does not belong to the above-mentioned domain name server hierarchy.

[2] When a host sends a DNS request message, the message is first sent to the local domain name server of the host. The local domain name server acts as a proxy and forwards the message to the above-mentioned domain name server level. in the structure.

[3] Every Internet Service Provider (ISP), a university, or even a college within a university, can have a local name server, which is sometimes called a default name server.

[4] The local domain name server is relatively close to the user, generally no more than a few routers away, or it may be in the same LAN.

[5] The IP address of the local domain name server needs to be configured directly in the host that requires domain name resolution.

4. Domain name resolution process

(1) Domain name resolution includes two query methods: recursive query and iterative query .

(2) Recursive query:

① Assume that the host in the figure below wants to know the IP address of the domain name y.abc.com. The host first performs a recursive query to its local domain name server.

② After receiving the request for recursive query, the local domain name server also uses recursive query to query a root domain name server.

③After receiving the request for recursive query, the root domain name server also uses recursive query to query a top-level domain name server.

④After receiving the entrustment of recursive query, the top-level domain name server also uses recursive query to query a certain authority domain name server.

⑤When the IP address corresponding to the domain name is queried, the query results will be passed between the previously entrusted domain name servers and finally returned to the user's host.

28cd6df4d8de48a9be3f31906eb29c6b.png

(3) Iterative query:

① Assume that the host in the figure below wants to know the IP address of the domain name y.abc.com. The host first performs a recursive query to its local domain name server.

②The local domain name server uses iterative query, which first queries a root domain name server.

③The root domain name server tells the local domain name server the IP address of the top-level domain name server that should be queried next.

④The local domain name server performs iterative queries to the top-level domain name server.

⑤The top-level domain name server tells the local domain name server the IP address of the authority domain name server that should be queried next time.

⑥The local domain name server performs iterative queries to the authority domain name server.

⑦The authority domain name server tells the local domain name server the IP address of the domain name queried.

⑧The local domain name server finally tells the host the query results.

ea0ce8cd782c4f40a5d5ced336b30f67.png

(4) Since recursive queries place too great a burden on the queried domain name server, the following pattern is usually adopted: the query from the requesting host to the local domain name server is a recursive query, and the remaining queries are iterative queries.

5. Cache

(1) In order to improve the query efficiency of DNS, reduce the load on the root domain name server and reduce the number of DNS query messages on the Internet, cache is widely used in domain name servers. The cache is used to store records of recently queried domain names and where the domain name mapping information was obtained.

(2) As shown in the figure below, if a user has queried the IP address of the domain name y.abc.com not long ago, the IP address corresponding to the domain name should be stored in the cache of the local domain name server. When the host recursively queries the local domain name server for the domain name, the local domain name server does not need to iteratively query a root domain name server. Instead, it directly retrieves the last query result stored in the cache (i.e., the IP address of the domain name). ) tells the user host.

1b6fed2f0a8e4e75b8ad35467986790a.png

(3) Since the mapping relationship between domain names and IP addresses is not permanent, in order to keep the contents in the cache correct, the domain name server should set a timer for each item and delete items that exceed a reasonable time (for example, each item should only Store for two days).

(4) Cache is needed not only in the local domain name server, but also in the user host.

① Many user hosts download the entire database of domain names and IP addresses from the local domain name server at startup, maintain a cache of their recently used domain names, and only query the domain name server when the domain name is not found in the cache.

②Similarly, the host also needs to maintain the accuracy of the contents in the cache.

5. File transfer protocol FTP

1. Overview of file transfer protocol

(1) Transferring files from one computer to another computer that may be far away through the network is a basic network application, that is, file transfer.

(2) File Transfer Protocol FTP (File Transfer Protocol) is the most widely used file transfer protocol on the Internet.

① FTP provides interactive access, allowing customers to specify the type and format of files (such as specifying whether to use ASCII code), and allowing files to have access rights (such as users accessing files must be authorized and enter a valid password).

②FTP shields the details of each computer system, so it is suitable for transferring files between any computers in heterogeneous networks.

(3) In the early stages of the development of the Internet, using FTP to transfer files accounted for about one-third of the entire Internet traffic, and the traffic generated by email and domain name systems was even smaller than the traffic generated by FTP. It was only in 1995 that the traffic volume of the World Wide Web exceeded FTP for the first time.

2. Application of file transfer protocol

(1) As shown in the figure below, assume that there is an FTP server and multiple user hosts in the network. FTP uses the client/server method . The FTP server process (also referred to as FTP server) is run on the FTP server, and the FTP client process (also referred to as FTP client) is run on the user host .

(2) FTP client computers on the Internet can upload various types of files to FTP server computers, and FTP client computers can also download files from FTP server computers.

2d1fff1ea2114632a807d05b40e847b2.png

(3) Depending on the application requirements, the FTP server may require a high-performance and high-reliability server computer, or it may only require an ordinary personal computer. This example uses an ordinary personal computer as the FTP server computer.

(4) For the sake of simplicity, assume that the FTP client computer and the FTP server computer are in the same LAN, and create an FTP server in the FTP server computer. To create an FTP server, you can use a third-party FTP server software, or you can use the FTP server software that comes with the operating system. For example, in a Windows system, use its own FTP server function to add an FTP site (FTP server), and its IP address is 192.168.124.16, the FTP client computer can use the browser software to access the FTP server through this IP address.

ec39cb1f428f4ba9981ec638c2e4bfad.png

(5) Common uses of FTP:

①Transfer files between computers, especially for batch transfer of files.

② Let website designers batch upload a large number of files that make up website content to their web servers.

3. Basic working principle of FTP

(1) Active mode:

①As shown in the figure below, the FTP server listens to the well-known port number 21 , and the FTP client randomly selects a temporary port number to establish a TCP connection with it. This TCP connection is used to transmit FTP-related control commands between the FTP client and the server , that is, This TCP connection is the command channel between the FTP client and the server .

②When there is data to be transmitted, the FTP client notifies the FTP server through the command channel to establish a TCP connection with another temporary port number of its own (that is, establishing a data channel). The FTP client randomly selects another temporary port number, and the FTP server uses its well-known port number 20 to establish a TCP connection with it . This TCP connection is used to transfer files between the FTP client and the server . In other words, this TCP connection is an FTP client. Data channel to and from the server .

③Since the FTP server actively connects to the FTP client when establishing a data channel, it is called active mode .

④The control connection remains open during the entire session and is used to transmit FTP-related control commands; the data connection is used for file transfer, is established each time the file is transferred, and is closed after the transfer is completed.

37918ba3cf7047b09a6c96c0fa50f804.png

(2) Passive mode:

①As shown in the figure below, for the establishment of a command channel between the FTP client and the server, there is no difference between the passive mode and the active mode.

②When there is data to be transmitted, the FTP client tells the FTP server through the command channel to open a negotiated temporary port and passively wait for the TCP connection from the FTP client (that is, establishing a data channel). The FTP client randomly selects another temporary port number, and the FTP server randomly selects a temporary port number . The FTP client initiates a TCP connection with the FTP server to establish a data channel.

③Since the FTP server passively waits for the FTP client's connection when establishing a data channel, it is called passive mode .

④The control connection remains open during the entire session and is used to transmit FTP-related control commands; the data connection is used for file transfer, is established each time the file is transferred, and is closed after the transfer is completed.

05aa9287dd21488eabb51d14db81583a.png

6. Email

1. Email Overview

(1) Email (E-mail) is one of the earliest popular applications on the Internet, and is still one of the most important and practical applications on the Internet today.

(2) Traditional telephone communication is real-time communication and has the following two shortcomings:

① Both the calling party and the called party in telephone communication must be present at the same time.

②Some calls that are not very urgent often interrupt people's work or rest unnecessarily.

(3) Email is similar to sending letters in the postal system:

①The sender sends the email to the mail server he uses.

②The sender's mail server forwards the received mail to the recipient's mailbox in the recipient's mail server according to its destination address.

③The recipient can access his or her own mailbox in the recipient's mail server at a convenient time to obtain the received email.

(4) Email is easy to use, fast to deliver and low in cost. Not only can it transmit text messages, but it can also be accompanied by sounds and images.

2. Main components of email system

(1) The email system adopts the client/server method.

(2) The three main components of an email system: user agent, mail server, and protocols required for email.

①The user agent is the interface between the user and the email system, also known as email client software . A user agent is required on the sender's computer to send emails, and a user agent is also required on the recipient's computer to receive emails.

②The mail server is the infrastructure of the email system. All ISPs on the Internet have mail servers. Their function is to send and receive mails , and they are also responsible for maintaining users' mailboxes. You can simply think that there are many mailboxes in the mail server, as well as caches for caching mails to be forwarded.

③The sender uses the user agent to send the email to the sending mail server through the mail sending protocol (such as SMTP, Simple Mail Transfer Protocol) , and the sending mail server also sends the mail to the receiving mail server through the mail sending protocol. The receiving party can conveniently When using a user agent to read mail from the receiving mail server through a mail reading protocol (such as POP3, Post Office Protocol) . That is to say, the protocols required for email include mail sending protocols (such as SMTP) and mail reading protocols (such as POP3, IMAP).

4fc422c4abc742c09e548261400f5993.png

3. The specific process of sending and receiving emails

(1) The sender's user agent acts as an SMTP client and makes a TCP connection with the SMTP server in the sender's mail server , and then uses the SMTP protocol to send emails to the sender's mail server based on this connection.

(2) The SMTP client in the sending mail server establishes a TCP connection with the SMTP server in the receiving mail server , and then uses the SMTP protocol to send the received mail to be forwarded to the receiving mail server based on this connection.

(3) The recipient's user agent acts as a POP3 client and makes a TCP connection with the POP3 server in the recipient's mail server , and then uses the POP3 protocol to read emails from the recipient's mail server based on this connection.

5b190c66479045dab6fe211addb156a2.png

4. Basic working principle of Simple Mail Transfer Protocol SMTP

(1) As shown in the figure below, take the sender's mail server using the SMTP protocol to send emails to be forwarded to the recipient's mail server.

(2) The sending mail server periodically scans the mail cache. If a mail to be forwarded is found, the SMTP client in the sending mail server will make a TCP connection with the SMTP server in the receiving mail server. The port number is 25 .

The SMTP client can send SMTP commands to the SMTP server based on this TCP connection , a total of 14 types; the SMTP server will also send corresponding responses to the SMTP client , a total of 21 types.

The interaction between the SMTP client and the server through commands and responses ultimately enables the SMTP client to send emails to the SMTP server .

87a8e95b72b34266886eeccaf1040b5a.png

(3) When the TCP connection is successfully established, the SMTP server will actively push a service ready response to the SMTP client. The response code 220 may be followed by description information. After receiving the response, the SMTP client identifies itself to the server and informs itself of the domain name of the SMTP server. The specific command is HELO, followed by the command parameters.

(4) If the SMTP server believes that the identity is valid, it will send back a response code of 250, otherwise it will send back another code (for example, 421 means that the service is unavailable). After receiving the response, the SMTP client uses the command MAIL FROM to tell the server where the mail comes from.

(5) If the SMTP server thinks it is reasonable, it will send back the response code 250, otherwise it will send back other error codes. After the SMTP client receives the response, it uses the command RCPT TO to tell the server where the email goes.

(6) If the recipient's mailbox exists in the SMTP server, response code 250 will be sent back, otherwise other error codes will be sent back. After receiving the response, the SMTP client uses the DATA command to tell the server that it is ready to send the email content.

(7) If the SMTP server is ready to receive, it will send back response code 354, otherwise it will send back other error codes. After receiving the response, the SMTP client starts sending the email content to the server. After the SMTP client sends the email content, it also needs to send the end character.

(8) If the SMTP server receives the message successfully, it will send back the response code 250, otherwise it will send back other error codes. After receiving the response, the SMTP client uses the QUIT command to request the server to disconnect.

(9) The SMTP server sends back response code 221, indicating that it accepts the request and actively disconnects.

389c6229f9ca4d8c978d0c9c26aaa62c.png

Notice:

①For the sake of simplicity, the authentication process is omitted in the above example.

②The response code is usually followed by simple description information.

③The description information of the same response code given by different SMTP servers may be different.

5. Email information format

(1) The information format of email is not defined by SMTP, but is defined separately in RFC 822. This RFC document has been updated to RFC 5322 in 2008.

(2) An email has two parts: envelope and content, and the content consists of two parts: header and body. Both header and body information need to be filled in by the user.

(3) The header contains some keywords, followed by a colon, and then fill in the corresponding content. After the user writes the header, the email system will automatically extract the information required from the envelope and write it on the envelope, so the user does not need to fill in the information on the email envelope. (To and Subject are often required options)

①Keyword From, fill in the sender’s email address after the colon, which is usually filled in automatically by the email system.

②Keyword To, fill in the email address of one or more recipients after the colon.

③Keyword Cc, after the colon, fill in the email address of one or more carbon copy persons other than the recipient (after the carbon copy person receives the email, he or she may or may not read the email, and may or may not reply to the email).

④Keyword Subject, fill in the subject of the email after the colon, which reflects the main content of the email.

(4) After filling in the content of each keyword in the first part, the user also needs to write the main part of the email. This is the core information that the user wants to convey to the recipient.

63f2876e7c2d46ce90d616e989c33f6b.png

6. Multipurpose Internet Mail Extension MIME

(1) The SMTP protocol can only transmit ASCII text data and cannot transmit executable files or other binary objects. That is to say, SMTP cannot meet the needs of transmitting multimedia emails (such as pictures, audio or video data), and many other Text from non-English speaking countries (such as Chinese, Russian, or even French or German with accents) cannot be transmitted using SMTP.

(2) In order to solve the problem of SMTP transmitting non-ASCII text, Multipurpose Internet Mail Extensions (MIME) were proposed.

(3) The SMTP protocol can only transmit ASCII code text data. If the email sent by the sender contains non-ASCII code data, it cannot be transmitted directly using SMTP. It needs to be converted through MIME to convert the non-ASCII code data into ASCII code data can then be transmitted using SMTP.

(4) The receiver must also use MIME to reverse-convert the received ASCII code data, so that an email containing non-ASCII code data can be obtained.

3b68b47752df4285902de7b84e12a0db.png

(5) In order to achieve this conversion, MIME has made the following improvements:

①Added 5 new email header fields, which provide information about the email body.

② Defines the format of many email contents and standardizes the representation method of multimedia emails.

③The transmission encoding is defined and any content format can be converted without being changed by the mail system.

(6) In fact, MIME is not only used for SMTP, but also for the later Hypertext Transfer Protocol HTTP, which is also oriented to ASCII code characters.

7. Commonly used email reading protocols

(1) Post Office Protocol POP (Post Office Protocol):

①POP3 is its third version and is the official standard of the Internet. It is a very simple and limited-function email reading protocol.

② Users can only download emails from the mail server to the user's computer in the download and delete mode or download and keep mode. Users are not allowed to manage their own emails on the mail server, such as creating folders and classifying emails.

(2) Internet Mail Access Protocol IMAP (lnternet Message Access Protocol):

①IMAP4 is its fourth version and is currently only an Internet recommended standard. It is a mail reading protocol that is more powerful than POP3.

② Users can control the mailbox in the mail server on their own computer just like they control it locally, so IMAP is an online protocol.

(3) Both POP3 and IMAP4 use the client/server method based on TCP connection. POP3 uses the well-known port 110, and IMAP4 uses the well-known port 143.

8. Web-based email

(1) Log in through the browser (provide user name and password) to the mail server World Wide Web website to compose, send and receive, read and manage emails. This working mode is very similar to IMAP. The difference is that the user's computer does not need to install a special user agent program and only needs to use a general World Wide Web browser.

(2) Mail server websites usually provide very powerful and convenient mail management functions. Users can manage and process their own mails on the mail server website without downloading the mails to local management.

(3) As shown in the figure below, assume that users A and B both use NetEase mail server , and user A wants to send an email to user B.

①User A uses a browser to log in to the mail server website, composes and sends an email to user B.

② User B also uses a browser to log in to the mail server website and read the received mail.

③Users A and B use the HTTP protocol when sending and receiving emails with the server , and do not need to use the SMTP and POP3 protocols we introduced before.

9f09b000144644d0823918e8b4650e56.png

(4) As shown in the figure below, assume that user A uses NetEase mail server, user C uses Google mail server , and user A wants to send an email to user C.

User A uses a browser to log in to his mail server website, composes and sends an email to user C, using the HTTP protocol .

User A’s mail server uses SMTP to send the mail to user C’s mail server.

③User C also uses a browser to log in to his mail server website and read the received mail , also using the HTTP protocol.

6502157f68914153abbb1610ad73a717.png

7. World Wide Web WWW

1. Overview of the World Wide Web

(1) The World Wide Web (WWW) is not a special computer network. It is a large-scale, online information storage and a distributed application running on the Internet.

(2) The World Wide Web uses hyperlinks between web pages to link web pages on different websites into a logical information network.

(3) The World Wide Web was originally proposed by Tim Berners-Lee of the European Particle Physics Laboratory in March 1989.

2. Browser

(1) Relevant history:

①In February 1993, Mosaic, the world's first graphical interface browser, was born.

②In 1995, the famous Netscape Navigator browser was launched.

(2) The currently popular browsers are as follows: (The red words are their rendering engines)

a14eb1bd901b43f69f3fedb6a03f378e.png

(3) The most important part of the browser is the rendering engine, which is the browser kernel, which is responsible for parsing and displaying web content.

① Different browser kernels also parse web page content differently, so the display effect of the same web page in browsers with different kernels may be different.

②Web page writers need to test the web page display effect in browsers with different cores.

3. World Wide Web Applications

(1) Use a browser on the user host to access the World Wide Web server of Hunan University of Science and Technology, that is, access the official website of Hunan University of Science and Technology.

① After entering the domain name of the official website of Hunan University of Science and Technology in the address bar of the browser and pressing the Enter key, the browser will send a request message to the server.

②The server performs corresponding operations after receiving the request message, and then sends a response message back to the browser.

③The browser parses and renders the content in the response message, so that you can see the homepage of the website.

6042a102f0334059ad4ec0120682b63b.png

(2) In order to facilitate access to documents around the world, the World Wide Web uses Uniform Resource Locators (URLs) to indicate the location of any kind of "resource" on the Internet.

①The general form of a uniform resource locator consists of four parts, namely protocol, host, port, and path .

② Previously, the domain name entered into the browser address bar was the domain name of the official website of Hunan University of Science and Technology. The purpose was to obtain the content of the homepage of the website. The corresponding uniform resource locator is as shown in the figure.

855cf44c935c4e3e9224b51d8abf8260.png

(3) When we click on a hyperlink in the web page, it will jump to another web page, in which the protocol, host and port of the uniform resource locator are the same as the homepage of the website, but the path and web page file are different .

60aa242eca3d43e09568f98e390f11f8.png

4. Documentation of the World Wide Web

(1) Save the homepage of the official website of Hunan University of Science and Technology that you visited previously as a file. You can see that there is a file with the extension htm and a folder. The contents of the folder are as shown below: three files with the extension htm The files are HTML documents, the five files with the extension js are JavaScript documents, the two files with the extension css are CSS documents, and the other JPG files and PNG files are image files.

0d3486636c944ae3b74e190f67473c47.png

①HyperText Markup Language HTML (HyperText Markup Language): Uses a variety of "tags" to describe the structure and content of web pages.

②Cascading Style Sheets (CSS): describes the style of a web page from an aesthetic perspective.

③JavaScript: A scripting language (nothing to do with Java) that controls the behavior of web pages.

(2) World Wide Web documents written in HTML, CSS, and JavaScript are parsed and rendered by the browser kernel.

(3) The picture below shows the simplest HTML document written in HTML. Open the HTML document with a browser, and the browser will render a very simple web page.

① In the HTML document, two html tags are used to define the scope of the HTML document, two head tags are used inside to define the header of the HTML document, and two body tags are used to define the main body of the HTML document.

②The content between the two title tags in the header is rendered as the title of the web page, and the content between the two p tags is dyed into a text paragraph.

9ad93a60eddf4fafb7702c655666dee8.png

③HTML uses a variety of "tags" to describe the structure and content of web pages, but the presented content style is too simple, or not beautiful enough. In this case, you can define some required styles in the CSS document to beautify the displayed content of the web page.

(4) The figure below (upper part) shows a CSS document in which a style is defined: the color is dark pink and the font size is 36 pixels.

①Introduce the CSS document using the link tag at the beginning of the previously written HTML document.

②Assign the style name to the p tag in the body whose style needs to be changed.

c1d678b232a3421182657f4266bb9214.png

③ Refresh in the browser, and you can see that the browser has re-rendered the web page content. You can see that the color and font size of the "Hello world" paragraph have changed accordingly.

499bb47e39d74f6fad2be1ec0d78df44.png

(5) If you want users to perform corresponding operations on the web page, you need to write a JS document.

① Use the button tag in the main body of the previous HTML document to add a button, specify a processing function that should be called when a click event occurs for the button, and then use JavaScript scripting language to write a JS document, and write a single The specific implementation code of the click event handling function.

② Use the script tag at the beginning of the HTML document to introduce the JS document.

③In the event processing function, find the corresponding element through the id of the element, that is, display the p tag of Hello world, and then change its display content.

49524aa7e99348f5a8ef5122fd1a284b.png

④ Refresh the browser and you will see the button you just added. When the button is clicked with the mouse, "Hello world" changes to "Thank you for the like".

7da3e9ea257f48faa4fdd310dd9c2b3f.png

5. Hypertext Transfer Protocol HTTP

(1) HTTP defines how the browser (i.e., the World Wide Web client process) requests World Wide Web documents from the World Wide Web server, and how the World Wide Web server transmits the World Wide Web documents to the browser.

(2) As shown in the figure below, using the user host to access the World Wide Web server of Hunan University of Science and Technology can be regarded as an Internet-based communication between the browser process (ie, client process) in the user host and the server process in the server.

The browser process first initiates a TCP connection with the server process , using the well-known port number 80. Based on this established TCP connection, the browser process sends an HTTP request message to the server process .

②The server process performs corresponding operations after receiving the HTTP request message, and then sends an HTTP response message back to the browser process .

4539aa58d3d445ca8df3af437c5c7d0c.png

(3) HTTP/1.0 uses a non-persistent connection method. In this method, every time the browser requests a file, it must establish a TCP connection with the server, and the connection is closed immediately after receiving a response .

①The client and server conduct a TCP connection through a "three-message handshake". The data payload part of the last of these three messages carries an HTTP request message. After receiving it, the server sends an HTTP response back to the client. message.

②The time spent on a request and response is recorded as the round-trip time RTT. The time required to request a World Wide Web document is 2RTT + the transmission delay of the document.

③Every time a document is requested, there is twice the RTT overhead. If there are many reference objects (such as pictures, etc.) on a web page, it will take 2RTT to request each object.

④In order to reduce latency, browsers usually establish multiple parallel TCP connections to request multiple objects at the same time, but this will occupy a large amount of resources on the World Wide Web server. In particular, the World Wide Web server often has to serve a large number of customer requests at the same time, which will cause The burden is heavy.

a78fce14a44d417fb975039ff80d7909.png

(4) HTTP/1.1 uses a persistent connection mode (the default is a persistent connection mode, which can be changed to a non-persistent connection mode). In this mode, the World Wide Web server still maintains this connection after sending a response, so that the same client (browser) ) and the server can continue to transmit subsequent HTTP request messages and response messages on this connection .

① This is not limited to transferring objects referenced on the same page, but as long as these documents are on the same server.

②In order to further improve efficiency, HTTP/1.1's persistent connection can also work in a pipeline manner, that is, the browser can continuously send multiple request messages before receiving the HTTP response message. After such request messages arrive at the server one after another, the server sends back response messages one after another, which saves a lot of RTT time, reduces the idle time in the TCP connection, and improves the efficiency of downloading documents. (The figure below shows the non-pipeline method)

3290f7f66113438b842cdd183a01026d.png

(5) HTTP is text-oriented, each field in its message is some ASCII code string, and the length of each field is uncertain.

①HTTP request message format:

[1] The first line is the request line . It starts with the method field , followed by a space , followed by the Uniform Resource Locator field , followed by a space , followed by the version field , and finally a carriage return and line feed .

[2] Starting from the second line is the first line . Each header line starts with the header field name , followed by a colon , then a space , then the value of the field , and finally a carriage return and line feed .

[3] There can be multiple header lines, as shown in the figure below.

[4] Below the last header line is a blank line .

[5] Below the blank line is the entity body, which is usually not used.

036c3e422f06491f97fee3c520e8e9b9.png

344abf7135984286aba04171b8d2fd94.png

②HTTP response message format:

[1] The first line is the status line . It starts with the version field , followed by a space , followed by the status code field , followed by a space , followed by the phrase field , and finally a carriage return and line feed .

[2] Except for the status line, other parts are similar to the corresponding parts of the HTTP request message format and will not be described again here.

63fdc684398646e1b0111c663914907f.png

7b33f20227294653aa1249d8074ca32b.png

6、Cookie

(1) When visiting a website, the browser usually uses cookies to record user information on the server.

① Early World Wide Web applications were very simple. They only allowed users to view various static documents stored on different servers. Therefore, HTTP was designed as a stateless protocol. This simplifies server design.

② Now, users can implement various complex applications through the World Wide Web, such as online shopping, e-commerce, etc. These applications often require the World Wide Web server to be able to identify the user.

③Cookie provides a mechanism that enables the World Wide Web server to "remember" users without requiring users to actively provide user identification information. In other words, Cookie is a technology that makes stateless HTTP stateful.

(2) How cookies work:

①As shown in the figure below, the browser process in the user host first establishes a TCP connection with the server process in the World Wide Web server.

② When the user's browser process sends an HTTP request message to the server process for the first time , the server process will generate a unique Cookie identification code for it , and use this as an index to create an item in the server's back-end database for Record various information about the user's visit to the website.

③Then the server process sends an HTTP response message back to the browser process. The response message contains a header line with the header field Set-Cookie. The value of this field is the Cookie identification code .

④When the browser process receives the response message, it adds a line to a specific Cookie file to record the domain name and Cookie identification code of the server.

8c551c4805f641d9b6b48c7a1e0246c5.png

⑤When the user uses the browser to visit the website again , each time an HTTP request message is sent, the browser will take out the cookie identification code of the website from the Cookie file and put it in the Cookie header line of the HTTP request message .

⑥After the server process receives the HTTP request message, it can identify the user based on the Cookie identification code and return the user's personalized web page.

b1fd85e5a32249d792b7daee35d7c12a.png

7. World Wide Web Cache and Proxy Server

(1) Caching mechanisms can also be used in the World Wide Web to improve the efficiency of the World Wide Web.

(2) The World Wide Web cache is also called Web Cache, which can be located on the client or on an intermediate system. The Web Cache located on the intermediate system is also called a Proxy Server.

(3) The Web cache temporarily stores some recent requests and responses in the local disk. When a new request arrives, if it is found that the request is the same as the temporarily stored request, the temporary response is returned without the need to press the URL address again. Go to the Internet to access this resource.

(4) As shown in the figure below, the one on the right is a World Wide Web server on the Internet. In order to distinguish it from the name of the World Wide Web proxy server, the server is referred to as the original server; the one on the left is a certain World Wide Web proxy server on the campus network. , referred to as a proxy server.

① When a host in the campus network wants to access the original server on the Internet, it will first send a request to the proxy server in the campus network.

② If the requested object is stored in the proxy server, the proxy server will send a response containing the requested object back to the host; if there is no requested object in the proxy server, the proxy server will send a request to the original server on the Internet. , the original server sends a response containing the requested object back to the proxy server, the proxy server stores the response in the web cache, and then sends the response back to the host.

d82e5d8e651441eab2de739fa129cc56.png

③If the hit rate of the Web cache is relatively high, the communication volume of the link between routers R1 and R2 will be greatly reduced, thus reducing the delay of each host on the campus network accessing the Internet.

(5) Assume that there is a document in the original server and a copy of the document in the proxy server.

① If the document in the original server has been changed, and then a host in the campus network wants to request the document, it first sends a request to the proxy server in the campus network. After the proxy server finds the document, it encapsulates it in the response. message sent back to the host. In this way, the document requested by the host is inconsistent with the document in the original server.

f30009f5dc1f49f4b50d8b7f228de8c1.png

②In order to solve the above problems, the original server usually sets a modification time field (Last-Modified) and an effective date field (Expires) for each response object.

③When a host in the campus network wants to request the document in the original server, it first sends a request to the proxy server in the campus network.

[1] If the document in the proxy server has not expired, the proxy server will encapsulate it in a response message and send it back to the host.

[2] If the document in the proxy server has expired, the proxy server will send a request to the original server on the Internet. The request message contains a header line with the header field lf-modified-since. The value of this field is the modification date of the document. The original server can determine the modification date of the document based on the modification date of the document. Whether the document is consistent with the document stored by itself.

#1 If consistent, send a response that does not contain the entity body to the proxy server. The status code is 304 and the phrase is Not Modified. The proxy server re-updates the effective date of the document, and then encapsulates the document in a response message and sends it back to host.

dd988eea93ca4896b57eb7b0c42b3c51.png

#2 If it is inconsistent, send a response message encapsulating the document to the proxy server, so that the proxy server updates the document, and then encapsulates the updated document in a response message and sends it back to the host.

e2d37c7765f64b6dac828d0deefb70dc.png

Guess you like

Origin blog.csdn.net/Zevalin/article/details/135096652