Chapter 2, Application Layer

2.1 Application layer principle

2.1.1 Web Application Architecture

Two mainstream architectures used in modern web applications: client-server architecture, peer-to-peer (P2P) architecture

  • Client-server structure: Simply put, the server provides data, and the client needs to access the server to obtain the data.
  • P2P architecture: every node in the network has the same status, it can receive data and provide data for other nodes. Has strong self-expandability

2.1.2 Process communication

Processes on two different end systems communicate with each other by exchanging messages across a computer network. The sending process generates and sends messages out onto the network; the receiving process receives these messages and may respond with an echo message.

  • Client and Server Processes

    We usually identify one of these two processes as a client (client), and the other process as a server (serve).

    In a communication session scenario between a pair of processes, the process that initiates the communication (that is, initiates a contact with another process at the beginning of the session) is identified as the client, and the process waiting to be contacted at the beginning of the session is the server.

  • Interface between process and computer network

    The process sends and receives messages to and from the network through a software interface called a socket.

  • process addressing

    In order for a process running on one host to send a packet to a process running on another host, the receiving process needs to have an address. In the Internet, a host is identified by its IP address ( IP address ).

2.1.3 Shipping services available for the application

We can broadly categorize application service requirements in four areas: reliable data transfer, throughput, timing, and security.

  • reliable data transmission

    An important service that transport layer protocols can potentially provide to applications is reliable process-to-process data transfer. When a transport protocol provides this service, the sending process can simply pass its data into the socket with complete confidence that the data will reach the receiving process without errors.

    When a transport layer protocol does not provide reliable data transmission, some data sent by the sending process may not reach the receiving process

  • throughput

    That is, the transport layer protocol can provide guaranteed available throughput at a certain rate. Using this service, the application can request a guaranteed throughput of r bits/second , and the transport protocol can ensure that the available throughput is always at least r bits/second .

  • timing

    Transport layer protocols can also provide timing guarantees. That is, each bit injected into the socket by the sender is guaranteed to arrive at the receiver's socket no later than xx ms.

  • safety

    A transport protocol can provide one or more security services to an application. Such a service would provide confidentiality between the sending and receiving process, in case the data was somehow observed between the two processes

2.1.4 Transportation services provided by the Internet

The Internet (and more generally the TCP/IP network) provides applications with two transport layer protocols, UDP and TCP.

  • TCP service : The TCP service model includes connection-oriented services and reliable data transmission services

    • Connection-oriented service: TCP lets clients and servers exchange transport-layer control information with each other before application-layer datagrams begin to flow. This so-called handshake alerts the client and server to prepare for the arrival of a large number of packets. After the handshake phase, a TCP connection (TCP connection) is established between the sockets of the two processes. This connection is full-duplex, that is, the processes on both sides of the connection can send and receive messages on this connection at the same time. When the application finishes sending the message, the connection must be torn down.

    • Reliable data delivery service: Communication processes can rely on TCP to deliver all sent data error-free and in proper order. When one side of the application streams bytes into a socket, it can rely on TCP to deliver the same stream of bytes to the receiver's socket without loss and redundancy of bytes.

  • UDP Services : UDP is a lightweight transport protocol that does not provide unnecessary services, it only provides minimal services. UDP is connectionless, so there is no handshaking process before the two processes communicate. The UDP protocol provides an unreliable data transfer service, that is, when a process sends a message into a UDP socket, the UDP protocol does not guarantee that the message will reach the receiving process. Not only that, but packets arriving at the receiving process may also arrive out of order.

  • Services not provided by ITPs : No guarantees on throughput or timing are provided by current ITPs, and the Internet was designed to deal with the lack of such guarantees as much as possible.

image-20230820170901085

2.1.5 Application layer protocol

The application-layer protocol (application-layer protocol) defines how application processes running on different end systems transmit messages to each other. In particular, the application layer protocol defines

  • The types of packets exchanged, such as request packets and response packets.
  • The syntax of various message types, such as each field in the message and how these fields are described.
  • The semantics of the fields, that is, the meaning of the information in those fields.
  • Rules that determine when and how a process sends and responds to messages.

2.2.2 Web and HTTP

2.2.1 Overview of HTTP

The application layer protocol of the Web is the Hypertext Transfer Protocol (HyperText Transfer Protocol, HTTP), which is the core of the Web and is defined in [RFC 1945] and [RFC 2616]. HTTP is implemented by two programs: a client program and a server program. The client program and the server program run in different end systems and conduct conversations by exchanging HTTP messages. HTTP defines the structure of these messages and the way clients and servers exchange messages.

HTTP defines the way a Web client requests a Web page from a Web server, and the way the server transmits a Web page to a client.

HTTP uses TCP as its backing transport protocol (instead of running over UDP). The HTTP client first initiates a TCP connection with the server.

Because the HTTP server does not keep any information about the client, we say that HTTP is a stateless protocol (stateless protocol).

2.2.2 Discontinuous connection and persistent connection

  • With a non-persistent connection, each request on the page will go through the TCP handshake process. If there are 10 pictures on the page, plus the page, there will be a total of 11 requests and 11 TCP handshakes. This has some disadvantages. First, a new connection must be established and maintained for each requested object. Second, each object suffers a delivery delay of twice the RTT, that is, one RTT for creating the TCP and another RTT for requesting and receiving an object.

  • Continuous connection, in the case of HTTP 1.1 persistent connection, the server keeps the TCP connection open after sending the response. Subsequent request and response messages can be transmitted over the same connection between the same client and server, and these requests for objects can be issued one after the other without waiting for a reply to a pending request (pipeline).

2.2.3 HTTP message format

1. HTTP request message

A typical HTTP request message is provided below:

 GET /somedir/page.html HTTP/1.1 
 Host: www someschool.edu 
 Connection: close 
 User-agent: Mozilla/5.0 
 Accept-language: fr

Each line of the message is terminated by a carriage return and line feed. Append a carriage return and line feed after the last line.

The first line of an HTTP request message is called the request line, and the subsequent lines are called the header line. The request line has 3 fields: method field, URL field and HTTP version field. The method field can take several different values, including GET, POST, HEAD, PUT, and DELETE. Most HTTP request messages use the GET method.

2. HTTP response message

The response to the request message.

HTTP/1.1 200 OK 
Connection: close 
Date: Tue, 18 Aug 2015 15:44:04 GMT 
Server: Apache/2.2.3 (CentOS) 
Last-Modified: Tuer 18 Aug 2015 15:11:03 GMT 
Content-Length: 6821 
Content-Type: text/html 
(data data data data data •••)

The above message has three parts: an initial status line (status line), 6 header lines (headerline), and then the entity (entity body). The entity part is the main part of the message, i.e. it contains the requested object itself (expressed as data data data data data ...)

The status line has 3 fields: protocol version field, status code and corresponding status information.

2.2.4 Interaction between user and server: cookie

Cookies can be used to identify users. The cookie technology has four components: ① a cookie header line in the HTTP response message; ② a cookie header line in the HTTP request message; server for management; ④ located in a back-end database of the Web site.

image-20230820175643209

2.2.5 Web caching

Web cache (Web cache) is also called proxy server (proxy server), which is a network entity that can satisfy HTTP requests on behalf of the original Web server. The Web cache has its own disk storage space, and keeps a copy of the most recently requested objects in the storage space.

2.2.6 Conditional GET

Although caching can reduce the response time perceived by users, it also introduces a new problem, that is, the copy of the object stored in the cache may be stale. In other words, objects stored on the server may have been modified since the copy was cached on the client.

The HTTP protocol has a mechanism that allows a cache to verify that its objects are up to date. This mechanism is the conditional GET (conditional GET) method. If: ① the request message uses the GET method; and ② the request message contains an "If-Modified-Since:" header line. Then, this HTTP request message is a conditional GET request message.

The If-Modified-Since field will be judged on the server side, and a response will be given by comparing the time. If the file has been modified, the corresponding file will be returned. If it has not been modified, there will be 304 Not in the response line Modified, there will be nothing in the response body

2.3 3 E-mail in the Internet

image-20230820225304789

The figure above shows the overall situation of the Internet e-mail system. It can be seen that it has three main components: user agent (user agent), mail server (mail server) and Simple Mail Transfer Protocol (Simple Mail Transfer Protocol, SMTP) .

2.3.1 SMTP

SMTP is at the heart of Internet e-mail. SMTP is used to send messages from the sender's mail server to the receiver's mail server.

2.3.2 Comparison with HTTP

  1. Both protocols SMTP and HTTP are used to transfer files from one host to another

    • HTTP transfers files (also called objects) from a Web server to a Web client (usually a browser). SMTP transfers files (ie, email messages) from one mail server to another.

    • Both HTTP and SMTP use persistent connections when transferring files

  2. HTTP is primarily a pull protocol, meaning that someone loads information on a web server at a convenient time, and users pull that information from that server using HTTP. SMTP is basically a push protocol (push protocol), that is, the sending mail server pushes the file to the receiving mail server.

  3. SMTP requires each message (including their body) to be in 7-bit ASCII format. HTTP data is not subject to this restriction.

  4. HTTP encapsulates each object into its own HTTP response message, while SMTP puts all message objects in one message.

2.3.3 Mail message format

From: [email protected]
To: [email protected]
Subject: Searching for the meaning of life.

Each header must contain a From: header line and a To: header line. A header may contain a Subject: header line and other optional header lines.

After the message header, a blank line follows, followed by the message body in ASCII format. You should use Telnet to send a message to the mail server that contains some header lines, including the Subject: header line.

Below is an example of exchanging message text between an SMTP client (C) and an SMTP server (S).

S: 220 hamburger.edu
C: HELO crepes.fr
S: 250 Hello crepes.frf pleased to meet you
C: MAIL FROM: <[email protected]>
S: 250 [email protected] ... Sender ok
C: RCPT TO: <[email protected]>
S: 250 [email protected] ... Recipient ok
C: DATA
S: 354 Enter mail, end with on a line by itself
C: Do you like ketchup?
C: How about pickles? 
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 hamburger.edu closing connection

2.3.4 Mail Access Protocol

image-20230820232043795

The above picture describes the process of sending emails. Alice sends emails to her own proxy server first, and then the proxy server sends them to Bob’s proxy server. Finally, Bob passes certain protocols (SMTP cannot be used, because SMTP is a push protocol, and the Fetching is a pull process) Get mail from your own proxy server

POP3 (Post Office Protocol version 3) is an application-layer protocol for retrieving e-mail from a server. It allows users to download their own emails from the mail server through the email client for offline reading and management.

IMAP (Internet Message Access Protocol) is an application-layer protocol for email that allows users to synchronize, manage, and access email across multiple devices through an email client.

Today, more and more users use their web browsers to send and receive e-mail. Using this service, except for the access used between mail servers is SMTP, other protocols used are HTTP

2.4 DNS: Directory Service for the Internet

Simply put, DNS can obtain the mapping of domain name -> IP

2.4.1 Access provided by DNS

DNS is: ① a distributed database implemented by a hierarchical DNS server (DNS server); ② an application layer protocol that enables hosts to query distributed databases. The DNS server is usually a UNIX machine running BIND (Berkeley Internet Name Domain) software [BIND 2012]. The DNS protocol runs on top of UDP, using port 53

When a browser makes a request for www.someschool.edu/index.html, the following happens

  1. The client of the DNS application is running on the same user host.

  2. The browser extracts the host name www.someschool.edu from the above URL, and passes this host name to the client of the DNS application.

  3. A DNS client sends a request containing a hostname to a DNS server.

  4. The DNS client will eventually receive a reply message containing the IP address corresponding to the hostname.

  5. Once the browser receives the IP address from DNS, it can initiate a TCP connection to the HTTP server process on port 80 at the IP address.

In addition to converting hostnames to IP addresses, DNS provides some important services:

  • Host aliasing: A host with a complex hostname can have one or more aliases. Applications can call DNS to obtain the canonical host name corresponding to the host alias and the host's IP address.
  • Mail server aliasing (mail server aliasing): An email application can call DNS to resolve a provided hostname alias to obtain the host's canonical hostname and its IP address.
  • Load distribution (load distribution): DNS is also used for load distribution among redundant servers (such as redundant web servers, etc.).

2.4.2 DNS working mechanism

Domain name servers are divided into three categories, as follows

  • Root DNS servers : There are more than 400 root name servers all over the world. These root name servers are managed by 13 different organizations. Root name servers provide IP addresses of TLD (top-level domain) servers
  • Top-level domain (DNS) servers : There are TLD servers (or clusters of servers) for each top-level domain (such as com, org, net, edu, and gov) and for all national top-level domains (such as uk, fr, ca, and jp) . The TLD server provides the IP address of the authoritative DNS server.
  • Authoritative DNS server : Every organization with publicly accessible hosts on the Internet (such as web servers and mail servers) must provide publicly accessible DNS records that map the names of those hosts to IP addresses.

image-20230820234308709

The working mechanism of DNS can be briefly summarized as the following steps:

  1. Domain name query initiation: When a user enters a domain name (such as www.example.com) in a web browser, the browser will send a domain name query request to the DNS resolver of the local computer, asking for the IP address corresponding to the domain name.
  2. Local resolver query: The local resolver is a DNS cache server in the user device or network, and it will first check whether there is an IP address corresponding to the requested domain name in its own cache. If it does, it returns that IP address directly, speeding up the resolution process.
  3. Recursive query: If the local resolver does not have the IP address corresponding to the requested domain name in its cache, it will initiate a recursive query. The local resolver sends a query request to the root domain name server, asking for the IP address of the top-level domain name server of the domain name (such as the top-level domain name server of the .com domain name).
  4. Top-level domain name server lookup: The root domain name server directs local resolvers to the correct top-level domain name server, such as the top-level domain name server for a .com domain name. The top-level nameserver knows the IP address of the authoritative nameserver responsible for that domain name.
  5. Authoritative nameserver lookup: The top-level nameserver directs local resolvers to the authoritative nameserver responsible for the requested domain name. The authoritative domain name server keeps the record of the IP address corresponding to the domain name.
  6. IP address return: The authoritative domain name server returns the IP address of the requested domain name to the local resolver.
  7. Local resolver returns: The local resolver returns the received IP address to the user's computer or device.
  8. Accessing the website: The user's computer or device now knows the IP address of the requested domain name, and it can use this IP address to establish a connection with the server and access the corresponding website or resource.

image-20230820234350671

In order to improve the delay performance and reduce the number of DNS packets transmitted on the Internet, the DNS widely uses caching technology.

In a chain of requests, when a DNS server receives a DNS reply (for example, containing a mapping of a hostname to an IP address), it can cache the mapping in local storage

2.4.3 DNS records and packets

The general format of a DNS record is as follows:

Name     |  TTL  |  Class  |  Type  |  Data

Below is a description of each field:

  1. Name: The domain name is the hostname or domain name to map, such as www.example.com. In DNS records, domain names are usually expressed in a dot-separated manner, for example, www is the host name, example is the second-level domain name, and com is the top-level domain name.
  2. TTL (Time to Live): TTL indicates the survival time of DNS records in the cache, in seconds. During this time, other devices or DNS servers can use the cached records without having to query the DNS server again. Once the TTL expires, the cache will become invalid and the DNS server needs to be queried again.
  3. Class (class): The class of DNS records is usually IN, which means Internet class. Other possible values ​​are less commonly used.
  4. Type: The Type field indicates the type of the record, such as A, AAAA, MX, CNAME, etc. Different types represent different information. For example, A record represents the mapping of domain name to IPv4 address, and MX record represents the mail exchange server.
  5. Data: The data field contains data related to a particular record type. For example, the data field of the A record will be an IPv4 address, and the data field of the MX record will be the hostname of the mail server.

The DNS message format is as follows

image-20230820235849799

2.5 P2P file distribution

P2P (Peer-to-Peer, point-to-point) file distribution is a file sharing and transmission method based on a distributed architecture. Unlike the traditional client-server model, P2P file distribution allows files to be shared directly between users, resulting in faster download speeds and more efficient bandwidth utilization.

The advantages of P2P file distribution include:

  • Bandwidth Efficiency: Since files are downloaded from multiple sources, P2P can make better use of network bandwidth and speed up the file download process.
  • Decentralization: No need for centralized servers, P2P network shares files between nodes, reducing the risk of centralized servers.
  • Scalability: The more users join the P2P network, the download speed will usually be faster because there are more resources to share.
  • Resilience: P2P networks are more resilient because when a node goes offline, other nodes can still continue to share files.

2.6 Video Streaming and Content Distribution Network

Pre-recorded streaming video is now the bulk of traffic at ISPs.

2.6.1 Internet video

Video can be transmitted in the form of video stream

2.6.2 HTTP streaming and DASH

Both HTTP Streaming (HTTP Streaming) and DASH (Dynamic Adaptive Streaming over HTTP) are technologies related to video and audio streaming, which are used to deliver media content on the Internet, especially to provide better user experience under different network conditions.

  • HTTP Streaming (HTTP Streaming):
    HTTP Streaming is a technology that streams media content, such as video and audio, to end-user devices. It uses the HTTP protocol to split media files into small chunks (fragments), which are then transferred chunk by chunk. An advantage of this approach is that the user can start playback from the transferred media stream without having to wait for the entire file to download. This helps start playback faster and reduces buffering times.

    HTTP streams typically use files with the extension .ts (Transport Stream) or .m3u8 (HLS format), where the .ts file contains the actual content of the media segment and the .m3u8 file is a playlist that instructs the client how to request and play the clips.

  • DASH (Dynamic Adaptive Streaming over HTTP):
    DASH is an adaptive streaming media technology that allows media content to dynamically adjust the quality and resolution according to the user's network conditions and device capabilities. DASH uses the HTTP protocol to divide media files into segments of different qualities, and then selects appropriate segments for transmission and playback according to the client's network bandwidth and performance.

    The core idea of ​​DASH is to switch between different quality versions of media content to provide the best viewing experience under different network conditions. If network bandwidth decreases, the client can automatically switch to a lower quality segment, avoiding stuttering and buffering. Conversely, if the network conditions are good, DASH can switch to a higher quality clip, providing a clearer picture and higher resolution.

In general, both HTTP streaming and DASH are technologies used to transmit media content through the HTTP protocol, helping to optimize the user's viewing experience under different network conditions. They are widely used in video-on-demand, live broadcast and streaming services.

2.6.3 Content distribution network

Content Delivery Network (CDN) is a distributed infrastructure for providing web content, designed to improve users' access speed, availability and performance of websites, applications, media and other content. A CDN reduces network latency and congestion by caching content on servers located around the world, allowing users to get content from the server closest to them.

The main principle of CDN is to place a series of cache servers in the network, and these servers are distributed in different geographical locations and network nodes. When a user requests a certain content, CDN will automatically redirect the user's request to the nearest cache server, thus providing faster content delivery. Here are the key features and working principles of CDNs:

  1. Caching and nearby access: CDN servers are distributed around the world, caching the static content of the website (such as images, CSS, JavaScript, etc.). When users request content, they are connected to the closest CDN server, reducing delivery delays.

  2. Load balancing: CDN realizes load balancing by distributing user requests to multiple servers. This avoids overloading a single server and improves overall performance.

  3. Dynamic content acceleration: In addition to caching static content, some CDNs can also accelerate dynamic content, such as pages generated based on user requests.

  4. Fault tolerance: If a CDN node fails, requests can be automatically redirected to other available nodes, thereby improving availability.

  5. Content optimization: Some CDNs can optimize content, such as compression, image optimization, etc., to improve transmission speed and user experience.

  6. Security: CDN can provide some security features, such as DDoS attack mitigation, SSL certificate management, etc.

CDN has a wide range of applications and is suitable for various types of websites, applications and media content, including static websites, e-commerce websites, streaming services, online games, etc. By distributing content to servers around the world, CDN can significantly reduce user latency when accessing content, providing a faster, more reliable and high-performance user experience.

2.7 Socket programming: building web applications

Socket programming is a method of creating network applications in computer networks. By using sockets (sockets), developers can establish communication connections between different computers to achieve data transmission and exchange. Socket programming is usually used to develop various network applications, including online games, chat applications, Web servers and clients, etc.

2.7.1 UDP socket programming

The UDP socket programming process is as follows

image-20230821001139221

2.7.2 TCP socket programming

The TCP socket programming process is as follows

image-20230821001255112

2.8 section

With servers all over the world, CDN can significantly reduce the delay when users access content, and provide a faster, more reliable and high-performance user experience.

Guess you like

Origin blog.csdn.net/m0_51545690/article/details/132399011