Distributed software architecture - domain name resolution system

Design principles of transparent multi-stage shunt system

When a user uses the information system, the request first starts from the browser, finds the system entrance under the guidance of DNS, and then passes through a series of facilities such as gateway, load balancer, cache, and service cluster, and finally touches the end of the system The information stored in the database server is then returned to the user's browser step by step.

This process needs to go through many technical components. Then, as system designers, we should realize that different facilities and components have different values ​​in the system:

  • There are some components located at the edge of the client or the network, which can quickly respond to user requests and avoid pressure on the rear I/O and CPU, such as local cache, content distribution network, reverse proxy , etc.

  • The processing capacity of some components can be expanded linearly, and it is easy to scale. You can stack machines at a small cost to obtain concurrent performance that matches the number of users, and it should be used as the main carrier of business logic as much as possible. A typical cluster can Auto-scaling service nodes .

  • The stable services of some components have a global impact on the operation of the system. Fault-tolerant backups must be maintained at all times to maintain high availability, such as service registration centers and configuration centers .

  • Some facilities are inherently single-point components, and can only rely on upgrading the network, storage and computing performance of the machine itself to improve processing capabilities, such as routing, gateways or load balancers at the entrance of the system, traditional relational databases at the end of the request call chain, etc. , are typical easy-to-form single-point components.

Therefore, when planning traffic for the system, we need to fully understand the value differences of these components. Here are two simple, universal design principles:

  1. Minimize single-point components as much as possible, and if some single-points are unavoidable, minimize the flow to single-point components.

For example, if a user wants to obtain a user avatar picture stored in a database, browser cache, content distribution network, reverse proxy, web server, file server, database, etc. can all provide this picture. Therefore, it is appropriate to guide requests to the most appropriate components, avoiding the collection of most traffic to single-point parts (such as databases), and at the same time, the accuracy of processing results can still be guaranteed, or most of the time, at a single point When the system fails, remedial measures can still be implemented automatically and quickly, which is the significance of multi-level shunt in the system architecture.

  1. Occam's razor principle

Entities should not be multiplied without necessity
.
——Occam's Razor, William of Ockham

The simplest system is the best system as long as it can meet the needs.

How DNS works

Domain Name System (English: Domain Name System , abbreviation: DNS) is a service of the Internet. As a distributed database that maps domain names and IP addresses to each other, it can make it easier for people to access the Internet. DNS uses UDP port 53. Currently, the limit for the length of domain names at each level is 63 characters, and the total length of domain names cannot exceed 253 characters. Its function is to convert the domain name address (such as www.baidu.com) that is easy for human to understand to the IP address (such as 14.119.104.254) that is easy for computer to process.

Look at two nouns:

  • Authoritative DNS: A DNS server responsible for translating specific domain names. Authority means that the server determines the final result of the domain name.
  • Root Domain Name Server (Root DNS): Refers to fixed, query-free top-level domain name (Top-Level Domain) servers, which can be assumed to be built into the operating system code by default. There are a total of 13 groups of root domain name servers in the world (each group of root domain names has established a large group of mirrors through anycast), the reason for the 13 limit is that DNS mainly uses the UDP transmission protocol for data exchange, not fragmented The maximum effective value of the UDP data packet under IPV4 is 512 bytes, and it can store up to 13 sets of address records.

DNS resolution steps are as follows:

  1. The client checks the local DNS cache to see if the address record of the domain name exists and is alive (the cache is invalidated according to the TTL, Time to Live survival time).
  2. The client sends the address to the local DNS configured in the local operating system (Local DNS, manually set by the user or automatically obtained from the PPP server when assigned by DHCP or dial-up).
  3. After the local DNS receives the query request, it will search for its own address records in the order of whether there is an authoritative server for www.baidu.com --> whether there is an authoritative server for baidu.com --> whether there is an authoritative server for com, If there is no query, the local DNS will always find the root domain name server represented by the last dot.
  4. Assuming that the local DNS is brand new, there is no authoritative server record for any domain name on it, so the DNS query request follows the sequence of step 3. After the root domain name server is found, it will get the authoritative server record of com, and then pass through the authoritative server of com Get the authoritative server address record of baidu.com, and so on, and finally find the authoritative server address that can explain www.baidu.com.
  5. Through the authoritative server of www.baidu.com, query its address records. The address record here does not necessarily refer to the IP address. In the RFC specification, there are dozens of types of address records defined. For example, the IP address under IPV4 is an A record, the AAAA record under IPV6, the host alias CNAME record, etc. wait.

The design of multi-level diversion of the DNS system is to enable the DNS system to withstand the uninterrupted impact of global network traffic, but it is not without shortcomings. The typical problem is that the response speed will be affected . In extreme cases, domain name resolution may cause each domain name to be recursed multiple times before the query results can be found, which significantly affects the response speed of transmission.
Take the following figure as an example. DNS query takes about 310 milliseconds.
Time-consuming for the first DNS request
Therefore, in order to avoid this problem, there is a special front-end optimization method called DNS Prefetching (DNS Prefetching): If the website subsequently uses resources from other domains, Then a link request is generated when the webpage is loaded, prompting the browser to pre-interpret the domain name in advance, as shown below:

<link rel="dns-prefetch" href="//domain.not-icyfenx.cn">

And another possibly more serious defect is that the hierarchical query of DNS means that each level may be threatened by man-in-the-middle attacks, resulting in the risk of hijacking .
It is very difficult to attack the root domain name servers and links at the top of the recursive chain, and they all have very professional security protection measures. However, many Local DNS servers located at the bottom of the recursive chain or from local operators have relatively lax security protection, and even operators in many areas actively hijack them and return a wrong IP. By proxying user requests on this IP, In order to inject advertisements into specific types of resources (mainly HTML) for profit.
In response to this situation, a new DNS working mode has emerged in recent years: HTTPDNS (also known as DNS over HTTPS, DoH). It opens the original DNS resolution service as a query service based on the HTTPS protocol, replacing the DNS domain name resolution based on the UDP transport protocol, and obtains resolution data directly from the authoritative DNS or reliable Local DNS through a program instead of the operating system, thereby bypassing the traditional Local DNS. The advantage of this approach is that it avoids the environment of "middlemen earning the price difference", no longer fears of underlying domain name hijacking, and can effectively avoid slow domain name validation, inaccurate source IP, and smart line switching errors caused by unreliable Local DNS. question.

Guess you like

Origin blog.csdn.net/zkkzpp258/article/details/131500038