DNS Cloud School | About DNS cache management, all you want to know is here (collection is recommended)

DNS cache management is an important part of the operation and maintenance of the domain name resolution system. This issue of Cloud Academy analyzes and explains the key measures of DNS cache management from the authoritative side, recursive side, client side, etc., to help you avoid the minefield of operation and maintenance. enjoy:

 

1. Where DNS TTL exists

1.1 Internet analysis process analysis

The previous article has done a detailed analysis of what is DNS, the role of DNS, etc., so I won't do more explanation here. This article starts with a typical DNS resolution step through common Internet business access.

Clients wishing to visit the www.zdns.cn website must go through recursive requests, iterative requests, and point-to-point IP access between the client and the server.

In the figure below, the first quadrant is the authoritative Internet server, the second quadrant is the operator’s "recursive domain name server" (hereinafter collectively referred to as Local DNS or LDNS), and the third quadrant is the client, that is, the computers and mobile phones we often use. The fourth quadrant is the application server or server cluster.

 

image

Assuming that all links in this visit are not cached

1. When the user wants to visit the www.zdns.cn webpage, enter www.zdns.cn in the browser, the browser first checks whether there is a cache of the website (the mapping relationship between domain name and IP), and if there is, use it directly. If no DNS request is sent to the client resolver, the resolver will initiate a resolution request to the operator LDNS;

2. LDNS will initiate resolution requests to the root, cn., zdns.cn., and finally obtain the ip address corresponding to the www.zdns.cn domain name: 202.173.11.10. At this time, LDNS will cache the information (and cache it together) There are also A records and TTL information, assuming that the TTL is 300s at this time), and at the same time, reply "www.zdns.cn. 300 IN A 202.173.11.10" to the user client;

3. At this time, the browser gets the address corresponding to www.zdns.cn, so the client uses the local public network IP as the source address to initiate an http service request to 202.173.11.10;

4. At this time, LDNS is within 300 seconds, if another client initiates a www.zdns.cn request to the same LDNS, LDNS will no longer initiate a request to the authoritative server, but the response information will change www.zdns. cn. 210 IN A 202.173.11.10 Its TTL is less than 300 seconds.

 

 

1.2 Where the cache exists

The cache mainly exists in the second quadrant LDNS and the third quadrant client.

When the authoritative server in the first quadrant turns on the recursive function, there will also be a cache of responses on that server.

When the local server in the four quadrants acts as a client to resolve other domain names, it will also have a domain name cache.

In actual applications, there are system cache, application cache, and browser cache.

It can be seen that the DNS cache may be ubiquitous in the entire access process. The existence of the cache will reduce the pressure on the DNS server to resolve, and will also speed up the resolution. But the existence of cache will also bring some confusion to the timeliness of business switching.

E.g:

1. When an active operation and maintenance switch occurs in a business, there will be a certain delay when the business is switched to the standby server;

2. When business failover, due to the existence of cache, it will bring negative experience to users;

3. When the authoritative domain name is hijacked, the cache will bring irreparable losses.

 

So how to manage DNS cache reasonably in a complex production application environment?

 

Two, cache management

2.1 Cautions for authoritative server cache management

 

 

Under normal circumstances, the authoritative DNS server should not turn on the recursion function, because after the authoritative DNS server turns on the recursion function, when it receives a non-local authoritative domain name resolution, the authoritative server will initiate a DNS resolution request instead of LDNS. At this time, it will reduce the performance of the authoritative server to process authoritative domain names, and at the same time may suffer from Internet DNS attacks.

If the recursive function of the authoritative server must be turned on during actual use, only the address of the communication network segment needs to be turned on.

 

2.2 Recursive server cache management considerations

The recursive server is mainly the LDNS of operators all over the world, and its role is to face a large number of client groups, cache the information of authoritative domain names, and improve resolution efficiency.

Under normal circumstances, LDNS needs to inherit the domain name TTL set by the authoritative server, so that domain name managers can easily grasp the cache aging speed. However, in actual use, some operators in remote areas deny the TTL set by the authoritative DNS server in order to reduce the server's eyes, and force the locally cached domain name TTL to be set to 1 day or longer, so that when the domain name address changes Later, the actual user experience will become very bad.

At the same time, the types of attacks against recursive servers are complicated. The most terrible type of attack against LDNS is domain name interception and tampering. When the domain name resolution result is tampered with, the client will initiate business interaction with the illegal server, and customer information will be completely leaked.

The most effective way to resolve this process is to use DNSSEC and HTTPDNS technology. However, neither of these two technologies are currently used in China.

 

DNSSEC

DNSSEC technology defends against such attacks by digitally "signing" data, so that you can be confident that the data is valid. DNSSEC technology has gradually spread from the root server down the Internet.

Compared with traditional DNS resolution, DNSSEC has more time overhead and transmission overhead for the server. It is believed that as technology improves, these problems will gradually weaken.

 

HTTPDNS

HTTPDNS is based on the HTTP protocol from the visiting client directly to the HTTPDNS server to send a domain name resolution request, thereby avoiding the possibility of illegal tampering of the LDNS cache. HTTPDNS is currently applicable to apps developed by ourselves.

ZDNS's products support DNSSEC and HTTPDNS technologies, and improve service security in recursive scenarios through methods such as 0x20 security reinforcement, random ID, and random port.

 

 

2.3 Client Cache Management

Whether it is a PC terminal or a server, when DNS domain name resolution is needed, it will act as a client to initiate a resolution request to the server. In the process of business analysis, different system types of clients and different application environments will have different behaviors in processing cache. Here are some common analysis:

Windows client

The Windows system turns on the cache by default and inherits the DNS cache settings. When the business is parsed, the system will cache the analysis.

It should be noted that when the TTL of the cname record is set high and the TTL of the A record is low, the default behavior of the Window client is to cache the shorter value of the TTL of the cname and A record.

That is, for example, when the authoritative configuration content is:

www.baidu.com.      1200      IN    CNAME  www.a.shifen.com.

www.a.shifen.com.  300 IN    A     180.101.49.12

www.a.shifen.com.  300 IN    A     180.101.49.11

 

After parsing through the Windows system, the result of checking the system cache is:

www.baidu.com.      300 IN    CNAME  www.a.shifen.com.

www.a.shifen.com.  300 IN    A     180.101.49.12

www.a.shifen.com.  300 IN    A     180.101.49.11

The TTL of CNAME is changed to 300

 

Suse operating system

The Suse12 operating system does not enable the cache by default. When the cache is enabled, the cache value of the DNS record is inherited.

 

AIX operating system

The AIX7.1 operating system does not enable the cache by default. When the cache is enabled, the cache value of the DNS record is not inherited, and the cache timeout time configured locally in the operating system is inherited.

 

JDK environment

JDK turns on caching by default, does not inherit the cache value of DNS records, and follows the timeout period configured by itself.

 

In actual use, different operating systems and application environment combinations have a lot of influence on whether to enable the cache setting. Therefore, in order to ensure the fast and accurate switching of services, a unified specification needs to be formulated, requiring the system to follow DNS to set TTL. At the same time, formulate how to set the TTL duration in different combination scenarios to meet the DNS setting.

 

2.4 Negative cache management

In actual use of authoritative DNS, many domain names that do not exist in this zone will be received. At this time, the device manages such spam requests by using negative cache time.

It is set when the cache is negative, and the last time of SOA record in the BIND setting is the negative cache time (Negative TTL).

picture 2-1

image

Negative cache time effective mechanism: The negative cache time will be compared with the TTL recorded by SOA, and a smaller value is selected as the negative cache time.

For example, as shown in Figure 2-1, the TTL setting time of SOA is 3600, and the negative cache time is set to 1800. At this time, if the domain name does not exist, the TTL is 1800, as shown in Figure 2-2.

Figure 2-2

image

If the TTL of SOA is modified to 1500

Figure 2-3

image

At this time, if the domain name does not exist, the TTL is 1500

Figure 2-4

image

 

Three, server business switching management

3.1 Actively maintain business switching

The main scenarios of active maintenance service switching are handover drills, service migration, and disaster recovery handover scenarios. Active service switching can generally modify the domain name TTL to a smaller value in advance. When the TTL is set synchronously on the entire network, the domain name will be resolved from the old IP address to the new IP address.

In the Internet scenario, the old cluster needs to run in parallel with the new cluster for 1-2 days to ensure the normal business experience of remote operators during the transition period of downloading and caching.

 

3.2 Passive business switching

The main scenario of passive service switching is a service cluster failure, and it is necessary to switch to the standby cluster for emergency recovery. No matter how robust the business cluster is, prepare for abnormal business switching. At this time, the internal network and Internet business cache management must be treated separately.

 

3.2.1 Intranet production environment

First, you need to actively discover whether the business cluster is available, and set the DNS server to actively detect whether the server is available;

The second is the domain name TTL setting. If the TTL setting time is too long, the business will be inaccessible for a long time after the failure. If the TTL setting is short, the business recovery will be restored, but the pressure on the DNS system will increase. Therefore, to set the domain name TTL according to the response level of different services, it needs to be guaranteed at the second level. This ensures that the business will resume as soon as possible during the business switching period.

If there is an LDNS server in the intranet, you need to clear the cache of the specified domain name on the LDNS after the service switch.

 

3.2.2 Intranet office environment

Intranet office services can be set at the hourly level due to the controllable scope of influence and minor configuration changes. If there is an LDNS server in the intranet, you need to clear the cache of the specified domain name on the LDNS after the service switch.

 

3.2.3 Internet environment

The Internet environment also needs to be configured to monitor whether the service is available to ensure that it can automatically switch after a service failure.

At the same time, in the Internet environment, due to the existence of the operator's LDNS cache server, which is uncontrollable, it is not suitable to configure a longer cache time. It is recommended to configure a smaller cache value, such as minutes, with reference to DNS performance.

As for the operator’s cache, with the continuous development of the operator’s DNS, it has been possible to clear the cache of designated domain names in the LDNS under the jurisdiction.

 

3.3 Cluster robustness construction

In order to ensure the stable operation of the business, passive business switching needs to be reduced, so the business cluster needs to be sufficiently robust.

Guess you like

Origin blog.csdn.net/weixin_38354951/article/details/111595020