Fun Talking about Network Protocols-Lecture 19 | HTTPDNS: The Address Book of the Online World Will Also Wrong

This series of related blog, reference geeks time - Something about network protocol

In the previous section, we learned about the two functions of DNS. The first is to find a specific address based on the name. The other is to load balance multiple addresses, and you can choose one of the multiple addresses close to you to access .

However, sometimes this address book often points you the wrong way. There is obviously a place to eat 500 meters away from you. You have to recommend you 5 kilometers away. Why is there such a situation?

do you remember? When we issue a request to resolve DNS, first, we will first connect to the local DNS server of the operator. This server will help us to resolve the entire DNS tree, and then return the result of the resolution to the client. But the local DNS server, as a local guide, often has its own "careful thinking."

What are the problems with traditional DNS?

1. Domain name cache issues

It can do a cache locally, that is to say, not every request will go to the authoritative DNS server, but the result will be cached locally after one visit, and when others come to ask, it will directly return this Cache data.

This is equivalent to a tour guide going to a restaurant and remembering the address in his own mind. When a tourist asks, he answers based on his memory without having to check the address book. A problem that often exists in this way is that the restaurant of other people has obviously moved. As a guide, he did not refresh the cache. As a result, you arrived at this location and found that the restaurant has become a clothing store. Very disappointed?

In addition, some operators will cache some static pages in the operator's server, so that when the user requests, there is no need to access across operators, which not only speeds up, but also reduces the traffic calculation between operators cost. During domain name resolution, users are not directed to the real website, but to this cached server.

In many cases, the problem is not seen, but when the page is updated, the user will visit the old page, and the problem comes out. For example, you heard that a restaurant has launched a new dish, and you want to try it. The tour guide told you that eating here is the same. Some tourists will find it okay, but for those who want to try new dishes, if the tour guide says to take you, but in fact did not eat new dishes, would you also be very disappointed?

Then there is the local cache, which often makes the global load balancing fail, because the last time the cache was cached, the address in the cache is not necessarily the closest place to the customer this time. If you return this address to the customer, it will definitely bypass. road.

Just like the last time a customer wanted to eat West Lake Vinegar, the tour guide knew that there was a restaurant by the West Lake, because the tourists were at the West Lake at the time, but the next time the customer was in Lingyin Temple and wanted to eat West Lake Vinegar, the tour guide also pointed to West Lake The one by the side, then this is too far.
Insert picture description here

2. Domain forwarding issues

The caching problem still says that the local domain name resolution service will still look in the authoritative DNS server, but not every time. It can be said that this is still a big tour guide and big intermediary. There are also some small tour guides and small intermediaries. After having the request, they are directly forwarded to other operators for analysis, and they just outsource it.

The problem is that if it is a customer of operator A, access the DNS server of its own operator. If operator A queries the authoritative DNS server, the authoritative DNS server knows that you belong to operator A and returns it to a deployment in A. The website address of the operator, so that access to the same operator will be much faster.

However, operator A is lazy and forwards the parsed request to operator B. If operator B queries the authoritative DNS server, the authoritative server will mistakenly believe that you are operator B, and it will return you to operator B. Website address. As a result, each visit of the customer must cross the operator, and the speed will be very slow.
Insert picture description here

3. Export NAT problem

When we talked about the gateway before, we know that when exporting, many computer rooms will be configured with NAT, that is, network address translation, so that the packets going out of this gateway will be replaced with new IP addresses. , And then convert the IP address back, so there is no problem for access.

However, once the network address is converted, the authoritative DNS server cannot use this address to determine which operator the customer is from, and it is very likely that the converted address will misjudge the operator and cause cross-operator Access.

4. Domain name update issues

Local DNS servers are independently deployed by different regions and different operators. The implementation strategy of domain name resolution cache is also different. Some will be lazy and ignore the TTL time limit of the domain name resolution result. When the authoritative DNS server resolves the change, the period for the resolution result to take effect on the entire network is very long. But sometimes, in the DNS switching, the scene has a relatively high requirement for the effective time.

For example, when deploying in a dual-machine room, DNS is used for load balancing and disaster recovery across the machine rooms. When a problem occurs in a computer room, the authoritative DNS needs to be modified to point the domain name to the new IP address, but if the update is too slow, many users will have access exceptions.

It's like that some tour guides are more diligent and dedicated, always paying attention to the changes in hotels, restaurants, and transportation. When they ask him, they often get the latest information. Some tour guides are lazy, and the words of the tour guides back eight years ago have not been changed. When asking him, the directions are often wrong.

5. Analyze the delay problem

Judging from the DNS query process in the previous section, the DNS query process needs to traverse multiple DNS servers recursively to obtain the final resolution result, which will bring a certain delay and even the resolution timeout.

Working mode of HTTPDNS

Since there are so many problems in DNS resolution, what should we do? Is it possible to return to using the IP address directly? This is obviously inappropriate, so there is HTTPDNS.

In fact, HTTPNDS does not take the traditional DNS resolution, but builds a DNS server cluster based on the HTTP protocol, distributed in multiple locations and multiple operators. When the client needs DNS resolution, it directly requests the server cluster through the HTTP protocol to obtain the nearest address .

This is equivalent to each company implementing its own domain name resolution based on the HTTP protocol and making its own address book instead of using a unified address book. However, the default domain name resolution is DNS, so the use of HTTPDNS needs to bypass the default DNS path, and the default client cannot be used. Those using HTTPDNS are often mobile applications, and need to embed a client SDK that supports HTTPDNS on the mobile terminal.

Through its own HTTPDNS server and its own SDK, from relying on local tour guides, to its own online query to do travel strategies, free travel, how to play and how to play. In this way, you can avoid relying on the tour guide, who is not professional, and you can't embarrass him.

Let me parse the working mode of HTTPDNS

Dynamically request the server in the client's SDK, obtain the IP list of the HTTPDNS server, and cache it locally. As the domain name is continuously resolved, the SDK will also cache the DNS domain name resolution results locally.

When the mobile application wants to access an address, first check to see if there is a local cache, and if so, return directly. The difference between this cache and the local DNS cache is that this is done by the mobile application itself, not by the entire operator. How to update and when to update, the client of the mobile application can coordinate with the server to do this.

If it is not available locally, you need to request an HTTPDNS server. In the IP list of the local HTTPDNS server, select an HTTP request and it will return an IP list of the websites you want to visit.

The way of request is like this.

curl http://106.2.xxx.xxx/d?dn=c.m.163.com
{"dns":[{"host":"c.m.163.com","ips":["223.252.199.12"],"ttl":300,"http2":0}],"client":{"ip":"106.2.81.50","li

The mobile client naturally knows which carrier and which address the mobile phone is in. Because it is direct HTTP communication, the HTTPDNS server can accurately know this information, so it can do accurate global load balancing. 
Insert picture description here
Of course, when all of these are not working, you can switch to the traditional LocalDNS to resolve, which is better than not accessing. How does HTTPDNS solve the above problem?

In fact, it comes down to two major problems. One is the balance between parsing speed and update speed, and the other is intelligent scheduling. The corresponding solution is the cache design and scheduling design of HTTPDNS.

HTTPDNS cache design

The DNS resolution process is complicated and the number of communications is large, which greatly affects the resolution speed. In order to speed up the analysis, there is a cache, but this will cause the problem that the cache update speed is not timely. The most terrible thing is that these two aspects are in the hands of others, that is, the hands of the local DNS server. It will not be customized for you. You ca n’t do it as a client.

HTTPDNS is to control the resolution speed and update speed in your own hands. On the one hand, the process of parsing does not require a large circle of local DNS service recursive calls. An HTTP request is directly handled. When it needs to be updated in real time, it will work immediately; The cache is maintained in the client SDK, and the expiration time and update time can be controlled by yourself.

The cache design strategy of HTTPDNS is also a common cache design pattern in our application architecture, which is divided into three layers: client, cache, and data source.

  • For the application architecture, it is the application, cache, and database. Common ones are Tomcat, Redis, MySQL.
  • For HTTPDNS, it is the mobile client, DNS cache, and HTTPDNS server.
    Insert picture description here
    As long as it is a cache mode, there are problems of cache expiration, update, and inconsistency, and the solution is very similar.

For example, the DNS is cached in memory, and can also be persisted to the storage, so that after the APP restarts, it can load the analysis results of the last accumulated frequently visited website from the storage as soon as possible. Into a cache. This is a bit like Redis is a memory-based cache, but it also provides the ability to persist, so that the data will not be completely lost when restarting or when the master and slave are switched.

The cache in the SDK will strictly follow the cache expiration time. If the cache does not hit or has expired, and the client does not allow the use of expired records, it will initiate a resolution to ensure that the records are updated.

Parsing can be performed synchronously, that is, directly calling the HTTPDNS interface to return the latest records and updating the cache; it can also be performed asynchronously, adding a parsing task to the background, and the background task calling the HTTPDNS interface.

The advantage of synchronous update is good real-time. The disadvantage is that if multiple requests are found to be expired, HTTPDNS will be requested multiple times at the same time, which is actually a waste.

The synchronous update method corresponds to the Cache-Aside mechanism of the cache in the application architecture , that is, the read cache is read first, and the read database is not hit, and the result is written to the cache at the same time. The advantage of
Insert picture description here
asynchronous update is that multiple requests can be found to be out of date and combined into one request task for HTTPDNS, which is only executed once, reducing the pressure of HTTPDNS. At the same time, you can create a task to preload when it is about to expire, to prevent refreshing after expiration, called preloading.

Its disadvantage is that when the current request gets expired data, if the client allows the use of expired data, it needs to take a risk. If the expired data can still be requested, there is no problem; if it cannot be requested, it will fail once, and the request will be successful after the next cache update.
Insert picture description here
The asynchronous update mechanism corresponds to the Refresh-Ahead mechanism of the cache in the application architecture , that is, the business only accesses the cache and refreshes it periodically when it expires.

In the well-known application cache Guava Cache, there is a RefreshAfterWrite mechanism. In the case of concurrent, multiple cache access misses and trigger concurrent back to the source, you can take only one request back to the source mode. In the cache of the application architecture, data warm-up or pre-loading mechanisms are often used.
Insert picture description here

HTTPDNS scheduling design

Because the client is embedded in the SDK, it will not cause the authoritative DNS server to misunderstand the location and operator of the client because of various caches, forwarding, and NAT of the local DNS, and can obtain first-hand information.

On the client, you can know which country, which operator, which province, or even which city the mobile phone is in. The HTTPDNS server can choose the best service node to return based on this information.

If there are multiple nodes, it will also consider the error rate, request time, server pressure, network conditions, etc. to make a comprehensive choice, rather than just considering the geographic location. When a node is down or performance is down, you can switch as soon as possible.

To do this, the client needs to use the IP returned by HTTPDNS to access the business application. The client SDK will collect network request data, such as error rate, request time, and other network request quality data, and send it to the statistics background for analysis and aggregation to view the service quality of different IPs.

On the server side, the application can configure the priority and weight of different service qualities by calling the HTTPDNS management interface. HTTPDNS will calculate a ranking based on these strategies based on the geographic location and line conditions, and give priority to the current high-quality IP addresses with low latency.

The results returned by HTTPDNS through intelligent scheduling will also be cached on the client. In order to prevent the caching from distorting the scheduling, the client can cache the data in different dimensions according to the SSID of different mobile network operators WIFI. Different operators or WIFI will have different results. 
Insert picture description here

summary

Ok, this is the end of this section, let's summarize, you need to remember these two important points:

  • Traditional DNS has many problems, such as slow resolution and untimely update. Because of caching, forwarding, and NAT problems, clients misunderstand their location and operator, which affects traffic scheduling.
  • HTTPDNS uses the client SDK and the server to directly call and resolve DNS through HTTP, bypassing these shortcomings of traditional DNS and implementing intelligent scheduling.

Finally, I will leave you two thinking questions.

  1. To use HTTPDNS, you need to ask the HTTPDNS server to resolve the domain name, but how does the client know the address or domain name of the HTTPDNS server?
  2. The intelligent scheduling of HTTPDNS is mainly to let the client choose the closest server, and there is another mechanism to make the resource distribution to the location closer to the client, thereby speeding up the client's access, do you know what technology?
Published 40 original articles · won praise 1 · views 5362

Guess you like

Origin blog.csdn.net/aha_jasper/article/details/105575453