Detailed explanation of CDN request process

Introduction to CDN

CDN is familiar to everyone, here is a brief introduction.

CDN mainly allows users to obtain resources from CDN nodes that are very close to users when they access resources, instead of going to the machine that actually provides services. So CDN can

  1. Let users get the content they need faster
  2. Reduce backbone network traffic
  3. Reduce server pressure

CDN has gone through three stages

The first phase In 1995, Tim, the inventor of the Internet, created the first CDN service company Akamai

The second stage 1999~2001, the climax of Internet development, CDN developed rapidly

In the third stage, the Internet collapsed in 2001, CDN companies went bankrupt, and Tim's company also went bankrupt. Beginning in 2002, broadband upgrades, games and video development have led to the development of CDN

Although CDN has experienced more than 20 years of development, it has not yet formed a standard specification, and the specific implementation of each company is different. This article only explains one type, and hopes to give you a deeper understanding of CDN.

CDN request process

The CDN request process is roughly as shown in the figure below, and I will briefly introduce it below. The solid box on the left is the DNS lookup phase, and the dotted box on the right is the scope of the CDN. https://www.processon.com/view/link/5ed5175e0791291d5dba30ea

Insert picture description here

DNS lookup phase

The user requests a link in the browser, such as event.mi.com, the browser needs to find the IP address corresponding to the domain name

  1. First check if there is a record on the DNS client of your machine, if there is no record

  2. Obtain from the local DNS server, if the local DNS server has no record

  3. Search from the root domain name server (13 root domain name servers worldwide), the root domain name returns the address of the com domain name server

  4. The local DNS server looks up from the com domain name server, and the com domain name server returns the authoritative domain name server address of event.mi.com

  • Authoritative domain name DNS server: contains all the information of the domain name
  1. After finding the authoritative domain name server, you will find that the domain name has a CNAME, this CNAME generally points to the CDN's global load balancing system
  • DNS A record

    • The format of the A record is "domain name-ip", and the record is the server ip address corresponding to the domain name
  • DNS CNAME

    CNAME is the alias of the domain name, generally has two functions

    1) Multiple domain names point to the same service IP. When the service IP changes, only one A record needs to be changed. For example, the A record of the domain name www.abc.com is 1.1.1.1, and the alias of the domain name mail.abc.com and study.abc.com can be set to www.abc.com, so that when the service changes the IP address, only Need to change the A record of www.abc.com, no other domain names need to be changed, reducing maintenance costs

    2) The role of CNAME on CDN is also very important. To hang a domain name on the CDN, you need to set the CNAME of the domain name to the domain name provided by the CDN provider, so that the CDN provider can transfer traffic to the CDN through DNS. Moreover, after the CNAME of the domain name is set to the domain name of the CDN, the A record of the domain name cannot exist. Generally , you can view the DNS resolution through the nslookup and dig commands. As shown in the figure below, you can see that the CNAME of the domain name event.mi.com is a domain name of Baishan Cloud. In addition, you can see that the domain name of Baishan Cloud also has a corresponding CNAME. This is mainly for load balancing, which will be explained later.
    Insert picture description here

    3) The DNS resolution described in this module uses iterative search. DNS also provides a recursive search method. If you are interested, you can take a look at the difference between the two

CDN stage

Through the DNS resolution process described above, the CDN operator successfully transferred the request to them

  1. There are many ways to implement CDN's global load balancing system. Here is a more common solution, the DNS-based global load balancing system.
  • First, the system is DNS, which can resolve domain names

  • Second, the system has a load balancing function. There are generally two types of load balancing strategies for CDN, static load balancing (select different servers based on the user's geographic location, network operators, etc.) and dynamic load balancing (select servers based on dynamic data such as server traffic, performance, and load). Because dynamic load balancing consumes more resources, global load balancing systems generally use static load balancing strategies, and regional load balancing systems generally use dynamic load balancing strategies.

  • Finally, the global load balancing system, based on the static load balancing strategy, selects the appropriate regional load balancing system IP to the requester.

  • PS: Generally, the global load balancing system will have a backup system, and the configuration of the backup system is exactly the same as the current system. When the CDN is discovered to be attacked by DDos, the backup system will be activated, the ip of the backup system will be added to the DNS, and the cache time of the ip setting will be longer. This solution can reduce DDos attacks.

  • PS: Load balancing has four levels of load balancing and seven levels of load balancing. The so-called four and seven levels correspond to the seven layers of OSI. The fourth layer can only perform load balancing based on IP, etc., and the seventh layer can obtain the requested information, such as Cookies, etc. do load balancing, our commonly used nginx can do four-level load balancing and seven-level load balancing.

  1. The client requests the CDN regional load balancing system, which will determine the CDN cache server that provides the service. Regional load balancing systems generally use dynamic strategies. For this reason, a separate server is needed to collect various information of CDN cache servers in the region (such as session capacity, round-trip time, traffic, cache location, etc.)
  • Here is a brief introduction to how to select a CDN cache server based on the location of the cache. The implementation principle of this method is very simple. The requested URL is matched to a CDN cache server through a certain algorithm, and when the URL is requested again in the future, it will still hit the cache server. The advantage of this method is that it saves space. A request is stored on only one server, and there is no redundancy. The disadvantage is that if it is a hot URL, the server pressure will be too high, and if there is a problem with the server, all requests may be returned to the source.
  1. If the CDN cache server provided by the regional load balancing system is not cached or the cache is invalid, a request will be made to the higher-level CDN cache server. The commonly used protocols are ICP/HTCP/CARP, etc. Of course, the basic knowledge of the Web is used to judge cache invalidation, such as Pragma, Expires, Cache-Control, Last-Modified, Etag, etc.

  2. If the upper CDN cache server is still not cached or expired, it will request the file on the back-to-origin machine, and cache it after the request is successful

Some problems in the use of CDN

CDN is very useful against high concurrency, you can refer to this article "Common Caching Techniques"

However, when using CDN, you may also encounter some problems. Here I will tell you some problems I have encountered.

Obtaining files takes too long

I recently encountered a problem. It took 120s to obtain a 50KB image through CDN.

The reason for this is that there is a problem with the CDN manufacturer's load balancing configuration. Under the wrong configuration, in order to get the picture, it needs to travel half of the earth. Later, after letting the CDN manufacturer modify the configuration, it only takes 0.2s.

Get wrong file

The product needs to have a product site, and the product site will reference the js file. Every time the product site changes, the js name will not change, but the js tag will change, such as base.js?v01 to base.js?v02. The js file is set to never expire. If the version number changes, it will be returned to the source, which is the premise.

If the product site does not match the js version, the product site will produce some errors, such as the page cannot be opened or some functions cannot be used. There are two situations where the product site and js do not match

  1. The product site is the new version, js is the old version

This situation is generally due to the fact that js was not first released to the back-to-source machine when it was released, but a new product site was first released, so that when the new product site requests new js, it will request the back-to-source machine and return The source machine is still the old version, so the old js is treated as the new js cache. After this happens, a new js version is generally generated and the release operation is performed again.

  1. The product site is the old version, js is the new version

There are many reasons for this situation, and it is often difficult to deal with. A prerequisite for this to happen is that the new js has been released to the source machine, and the old product site has not yet been released

  • When there is only one CDN service provider, if the users are all abroad and visit the product site in China at this time, according to the global load balancing strategy, this request is the first request in the country, and there is no cache on the CDN. The js obtained is It is a new js, but the probability of this occurrence is relatively small, and the impact is not large
  • When there are multiple CDN service providers, the probability of this happening is extremely high. Because different service providers have different service areas, and the operation and maintenance may also deploy the traffic of different service providers, it is likely that the js request is not on the CDN at all, resulting in a back-to-origin situation. In this case, the CDN cache is generally deleted to force a version agreement, but it may still affect users
  • Another situation is that if files are only cached on one CDN cache server, if the server fails, back-to-origin will also occur. However, this situation is rare, because the probability of server damage is not high, and there is a cache server on the upper layer of the CDN server, so the probability is small.

Request bulk

After this happens, the server is often overwhelmed. The reason for this is that most CDN service providers judge the CDN hit based on the entire url. If the url is promoted through Google ads, a different suffix will be added later, and it will not be hit. To prevent this from happening, you can let the operation and maintenance help to make special configuration, and only the specified query parameter changes will return to the source (the operation and maintenance may not want to do this operation because it is not conducive to later maintenance), or improve the performance of your own service.

At last

If you like my article, you can follow my public account (Programmer Mala Tang)


Detailed explanation of CDN request process

Thoughts on the career development of programmers

The history of blog service being crushed

Common caching techniques

How to efficiently connect with third-party payment

Gin framework concise version

Thinking about code review

data

  1. https://cloud.tencent.com/developer/article/1349559
  2. https://www.cnblogs.com/liyuanhong/articles/7353974.html
  3. https://www.jianshu.com/p/4d8df62d55e3
  4. https://blog.csdn.net/jiajiren11/article/details/80071312?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.nonecase&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.nonecase
  5. Introduction to DNS function and usage in CDN
  6. https://mp.weixin.qq.com/s?__biz=Mzg3MjA4MTExMw==&mid=2247486200&idx=1&sn=197c0905028104e1ae32dc6bed7941f5&chksm=cef5f94ef982705874cf1a852e2f3e4879e59cb0705d1aed5a12cd91f845fba1869b58cb8863&scene=21#wechat_redirect

Guess you like

Origin blog.csdn.net/shida219/article/details/106748366