Talking about network protocols-Lecture 20 | CDN: Have you ever picked up a courier at the store?

This series of related blog, reference geeks time - Something about network protocol

Talking about network protocols-Lecture 20 | CDN: Have you ever picked up a courier at the store?

In the previous section, we saw the general access patterns of the website.

When a user wants to visit a website, specify the domain name of the website, DNS will resolve the domain name into an address, and then the user requests the address and returns a web page. Just like you want to buy something, you first need to find the location of the store, then go to the store to find what you want, and finally take it home.

Is there any place where it can be optimized?

For example, if you go to the e-commerce website and place an order to buy something, do you have to deliver it from the central warehouse of the e-commerce headquarters? It turns out that this is basically the case, and each order is delivered separately, so it may take you a long time to receive your baby. However, the logistics system of the e-commerce website became smarter. They built many warehouses all over the country, instead of only the central warehouse of the headquarters.

According to statistics, the e-commerce website probably knows how many books, toilet paper, bags, electrical appliances and other items with relatively long storage periods can be sold every day in Beijing, Shanghai, Guangzhou, Shenzhen, Hangzhou and other places. These items do not need to be sent from the central warehouse, so they can usually be distributed in warehouses around the world. Customers order and the nearest warehouse sends them out, and they can be received the next day.

In this way, the user experience is greatly improved. Of course, there is also a difficulty here. The shelf life of such things as fresh food is too short. If you prepare the goods in advance, but no one places an order, it will definitely be broken. I will talk about this later.

Let's start by saying that our website visit can draw on the idea of ​​"distribution nearby".

CDN distribution system architecture

There are so many data centers in the world, no matter where you go online, there are basically data centers in the vicinity. Is it possible to deploy several machines in these data centers to form a cache cluster to cache part of the data, then when users access the data, they can access it nearby?

Of course it is possible. These nodes distributed in various data centers in various places are called edge nodes.

Due to the large number of edge nodes, but the size of each cluster is relatively small, it is impossible to cache everything, so it may not hit, so it will be above the edge node. If there are regional nodes, the scale will be larger, the cached data will be more, and the probability of hit will be greater. Above the regional nodes is the central node, which has a larger scale and more cached data. If you still don't hit, you have to go back to the source website to visit.
Insert picture description here
This is the architecture of the CDN distribution system. The cache of the CDN system is also layer-by-layer, and you can not disturb it without accessing the real source of the backend. This is also the idea of ​​the logistics system of the e-commerce website. The Beijing Bureau cannot find the North China Bureau. The North China Bureau cannot find the North Bureau.

Related concepts supplement

The full name of CDN is Content Delivery Network , which is the content delivery network . CDN is an intelligent virtual network built on the basis of the existing network. Relying on edge servers deployed in various places, through the load balancing, content distribution, scheduling and other functional modules of the central platform, users can obtain the required content nearby and reduce network congestion Improve user access response speed and hit rate. The key technologies of CDN mainly include content storage and distribution technology.

CDN (Content Delivery Network) refers to a content delivery network, also known as a content delivery network. This concept began in 1996 and was proposed by a research team at the Massachusetts Institute of Technology to improve the quality of Internet services. In order to publish rich broadband media content on the traditional IP network, they proposed to build a content distribution platform based on the existing Internet to provide services for the website, and in 1999 established a special CDN service company to provide professional services for Yahoo. Since CDN is a network overlay layer optimized for accelerating network access speed, it is visually called " network accelerator ".

The birth of the CDN network has greatly improved the quality of Internet services, so traditional large-scale network operators have begun to build their own CDN networks, such as AT & T, Deutsche Telekom, China Telecom and so on. With the increasing market demand, even pure CDN network operators have emerged. Akamai in the United States is the largest one, with more than 1,000 nodes distributed around the world. China's first pure CDN network service company is Beijing Lanxun Company. It has established a special CDN service network, ChinaCache, since 2000. At present, the CDN network has exceeded 50 nodes, covering six backbone networks in China-China Telecom, China Netcom, China Mobile, China Unicom, China Railcom Network, and China Education Network, with a bandwidth resource reserve of more than 35G and serving more than 300 customers .

If you haven't known it before, you can search for more related materials by yourself.

Load balancing

With this distribution system, the next step is, how does the client find the corresponding edge node to access?

Remember the DNS-based global load balancing we talked about? This load balancing is mainly used to select a nearby server of the same operator for access. You will find that the CDN distribution network is also a distributed system distributed in multiple regions and multiple operators. You can also use the same idea to select the most suitable edge node. 
Insert picture description here
In the absence of a CDN, the user enters www.web.comthis domain name into the browser, and when the client accesses the local DNS server, if the local DNS server has a cache, it returns the website address; The authoritative DNS server is responsible for web.com and it will return the IP address of the website. The local DNS server caches the IP address, returns the IP address, and then the client directly accesses the IP address to access the website.

However, with the CDN, the situation has changed. On the authoritative DNS server web.com, a CNAME alias will be set, pointing to another domain name www.web.cdn.com, and returned to the local DNS server.

When the local DNS server gets the new domain name, it needs to continue to resolve the new domain name. At this time, it is not the authoritative DNS server of web.com, but the authoritative DNS server of web.cdn.com, which is CDN's own authoritative DNS server. On this server, a CNAME will still be set, pointing to another domain name, which is the global load balancer of the CDN network.

Next, the local DNS server requests the CDN's global load balancer to resolve the domain name. The global load balancer will select a suitable cache server for the user to provide services. The selection basis includes:

  • According to the user's IP address, determine which server is closest to the user;
  • The operator where the user is located;
  • According to the content name carried in the URL requested by the user, determine which server has the content required by the user;
  • Query the current load of each server to determine which server still has service capabilities.

Based on the above conditions, after a comprehensive analysis, the global load balancer will return the IP address of a cache server.

The local DNS server caches the IP address, and then returns the IP to the client, and the client accesses the edge node to download resources. The cache server responds to the user request and transmits the content required by the user to the user terminal. If this cache server does not have the content that the user wants, then this server will request content from its upper-level cache server until the source server traced back to the website pulls the content locally.

Cache

There are many kinds of content that CDN can cache.

Static data cache

Daily necessities with a long shelf life are easier to cache, because they are not easy to expire, which corresponds to the static pages and pictures in the e-commerce warehouse system. These things are not changed so they are suitable for caching.
Insert picture description here
Remember the architecture of the access layer cache? When entering the data center, we hope to block most static resource access at the edge through the outermost access layer cache. The CDN goes one step further, caching these static resources outside the data center closer to the user. The closer to the customer, the better the access performance and the lower the latency.

Streaming media cache

But among the static content, there is a special kind of content, and CDN is also used a lot. This is the streaming media mentioned earlier.

CDN supports streaming media protocols, such as the RTMP protocol described earlier. In many cases, this is equivalent to a proxy that reads content from the upper level cache and forwards it to the user. Since streaming media is often continuous, it can be pre-cached or pushed to the user ’s client in advance.

For static pages, the distribution of content often takes the pull method, that is, when a miss is found, go to the next level to pull. However, the amount of streaming media data is large, and if there is a back-to-source, the pressure will be relatively large, so it often adopts an active push mode to actively push hotspot data to edge nodes.

For streaming media, many CDNs also provide preprocessing services, that is, files are processed before being distributed. For example, the video is converted into different code streams to meet the needs of users with different network bandwidths; and then the video is fragmented to reduce the storage pressure, and the client can choose to use different bit rates to load different fragments. This is what we commonly see, "I want to see super clear, standard definition, smooth, etc."

For the streaming media CDN, a key issue is the anti-theft chain problem. Because the video costs a lot of money to buy the copyright, in order to earn some money and collect advertising fees, if the streaming media is stolen by other websites and played on other people's websites, the loss can be huge.

The most common and easiest method is the refer field of the HTTP header. When the browser sends a request, it usually brings a referer to tell the server which page the link came from. Based on this, the server can obtain some information for processing. If the refer information does not come from this site, block access or jump to other links.

The refer mechanism is relatively easy to crack, so you need to cooperate with other mechanisms.

A commonly used mechanism is the timestamp anti-theft chain. Administrators using the CDN can agree on an encrypted string with the CDN manufacturer on the configuration interface.

The client takes out the current timestamp, the resource to be accessed and its path, and performs a signature algorithm together with the encrypted string to obtain a string, and then generates a download link with this signature string and deadline timestamp to access the CDN.

At the CDN server, compare the expiration time with the current CDN node time to confirm whether the request expires. Then the CDN server has the resource and path, timestamp, and agreed encrypted string. The signature is calculated according to the same signature algorithm. If it matches, the signature is consistent and the access is legal, and the resource is returned to the client.

Dynamic data cache

However, for example, in the e-commerce warehouse, I mentioned earlier that the caching of fresh food is very troublesome. This corresponds to dynamic data, which is more difficult to cache. How to do it? There are also dynamic CDNs, mainly in two modes.

  • One is the fresh supermarket model, that is , the model of edge computing . Since the data is generated dynamically, the logical calculation and storage of the data are also placed on the edge nodes accordingly. Among them, the stored data is synchronized from the source data regularly, and then the results are calculated at the edge. Just like the cooking of fresh food is dynamic, there is no way to cache it in advance, so placing the fresh food supermarket next to your home can not only deliver to your door, but also be able to cook on site, which is also a manifestation of edge computing.
  • The other is the cold chain transportation mode, that is, the path optimization mode . The data is not generated at the edge of the calculation, but at the source station, but the data can be distributed through the CDN network to optimize the path. Because there are many CDN nodes, it is possible to find edge nodes that are very close to the source station, as well as edge nodes that are close to the user. The link in the middle is completely planned by the CDN, choose a more reliable path, and use a similar line to access.

For commonly used TCP connections, data is often lost when transmitting on the public network, resulting in a TCP window that is always small and the sending speed cannot be increased. According to the previous principles of TCP flow control and congestion control, TCP parameters can be adjusted in the CDN acceleration network so that TCP can transmit data more aggressively.

A connection can be multiplexed through multiple requests to ensure that each dynamic request arrives. The connection has been established, there is no need to temporarily shake hands three times or establish too many connections, increasing the pressure on the server. In addition, transmission data can be compressed to increase transmission efficiency.

All these methods are like cold chain transportation, the entire logistics is optimized, and the whole process is frozen and transported at high speed. Whether fresh food is delivered to your home from the supermarket next to you, or from the place of origin, it is guaranteed that your home is fresh.

summary

Ok, this is the end of this section. Let ’s summarize, just remember these two important points.

  • Like the distributed storage system of the e-commerce system, the CDN is divided into a central node, a regional node, and an edge node, and the data is cached at the position closest to the user.
  • CDN is best at caching static data. In addition, it can also cache streaming media data. At this time, pay attention to the use of anti-theft chains. It also supports dynamic data caching, one is the edge-computing fresh supermarket mode, and the other is the link-optimized cold chain transportation mode.

Finally, I will leave you two questions:

  1. This section describes an example of CDN using DNS for global load balancing. How does CDN use HTTPDNS?
  2. The client has been accessing DNS, HTTPDNS, and CDN for a long time, and has not entered the data center. Do you know what it is in the data center?
Published 40 original articles · won praise 1 · views 5361

Guess you like

Origin blog.csdn.net/aha_jasper/article/details/105575484