Let’s talk about how CDN and load balancing are implemented.

Analysis & Answers

What is CDN

CDN (full name: Content Delivery Network) is a content distribution network.

The intelligent virtual network built on the basis of the existing network relies on edge servers deployed in various places and uses the load balancing, content distribution, scheduling and other functional modules of the central platform to enable users to obtain the content they need nearby, reduce network congestion, and improve user experience. Access response speed and hit rate. The key technologies of CDN mainly include content storage and distribution technology/

Simply put, CDN allocates the nearest resources based on the user's location.

Therefore, users do not need to directly access the origin site when surfing the Internet, but visit the "nearest" CDN node to them. The term is called "edge node", which is actually a proxy server that caches the content of the origin site. As shown below:

CDN principle analysis

When CDN is not used, the path when we use the domain name to access a certain site is

The user submits the domain name → the browser interprets the domain name → DNS resolves to obtain the IP address of the destination host → issues a request based on the IP address access → obtains the request data and replies

After applying CDN, DNS no longer returns an IP address, but a CNAME (Canonical Name) alias record, pointing to the CDN's global load balancing

CNAME actually plays the role of a middleman (or agent) in the domain name resolution process, which is the key to CDN implementation.

Load balancing system

Since no IP address is returned, the local DNS will send another request to the load balancing system, and then enter the CDN's global load balancing system for intelligent scheduling:

Look at the user's IP address, check the table to find out the geographical location, and find the relatively nearest edge node.
Check the operator's network where the user is located and find the edge node of the same network
Check the load of edge nodes and find nodes with lighter loads.
Others, such as node "health status", service capabilities, bandwidth, response time, etc.

Combining the above factors, the most suitable edge node is obtained, and then this node is returned to the user, so that the user can access the CDN's caching proxy nearby.

The overall process is as follows:

caching proxy

The caching system is another key component of CDN. The caching system will selectively cache the most commonly used resources.

There are two indicators to measure CDN service quality:

Hit rate: The resource accessed by the user happens to be in the cache system and can be returned directly to the user. The ratio of the number of hits to the number of all accesses
Return-to-origin rate: It is not in the cache. It must be retrieved from the origin site through a proxy. The ratio of the number of return-to-origin times to the number of all visits.

The cache system can also be divided into levels, divided into first-level cache nodes and second-level cache nodes. The first-level cache configuration is higher, and it is directly connected to the origin site. The second-level cache configuration is lower, and it is directly connected to the user.

When returning to the origin, the second-level cache only looks for the first-level cache. If the first-level cache is not available, it returns to the origin site, which can effectively reduce the actual return to the origin.

Today's commercial CDN hit rates are above 90%, which is equivalent to amplifying the service capabilities of the origin site by more than 10 times.

3. Summary

The purpose of CDN is to improve the service quality of the Internet. In layman's terms, it actually means to increase the access speed.

CDN builds national and global private networks, allowing users to access edge nodes in the private network nearby, reducing transmission delays and achieving website acceleration.

Through the CDN's load balancing system, edge nodes are intelligently scheduled to provide services, which is equivalent to the brain of the CDN service, and the cache system is equivalent to the heart of the CDN. Cache hits are directly returned to the user, otherwise they are returned to the source.

Reflect & Expand

To put it simply, a CDN is one or more servers that store some static files, and save the files in them through copying, caching, etc.

1. What are static files?

css, html, pictures, and media are all static files, which means that requests sent by users will not affect the content of static files, but files such as jsp, php, etc. are not static files because their content will be changed due to our requests. Change.

2. How does CDN achieve acceleration?

Normally, the data we want is obtained from the main server, but if our main server is in the south and the access user is in the north, the access speed will be relatively slow. There are many reasons for the slowness, such as transmission distance. , operators, bandwidth and other factors. If we use CDN technology, we will distribute CDN nodes everywhere. When a user sends a request to the server, the server will allocate the nearest CDN server to the user based on the user's regional information.

3. Where does CDN data come from?

Replication, caching, and CDN servers can cache files after user requests, or they can actively crawl main server content.

Meow Interview Assistant: A one-stop solution to interview questions. You can search the WeChat applet [Meow Interview Assistant] or follow [Meow Interview Assistant] -> Interview Assistant to answer questions for free. If you have any good interview knowledge or skills, I look forward to sharing them with you!