HTTP related concepts

Browser

An HTTP protocol requester uses the HTTP protocol to obtain various resources on the network.

Web server

The responder of the HTTP protocol, which provides Web services, is the body that responds to requests in the HTTP protocol.

Hardware is a machine in physical form or "cloud" form. In most cases, it may not be a server, but a huge cluster composed of technologies such as reverse proxy and load balancing.

The web server of software meaning may be of more concern to us. It is an application program that provides web services, and will run on a server of hardware meaning. Like Tomcat, Node.js, Apache, Nginx, IIS, Jetty, etc.

CDN

CDN, the full name is "Content Delivery Network", translated as "Content Delivery Network". It uses the caching and proxy technology in the HTTP protocol instead of responding to client requests from the origin.

The advantage of CDN is that it can cache the data of the source site, so that the browser's request does not reach the source site server "a thousand miles", and the response can be obtained directly "halfway". If the CDN's scheduling algorithm is excellent, it can find the node closest to the user, greatly reducing the response time.

reptile

As mentioned earlier, the browser is a user agent that accesses the Internet on our behalf.

But the HTTP protocol does not stipulate that the user agent must be behind a "real human", and it can also be a "robot". The official name of these "robots" is called "Crawler", which is actually a kind of automatic access. Web resource application.

How did the reptile come from?

Most of them are "released" by major search engines, crawling web pages and storing them in a huge database, and then establishing a keyword index, so that we can quickly search the pages in the Internet corner in the search engine.

The crawler also has a bad side, it will consume excessive network resources and occupy the bandwidth of the server, so there is a "gentleman agreement" robots.txt, which stipulates which should be crawled and which should not be crawled.

summary

  1. Most resources on the Internet are transmitted using HTTP protocol;
  2. The browser is the requester in the HTTP protocol, that is, User Agent;
  3. The server is the responder in the HTTP protocol, and Apache and Nginx are commonly used;
  4. The CDN is located between the browser and the server, and mainly plays the role of cache acceleration;
  5. Crawlers are another type of User Agent, a program that automatically accesses network resources.
Published 420 original articles · 143 thumbs up · 890,000 views

Guess you like

Origin blog.csdn.net/jeikerxiao/article/details/93618883