Thoughts on the evolution of large-scale website technology (9)--Website static processing--Overview (1)

Source: Summer Forest

                       cache 

Website static ------------" separation of dynamic and static ---------" distance (CDN), size (compression), number of requests (merge)

ESI: Server-side dynamic and static combination

 

At the beginning of the storage bottleneck, I mentioned that a navigation website like hao123 can carry a large-scale concurrent traffic as long as the number of web servers it deploys is sufficient. If it is a dynamic website, especially a website that uses a database, it is difficult to It can effectively increase the concurrent access capability of the website by increasing the number of web servers . But the reality is that large dynamic websites such as Taobao and JD.com can still ensure fast response under the condition of high concurrency. What kind of technical means can achieve dynamic websites to support high concurrency scenarios? This may be A question that everyone who is doing web development is very interested in. Today I will write a new series to discuss this issue. I hope my experience and research can enlighten most people. It should be noted here that the writing method of this series is different from that of the storage bottleneck. The first part of this series is mainly to explain the principle, and the later part will explain the specific implementation method according to the principle. Please understand.

  I personally conclude that the reason why these large-scale dynamic websites can achieve fast response and high concurrency is that they try to make their websites static . Understand that static websites improve dynamic websites on the basis of improving website response speed, so I will first discuss the characteristics of static websites that can be used to improve website response speed.

  A static website is very simple. It is to access a webpage on the web server through a url. After the web server receives the request, it returns the webpage to the browser using the http protocol on the network. The browser finally displays the page on the browser by parsing the http protocol. Sometimes this web page is more complicated and contains some additional resources such as: pictures, external css files, external js files and some multimedia resources such as flash, these resources will use http protocol alone to return information to The browser combines these resources with the page from tags such as src, href, and Object in the page, and finally displays the page in the browser. But no matter what type of resource, if we don't change them manually, then we get the same result every time we request. This shows a feature of static web pages: the resources of static web pages are basically unchanged . Therefore, the first time we visit a static web page and our subsequent visits to this static web page are a repeated request. The loading speed of this website is basically determined by the speed of network transmission and the size of each resource request. Since the access The resources basically do not change, so isn't it a waste of time for us to repeatedly request these resources and wait there? If so, the browser has a caching technology. When we develop, we can write corresponding instructions on the http protocol for those unchanged resources. These instructions will cause the browser to cache these static resources after the first access to the static resources . When you visit this webpage for the second time, you no longer need to repeat the request, because the request resource is cached locally , so the efficiency of obtaining it becomes extremely efficient.

  Since the requested resources of static websites do not change frequently, such resources are actually easy to be migrated. We all know that the efficiency of network transmission is related to the distance. Since static resources are easy to be migrated, we can Distribute static resource servers on multiple service nodes by region. When a user requests a website, the request is placed on the node closest to the user according to a routing algorithm , which can reduce the distance of network transmission and improve the efficiency of access. This is We have long mentioned the famous CDN technology, content distribution network technology .

 

  The network transmission efficiency is also related to the size of our transmission resources , so we compress the resources before transmission to reduce the size of the resources to achieve the purpose of improving transmission efficiency; in addition, each http request is actually a tcp request, these Requests will consume a lot of system resources when establishing and releasing connections . These performance consumptions are often larger than the transmission content itself. Therefore, we will try our best to reduce the number of HTTP requests to improve transmission efficiency or use HTTP long connections to Eliminate the overhead of establishing and releasing connections (the use of long connections depends on specific scenarios, which I will talk about in a later article).

  In fact, most of the 14 suggestions for website optimization proposed by Yahoo are based on the above principles. Regarding Yahoo's 14 conditional suggestions, the content of this series will be discussed in detail later, and will not be discussed here.

  I often think that the best performance optimization method is to use the cache , but the cached data is generally the data that does not change frequently. The browser cache mentioned above, CDN can actually be understood as a cache method. They are also one of the most effective ways to improve website performance, but these caching technologies have become extremely difficult to implement when it comes to dynamic websites. What is going on?

  首先动态网站和静态网站有何不同呢?我觉得动态网站和静态网站的区别就是动态网站网页虽然也有一个url,但是我们如果传输参数不同那么这个url请求的页面并不是完全一样,也就是说动态网站网页的内容根据条件不同是会发生改变的,但是这些变化的内容却是同一个url,url在静态网站里就是一个资源的地址,那么在动态网站里一个地址指向的资源其实是不同的。因为这种不同所以我们没法把动态的网页进行有效的缓存,而且不恰当的使用缓存还会引发错误,所以在动态网页里我们会在meta设定页面不会被浏览器缓存。

  如果每次访问动态的网页该网页的内容都是完全不同的,也许我们就没有必要写网站静态化的主题了,现实中的动态网页往往只是其中一部分会发生变化,例如电商网站的菜单、页面头部、页面尾部这些其实都不会经常发生变化,如果我们只是因为网页一小部分经常变化让用户每次请求都要重复访问这些重复的资源,这其实是非常消耗计算资源了,我们来做个计算吧,假如一个动态页面这些不变的内容有10k,该网页一天有1000万次的访问量,那么每天将消耗掉1亿kb的网络资源,这个其实很不划算的,而且这些重复消耗的宽带资源并没有为网站的用户体验带来好处,相反还拖慢了网页加载的效率。那么我们就得考虑拆分网页了,把网页做一个动静分离,让静态的部分当做不变的静态资源进行处理,动态的内容还是动态处理,然后在合适的地方将动静内容合并在一起。

  这里有个关键点就是动静合并的位置,这个位置的选择会直接导致我们整个web前端的架构设计。我们这里以java的web开发为例,来谈谈这个问题。

  java的web开发里我们一般使用jsp来编写页面,当然也可以使用先进点的模板引擎开发页面例如velocity,freemark等,不管我们页面使用的是jsp还是模板引擎,这些类似html的文件其实并不是真正的html,例如jsp本质其实是个servlet也就是一个java程序,所以它们的本质是服务端语言和html的一个整合技术,在实际运行中web容器会根据服务端的返回数据将jsp或模板引擎解析成浏览器能解析的html,然后传输这个html到浏览器进行解析。由此可见服务端语言提供的开发页面的技术其实是动静无法分离的源头,但是这些技术可以很好的完成动静资源中的动的内容,因此我们想做动静分离那么首先就要把静的资源从jsp或者模板语言里抽取出来,抽取出来的静态资源当然就要交给静态的web服务器来处理,我们常用的静态资源服务器一般是apache或ngnix,所以这些静态资源应该放置在这样的服务器上,那么我们是否可以在这些静态web服务器上做动静结合呢?答案是还真行,例如apache服务器有个模块就可以将它自身存储的静态资源和服务端传输的资源整合在一起,这种技术叫做ESI,这个时候我们可以把不变的静态内容制作成模板放置在静态服务器上,动态内容达到静态资源服务器时候,使用ESI或者CSI的标签,把动静内容结合在一起,这就完成了一个动静结合操作。这里就有一个问题了,我前面提到过CDN,CDN其实也是一组静态的web服务器,那么我们是否可以把这些事情放到CDN做了?理论上是可以做到,但是现实却是不太好做,因为除了一些超有钱的互联网公司,大部分公司使用的CDN都是第三方提供的,第三方的CDN往往是一个通用方案,再加上人家毕竟不是自己人,而且CDN的主要目的也不是为了做动静分离,因此大部分情况下在CDN上完成这类操作并不是那么顺利,因此我们常常会在服务端的web容器前加上一个静态web服务器,这个静态服务器起到一个反向代理的作用,它可以做很多事情,其中一件事情就是可以完成这个动静结合的问题。

  那么我们把这个动静结合点再往前推,推到浏览器,浏览器能做到这件事情吗?如果浏览器可以,那么静态资源也就可以缓存在客户端了,这比缓存在CDN效率还要高,其实浏览器还真的可以做到这点,特别是ajax技术出现后,浏览器来整合这个动静资源也就变得更加容易了。不过一般而言,我们使用ajax做动静分离都是都是从服务端请求一个html片段,到了浏览器后,使用dom技术将这个片段整合到页面里,虽然这个已经比全页面返回高效很多,但是他还是有问题的,服务端处理完请求最终返回结果其实都是很纯粹的数据,可是这些数据我们不得不转化为页面片段返回给浏览器,这本质是为纯粹的数据上加入了很多与服务端无用的结构,之所以说无用是因为浏览器自身也可以完成这些结构,为什么我们一定要让服务端做这个事情了?如是乎javascript的模板技术出现了,这些模板技术和jsp,velocity类似,只不过它们是通过javascript设计的模板语言,有了javascript模板语言,服务端可以完全不用考虑对页面的处理,它只需要将有效的数据返回到页面就行了,使用了javascript模板技术,可以让我们动静资源分离做的更加彻底,基本上所有的浏览器相关的东西都被静态化了,服务端只需要把最原始的数据传输到浏览器即可。讲到这里我们就说到了web前端最前沿的技术了:javascriptMVC架构了。

  好了今天就写到这里,本篇文章是网站静态化处理理论的总述,后面的文章我将会一点一滴的讲述实现网站静态化的各种技术实现细节。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326333562&siteId=291194637