Internet Architecture Evolution model

6, using a reverse proxy and CDN acceleration response website

    To further speed up the access speed of the site, consider using CDN and reverse proxy. CDN provider of network deployment in the engine room when the user access, you can get data from a network provider closest to the user's computer room. Reverse proxy deployment site in the center of his room, when the user requests reach the engine room, priority access server is a reverse proxy server, if the reverse proxy cache resource requested by the user, then returned directly to the user, faster response speed, but also reduce the pressure on the back-end load. CDN basic principles are the reverse proxy cache.

This stage involves the knowledge system: the need to understand and reverse proxy CDN knowledge related

 

| 7, sub-sub-table repository database (vertical / horizontal resolution) and a distributed file system

    Our site evolved to the present, the user, goods, data transactions are still the same database. Despite the increase in cache, read and write separate ways, but with the pressure on the database continues to grow, more and more prominent bottlenecks in the database at this time, we can sub-library sub-table using two methods to solve.

    Sub-library: also known as split vertically , is to split the database to a different service data in a different database, with examples now, it is to separate users, goods, transaction data. Advantages: solve the original all business in a database pressure problem, we can do more to optimize the characteristics of the business. Disadvantages: need to maintain multiple databases.

    Sub-library problems encountered: 1) to consider the original inter-transaction business; 2) cross-database join

    Solution: In the application layer to avoid things that cross-database, cross-database if you must, try to control the code. We can be resolved through third-party applications, such as mycat mentioned above, mycat provides a rich cross-database join plan, details can be mycat official documents.

    Part table: called split level , is to split the data into the same table two or more databases. Cause the data level is the amount of data in a split service table update amount reaches a bottleneck or single database, then the table can be split to two or more databases. Advantages: If we can overcome the problem of excessive single table, then we will be able to cope with a very good situation rapidly growing amount of data.

    Sub-table problems encountered:

    1) application to access user information need to address SQL routing problem, because now the user information points in the two databases, you need to know where the data during data operations need to operate in.

    2) primary key processing becomes different from the original, for example, by field, now can not simply continue to use. 

    3) If you need paging, processing up more trouble.

    Solution: We can still be resolved by a third-party middleware, such as mycat. mycat can be parsed by our SQL SQL parsing module, then according to our configuration, forwards the request to a specific database. We can guarantee a unique UUID or by custom ID solutions to the primary key issue. mycat also provides a wealth of paging query plan, such as start paging query each database to do, and then do a merge data paging query and so on.

    Sub-table is the last method to split the database system, used only in very large-scale single-table data when the more common database means is split operations classified library, the different business databases deployed on different physical servers. Up to now, our database has been changed to a distributed database, then we can use a distributed file system, used to respond to substantial growth in data volumes.

 

This stage involves the knowledge system: the third party middleware such as mycat applications in sub-library sub-table.

 

| 8, and using the NoSQL search engine

    With website traffic increasingly complex, demand for data storage and retrieval has become increasingly complex, the site needs to use some non-relational databases (NoSQL) and non-database query (search engine) technology, then we can introduce NoSQL database and search engines. Application server to access a variety of data through unified data access module, reducing application management trouble many data sources.

    数据库做读库的话,常常对模糊查找力不从心,即使做了读写分离,这个问题还未能解决。以我们所举的交易网站为例,发布的商品存储在数据库中,用户最常使用的功能就是查找商品,尤其是根据商品的标题来查找对应的商品。对于这种需求,一般我们都是通过like功能来实现的,但是这种方式的代价非常大。此时我们可以使用搜索引擎的倒排索引来完成。 

    搜索引擎的优点:它能够大大提高查询速度。

    引入搜索引擎后也会带来以下的开销:

    1)带来大量的维护工作,我们需要自己实现索引的构建过程,设计全量/增量的构建方式来应对非实时与实时的查询需求。

    2)需要维护搜索引擎集群

    搜索引擎并不能替代数据库,他解决了某些场景下的“读”的问题,是否引入搜索引擎,需要综合考虑整个系统的需求。引入NoSQL和搜索引擎后的系统结构如下:

这一阶段涉及到的知识体系:

    搜索引擎:例如Elasticsearch、Solr、Sphinx等

    NoSQL:例如MongoDB、Hbase、Cassandra等

 

| 9、按业务模块拆分

    当业务模块越来越多,同时用户达到一定的级别后,需要通过分而治之的手段将系统分成多个模块,不同模块划分不同业务团队负责。具体到技术上,也会根据不同模块进行拆分,每个模块独立开发维护和部署。

这一阶段涉及到的知识体系:业务模块拆分,工作量比较大,需要对各业务非常熟悉,且非常认真负责。

 

| 10、服务化及中间件

    随着业务的拆分越来越小,系统的复杂度呈指数级上升,部署维护越来越困难。这时会面临以下问题:

    1)业务拆分后,可能会有一些相同的代码,如用户相关的代码,商品和交易都需要用户信息,所以在这两个系统中都保留差不多的操作用户信息的代码,这些代码如何可复用?

    2)由于所有的应用都需要跟数据库进行连接,导致数据库的连接资源不足,拒绝服务

    为了解决这些问题,我们需要将公用的功能进行独立提取,独立部署,向外提供可复用的统一分布式服务,这种方式又叫SOA。服务化之后,相同的代码不会散落在不同的应用中了,这些实现放在了各个服务中心,使代码得到更好的维护,同时我们把对数据库的交互放在了各个服务中心,让前端的web应用更注重与浏览器的交互工作,而新增业务也只需要调用这些分布式服务。但是同时,我们又遇到以下问题:

    1)如何进行远程的服务调用

    2)随着网站的继续发展,我们的系统中可能出现不同语言开发的子模块和部署在不同平台的子系统。此时我们需要一个平台来传递可靠的,与平台和语言无关的数据,并且能够把负载均衡透明化,能在调用过程中收集调用数据并分析之,推测出网站的访问增长率等等一系列需求,对于网站应该如何成长做出预测。

    这时我们可以通过引入消息中间件的方式来解决,例如阿里的dubbo,可以搭配Google开源的分布式程序协调服务zookeeper实现服务器的注册与发现。这一步的架构图如下:

    这一阶段涉及到的知识体系:需要识别可复用的业务,设计服务接口,规范服务依赖关系;要求对通信、远程调用、消息机制等有深入的理解和掌握,要求的都是从理论、硬件级、操作系统级以及所采用的语言的实现都有清楚的理解;运维这块涉及的知识体系也非常的多,多数情况下需要掌握分布式并行计算、报表、监控技术以及规则策略等等。

通常,演进到这一阶段需要耗费相当长的时间,也会碰到很多挑战:

    1)拆成分布式后需要提供一个高性能、稳定的通信框架,并且需要支持多种不同的通信和远程调用方式;

    2)将一个庞大的应用拆分需要耗费很长的时间,需要进行业务的整理和系统依赖关系的控制等;

    3)如何运维(依赖管理、运行状况管理、错误追踪、调优、监控和报警等)好这个庞大的分布式应用。

    经过这一步,差不多系统的架构进入相对稳定的阶段,同时也能开始采用大量的机器来支撑着巨大的访问量和数据量,结合这套架构以及这么多次演变过程吸取的经验来采用其他各种各样的方法来支撑着越来越高的访问量。

 

| 总结

    设计网站架构时一定要从小开始,架构随业务演变而演变,切记不要在业务还是0的时候去追求1的架构设计,那样的架构只会舍本逐末,得不偿失。整个网站架构的经典演变过程都和上面比较的类似,当然,每步采取的方案,演变的步骤有可能会不同,另外,由于网站的业务不同,会有不同的专业技术的需求,这篇文章更多的是从架构的角度来讲解演变的过程,当然,其中还有很多的技术也未在此提及,像数据挖掘、实时数据统计等,并且在真实的演变过程中还会借助像提升硬件配置、网络环境、改造操作系统等来支撑更大的流量,因此在真实的发展过程中还会有很多的不同。一个大型网站要做到的远远不仅仅上面这些,还有像安全、运维、运营、服务、存储等,要做好一个大型的网站真的很不容易。

Guess you like

Origin www.cnblogs.com/liliuguang/p/11933529.html