Large-scale website architecture technology list (transfer)

http://coderknock.com/blog/2016/12/09/all.html (transfer)

Large-scale website architecture technology at a glance

This article is reproduced from: List of large-scale website architecture technologies

The challenges of large websites mainly come from the huge number of users, high concurrent access and massive data. Once any simple business needs to process tens of petabytes of data and face hundreds of millions of users, the problem will become difficult. Large-scale website architecture is mainly to solve this kind of problem. For more content, you can also read the two articles on the evolution of the architecture of major Internet companies and the evolution of the architecture of large -scale websites .

Most of the content of this article comes from "Technical Architecture of Large Websites" . This book is worth reading and is highly recommended.

The system architecture level of the website is shown in the following figure:

website system architecture

1. Front-end architecture

Front-end refers to the link that user requests go through before reaching the application server of the website. Usually, it does not include the business logic of the website and does not handle dynamic content.

Browser Optimization Technology

It is not to optimize the browser, but to speed up the loading and display of the browser page by optimizing the response page. Commonly used are page caching, merging HTTP to reduce the number of requests, and using page compression.

CDN

The content distribution network is deployed in the network operator's computer room. By distributing the static page content to the CDN server closest to the user, the user can obtain the content through the shortest path.

Dynamic and static separation, independent deployment of static resources

Static resources, such as JS, CSS and other files, are deployed on a dedicated server cluster, separated from the dynamic content service of Web applications, and use a dedicated (second-level) domain name.

image service

Pictures do not refer to website logos, button icons, etc. These files belong to the static resources mentioned above and should be deployed together with JS and CSS. The pictures here refer to pictures uploaded by users, such as product pictures, user avatars, etc. The picture service also applies to independently deployed picture server clusters and uses independent (second-level) domain names.

reverse proxy

Deployed in the website computer room, it provides page caching services before the application server, static resource server, and image server.

DNS

Domain name service, resolves domain names into IP addresses, and uses DNS to achieve DNS load balancing. Configuring CDN also requires modifying DNS so that domain names point to the CDN server after being resolved.

2. Application layer architecture

The application layer is where the main business logic of the website is handled.

Development Framework

The website business is changeable. Most of the software engineers of the website are working overtime to develop the website business. A good development framework is very important. The development framework of a number should be able to separate concerns, so that artists and development engineers can perform their own tasks and collaborate easily. At the same time, some security policies should be built in to protect against web attacks.

page rendering

Integrate the separately developed and maintained dynamic content and static page templates to form a complete page that is finally displayed to the user.

load balancing

Multiple application servers are formed into a cluster, and user requests are distributed to different servers through load balancing technology to cope with the high concurrent load pressure generated when a large number of users access at the same time.

Session management

In order to achieve a high-availability application server cluster, the application server is usually designed to be stateless and does not save user request context information. However, website business usually needs to maintain user session information, and a special mechanism is required to manage the session, so that the application servers within the cluster or even across the cluster can be managed. Session can be shared.

dynamic page static

For dynamic pages with a large number of visits and infrequent updates, you can make them static, that is, generate a static page, and use static page optimization methods to speed up user access, such as reverse proxy, CDN, browser cache, etc.

business split

Splitting complex and huge businesses into multiple smaller-scale products for independent development, deployment, and maintenance not only reduces system coupling, but also facilitates database business division. Splitting the relational database by business is relatively less technically difficult and the effect is relatively good.

virtualized server

Virtualizing a physical server into a polymorphic virtual server makes it easier to build a high-availability application server cluster with fewer resources for services with low concurrent access.

3. Service layer architecture

Provide basic services, which can be called by the application layer to complete website business.

Distributed Messaging

Using the message queue mechanism, realize asynchronous message sending and low-coupling business relationship between business and business, business and service.

Distributed service

Provide high-performance, low-coupling, easy-to-use, easy-to-manage distributed services, and implement Service-Oriented Architecture (SOA) on the website.

Distributed cache

Providing a large-scale hot data cache service through a scalable server cluster is an important means of website performance optimization.

Distributed configuration

系统运行需要配置许多参数,如果这些参数需要修改,比如分布式缓存集群加入新的缓存服务器,需要修改应用程序客户端的缓存服务器列表配置,并重启应用程序服务器。分布式配置在系统运行期提供配置动态推送服务,将配置修改实时推送到应用系统,无需重启服务器。

4.存储层架构

提供数据、文件的持久化存储访问与管理服务。

分布式文件

网站在线业务需要存储的文件大部分都是图片、网页、视频等比较小的文件,但是这些文件的数量非常庞大,而且通常都在持续增加,需要伸缩性设计比较好的分布式文件系统。

关系数据库

大部分万丈的主要业务是基于关系数据库开发的,但是关系数据库对集群伸缩性的支持表较差。通过在应用程序的数据访问层增加数据库访问的路由功能,根据业务配置将数据库访问路由到不同的物理数据库上,可实现关系数据库的分布式访问。

NoSQL数据库

目前各种NoSQL数据库层出不穷,在内存管理、数据模型、集群分布式管理等方面各有优势,不过从社区活动性角度看,HBase无疑是目前最好的。

数据同步

在支持全球范围内数据共享的分布式数据库技术成熟之前,拥有多个数据中心的网站必须在多个数据中心之间进行数据同步,以保证每个数据中心都拥有完整的数据。在实践中,为了减轻数据库压力,将数据库的事物日志(或者NoSQL的写操作Log)同步到其他数据中心,根据Log进行数据重演,实现数据同步。

5.后台架构

网站应用中,除了要处理用户的实时访问请求外,还有一些后台非实时数据分析要处理。

搜索引擎

即使是网站内部的搜索引擎,也需要进行数据增量更新及全量更新、构建索引等。这些操作通过后台系统定时执行。

数据仓库

根据离线数据,提供数据分析与数据挖掘服务。

推荐系统

社交网站及购物网站通过挖掘人与人之间的关系,人和商品之间的关系,发展潜在的人际关系和购物兴趣,为用户提供个性化推荐服务。

6.数据采集与监控

监控网站访问情况与系统运行情况,为网站运营决策和运维管理提供支持保障。

浏览器数据采集

通过在网站页面中嵌入JS脚本采集用户浏览器环境与操作记录,分析用户行为。

服务器业务数据采集

There are two types of server business data. One is to collect user request operation logs recorded on the server side;

Server performance data collection

Collect server performance data, such as system load, memory usage, network card traffic, etc.

System monitoring

Display the data collected above in the form of graphs, so that operation and maintenance personnel can monitor the running status of the website. This step is only for system monitoring. A more advanced approach is to perform automatic operation and maintenance based on the collected data, to automatically handle system abnormal conditions, and to absorb automatic control.

system alarm

If the collected data exceeds the preset normal threshold, for example, the system load is too high, an alarm signal will be sent through emails, text messages, voice calls, etc., waiting for engineers to intervene.

7. Security Architecture

Protect your website from attacks and leaks of sensitive information.

web attack

Attacks initiated in the form of HTTP requests are the most harmful to XSS and SQL injection attacks. However, both of these attacks are relatively easy to prevent if the measures are taken properly.

data protection

Sensitive information is encrypted for transmission and storage, protecting the website and user assets.

8. Data center room architecture

Large-scale websites require hundreds of thousands of servers, and the physical architecture of the computer room also needs attention.

Computer room architecture

For a large website with 100,000 servers, the power consumption of each server (including the power consumption of the server itself and the power consumption of the air conditioner) is about RMB 2,000 per year, so the annual electricity cost of the website will cost RMB 200 million. The problem of data center energy consumption is becoming more and more serious. When Google and Facebook choose the location of the data center, they tend to choose places with good heat dissipation and sufficient power supply.

Cabinet Architecture

Including a series of issues such as cabinet size, network cable layout, indicator light specifications, uninterruptible power supply, and voltage specifications (48V DC or 220V civilian AC).

server architecture

Due to the large scale of server procurement, large-scale websites mostly use custom servers instead of purchasing complete servers. According to the application requirements of the website, customize the hard disk, memory, and even CPU, remove unnecessary peripheral interfaces (display output interface, mouse, keyboard input interface), and make the space structure conducive to heat dissipation.

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326847148&siteId=291194637