[Transfer] Large-scale website architecture series: e-commerce website architecture case

Large-scale website architecture is a series of documents, and everyone is welcome to pay attention. The theme of this sharing: the case of e-commerce website architecture. From the needs of e-commerce websites to stand-alone architecture, it has gradually evolved into a prototype of a commonly used distributed architecture for reference. In addition to functional requirements, it also has certain non-functional quality requirements (architectural goals) such as high performance, high availability, scalability, and extensibility.

According to actual needs, it is no problem to transform, expand, and support tens of millions of PVs.

This sharing outline

  1. The reason for the e-commerce case
  2. E-commerce website requirements
  3. Website primary structure
  4. System Capacity Estimation
  5. Website Architecture Analysis
  6. Website Architecture Optimization
  7. Architecture Summary

E-commerce website case, there are three articles in total. This article mainly describes the requirements of the website, the initial structure of the website, and the system capacity estimation method.

The reasons for the e-commerce case

At present, there are several types of distributed large-scale websites: 1. Large-scale portals, such as NetEase, Sina, etc.; 2. SNS websites, such as Xiannei, Kaixin, etc.; 3. E-commerce websites: such as Alibaba, Jingdong Mall, Gome Online, Auto home, etc. Large-scale portals are generally news-type information, which can be optimized by CDN, static, etc., and Kaixin.com is more interactive, and may introduce more NOSQL, distributed cache, and use high-performance communication frameworks. E-commerce websites have the characteristics of the above two categories. For example, product details can use CDN, static, and high interactivity needs to use NOSQL and other technologies. Therefore, we use the e-commerce website as a case for analysis.

2. Requirements for e-commerce websites

client needs:

  • Build a full-category e-commerce website (B2C), where users can buy goods online, pay online, or pay on delivery;
  • Users can communicate with customer service online when purchasing;
  • After receiving the product, the user can rate and evaluate the product;
  • At present, there is a mature invoicing system; it needs to be connected with the website;
  • Hope to support the development of the business for 3 to 5 years;
  • It is expected that the number of users will reach 10 million in 3 to 5 years;
  • Regularly hold Double 11, Double 12, March 8th Men's Day and other activities;
  • For other functions, refer to websites such as JD.com or Gome Online.

Customers are customers. They won't tell you what they want, but only what they want. We often need to guide and explore the needs of customers. Fortunately, a clear reference site is provided. Therefore, the next step is to conduct a lot of analysis, combine the industry, and refer to the website to provide customers with solutions.

Other slightly~~~~~~

Requirement Function Matrix

Traditionally, requirements management uses use case diagrams or block diagrams (requirements lists) to describe requirements. This often ignores a very important requirement (non-functional requirement), so it is recommended that you use the requirement function matrix to describe the requirement.

The demand matrix for this e-commerce website is as follows:

 

website requirements

Functional Requirements

non-functional requirements

A full range of e-commerce sites

Category management, commodity management

Facilitate multi-category management (flexibility)

Website access speed is faster (high performance)

Image storage requirements (massive small images)

Users can buy items online

Membership management, shopping cart, checkout function

Good shopping experience (usability, performance)

Online payment or cash on delivery

Multiple online payment methods

The payment process should be secure, data encryption (security)

Flexible switching of various payment interfaces (flexibility, scalability)

Can communicate with customer service online

Online customer service function

Reliability: Instant Messaging

product rating

Product Reviews

 

At present, there is a mature invoicing system

Docking invoicing

subject to constraints

Data consistency and robustness should be considered when docking

Support 3~5 years, business development

 

subject to constraints

scalability, extensibility

10 million users in 3~5 years

 

Restrictions

Hold double 11, double 12, March 8 men's day and other activities

Event management, spike

Burst traffic (scalable)

Real-time requirements (high performance)

Refer to Jingdong or Gome Online

 

Reference conditions

 

 

 

 

The above is a simple example of the needs of e-commerce websites. The purpose is to illustrate (1) when analyzing requirements, it is necessary to be comprehensive, and large-scale distributed systems should focus on non-functional requirements; (2) describe a simple e-commerce demand scenario, so that everyone can understand There is a basis for the next analysis and design.

 

Third, the primary structure of the website

The general website, at the beginning, is three servers, one deploys the application, one deploys the database, and one deploys the NFS file system.

This is a more traditional practice in the past few years. Before, I saw a website with more than 100,000 members, a vertical clothing design portal, and N many pictures. A server is used to deploy the application, database and image storage. There are a lot of performance issues.

As shown below:

 

However, the current mainstream website architecture has undergone earth-shaking changes. Clustering is generally used for high-availability design. At least it looks like this.

 

(1) Redundant application servers using clusters to achieve high availability; (load balancing devices can be deployed together with applications)

Use the database active-standby mode to achieve data backup and high availability;

4. System capacity estimation

Estimated steps:

(1) Number of registered users - daily average UV volume - daily PV volume - daily concurrent volume;

(2) Peak estimate: 2 to 3 times the usual amount;

(3) Calculate the system capacity according to the concurrency (concurrency, number of transactions) and storage capacity.

 

Customer demand: 10 million registered users in 3 to 5 years;

 

Estimated concurrency per second:

(1) The daily UV is 2 million (the principle of 28);

(2) Click and browse 30 times a day;

(3) PV volume: 200*30=60 million;

(4) Concentrated visits: 24*0.2=4.8 hours, there will be 60 million*0.8=48 million (the principle of 28);

(5) Concurrency per minute: 4.8*60=288 minutes, 4800/288=167,000 visits per minute (approximately equal to);

(6) Concurrency per second: 167,000/60=2780 (approximately equal to);

(7) Assumption: The peak period is three times the normal value, and the number of concurrency per second can reach 8340.

(8) 1 millisecond = 1.3 visits;

 

Do you regret not studying mathematics well? ! (I don't know if the above calculation is wrong, hehe~~)

 

Server estimation: (take tomcat server as an example)

(1) According to a web server, it supports 300 concurrent calculations per second. Usually 10 servers are required (approximately equal to); [tomcat default configuration is 150]

(2) Peak period: 30 servers are required;

 

Capacity estimation: 70/90 principle

The system CPU is generally maintained at the level of about 70%, and the peak period reaches the level of 90%, which is not a waste of resources and is relatively stable. Memory, IO is similar.

 

The above estimates are for reference only, because server configuration, business logic complexity, etc. have an impact. The CPU, hard disk, network, etc. are no longer evaluated here.

The second article in the Grid Web Site Architecture Case Study series. Mainly explain website architecture analysis, website architecture optimization, business splitting, application cluster architecture, multi-level caching, distributed session.

Five, website structure analysis

Based on the above estimates, there are several problems:

  • A large number of servers need to be deployed, and for peak computing, 30 web servers may be deployed. And these 30 servers will only be used when they are killed in seconds, and there is a lot of waste.
  • All applications are deployed on the same server, and the coupling between applications is severe. Vertical splitting and horizontal splitting are required.
  • A large number of applications have redundant code
  • Server SESSION synchronization consumes a lot of memory and network bandwidth
  • Data requires frequent access to the database, and the pressure of database access is huge.

 

Large-scale websites generally need to do the following architectural optimizations (optimization is to be considered when the architecture is designed, and it is generally solved from the architecture/code level. The tuning is mainly the adjustment of simple parameters, such as JVM tuning; if the tuning involves a lot of code transformation, It's not tuning, it's refactoring):

  • business split
  • Application cluster deployment (distributed deployment, cluster deployment and load balancing)
  • multilevel cache
  • Single sign-on (distributed session)
  • Database cluster (read-write separation, sub-database sub-table)
  • Servicing
  • message queue
  • other technologies

6. Website Architecture Optimization

6.1 Business split

According to business attributes, it is divided into product subsystem, shopping subsystem, payment subsystem, comment subsystem, customer service subsystem, and interface subsystem (interfacing with external systems such as invoicing, SMS, etc.).

According to the level definition of business subsystem, it can be divided into core system and non-core system. Core system: product subsystem, shopping subsystem, payment subsystem; non-core: comment subsystem, customer service subsystem, interface subsystem.

The role of business splitting: the promotion of subsystems can be handled by special teams and departments, and professional people do professional things to solve the problems of coupling and scalability between modules; each subsystem is deployed separately to avoid centralized deployment that causes an application to hang, All apps are unavailable.

Class definition function: used to protect key applications and achieve graceful degradation when traffic bursts; protect key applications from being affected.

Split architecture diagram:

 

Reference Deployment Scenario 2

(1) As shown above, each application is deployed separately

(2) Combined deployment of core systems and non-core systems

 

6.2 Application cluster deployment (distributed, cluster, load balancing)

         Distributed deployment: The applications after business splitting are deployed separately, and the applications directly communicate remotely through RPC;

         Cluster deployment: High availability requirements for e-commerce websites, deploy at least two servers for each application for cluster deployment;

         Load balancing: It is necessary for high-availability systems. General applications achieve high availability through load balancing, distributed services achieve high availability through built-in load balancing, and relational databases achieve high availability through active-standby mode.

Architecture diagram after cluster deployment:

 

6.3 Multilevel Cache

         Caches can generally be divided into two types: local caches and distributed caches according to their storage locations. This case uses the second-level cache to design the cache. The first-level cache is a local cache, and the second-level cache is a distributed cache. (And page cache, fragment cache, etc., that's a more fine-grained division)

Basic immutable/regularly changing information such as first-level cache, cache data dictionary, and commonly used hotspot data, second-level cache caches all the caches needed. When the first-level cache expires or is unavailable, the data in the second-level cache is accessed. If there is no second level cache, access the database.

The ratio of the cache, generally 1:4, can be considered to use the cache. (Theoretically it is 1:2).

 

         The following cache expiration policies can be used according to business characteristics:

(1) The cache expires automatically;

(2) The cache trigger expires;

6.4 Single Sign On (Distributed Session)

The system is divided into multiple subsystems. After independent deployment, it will inevitably encounter the problem of session management. Generally, Session synchronization, Cookies, and distributed Session methods can be used. E-commerce websites generally use distributed session implementation.

         Further, a complete single sign-on or account management system can be established according to the distributed session.

 

         Flow Description

(1) When the user logs in for the first time, the session information (user ID and user information), such as the user ID as the key, is written into the distributed session;

(2) When the user logs in again, obtain the distributed session, whether there is session information, if not, transfer to the login page;

(3) Cache middleware is generally used to implement, and Redis is recommended, so it has a persistence function, which is convenient for the session information to be loaded from the persistent storage after the distributed session is down;

(4) When saving a session, you can set the duration of the session, such as 15 minutes, after which it will automatically time out;

Combined with the Cache middleware, the distributed Session implemented can simulate the Session session well.

This article is the third part of the e-commerce website architecture case. It mainly introduces database clustering, read-write separation, sub-database and sub-table, service-oriented, the use of message queues, and the architecture summary of this e-commerce case.

6.5 Database cluster (separation of read and write, sub-database and sub-table)

Large websites need to store massive amounts of data. In order to achieve massive data storage, high availability, and high performance, the system is generally designed in a redundant manner. There are generally two ways to read and write separation and sub-database sub-table.

Read-write separation: Generally, to solve the scenario where the read ratio is much larger than the write ratio, one master and one backup, one master with multiple backups, or multiple masters with multiple backups can be used.

In this case, on the basis of business splitting, it combines sub-database sub-table and read-write separation. As shown below:

 

(1)       业务拆分后:每个子系统需要单独的库;

(2)       如果单独的库太大,可以根据业务特性,进行再次分库,比如商品分类库,产品库;

(3)       分库后,如果表中有数据量很大的,则进行分表,一般可以按照Id,时间等进行分表;(高级的用法是一致性Hash)

(4)       在分库,分表的基础上,进行读写分离;

 

相关中间件可参考Cobar(阿里,目前已不在维护),TDDL(阿里),Atlas(奇虎360),MyCat(在Cobar基础上,国内很多牛人,号称国内第一开源项目)。

分库分表后序列的问题,JOIN,事务的问题,会在分库分表主题分享中,介绍。

6.6服务化

         将多个子系统公用的功能/模块,进行抽取,作为公用服务使用。比如本案例的会员子系统就可以抽取为公用的服务。

        

6.7消息队列

         消息队列可以解决子系统/模块之间的耦合,实现异步,高可用,高性能的系统。是分布式系统的标准配置。本案例中,消息队列主要应用在购物,配送环节。

(1)       用户下单后,写入消息队列,后直接返回客户端;

(2)       库存子系统:读取消息队列信息,完成减库存;

(3)       配送子系统:读取消息队列信息,进行配送;

 

目前使用较多的MQ有Active MQ,Rabbit MQ,Zero MQ,MS MQ等,需要根据具体的业务场景进行选择。建议可以研究下Rabbit MQ。

6.8其他架构(技术)

除了以上介绍的业务拆分,应用集群,多级缓存,单点登录,数据库集群,服务化,消息队列外。还有CDN,反向代理,分布式文件系统,大数据处理等系统。

此处不详细介绍,大家可以问度娘/Google,有机会的话也可以分享给大家。

七、架构总结

 

以上是本次分享的架构总结,其中细节可参考前面分享的内容。其中还有很多可以优化和细化的地方,因为是案例分享,主要针对重要部分做了介绍,工作中需要大家根据具体的业务场景进行架构设计。 

以上是电商网站架构案例的分享一共有三篇,从电商网站的需求,到单机架构,逐步演变为常用的,可供参考的分布式架构的原型。除具备功能需求外,还具备一定的高性能,高可用,可伸缩,可扩展等非功能质量需求(架构目标)。

关于负载均衡,业务拆分,集群架构,读写分离,分库分表,服务化,消息队列等常用技术和架构实现,本博客将会推出系列文章,进行介绍。欢迎小伙伴们围观。

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326614735&siteId=291194637