Large-scale website architecture design

Overview

  • Three dimensions: evolution, pattern, element
  • Five elements: performance, availability, scalability, scalability, security

Evolution

The legend can refer to the  evolution of large-scale website architecture :

  1. The website architecture in the initial stage : a server with all resources such as applications, databases, files, etc. at the same time. e.g. LAMP architecture
  2. Application and data service separation : three servers (with different hardware resources), namely application server, file server and database server
  3. Use cache to improve website performance : There are two types, local cache cached on the application server and remote cache cached on a dedicated distributed cache server
  4. Use application server clusters to improve website concurrent processing capabilities : Distribute access requests to any machine in the application server cluster through load balancing scheduling servers
  5. Database read-write separation : The database adopts master-slave hot standby, the application server accesses the master database when writing data, and the master database synchronizes data updates to the slave database through the master-slave replication mechanism. The application server is transparent to the application using a dedicated data access module
  6. Use reverse proxies and CDNs to speed up website responses : The basic principle of both is caching. The reverse proxy is deployed in the central computer room of the website, and the CDN is deployed in the network provider's computer room
  7. Use distributed file system and distributed database system : the last resort of database splitting, more commonly used is business sub-database
  8. Using NoSQL and Search Engines : Better Support for Scalable Distributed
  9. Business splitting : split the entire website business into different applications, each application is independently deployed and maintained, and the applications are connected through hyperlinks/message queues for data distribution/access to the same data storage system
  10. Distributed service : extract public business, deploy java source code independently, springmvc mybatis SSM,  obtain download address   

Architecture Evolution - Distributed Services

Evolutionary Values

  • The core value of large website architecture is to respond flexibly to the needs of the website
  • The main force driving the development of large-scale website technology is the business development of the website

Misunderstanding

  • Blindly following the big company's solution
  • Technology for technology's sake
  • Attempt to solve all problems with technology

architectural pattern

The key to the pattern is the repeatability of the pattern

  • Layering : Horizontal Slicing
  • Split : vertical split
  • Distributed : The main purpose of layering and splitting is to facilitate distributed deployment of split modules. Common scheme:
    • Distributed Applications and Services
    • Distributed static resources
    • Distributed Data and Storage
    • Distributed Computing
    • Distributed configuration, distributed locks, distributed files, etc.
  • Cluster : Multiple servers deploy the same application to form a cluster, and provide services to the outside world through load balancing equipment
  • Cache : Putting the data in the position closest to the calculation to speed up the processing speed and improve the performance is the first method, which can speed up the access speed and reduce the load pressure on the backend. There are two prerequisites for using cache   : 1. Unbalanced data access hotspots; 2. Data is valid for a certain period of time and will not expire soon
    • CDN
    • reverse proxy
    • local cache
    • Distributed cache
  • Asynchronous : Aims at system decoupling. Asynchronous architecture is a typical consumer-producer pattern with the following characteristics:
    • Improve system availability
    • Speed ​​up website access
    • Eliminate concurrent access spikes
  • Redundancy : achieve high availability. Cold and hot backup of database
  • Automation : including release process automation, automated code management, automated testing, automated security detection, automated deployment, automated monitoring, automated alarms, automated failover, automated failover recovery, automated degradation, and automated resource allocation
  • Security : password, mobile phone verification code, encryption, verification code, filtering, risk control

the core element

Architecture is "planning at the highest level, regulations that are hard to change". Focus on five elements:

  • performance
  • Availability
  • Scalability
  • Extensibility
  • safety

Architecture

These five elements are summarized below

high performance

The main performance test indicators are:

  • Response time: refers to the time it takes for an application to perform an operation
  • Concurrency: refers to the number of requests that the system can process at the same time
  • Throughput: refers to the number of requests processed by the system per unit time
  • Performance counters: some data indicators that describe server or operating system performance

Performance test method:

  • Performance Testing
  • load test
  • pressure test
  • Stability test

Performance test curve

Performance optimization, according to the hierarchical structure of the website, can be divided into three categories:

  • Web front-end performance optimization
    • Browser access optimization
      • reduce http requests
      • Use browser cache
      • enable compression
      • CSS at the top of the page, JavaScript at the bottom of the page
      • Reduce cookie transmission
    • CDN acceleration: it is essentially a cache, generally caching static resources
    • reverse proxy
      • Keep your website safe
      • Speed ​​up web requests by configuring caching
      • achieve load balancing
  • Application server performance optimization : the main means are cache, cluster, asynchronous
    • Distributed cache ( first law of website performance optimization: optimization considers using cache to optimize performance )
    • Asynchronous operations ( message queues, peak shaving )
    • Use a cluster
    • Code optimization
      • Multithreading (designed to be stateless, using local objects, concurrent access to resources using locks)
      • Resource reuse (singleton, object pool)
      • data structure
      • garbage collection
  • Storage server performance optimization
    • HDD vs. Solid State Drive
    • B+ tree vs. LSM tree
    • RAID vs. HDFS

High availability

  • High-availability website architecture: The purpose is to ensure that services are still available, data is still stored and can be accessed when server hardware fails, and the main means are redundant backup and failover of data and services.
  • Highly available applications: The distinguishing feature is the stateless nature of the application
    • Failover of stateless services through load balancing
    • Session management for application server clusters
      • Session replication
      • Session binding
      • Record Session with Cookies
      • session server
  • 高可用的服务:无状态的服务,可使用类似负载均衡的失效转移策略,此外还有如下策略
    • 分级管理
    • 超时设置
    • 异步调用
    • 服务降级
    • 幂等性设计
  • 高可用的数据:主要手段是数据备份和失效转移机制
    • CAP 原理
      • 数据一致性(Consisitency)
      • 数据可用性(Availibility)
      • 分区耐受性(Partition Tolerance)
    • 数据备份
      • 冷备:缺点是不能保证数据最终一致和数据可用性
      • 热备:分为异步热备和同步热备
    • 失效转移:由以下三部分组成
      • 失效确认
      • 访问转移
      • 数据恢复
  • 高可用网站的软件质量保证
    • 网站发布
    • 自动化测试
    • 预发布验证
    • 代码控制
      • 主干开发、分支发布
      • 分支开发、主干发布
    • 自动化发布
    • 灰度发布
  • 网站运行监控
    • 监控数据采集
      • 用户行为日志采集(服务器端和客户端)
      • 服务器性能监控
      • 运行数据报告
    • 监控管理
      • 警报系统
      • 失效转移
      • 自动优雅降级

伸缩性

大型网站的“大型”是指:

  • 用户层面:大量用户及大量访问
  • 功能方面:功能庞杂,产品众多
  • 技术层面:网站需要部署大量的服务器

伸缩性的分为如下几个方面

  • 网站架构的伸缩性设计
    • 不同功能进行物理分离实现伸缩
      • 纵向分离(分层后分离)
      • 横向分离(业务分割后分离)
    • 单一功能通过集群规模实现伸缩
  • 应用服务器集群的伸缩性设计
    • HTTP 重定向负载均衡
    • DNS 域名解析负载均衡
    • 反向代理负载均衡(在 HTTP 协议层面,应用层负载均衡)
    • IP 负载均衡(在内核进程完成数据分发)
    • 数据链路层负载均衡(数据链路层修改 mac 地址,三角传输模式,LVS)
    • 负载均衡算法
      • 轮询(Round Robin, RR)
      • 加权轮询(Weighted Round Robin, WRR)
      • 随机(Random)
      • 最少链接(Least Connections)
      • 源地址散列(Source Hashing)
  • 分布式缓存集群的伸缩性设计
    • Memcached 分布式缓存集群的访问模型
      • Memcached 客户端(包括 API,路由算法,服务器列表,通信模块)
      • Memcached 服务器集群
    • Memcached 分布式缓存集群的伸缩性挑战
    • 分布式缓存的一致性 Hash 算法(一致性 Hash 环,虚拟层)
  • 数据存储服务集群的伸缩性设计
    • 关系数据库集群的伸缩性设计
    • NoSQL 数据库的伸缩性设计

可扩展

系统架构设计层面的“开闭原则”

  • 构建可扩展的网站架构
  • 利用分布式消息队列降低耦合性
    • 事件驱动架构(Event Driven Architecture)
    • 分布式消息队列
  • 利用分布式服务打造可复用的业务平台
    • Web Service 与企业级分布式服务
    • 大型网站分布式服务的特点
    • 分布式服务框架设计(Thrift, Dubbo)
  • 可扩展的数据结构(如 ColumnFamily 设计)
  • 利用开放平台建设网站生态圈

网站的安全架构

XSS 攻击和 SQL 注入攻击是构成网站应用攻击最主要的两种手段,此外还包括 CSRF,Session 劫持等手段。

  • 攻击与防御
    • XSS 攻击:跨站点脚本攻击(Cross Site Script)
      • 反射型
      • 持久型
    • XSS 防御手段
      • 消毒(即对某些 html 危险字符转义)
      • HttpOnly
    • 注入攻击
      • SQL 注入攻击
      • OS 注入攻击
    • 注入防御
      • 避免被猜到数据库表结构信息
      • 消毒
      • 参数绑定
    • CSRF 攻击:跨站点请求伪造(Cross Site Request Forgery)
    • CSRF 防御:主要手段是识别请求者身份
      • 表单 Token
      • 验证码
      • Referer Check
    • 其他攻击和漏洞
      • Error Code
      • HTML 注释
      • 文件上传
      • 路径遍历
    • Web 应用防火墙(ModSecurity)
    • 网站安全漏洞扫描
  • 信息加密技术及密钥安全管理
    • 单向散列加密:不同输入长度的信息通过散列计算得到固定长度的输出
      • 不可逆,非明文
      • 可加盐(salt)增加安全性
      • 输入的微小变化会导致输出完全不同
    • 对称加密:加密和解密使用同一个密钥
    • 非对称加密
      • 信息传输:公钥加密,私钥解密
      • 数字签名:私钥加密,公钥解密
    • Key security management: The secure transmission of information is guaranteed by the key. The means of improvement include:
      • Put keys and algorithms on a separate server
      • Put the encryption and decryption algorithm in the application system and the key in the independent server
  • Information filtering and anti-spam
    • text match
    • Classification algorithm
    • blacklist

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326402896&siteId=291194637