Large websites handle high concurrency

1. General high-concurrency web system
Here generally refers to e-commerce systems such as spikes, such as millet snaps, Taobao Double Eleven, spikes, etc. There are essential differences between the 12306 booking website and the 12306 booking website, which will be discussed below. .

System Architecture Diagram
The following diagram is the system architecture diagram that I have surveyed and summarized. Of course, this cannot be used as a general solution. First of all, the information published by companies such as Taobao is definitely not the most advanced or the best performance. Secondly, the specific situation is still Detailed analysis is required.

core technology point

  • Front-end optimization  Front-end optimization mainly includes static dynamic content and increasing front-end caching. Page static refers to converting dynamic web pages containing a large number of dynamic elements, such as jsp, php, etc., into html static pages. Since static pages do not need to load dynamic elements, their access speed is much faster than that of dynamic pages, which can increase the access speed. , reduce the pressure on the database; the front-end page is cached in the front-end of the system to cache the pages on the Web server.
  • CDN technology  CDN is a content distribution network. Its basic idea is to avoid bottlenecks and links that may affect the speed and stability of data transmission on the Internet as much as possible, so that content transmission is faster and more stable. By placing node servers everywhere in the network to form a layer of intelligent virtual network on the basis of the existing Internet, the CDN system can real-time based on network traffic and the connection and load status of each node, as well as the distance to the user and response time Such comprehensive information redirects the user's request to the service node closest to the user. Its purpose is to enable users to obtain the desired content nearby, solve the situation of Internet network congestion, and improve the response speed of users visiting websites.
  • Load  balancing The basic idea of ​​load balancing is to evenly distribute high concurrent access to each server node, thereby reducing the pressure on each node in the distributed database.
  • The middleware technology of the middleware     database refers to separating the application layer from the database layer and adding a part in the middle to prevent the application from directly accessing the database. Because the system may adopt the technology of separation of reading and writing, it will use different databases, and the middleware can shield the direct differences of the databases and provide a unified interface. The middleware is also responsible for the coordination and processing of transactions and plays the role of data connection management. Multiple client connections can share a database connection through the middleware.
  • memcached  memcached is a high-performance distributed in-memory object caching system that reduces the number of database reads by caching data and objects in memory, thereby increasing the speed of dynamic, database-driven websites. It is based on a stored key/value pair hashmap.
  • Concurrency control    The database current limit is reached, the maximum concurrent number of the database is reached, and the row lock state is entered. Without control, once one of the connections gets stuck, an avalanche effect can occur, affecting the entire system
  • The queuing system    lock mechanism causes queuing
  • Parallel replication    The parallel replication technology can solve the problem of replication delay of the primary and standby databases
  • Database splitting  is divided into
  • Read- write separation   Some systems have frequent read operations, while some systems have frequent write operations. Read-write separation can effectively improve access speed



2. The difference between the 12306 website and Taobao

The particularity of 12306 lies in its complexity:

  • Frequent mixed read and write operations   
  • Real-time multiplexing of seats The degree of correlation between orders is relatively large. Every time a ticket is sold, the number of remaining tickets in all possible sections of the entire train needs to be adjusted.

调查之前,我也很简单的认为,只要不停地增加节点数量,必然 能解决抢票尖峰时刻的问题,其实不然,访问速度主要受限于两个方面,一个是CPU处理速度,另一个是磁盘I/O。而简单的增加机器并不能解决这两个瓶颈, 相反,节点越多,数据同步的代价越大。为了更好的解释其特殊性,需要引入一些基本概念。

  • 事务

淘宝、12306等电子商务系统都属于面向交易的处理系统,即OLTP。数据库中事务(Transaction)是访问并可能更新数据库中各种数据项的一个程序执行单元(unit),具有四个属性:原子性、一致性、隔离性、持久性,简称为ACID。

  •         原子性(Atomicity)  一个事务是一个不可分割的工作单位,事务中包括的诸操作要么都做,要么都不做。
  •         致性(Consistency)  事务必须是使数据库从一个一致性状态变到另一个一致性状态。一致性与原子性是密切相关的。
  •         隔离性(Isolation)  一个事务的执行不能被其他事务干扰。即一个事务内部的操作及使用的数据对并发的其他事务是隔离的,并发执行的各个事务之间不能互相干扰。
  •         持久性(Durability)  持续性也称永久性(permanence),指一个事务一旦提交,它对数据库中数据的改变就应该是永久性的。接下来的其他操作或故障不应该对其有任何影响。

并行处理的核心就是隔离性,即不同的事务不能互相影响,一个用户订票行为不能影响其他用户,否则会出现看到票却买不到票的现象。隔离要处理三个问题:脏读、不可重复读和幻读保证隔离性就要对数据加锁,而锁会导致排队,排队必然会产生时延,面对千万级别的并发,问题就产生了。所有关键的问题还是要提高单个事务的处理速度,即CPU,以及减小磁盘I/O的时间。12306从12年就开始进行技术改革,核心采用了Vmware Gemfire 内存数据库技术,即把多个X86服务器虚拟成一个具有超大内存和处理速度的机器,把数据全部放在内存中计算,由内存与磁盘的I/O速度可以看出这种技术必然能大幅度提高性能。

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327042237&siteId=291194637