Spike of the principle and implementation

What is the spike

 

Spike scenes usually held a number of activities in the electricity business website or holiday encounters grab votes on the 12306 website. For some scarce electricity supplier website or specials, General Electric's Web site will be limited sales at the appointed time, because of the special nature of these commodities, will attract a lot of customers come to buy, and will at the same time at the agreed point in time pages spike rush.

Spike system features scenes

  • Large number of users take place at the same time buying spike at the same time, the site instantaneous traffic surge.
  • Spike is usually far greater than the number of access requests inventory number, only a few users to spike success.
  • Spike business process is relatively simple, the general is under orders inventory reduction.

Business Analysis spike

  1. Normal flow of e-commerce

    (1) Goods query; (2) create orders; (3) deduction of inventory; (4) update the order; (5) Payment; (6) the seller shipped

  2. Characteristic spike business

    (1) low price; (2) a significant promotion; (3) sold out instantaneously; (4) generally is a timing shelves; short (5) times, high instantaneous concurrency;

 

Spike architecture design

Limiting: Given that only a small number of users to spike success, so most of the traffic restrictions, allowing only a small part of the flow into the back-end service.

Clipping: spike system for instantaneous influx of large numbers of users have, so start there will be a rush to buy high instantaneous peak. Peak flow system is overwhelmed very important reason, so how high flow instantly becomes smooth flow over time is also very important design ideas spike system. A method implemented using a conventional buffer are clipped and messaging middleware technology.

Asynchronous processing: spike system is a highly concurrent systems, the use of asynchronous processing mode can greatly improve system concurrency, in fact, is an implementation of asynchronous processing of clipping.

Memory cache: the biggest bottleneck in the system are generally spike database read and write, because the database disk read and write belongs to IO, low performance, if we can put some data or business logic into the cache memory, the efficiency will be greatly improved.

May expand: Of course, if we want to support more users and greater concurrency, it will be the best system is designed to elastically expand, if traffic comes, expand the machine just fine. It will increase when a large number of machines to deal with high transaction as Taobao, Jingdong double eleven activities.

Infrastructure Solutions

General spike System Architecture

Write pictures described here

 

Technical challenges spike

Suppose a website just launched a commodity spike activity, is expected to attract 10,000 people to participate in activities, also said that the maximum number of concurrent requests is 10000, the technical challenges facing the system needs to have a spike

  1. Impact on existing business website

    Spike activity is just an additional marketing campaign website, this event has a short time, concurrent access to a large amount of features, if together, will inevitably impact on existing business and legacy application deployment site, the slightest mistake could cause the entire site paralysis.

    解决方案:将秒杀系统独立部署,甚至使用独立域名,使其与网站完全隔离。

     

  2. Application, database load under high concurrency

    Users spike before the start, by constantly refresh the browser page to ensure that the spike will not miss, if these requests in accordance with the general architecture of web applications, access the application server, connect to the database, application server and database server will cause the load pressure.

    解决方案:重新设计秒杀商品页面,不使用网站原来的商品详细页面,页面内容静态化,用户请求不需要经过应用服务。

     

  3. Sudden increase in network and server bandwidth

    Assuming that commodity page size 200K (mainly trade picture size), then you need to network and server bandwidth 2G (200K × 10000), the network bandwidth because new spike activity over the site usually used bandwidth.

    解决方案:因为秒杀新增的网络带宽,必须和运营商重新购买或者租借。为了减轻网站服务器的压力,需要将秒杀商品页面缓存在CDN,同样需要和CDN服务商临时租借新增的出口带宽。

     

  4. Direct orders

    Rules of the game is to spike a spike to begin the next commodity purchase orders before this point in time, only browse product information, not orders. The single page is an ordinary URL, if get this URL, you can not start until the following single spike up.

    解决方案:为了避免用户直接访问下单页面URL,需要将改URL动态化,即使秒杀系统的开发者也无法在秒杀开始前访问下单页面的URL。办法是在下单页面URL加入由服务器端生成的随机数作为参数,在秒杀开始的时候才能得到。

     

  5. How to control the spike merchandise purchase page button lights up

    Buy button only lights up when the spike beginning, before this is gray. If the page is dynamically generated, of course, output the page response in the server configuration, the control button is lit or gray, but in order to reduce server load pressure side, better utilization of the CDN, and other performance optimization reverse proxy means, the page is designed to be static pages, cache CDN, reverse proxy server, or even on the user's browser. When the spike start, user refreshes the page, the request will not reach the application server.

    解决方案:使用JavaScript脚本控制,在秒杀商品静态页面中加入一个JavaScript文件引用,该JavaScript文件中包含 秒杀开始标志为否;当秒杀开始的时候生成一个新的JavaScript文件(文件名保持不变,只是内容不一样),更新秒杀开始标志为是,加入下单页面的URL及随机数参数(这个随机数只会产生一个,即所有人看到的URL都是同一个,服务器端可以用redis这种分布式缓存服务器来保存随机数),并被用户浏览器加载,控制秒杀商品页面的展示。这个JavaScript文件的加载可以加上随机版本号(例如xx.js?v=32353823),这样就不会被浏览器、CDN和反向代理服务器缓存。
    
    这个JavaScript文件非常小,即使每次浏览器刷新都访问JavaScript文件服务器也不会对服务器集群和网络带宽造成太大压力。

     

  6. How to allow only the first submission of the order is sent to the order subsystem

    Because ultimately successful commodity spike to only one user, it is necessary when the user submits the order to check whether the order has been submitted. If you already have an order submitted successfully, you need to update JavaScript file, update start flag is no spike, buy button is grayed out. In fact, since the final user can successfully submit an order only one, in order to reduce the single page server load pressure can be controlled entrance into the next single page, users can enter only a few single-page, other users can go directly to spike the end of the page.

    解决方案:假设下单服务器集群有10台服务器,每台服务器只接受最多10个下单请求。在还没有人提交订单成功之前,如果一台服务器已经有十单了,而有的一单都没处理,可能出现的用户体验不佳的场景是用户第一次点击购买按钮进入已结束页面,再刷新一下页面,有可能被一单都没有处理的服务器处理,进入了填写订单的页面,可以考虑通过cookie的方式来应对,符合一致性原则。当然可以采用最少连接的负载均衡算法,出现上述情况的概率大大降低。

     

  7. How to order pre-inspection

    Native single-server checks processed single request number:

            1) If more than 10, have been returned directly to the end of the page to the user;

            2) If you are not over 10, the user can enter the confirmation page and fill out the order form;

 

         Check the global number of orders have been submitted:

           1) has exceeded the total number of commodities spike, returns the page to the end user;

           2) spike does not exceed the total number of items submitted to the sub-order system;

    8. The timing of the spike is typically added to

       This function is achieved in many ways. But now is a better way: set in advance product was added to time, the user can see the merchandise in the foreground, but can not click on the button "Buy It Now '. But need to consider is that 有人可以绕过前端的限制,直接通过URL的方式发起购买, which requires the back-end database product pages at the front desk, as well as bug page, every clock synchronization. In the back-end control, the higher the security.

The timing spike, then we should avoid sellers do editing brings unexpected impact on the commodity before the spike. This particular change we need the assessment. General ban editing, additional changes may take more data revision process.

  1. Less inventory operations

    There are two options, one is 拍下减库存 the other one is 付款减库存; currently used in “拍下减库存”a way that captured the moment thing, the user experience will be better.

  2. Inventory will bring "oversold" problem: more than the number sold inventory

    Since the problem of concurrent update inventory, resulting in the case of physical inventory has been inadequate, and stocks are still under reduced, leading to sellers of goods sold is expected to exceed the number of pieces spike. Program:采用乐观锁

    update auction_auctions set
    
    quantity = #inQuantity#
    
    where auction_id = #itemId# and quantity = #dbQuantity#

    Another way would be better, called the attempt to inventory deduction, the deduction of the successful stock will be under a single logic

    update auction_auctions set
    
    quantity = quantity-#count#
    
    where auction_id = #itemId# and quantity >= #count#

    Spike's deal with

    Typically under a single spike and quickly later, it can be screened according to a portion of the purchase record. The method can be achieved by a certain check code, which requires a sufficient security check code, not cracks, methods used are: 秒杀专用验证码,电视公布验证码,秒杀答题.

    Design ideas

    The request interceptor system in the upstream, downstream pressure lowering : spike system is characterized by a great amount of concurrency, but the actual spike in the number of requests are rarely successful, so if the interception is not likely to cause the front end database to read and write lock conflicts, and even lead to deadlock, The final request timed out. 
    Full use of caching : caching can greatly improve the system using read and write speeds. 
    Message queues : a message queue can be peak clipping, will intercept a large number of concurrent requests, this is an asynchronous process, according to their background traffic handling capacity from the message queue of the active pull service processing request message.

    Front-End Solution

    Browser (js):

    Static pages: all can be static elements on the event page of all static and dynamic elements to minimize. To resist the peak by CDN. 
    Do not re-submitted to: submit button after the user gray and resubmit prohibit 
    users limiting: In only allow users to submit a request for a certain period of time, such as IP flow restrictor may take

    Back-end program

    The server controller layer (Layer Gateway)

    Limit uid (UserID) access frequency : above us blocked access to the browser's request, but for some plug-ins or other malicious attacks, the server-side control layer with a need for access uid, restrict access frequency.

      nginx request limit module

      ngx_http_limit_conn_module 

限制连接数模块

通常用来限制同一IP地址的可并发连接数

指令说明:http://nginx.org/en/docs/http/ngx_http_limit_conn_module.html

需要注意的是$binary_remote_addr而不是$remote_addr,$remote_addr的长度为7到15个字节,它的会话信息的长度为32或64 bytes,$binary_remote_addr的长度为4字节,会话信息的长度为32字节,这样设置1M的一个zone时,用$binary_remote_addr方式,该zone将会存放32000个会话。

     ngx_http_limit_req_module

限制请求数模块

通常用来限制同一IP地址单位时间可完成的请求数,限制的方法是采用漏桶算法(Leaky Bucket),每秒处理固定请求数量,推迟过多请求,超过桶的阀值,请求直接终止返回503。

指令说明:http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

   Nginx-based branch of Tengine ngx_http_limit_req_module

nginx类似,不过支持多个变量,并且支持多个limit_req_zone及forbid_action的设置。

指令说明:http://tengine.taobao.org/document_cn/http_limit_req_cn.html

   Ngx_http_limit_req_module nginx-based branch of Senginx

指令说明:http://www.senginx.org/cn/index.php/%E5%9F%BA%E4%BA%8E%E6%9D%A1%E4%BB%B6%E7%9A%84%E9%99%90%E9%80%9F%E5%8A%9F%E8%83%BD

称之为基于条件的限速功能,在Tenginer的limit_req模块基础上,增加condition参数,在条件为真时执行限制动作。

   Ngx_http_ip_behavior nginx-based branch of Senginx

指令说明:http://www.senginx.org/cn/index.php/%E8%AE%BF%E9%97%AE%E8%A1%8C%E4%B8%BA%E8%AF%86%E5%88%AB%E6%A8%A1%E5%9D%97

称之为行为识别模块,访问行为识别模块的作用是对用户访问网站的行为进行监控

   Based ngx_http_robot_mitigation Senginx branch of nginx

指令说明:http://www.senginx.org/cn/index.php/Robot_Mitigation%E6%A8%A1%E5%9D%97

称之为HTTP机器人缓解,Robot Mitigation模块采用了一种基于“挑战”的验证方法,即向客户端发送特定的、浏览器能解析的应答,如果客户端是真实的浏览器,则会重新触发请求, 并带有一个特定的Cookie值,Robot Mitigation模块会依据此Cookie的信息来决定是否放行此请求。

       Service Layer

The above intercepts only a part of the access request, when the number of users is large spike, even if each user has only one request to the service layer is still very large number of requests. For example, we have 100 sets of 100W users to simultaneously grab the phone, the service layer concurrent requests pressure of at least 100W.

      1, the need for major spike merchandise and inventory information to initialize the cache redis

      2, do check the legality of the request (such as whether to log), corresponding prompt if the request is illegal, an error code is returned directly to the front, were

      3, the determination of the identification memory (true spike has ended, not spike end to false), if the memory identification is true, the end of the spike is directly returned

      . 4, decr stock pre-subtraction operation, it is determined if the amount of the stock is less than 0 decr, put the memory flag is set to true (spike has ended), the end of the spike and back

      5, it is determined whether the spike to prevent the spike is repeated, if the spike is repeated, the error codes returned directly repeated spike

      6, is sent to the spike MQ message to the corresponding service processing ends and returns to the client queue, if the client receives a message queue, the polling query automatically, until the spike returns success or failure until the spike

      7, the corresponding end business processing: the real deal with the business side of the spike, verify (such as whether the end of the spike, the adequacy of inventory, etc.) again, and the user id of goods stored in redis as a key to identify the user of the commodity spike success (step 5 above will be used), reduce inventory, generate orders spike, spike successful return

           Note: Even if the request reached the end of this process the real business, it is also possible spike failure, such as spike ends, lack of inventory, reduced inventory real failure, failed to generate a single spike and so on, if they fail, it returns the end of the spike

Optimization: The interface hides spike : spike user clicks the button when the cached user id to generate a unique encryption string under and returned to the client, and then again when the client requests come with a string encryption, back-end check whether legal, if not legal, direct request to return illegal;

           An interface to limit access frequency : You can use interceptors with custom annotations to achieve, you can do so and concrete operational separation to reduce the invasion, is also very easy to use
 

Database layer


Database layer is most vulnerable layer, application design in general when it is necessary to request the upstream interception off, the database layer bear only access requests "within the capability range". Therefore, by introducing the above queues and buffers in the service layer, so that the bottom of the database peace of mind

To prevent the emergence of a negative spike greater than the number of orders for real inventory, so the real inventory reduction, inventory when the update should be added where stock> 0, and the need to spike the Orders table plus a unique user id and product id index of joint
 

Published 72 original articles · won praise 7 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_39399966/article/details/105007927
Recommended