Design and Implementation spike system

Problem Description

如何设计并实现一个秒杀/抢购系统

Over the past say ten minutes on stage, audience ten years, while the spike system is more interesting, instantaneous peak flow might thirty-two minutes, but you have to do a lot of preparatory work to this end. Assess the capacity of the well, bandwidth is READY, the adequacy of the front and rear end closure, whether queued requests and the like.

Design Challenges

瞬时峰值

Instantaneous peak challenge the server bandwidth

Spike moment, bandwidth may be times when the usual several times, instant bandwidth may run full.

Instantaneous peak will challenge the application server resources

Traffic a few times, if not in full back-end architecture design. Just a thirty-two minutes will be in a very short period of time avalanche spike class of business, the end of the event when traffic will cliff-like drop, no upfront good design, almost impossible in the event of a peak in the given effective emergency Program. In addition, if there is no good isolation between services, it will also affect the operation of other business services.

DB will challenge the instantaneous peak load capacity

If a large number of requests fell on request DB, vast amounts of reading and writing a data inventory, write conflict, there will be a large number of locks and wait, the next step is turtle speed response with collapse.

Thinking

越早拦截,成本越低,吞吐量越大

Simple abstract, common application architecture like this

Interface application layer (APP / browser, etc.) -> Service Layer -> the storage layer

The core idea is to make the request reaches out to intercept as much as possible before storage layer, the more requests come later, the hardware architecture and the higher the cost. Build and support one million concurrent transactions DB cluster, one million concurrent read than a static HTTP server cluster is much more difficult.

How do you block

Interface application layer (APP / client browser, etc.)

Button is grayed out to prevent duplicate clicks

//####
// 防止一个页面中重复点击
//####
if button_is_clicked
    return
    
button_is_clicked = yes
post_data()

//####
// 校验是否5秒内点击过,
// 防止多个页面重复点击
//####
if clicked_within_5_seconds
    return

//跨页面存储在本地,如cookie
clicked_within_5_seconds = yes
post_data()

Peak load shifting submitted

After buying the beginning, a very short time in order to avoid a lot of traffic influx server, the client may require the user to do something to click on the buy button, shunt pressure. for example,

  1. Computing a simple math problems
  2. Enter confirmation code
  3. Enter a string of Chinese
  4. Responding to a question naughty

Asynchronous display buying results

The results show buying and buying, avoid design into a synchronization process in the product, so that one can win breathing space for the server, the response may be the first time information to the user, to improve the experience.

同步流程:

点击抢购 -> 同步等待服务端结果 -> 显示结果


异步流程:

点击抢购 -> 服务端响应201 -> 显示抢购中的页面 

-> 【若干秒】后异步拉取抢购结果 -> 显示结果
此处的【若干秒】也有很多想象空间。

譬如说,50%的用户是5秒后到服务端拉取结果,

50%的用户是10秒后。这样也同样实现了错峰。

Service Layer

Based user_id deduplication, the amount of the brush to prevent

Front-end protection is a very important part, can effectively intercept the average user, but also the most uncontrollable of a ring, there are many ways to bypass. Require the server to conduct a single user based on a unique identifier to heavy interceptor. Pseudo code is similar to the client

//####
// 校验该用户是否5秒内点击过
//####
if user_clicked_within_5_seconds
    return

Limit the number of concurrent connections

IP-conditions the number of concurrent connections, there will be some probability of manslaughter. A certain amount of redundancy is usually in the manslaughter take a balance between the effective knockdown.

The amount MQ, to intercept the DB

  1. Maintain a count request, by only slightly larger than the actual amount of stock to the MQ request, the remaining requests have been robbed complete response

  2. Use a number of worker buy inventory update


//请求数小于库存量130%
if mq_count < quantity * 1.3
    push_to_mq()
    mq_count++
    return 201

response('抢光啦')

//worker更新库存
while mq_has_data
    //乐观并发锁
    update_quantity_where_version_is_1()

The use of cache

Read more carry less concurrent write cache of fair use is particularly critical.

1, the client cache: static resources into the CDN, the request back to the source to be as little as possible.

2, the server cache

(1) hot read data, necessary to warm good put redis / memcache, even the local cache

(2) excessive request response, to avoid runtime assembly, a gateway may establish a mechanism directly responsive local cache. (Eg: Nginx-Lua)

common problem

How to light up buying

1, the client time, time to the gray button front

Advantages: simple

Cons: Customer restrictions are easily bypassed

2, the server maintains a timing server, when the timing is complete, push the results to each server, spike began.

Advantages: simple, worker push only need to listen to the results.

Disadvantages: Timing the server has a single point of problems, and has pushed to have time on each server, the request is prone to some panic buying began to fall, some not yet begun. Instantaneous pressure on the machine will explode buying began.

3, Redis stores to buy a ttl start time key, each server has not expired by checking the key, to determine activities began.

Recommended use: Redis a high-availability cluster, can easily carry over 100,000 concurrent read +

How underpinning

Outermost LB, if it is hardware-based. The need for a collapse threshold, once the excess, either directly connected abandoned. Either route to a CDN in the file, suggesting that "steal it" and so on.

Oversold issues

Lock, by intercepting the foregoing, the amount of DB layer has been depleted. Decisive plus optimistic concurrency lock.

Bandwidth / server expansion

The need for capacity assessment before the event, deploy spike system also needs to be independent of other application servers. Similarly Ali cloud / cloud Tencent amount of pay-per-server is a good choice, after the end of the event data synchronized back to your own server.

Broiler problem

This is the most vexing problems, career wool generally have a large number of accounts. Often from each dimension point of view, are normal user, very hard to detect. But still there are some ways to prevent

1, IP risk assessment

2, real-name authentication

3, risk assessment based on past account transactions, filtering high-risk accounts

4, if dependent on third-party platforms, they can use wind control function of the system itself, such as: Tencent day Royal https://cloud.tencent.com/document/api/295/1774

Guess you like

Origin www.cnblogs.com/WuYiStudio/p/11012648.html